The Relationship of Scholars to Libraries in the Digital Age:
The Challenge of Preserving and Making Available Digital Information
Library of Congress
October 1, 2003
Scholars have the expectation that accumulated knowledge and the information resources that are used to create knowledge will be maintained by libraries. That expectation has been met reasonably well in the world of print on paper, but as we enter the digital age, the contract that libraries have with society is difficult to honor.
Scholarship is cumulative, and libraries have always been integral components of universities and research laboratories whether as hospitable work environments, sources of expertise on the location and use of information, or collections of current and historical data. But despite the promises of digital technology, its speed, the seemingly boundless capacity of a silicon chip to store information, and so on, we are finding that the new digital technologies are as much friend as foe to long term research and its storage and preservation. The amount of information we create on our computers plus the information to which can have access through the networks is growing exponentially, leaving us with a quantity of data to organize, archive and preserve that we have never coped with before. And even if we could solve the well-known technical problems of media and signal degradation and hardware and software obsolescence, we have yet to solve the very grave challenges of searching and retrieving relevant information in the almost unfathomable quantities of heterogeneous information that we are creating.
Experts in and outside of Government are still looking for the best solution to the problem. But clearly one thing that is not the solution is merely and mindlessly to save every tidbit of data generated. Indeed, our much-envied research libraries – Harvard, Stanford, the University of Illinois and others – manage library collections that are useful to scholars, students and the public precisely because they were shaped by men and women who used their critical judgment and subject expertise to select items that would be of value. "Save everything" is not now the rule in Government either. At most, only five percent of all Government records are selected for permanent retention. Those who decide what to keep are professional archivists who work to legally sanctioned guidelines. These guidelines are periodically reviewed to prevent partisan opinions from compromising the intellectual audit trail of state and Federal governments. Even individuals don't save every scrap of paper in their files -- tax records, restaurant receipts, dry-cleaning tickets -- and deed them to their descendants with the injunction that they be kept forever. That would be irresponsible, some might even say self-centered and lazy. We all employ judgment and selection within a framework of expectations, be those driven by the Internal Revenue Service or some vague notion of “history.” Those editorial skills will become no less important in the digital era.
As a nation, where full citizenship is based on free and unfettered access to public information, we must take responsibility in shaping our digital legacy. As James Madison wrote: "A popular Government, without popular information, or the means of acquiring it, is but a prologue to a Farce or a Tragedy, or, perhaps, both. Knowledge will forever govern ignorance." Madison, of course, was worried about censorship. In our age, when too much information can numb the mind and paralyze the will, knowledge may develop a new sort of elusiveness.
But whose business is this anyway? If we can’t save everything, then how do we go about better understanding what is important to save?
Part of the answer to these questions lies in understanding the scope of the problem and the nature of the challenges that face us.
The Digital Problem; the Digital Challenge
Since the mid-1990s, we’ve known that digital preservation is a technical, legal, and organizational problem. Storage media degrade; signals decay; and the configuration of hardware and software required to render and display stored data become obsolete. Try running Microsoft Word 2.0 -- or any pre-1990 software -- on your Windows XP system and let me know how it goes. Moreover, given the networks, it is an international problem – a facet that you, as researchers, know well. Research thrives in an atmosphere of free exchange, hence the notion of pre-competitive research. Barriers thrown up by different copyright, database, patent, and trademarks regimes, while perhaps relevant to economic competitiveness, are also antithetical to an intellectual environmental conducive to discovery and innovation.
This conundrum is by no means unique to digital information but it obtains scale and scope precisely because of digital information, in particular, because of digital communications systems that enable informal e-mail, chat and listservs among skilled practitioners and investigators and then automatically create a record of the communications. The library in your lab or university may not be called upon to referee what is and is not permissible speech. But if an archive of e-mail is turned over to the university’s library, then the library has the very practical problem not only of saving the “bits,” meaning preserving the ability for further generations to display the material, but also of determining when, by whom, and under what conditions the material may be retrieved and shared. To be sure, manuscript curators have faced these problems, but these were relatively specialized in the context of the library’s collections. They are no longer, and manuscript curators certainly did not face issues of encryption or of licensing even to get at the material for which they were responsible.
Technology offers us tools to manage intellectual property issues and is likely to offer us more options as research flourishes. However, these tools must be deployed in organizational contexts. As the well-known examples from intellectual property law show us, the economic, social and organizational contexts are likely to pose even greater challenges than preserving the material itself. Indeed, one of the most surprising conclusions from a series of interviews, white papers and stakeholder meetings conducted in summer and fall of 2001 under the leadership of the Library of Congress’s National Digital Information Infrastructure and Preservation Program (NDIIPP) was precisely that: the technological dimensions of long term preservation of digital content are difficult, but they are not the most difficult dimensions of the preservation problem and the solutions to issues of long term digital preservation are not purely technological.
As many of you may know, legislation passed in December 2000 charged the Library of Congress to lead a national effort to develop an infrastructure to support long term preservation of so-called “born digital” content. That is, material created in digital form for which there is no analog equivalent and is experienced by the user as seamless digital information. Obvious examples are Web site that have an interface and “serve up” data from distributed databases or interactive online games that might appear differently to different users at different times. The intent of the background research, interviews and planning that took place in 2001 and 2002 was to define the basic issues while illuminating the concerns brought by the library, preservation, and archival communities as well as those of the content creator and distributor communities -- publishers, authors, studios, professional associations like Recording Industry Association of America,, Motion Picture Association of American, and the Association of American Publishers, and so on.
Not surprisingly, different communities placed emphasis on different problems but there was surprising consensus on the fundamentals:
1.Technology:The basic issues are well-known:storage media degrades and that signals encoded thereon also degrade but at a different rate; software and hardware obsolesce, and preserving equipment is not a viable long term solution.Research is underway to examine these issues – how long does a DVD last, for example? But there is less consensus about strategies such as migration versus emulation, about the scope and structure of metadata, and about architectures, although there was firm consensus on the need for some sort of distributed environment.Which of course immediately begs the organizational question:whose responsibility for what? and how will one node in a distributed system “know” what other nodes have?
Although this view of collecting and lending policies is mistaken, I do believe that the underlying faith in the American library system revealed by these opinions is wonderful. It speaks to our belief in the democratizing effects of knowledge and the cultural role libraries play in furthering that democratization. Now we have to find a way to protect that role as democratic stewards while assuming ensuring for fairness and balance among all interested parties. It is not our role alone -- we will need many partners -- but clearly the public looks to libraries for leadership.
What can scholars expect from their libraries in the digital age? It should be obvious by now that long term preservation of digital content requires participation from many individuals and entities at many steps in the process -- from decisions by individuals in government agencies, for example, about weeding their e-mail files, to software designers who must come up with systems that enable users to save relevant information with the click of a mouse, to managers who must deploy these systems within coherent policies that their employees understand. Government information is a relatively well-structured example. Libraries and archives that deal with even more heterogeneous information -- everything from diaries to commercial motion pictures and so-called “indie” productions -- face an even more challenging set of questions about collections and use policies. Most librarians would agree that trying to gauge the future expectations of users 50 or 100 years hence is both necessary and problematic but key to working out a reasonable set of policies.
To understand what digital assets are being collected and archived, the Council on Library and Information Resources (CLIR), which played a pivotal role in the background studies for NDIIPP in 2001 and 2002, commissioned a series of investigations into how scholars use digital information and what institutions were doing to provide and protect access to that information. I was CLIR’s president at the time.
Our findings were simultaneously disturbing and encouraging. First, the bad news:
A survey of 33 North American initiatives representing a cross-section of the cultural community, from performing arts organizations and scholarly and library associations to publishing groups and standards initiatives, revealed: a diverse array of missions, programs, services, and products and equally diverse – and frequently fragile – organizational structures with staff sizes that range a few volunteers to large groups of paid professionals.
Philanthropic foundations are the largest source of financial support for digital cultural initiatives, followed by membership fees and by grants from federal, state, municipal, and other local public agencies. But philanthropic support is unstable. The Andrew W. Mellon Foundation, a leading financial contributor, regards support for digital cultural heritage initiatives as a by-product of its mission to support scholarship. The Institute of Museum and Library Services (IMLS), the only federal agency with statutory authority to support digitization, does view such support as its mandate. However, funders evaluate sustainability among recipients and include demonstration of demand and institutional support and benchmarks to ensure accessibility over time.
Many of the initiatives surveyed in fact have not yet reached a sustainable state, and recent economic developments contribute to their tenuous status. Collaborative efforts are increasingly rare and projects observed an alarming trend among foundations to discontinue arts funding. Commercial ventures present one opportunity but may jeopardize their nonprofit purpose and status.
Many cultural organizations fail to treat digital cultural heritage projects as a permanent part of their operations. This tentativeness results in inadequate financial resources, a lack of long-term planning, and huge burdens on staff, and is exacerbated by an absence of community-wide preservation and archiving standards and management policies.
Now for the good news:
Among the projects surveyed by CLIR was recognition of the problem and some consensus about solutions, which converge with solutions proposed by NDIIPP and other international preservation initiatives. Indeed, the existence of these national initiatives is the second bit of good news. Initiatives at the National Library of the Netherlands, in the Scandinavian countries, at the Bibliotheque Nationale in France and at the British Library in the UK, and at the national libraries of Australia and New Zealand as well as NDIIPP, programs at the National Library of Medicine and the National Agricultural Library together with the electronic records program at the National Archives are making headway at several levels and most importantly are aware of concurrent work. These admittedly nascent interactions represent the third bit of good news.
Whether talking to representatives of individual initiatives while at CLIR or to leaders of other national libraries in my present position as Associate Librarian of Congress, very consistently I hear that collaboration and cooperation are essential to long term digital preservation. You will recall that one of the implications of distributed systems was understanding distributed responsibility for these systems and hence communication and cooperation among them. For some of you, that may mean the very hard questions of technical inoperability; for others (and I include myself in this category), it means the organizational challenges of partitioning responsibility for building collections, for sharing information about those collections, and ultimately for sharing the contents of those collections – again, while balancing other important concerns like copyright and integrity of documents and their display.
Increasingly, the library and archives communities are coming to terms with the notion of distributed responsibility for building collections and for long term stewardship of those collections. In addition to increasing awareness of preservation, we are also beginning to come up with at least preliminary standards and guides to good practice so that we can begin the important work of saving data at least in a primitive way while computer scientists, engineers, and behavioral scientists conduct their research. Colin Webb, from the National Library of Australia, is preparing an important new document for submission to the General Conference of UNESCO, the United Nations Educational, Scientific and Cultural Organization, “Guidelines for the Preservation of Digital Heritage”. Four points deserve particular emphasis today:
1. Digital preservation will happen only if organizations and individuals accept responsibility for it. The starting point for action is a decision about responsibility.
2. Everyone does not have to do everything; everything does not have to be done all at once.
3. Preservation action should not be delayed until a single ‘digital preservation standard’ appears.
4. It is reasonable for programs to choose multiple strategies for preserving access, especially to diverse collections. They should consider the potential benefits of maintaining the original data streams of materials as well as any modified versions, as insurance against the failure of still-uncertain strategies.
Challenges for the Future
Probably the biggest challenge to preserving the burgeoning amount of digital information that is being created today is agreeing that this is a problem that cannot be addressed by libraries and archives alone: it is a problem that is shared by all who create and use digital information – technologists, scientists in academe and R&D labs, scholars in the arts and humanities, historical societies that have traditionally looked after local history and genealogy, and the state and national libraries and archives with mandates of national and international scope.
We can certainly articulate many of the key challenges and go to work on solutions. What are they? Well, you can probably guess:
Copyright, rights ownership and management: As we acquire content protected by copyright we must be able to preserve it, which includes copying it across platforms and media, and we must understand how it might be made available for use.
Selection: Generally understood by librarians as collection development policies, we need to think about and implement models for collaboration. To this, we also need to understand how digital information is being used, what is important (of enduring value), and where responsibility for harvesting, cataloging, organizing, maintaining content is shared.
Technology. Solutions that we already use in very small collections need to be developed and tested and scaled up to meet future needs. Some of the specifics are passive and active capture methods, harvesting, documentation, metadata, and interoperability standards, registries of digital collections and formats (current and obsolete), agreement of what constitutes a “trusted digital repository”, agreement on what is what constitutes good life cycle management of digital content, and establishment of pathways for delivering content including sounds and moving images. There is very good work underway in many of these topics and that good work should be fostered and shared.
More trusted partners are needed who are willing to share collecting and administrative responsibility (and costs) for the long term preservation of content. The academic and research libraries have begun to evolve models for collaboration, and the NDIIPP holds great promise for developing a sound infrastructure for preservation. But the results of these efforts will need to be tested.
But sustainable digital preservation is much more than a technical challenge; it is a cultural challenge that requires scholars, among others, to recommit themselves to the archival enterprise. It also requires that we build new communities of stakeholders who have a vested interest in seeing that what we create today is available for our children and our children’s children. Of no little importance will be our need to build and strengthen the social capital in our institutions to bring together the librarians, the scholars, the IT experts, and the computer scientists. As our good friend Benjamin Franklin said at the birth of the Republic, “We must all hang together, or we will most assuredly hang separately.”