next up previous contents
Next: Challenges of Archivation Projects Up: Long-Term Preservation of Digital Previous: Contents   Contents

Introduction

Information and communication technology has not merely had an influence on our daily lives, but it has become an integral part of our society. Digital processing has permeated industry, scholarly research, communication. It has created a new economy, put forth new services, it has resulted in a new information medium, the Internet. An enormous amount of benefits has emerged from the information infrastructure, and the revolution is far from over. "The paradigmatic shifts resulting from the introduction of new and evolving technologies will almost certainly continue well into the 21st century."[Rus99] We are amidst a process that alters civilisation dramatically.

Being a major achievement of technology as well as a reason for its advance, the Internet embodies progress. Originally reserved to a small group of privileged, it has become a critical element of the public communications infrastructure. Even further, the Internet is not just a means of communication like a postal service or the telephone, it exceeds traditional media such as books, the radio or the television, as it combines all and goes beyond them in functionality. Entering in the everyday life of all of us, it is growing at an incredible rate. International Internet backbones registered a combined growth of 382 percent in 2000 [Tel01].

As much of the potential of the digital age remains undiscovered, predicting how society is going to incorporate the new possibilities appears a hard thing to do. Yet, analogies can be drawn to the evolution other media underwent. The revolution taking place today can indeed be compared to the impact Guthenberg's invention of the printing press had. Appealing as this development might be, nevertheless, unveiling these parallels gives rise to concern at the same time.

Only fragments of early writings are still available. Printed books deteriorated beyond readability. Many of the oldest television broadcasts were live and, thus, not preserved. Recorded films were often deleted, reusing the videotape they occupied. The loss of early media can be traced with paper, film, and photography, as well as the early days of radio and television [LB98]. Just now, there is danger that a similar fate happens to digital materials. The average life-time of a document has become relatively short. Data on the Internet is alarmingly volatile [Ger00]. The early days of the Internet have already faded away. Yet, humankind shows that it is indeed capable to learn from its past. Initiatives have been inaugurated in order to prevent further loss by archiving the Internet for future generations.

Nevertheless, the quality of the material disseminated in the Internet is contestable. In fact, a high percentage of the data available may be considered purely junk, useless, or even misleading information. Still, these artefacts could be an important source for research. This can be followed at the example of old newspapers. Scientists consider the advertisements and obituaries they hold very interesting, actually more interesting than the plain information contained in the articles.

As the potential of the new technology is explored, new forms of information representation emerge. The expressive power of hypertext documents with their non-linear link structure, as well as multimedia documents integrating video, sound, and interactive components, cannot be adequately represented in traditional forms. The network formed by cross-referenced documents offers another dimension unmatched in conventional media [RA01].

Adding up to this, the values waiting to be exploited in the Web have a far deeper dimension than merely information representation. The revolutionary innovation of the new medium is its interactivity. Having been a passive consumer of information formerly, the user is now offered the possibility to actively participate in the creation of information space, be it by contributing via discussion forums to existing web-pages, or even by creating his or her own home-page.

On-line, groups of users sharing the same interest gather to communicate and exchange information, which adds a social component to the web. The meeting of people from various countries having differing cultures and backgrounds in these on-line communities open up a potential unthought-of, a driving force for development beyond economically stamped expressions like "globalisation". Away from locational disadvantages and distances, Cyberspace has become like a big city. In fact, people call their own pages "home" [Neg96].

For these reasons, the Internet representing a mirror to society is an important source for research. Scholars from various backgrounds, sociology, history, linguistics and many more, have already underlined the importance to document its development. Therefore, it is essential to build archives that treasure our digital cultural heritage. The evolution of the Internet reflects the development of society and, hence, should be preserved.

Numerous challenges have to be addressed when archiving digital material, specifically with the source being the Internet. These range from the acquisition of the documents, to their storage, preservation, and to providing access. First of all, the necessity to capture the characteristics of the web, i.e. to obtain the documents, their content, look and feel, as well as their role within the larger network of interlinked information poses serious challenges. The vast amount of material available on the web, the decentralised organisation, as well as the volatility of the data are features calling for carefully designed methods of data acquisition to meet the goals of building a digital archive.

Furthermore, a suitable infrastructure has to be established for storing and managing the huge masses of data obtained. Access provision poses additional challenges. Resource discovery in such a big repository demands a sound organisational framework. However, other services are conceivable to improve usability, thereby facilitating the exploitation of the wealth of information to be found in the collections. Depending on the needs and goals of the archive's users, multiple ways of accessing and analysing the available data can be installed.

Yet, not only technical issues must be considered. Finances are, of course, any organisation's concern. Such a long-term project needs to address this issue in a profound manner. At this point of time, most countries do not provide a legal framework backing the creation of an Internet archive, since the consequences this entails are not sufficiently explored yet. Consequently, in such a project touching on Copyright as well as other issues, great sensitivity has to be exerted.

Having created the collections does not guarantee that future generations will be able to benefit from the treasured information. Digital material is under the imminent danger of becoming unusable due to the phenomenal rate at which technology evolves. Offering undeniable possibilities, the speed with which technology advances is both one of its great strengths as well as its most dangerous weakness. Superseding hardware or a new version of software could cause loss of information from one day to the other.

Our blooming digital culture is heading for oblivion. Worse than hardly maintainable constructs of legacy data, information increasingly disappears into a digital gap. Several occasions can be pinpointed, where valuable "born digital" documents were lost and are unrecoverable. Dismayingly, our current time may be considered a "digital dark age" [Bra99], for there has never been a time of such drastic and irretrievable information loss.

Under these terms, any information service provider is prompted to care for the long-term preservation of their resources into the future. Of course, this is the case for libraries that are increasingly undergoing an extension of their scope to the digital domain as they retain digital documents as supplements to and parallels of print materials. Yet, to the same extent this concerns any other institution that stores and maintains digital material. Whether public or private, governmental, commercial, a charitable society, or any other kind of organisation, all have to be aware of the importance in guaranteeing the permanent preservation of their digital resources. NASA has lost up to 20 percent of the information collected during the 1976 Viking mission to Mars, since the data was trapped on decaying digital magnetic tape [Ste98]. "From 1976 through 1979, the National Archives worked on recovering certain 1960 census data from tapes designed to run on long-obsolete machines." [Man98] Ensuring that digital documents retain their functionality beyond another cycle in the development of technology is, hence, an issue for the whole digital community.

In the following, we will take a closer look at the challenges in archiving digital information. Chapter 2 discusses all aspects of this thoroughly. Other initiatives in this field are presented in Chapter 3. Conveying a more tangible view, we introduce the Austrian On-Line Archive in Chapter 4. Thereafter, in Chapter 5, an up to now unsolved challenge, the automatic retrieval of interactive documents, is tackled with a suggested solution and experiments with a prototype that has been implemented. Reviewing our own experiences in Chapter 6, lessons learned conclude this thesis.


next up previous contents
Next: Challenges of Archivation Projects Up: Long-Term Preservation of Digital Previous: Contents   Contents
Andreas Aschenbrenner