What type of storage media is employed has to be carefully singled out. Contradictory goals such as high availability and accessibility on the one side, and low system cost on the other have to be reconciled accounting for the archive's purpose. Foremost, reliability and durability are required of the storage media used, since the data stored in this archive is meant to be treasured in the remote future. But not only the physical media deteriorates by and by, whole technological standards run in danger of becoming obsolete.
By definition, any such archive includes more and more documents, it retains massive amounts of data. The repository must not only comprise a capacious storage from the very beginning, but it should also be scalable for future demands. The Internet Archive, for example, stores more than 43 terabyte of data in October 2001, increasing continuously as the information made available in the Internet grows at an exponential rate3. Solutions for storage at these dimensions while still offering timely access are hard to be found.
A storage media offering lots of space are magnetic tapes. However, their accessibility is very limited, even if the tapes do not need to be searched for and put in the drive manually. A robotic device that mounts and reads tapes, a so-called juke-box, handles the management of the tapes to a great extent automatically. At the same time, however, its installation is a major investment. Any other sort of removable storage media has to deal with similar limitations. Yet, magnetic tapes offer a solution at a comparably low price.
A system granting acceptable access time is based on hard-disks. To balance costs, the hardware solution must be tailored to mass storage. This can be achieved by putting many disks in a single computer, which is best done using specialised rack-mount systems. Yet, clustering average desktop models offers an even cheaper solution [Ale01]. The fact that lower-cost hardware can be used contrary to the equipment needed for rack-mount systems compensates for the pure rise in quantity. Since all the desktop models are autonomous computers in principle, data availability is increased at the same time.
The advantages of both solutions, tapes and hard-disks, are tried to be combined in Hierarchic Storage Management systems [Hun01]. Those make use of files being accessed with a different frequency. While some are used often, the bulk remains largely untouched. Therefore, a caching scheme is applied. Off-line media (e.g. magnetic tape) or near-line media (e.g. tape juke-boxes) store the whole collections. They are combined with fast hard-disk storage, that temporarily holds a small amount of the frequently accessed data.
Aside magnetic media in the form of tapes, optical media such as a CD-ROM could be considered as the primary storage media. Since this technology allows random access of the data stored, retrievability is considerably faster in comparison to tapes, that require spooling to the requested location. Yet, for the creation of archives having high demands on storage space, tapes offer a more capacious solution. It remains to be seen, however, whether the advance of technology changes this, such as the development of the DVD promises to.
Most archives originally set off using a solution based on tapes. Yet, more and more projects now are moving towards a disk based storage system, with tapes used as additional back-ups.