next up previous contents
Next: Emulation Up: Digital Preservation Previous: Technology preservation   Contents


Conversion and standard formats

Conversion6 describes the periodic transfer of digital material to an up-to-date system configuration. Unlike purely refreshing the data to prevent it from physical decay, which merely involves the copying of the data as it is to new storage media (cf. Section 2.3.2), the format of the digital material is converted such that it can be interpreted by a software version currently in place. For example, HTML or PostScript files might be converted to Adobe's Portable Document Format (PDF). Thereby, the accessibility of the digital objects is upheld from one generation of computer technology to a subsequent.

When tampering with the document as it was created, obviously, its original form is altered, its authenticity is corrupted. Conversion to another format having a different functionality may cause an irreparable loss of information. In some cases, which make use of rare formats, or documents, that base their functionality on a specific characteristic of a data type that cannot be converted, this means the total loss of the document. This could, for instance, be the case with particular forms of interactive art. Yet, the authentic look of a document is likely to be altered already when making a minor step to an only slightly different data format. Consider, for example, commonly used Winword or Excel files. Already when upgrading to a newer version of the application, you run danger to have the layout, or even the functionality of documents altered.

Also, the work necessary for developing and installing a conversion must be considered, which is a very labour-intensive, time-consuming and error-prone process that must be repeated very often, since for all types of documents and each new data format a unique solution is required. In fact, particularly complex conversions could ultimately lead to the abandonment of a whole array of documents [Rot99].

However, this considerable effort can be reduced by adhering to a small number of standard formats. Typical standard formats are, e.g., ASCII for text, TIFF for images, or PostScript for the presentation of layout. There exists a large variety of data types. Yet, the number of differing formats within the archive can be kept at a relatively small number. This entails immediately converting a document, that is newly added to the archive, into a prevalent format, or maybe two formats offering different functionality and, hence, complement each other (e.g. ASCII and PostScript). A very sophisticated animation could, for example, be retained as a series of screen-shots.

Therefore, less converters are needed at any cycle of conversion, since fewer types are in the archive needing a specialised solution. Furthermore, it is very likely that converters from a then obsolete standard format to a new, superior standard format will be available. This is due to the tremendous amount of files existing in any standard file format, all of which require to be converted from the then obsolete to the new standard format. Needing only few converters and at the same time the prospect of having those at disposal facilitates the process of Conversion while accepting loss of information.

Another aspect to be considered present proprietary data formats. These constitute a sort of dependency to the holders of the very format, which not only limits the availability of access software but could also raise legal issues. This is avoided by adhering to open formats. Even further, the development of converters is substantially facilitated.

An obvious advantage of the method Conversion is the fast accessibility of a collection item. Since it will be in a prevailing standard format at any point in time, the document can be viewed with a then up-to-date system. In most cases, the conversion of digital material suffices to convey an impression of its original form. It is a viable solution, in fact, it is used already now, as people update their documents when changing to a new computer or barely a new software version.

However, an immediate deficiency of the strategy is the impossibility to predict what it will entail. When, let alone how often these cycles of conversion will have to be executed, can hardly be predicted. Standards might, in fact, turn out to be very short-lived in the digital environment. Also, the successive cycles will demand a new solution for every single conversion, deriving little or no benefit or cost savings from previous cycles [Rot99]. Thereby, costs accumulate over time and increase with the size of the archive.

Points of criticism can, hence, be subsumed as Conversion being labour-intensive, time-consuming, expensive, risky, non-scalable, and the original form of the document, its look and feel is corrupted. There is some point of justice in all these points made, yet, not all of them apply with equal force. After all, digital preservation is a substantial problem the magnitude of which cannot be ascertained in the near future. Therefore, the labour, time, and money to be invested in the strategy of Conversion might be justifiable.

Yet, bearing in mind that there is no other practicable technical solution for the time being, this is a viable approach, at least in the short run. Accessibility can be attained at relatively low costs, even for documents in rather obscure formats. How severely the original form is corrupted depends, of course, on the documents involved. Even if their authenticity might be altered in the process of Conversion, this might, in fact, suffice to serve the needs of the user.



Footnotes

...Conversion6
Conversion is often (misleadingly) referred to as Migration in the literature. However, this often leads to confusion with the migration (i.e. identical copying) of data from one physical storage media to the other, as described in Section 2.3.2.

next up previous contents
Next: Emulation Up: Digital Preservation Previous: Technology preservation   Contents
Andreas Aschenbrenner