Department of Software Technology
Vienna University of Technology


The SOMLib Digital Library - Motivation

During the last years we have witnessed an uninterrupted rise of the amount of information available in electronic form. While the size and availability of electronic information has changed a lot, and in spite of the availability of highly sophisticated algorithms for searching these archives, ways for representing and interacting with those collections could not keep pace. Most information repositories still present themselves as varieties of lists of entries, ranging from filename listings and commented lists of documents to manually created hierarchies of pieces of information, which usually try to find one single place for every document in the collection. Searching these collections requires users to define their queries in some boolean logic based expressions, specifying large numbers of keywords, synonyms and antinyms, requiring both knowledge of the problem domain as well as basic query formulation experience. Results of queries are usually presented as long lists of (both relevant and irrelevant) retrieved documents sorted following some ranking criteria, with the large overall number of documents retrieved usually inhibiting efficient search. Information on the documents retrieved from a collection is at the most presented as a rather long textual description of the available metadata.

On the other hand, taking a look at conventional libraries (which have a long history and thus had time to evolve and adapt to our needs) and the way we approach and query them, we find a completely different situation:

Libraries usually exhibit a clearly detectable structure by organizing books by topic into sections and shelves. This structure allows us to gain insight into the contents of the library as well as to get a rough overview of the amount of information available on specific topics. When entering a library or large-scale book store, in spite of the overwhelming amount of information present in such locations users usually manage to orient themselves and find the way to their section of interest quite easily. Without being able to read the title of books from the far distance, not knowing actually where to find a book by a specific author or even without knowing a title or an author of a book, most people are able to locate the respective sections when looking for a dictionary, a poem collection or a story book for children.
Furthermore, the spatial organisation has shown to be useful even for professional information researchers, who - while often not remembering neither the title nor the author of a certain book - often will know exactly where to find the information they have come across sometime ago. This somehow also relates to the long discussion about lost-in-hyperspace, which may be traced down to the fact of lacking constancy and understandable spatial organization.

Searching a library can take several forms: you might start browsing from the entrance via different floors to any specific section and shelve, which is then searched entry by entry. Note, that at most libraries you find a map of the library at the entrance, giving an overview of books on which topic may be found in which section. A second approach may be by searching keyword, author and title catalogues. Third, you might also ask a librarian to help you find the requested pieces of information by giving a rough idea of the desired book. The outcome of such an inquiry is usually not only a list of titles or a pile of books, but also includes some recommendations based on the experience of the librarian. Furthermore, locating one book in the library usually leaves you, due to the topical structure, with several other relevant ones nearby.

Once you find the corresponding shelve, by scanning the books sorted there, it is usually easy for you to tell the age of a book, the number of times it has been used before (at least in a public library rather than in a bookstore), as well as the amount and type of information to be expected in the books simply by looking at them. The cover of the book, the title, type of binding, the shape of the binding (brand new versus well-thumbed and almost torn apart), the size of the book, color and other properties of an item on the shelve contain a wealth of information that most people are accustomed to and able to interpret intuitively. Thus, it is easy for us to gain an overview of the contents of a library, the type of information present, how many items of a specific title can be found etc. All these features make orientation rather easy in spite of the wealth of information present.

Thus, we find conventional libraries and article collections in some aspects very well suited for the task they are intended to serve, whereas in other aspects digital libraries undoubtly offer more possibilities. Adopting these characteristics of conventional libraries for electronic media to combine the benefits of the evolved structures of conventional systems with the benefits of digital systems has proven to be difficult. This is partially due to the mere amount of information growth. Reading and manually classifying all entries in an information repository to create an order similar to the one found in conventional libraries proves to be a sisyphean struggle, as does searching and browsing these huge collections. Thus, the SOMLib system, by providing an automatic spatial organization of documents by content similar to conventional systems, combined with metaphorical representations of metadata offers itself to bridge this gap.


Up to the SOMLib Digital Library Homepage
Comments: rauber@ifs.tuwien.ac.at