My PhD research is on designing a generic model for semantic multimodal information retrieval, named Astera. Finding useful information from large multimodal document collections such as the WWW is one of the major challenges of Information Retrieval (IR). The many sources of information now available - text, images, audio, video and more - increases the need for multimodal search. Particularly important is also the recognition, that each information item is inherently multimodal (i.e. has aspects in its information character that stem from different modalities) and forms part of a networked set of related information items.
In Astera, I model multimodal domain specific collections (like music, patent or medical collection) with the help of different relation types, and enrich the available data by extracting inherent information in the form of facets . This model is under test with ImageCLEF 2011 multimodal collection.
As shown in Figure 1, we use a hybrid retrieval method which consists of two steps: 1) In the first step, we perform an initial search with Lucene and use these initial result list. 2) In the second step, using the first set of data objects, as seeds, we exploit the graph weighted edges from the initiating points for a number of steps. We perform the spreading activation and at the end recompute the ranked result based on the activation value nodes receive via propagation.