next up previous contents
Next: The Categoriser Up: Modules Previous: The Parser   Contents


The Database

As the generation of a query is a very complicated and a probably time consuming process, it should be possible to save a completely generated query for reuse at a later point in time. Additionally, this offers the possibility to define the request for particular dynamic pages manually. Consequently, a Database is required. This Database saves the three characteristic components for each form as extracted by the Parser.

There might be more interaction forms on a single page processed by the Parser, but the Database gets them one by one. It is compared to the previously stored data. If the corresponding entry is found, the query can be compiled from the stored values. Together with the destination URL and the type of transmission, it is forwarded to the Harvester, which is now able to retrieve the dynamic document.

When searching an entry in a database, it has to exactly match the description as extracted by the Parser. First and foremost, this requires the URL the query will be directed at to be the same. Also, the type of transmission should match. As a third characteristic component of an interaction form the fields are examined. Obviously, the number of fields and their names should be the same. Yet, there still is another feature of a form that must be considered.

A peculiarity are the so-called hidden-fields. The user does not know of the existence of these fields when viewing a page on his browser. Nevertheless, these fields must not simply be ignored. In fact, their constant values can have a big impact on the result. Thus, only forms having - apart from the features already mentioned - the same hidden fields can be considered to match, when searched for in the existing entries.

A proper query cannot be compiled that easily, if the query corresponding to a given form is not in the Database. In this case a "best guess" query may be created, trying to build a meaningful query based on information in the Database. The objective is to first interpret the meaning of the fields and to assign, based on the outcome, values to the fields. To make the interpretation and the assignment of values separate steps, an abstraction layer will be put in between them. This is done by assigning fields to a certain category. Therefore, an entry of a field in the Database consists not only of its name, value, and its type, but also of the category it belongs to.

To put it in a nutshell, when a new form is found, the category and the value of each field have to be found out. The former will be handled by the module Categoriser (cf. Section 5.2.4). The assignment of values is done by the module Value-Select (cf. Section 5.2.5).


next up previous contents
Next: The Categoriser Up: Modules Previous: The Parser   Contents
Andreas Aschenbrenner