next up previous contents
Next: Interpreting Fields Separately Up: The Prototype Previous: Comparison-Algorithm   Contents


Categorisation

This module is consulted every time a new form is encountered. Its task is to find out what a form signifies by assigning a category to each field. It appears hardly promising to scrutinise the whole document just to interpret the meaning of an interaction form on it. Keyword searches or login interfaces appear on wholly different looking pages. Therefore, the interpretation focuses on the actual form.

The categorisation process happens in two steps. As a first step, each field is assigned possible categories separately. This is done by making use of the information provided by the Parser. The next stage emphasises, that the form as a whole must not consist of fields having been interpreted isolated from the others. As an example, it is very unlikely, that a single form has several fields belonging to the same category. Fields expecting an address tend to occur together with one for the persons name. The importance of the context to neighbouring fields has to be accounted for. In this step, the implicit information available in the Database suggests itself for exploitation. For this reason, the entry most similar to the new form has to be found in the Database.



Subsections
next up previous contents
Next: Interpreting Fields Separately Up: The Prototype Previous: Comparison-Algorithm   Contents
Andreas Aschenbrenner