Comparison of New Forms

Next: Nearest Neighbour Up: Categorisation Previous: Interpreting Fields Separately Contents

Comparison of New Forms

By assigning a likelihood for each category in all fields of the new form in the first stage, a matrix is built.

...categories, i is index

As a field can only be of a single category, the length of each line is normalised to 1.0.

for each field f
k_i

...categories, i is index

A possibility for comparing this matrix with the existing forms in the Database has to be found. This is done by summing up over each column. The resulting vector has n dimensions and can directly be compared to other key-vectors as described in Section 5.3.1.

...fields, j is index

...categories, i is index

Essential for this comparison is a proper definition of the default probability in the first stage. If this step sets the values of likelihood too high, a tendency to vectors having many entries arises. Similarly, short vectors will be favoured if the initialisation is pessimistic.

Experiments have shown, however, that this means of comparing not yet categorised forms to existing entries in the Database is stable. Any misinterpretations can be allowed for and corrected in a subsequent step.

Next: Nearest Neighbour Up: Categorisation Previous: Interpreting Fields Separately Contents

Andreas Aschenbrenner