next up previous contents
Next: Nearest Neighbour Up: Categorisation Previous: Interpreting Fields Separately   Contents


Comparison of New Forms

By assigning a likelihood for each category in all fields of the new form in the first stage, a matrix is built.

table document X
k_i ...categories, i is index


As a field can only be of a single category, the length of each line is normalised to 1.0.

length is 1
for each field f
k_i ...categories, i is index


A possibility for comparing this matrix with the existing forms in the Database has to be found. This is done by summing up over each column. The resulting vector has n dimensions and can directly be compared to other key-vectors as described in Section 5.3.1.

keyvector a
f_j ...fields, j is index
k_i ...categories, i is index

Essential for this comparison is a proper definition of the default probability in the first stage. If this step sets the values of likelihood too high, a tendency to vectors having many entries arises. Similarly, short vectors will be favoured if the initialisation is pessimistic.

Experiments have shown, however, that this means of comparing not yet categorised forms to existing entries in the Database is stable. Any misinterpretations can be allowed for and corrected in a subsequent step.


next up previous contents
Next: Nearest Neighbour Up: Categorisation Previous: Interpreting Fields Separately   Contents
Andreas Aschenbrenner