next up previous
Next: SOM and Adaptive Coordinates Up: Finding Structure in Text Previous: Introduction

Sammon's Mapping

Sammon's Mapping (SM) [5] provides a mapping from a high-dimensional vector space onto a 2-dimensional output space. The basic idea is to arrange all the data points on a 2-dimensional plane in such a way, that the distances between the data points in this output plane resemble the distances in vector space as defined by some metric as faithfully as possible. More formally, given a set of data points xi in $\Re^{n}$ with d(xi, xj) being the distance between two data points according to some metric, we obtain a distance matrix D with elements dij in input space. Let oi be the image of the data item xi in the 2-dimensional output space. With O we denote the distance matrix containing the pairwise distances between images as measured by the Euclidean vector norm $\Vert o_i - o_j\Vert$. The goal is to place the oi in such a way that the distance matrix O resembles as closely as possible matrix D, i.e. to optimize an error function E by following an interative steepest-descent process.


 \begin{displaymath}
E=\frac{1}{\sum_{i}{\sum_{j>i}{ d_{ij} } }} \sum_{i}{\sum_{j>i}{ \frac{(d_{ij}-\Vert o_i - o_j\Vert)^2}{d_{ij}} } }
\end{displaymath} (1)

The resulting visualization depicts clusters in input space as groups of data points mapped close to each other in the output plane. Thus, the inherent structure of the input signals can be told from the structure detected in the 2-dimensional visualization.


next up previous
Next: SOM and Adaptive Coordinates Up: Finding Structure in Text Previous: Introduction
Andreas RAUBER
1998-04-28