Next: SOM and Adaptive Coordinates Up: Finding Structure in Text Previous: Introduction

Sammon's Mapping

Sammon's Mapping (SM) [5] provides a mapping from a high-dimensional vector space onto a 2-dimensional output space. The basic idea is to arrange all the data points on a 2-dimensional plane in such a way, that the distances between the data points in this output plane resemble the distances in vector space as defined by some metric as faithfully as possible. More formally, given a set of data points x_i in $\Re^{n}$ with d(x_i, x_j) being the distance between two data points according to some metric, we obtain a distance matrix D with elements d_ij in input space. Let o_i be the image of the data item x_i in the 2-dimensional output space. With O we denote the distance matrix containing the pairwise distances between images as measured by the Euclidean vector norm $\Vert o_i - o_j\Vert$ . The goal is to place the o_i in such a way that the distance matrix O resembles as closely as possible matrix D, i.e. to optimize an error function E by following an interative steepest-descent process.

$\begin{displaymath} E=\frac{1}{\sum_{i}{\sum_{j>i}{ d_{ij} } }} \sum_{i}{\sum_{j>i}{ \frac{(d_{ij}-\Vert o_i - o_j\Vert)^2}{d_{ij}} } } \end{displaymath}$

(1)

The resulting visualization depicts clusters in input space as groups of data points mapped close to each other in the output plane. Thus, the inherent structure of the input signals can be told from the structure detected in the 2-dimensional visualization.

Next: SOM and Adaptive Coordinates Up: Finding Structure in Text Previous: Introduction

Andreas RAUBER
1998-04-28