TU Logo   IFS Logo Vienna University of Technology
Institute of Software Technology and Interactive Systems
Data Mining with the Java SOMToolbox
[DataMining Home] [People] [Publications] [SOMToolbox]

Step-by-step guide to train and view Maps

To analyse data with a Self-Organising Map, you need to perform a series of steps:
  1. Data Preprocessing
  2. Feature Extraction
  3. Feature Processing
  4. SOM Training
  5. SOM Viewing

Data preprocessing

Depending on your data, you might need to perform some preprocessing steps. Examples of preprocessing might be

Feature Extraction

The Self-Organising Map can handle only numerical representations of data. Thus, you might need to apply some feature extraction, which is the process of describing certain characteristics of the data with numeric attributes. Some data (such as sales data) might already be in a numeric form and thus might require no feature extraction (but maybe some processing, such as normalisation, see below).

Specifically, our implementation of the SOM requires the data to be in the SOMLib file format, a rather simple ASCII format describing the features and the numeric representation of the data instances.

Data that might need feature extraction may for example be:

Feature processing: normalisation

This is an optional step, and you should be aware what kind of normalisation you want to apply to your data. The Java SOMToolbox provides the following normalisation methods:

./somtoolbox.sh SOMLibVectorNormalization -m UNIT_LEN <inputfile> <outputfile>

(in Windows use somtoolbox.bat instead of ./somtoolbox.sh)

For a brief introduction on the SOMLib input vector format see the quick guide on input files, or take a look at the detailed specification.

Self-Organizing Map training

Setup

Download the som.prop properties file and edit:

workingDirectory = <the directory with your data files>
outputDirectory = <directory where files will be created; empty means use workingDirectory>
namePrefix = <any project name you like>
vectorFileName = <name of *normalized* vector file - see 1.>
sparseData = <yes|no> ... use yes if vectors are sparse (e.g. text data), no if vectors are not sparse (audio!)
isNormalized = <yes|no> ... set yes if vectorFile has been previously normalized
templateFileName=vector.tv (the template vector file - see below)

Note: Under Windows use double backslashes \\ as path separator.

The remaining parameters control the SOM algorithm and can be experimented with:

xSize=20 ... size of map in x direction
ySize=14 ... size of map in y direction
learnrate=0.75
#sigma=12
#tau=
#metricName=
numIterations=2000 ... should be larger than the # of vectors in vectorFile (recommended: 5*<#_of_vectors>)

You have to provide an appropriate template vector file:

Note: you can also take a look at the complete and documented properties file.

Training

Now you are ready to train the SOM:

 ./somtoolbox.sh GrowingSOM [path/to/]som.prop

If an error occurs, please check the parameters provided.

At this point check if four files in your outputDirectory have been created with the namePrefix as provided in som.prop and the following extensions:

Analysing with the SOM Viewer

 ./somtoolbox.sh SOMViewer -u /path/to/file.unit.gz -w /path/to/file.wgt.gz --dw /path/to/file.dwm.gz