Researchers: Jouko Lampinen, Timo Kostiainen
The project is financed by TEKES and carried out in co-operation with the Finnish Institute of Occupational Health.
The Self-Organizing Map, SOM, is a very popular tool in exploratory data analysis. It is often used for the visualization of high-dimensional data. A theoretical and practical challenge in the SOM has been the difficulty to treat the method as a statistical model fitting procedure. This has greatly undermined the reliability of the results of data analysis and thus lead to a lot of time-consuming work in validating the results by other means.
In earlier attempts to associate a probability density model with the SOM, the SOM model has been modified. In this work we have derived the probability density model for which the unchanged SOM training algorithm gives the maximum likelihood estimate. The density model allows the application of model selection techniques to choose the parameters of the SOM to ensure as good generalization to the data as possible. Quantitative analysis of dependencies between data variables can also be carried out by calculating conditional distributions from the density model.
Figure 2: Left: Training data points and self-organizing map which consists of 2x4 units. Right: Probability density model associated with the SOM. The density model attempts to describe the distribution of the training data. The density function is not continuous due to the winner-take-all training rule of the SOM. |