Researchers: Jouko Lampinen, Aki Vehtari, Paula Litkey, Ilkka Kalliomäki, Simo Särkkä, and Jani Lahtinen
Neural networks are popular tools in classification and non-linear function approximation. The main difficulty with neural networks is in controlling the complexity of the model. It is well known that the optimal number of degrees of freedom in the model depends on the number of training samples, amount of noise in the samples and the complexity of the underlying function being estimated. With standard neural networks techniques the means for both determining the correct model complexity and setting up a network with the desired complexity are rather crude and often computationally very expensive. Also, tools for analyzing the models (such as confidence intervals) are rather primitive. The Bayesian approach is based on a consistent way to do inference by combining the evidence from data to prior knowledge from the problem, and it provides efficient tools for model selection and analysis.
In the laboratory of computational engineering we are studying full Bayesian approch for neural networks, where the high-dimensional integrals required in computation of various marginal distributions are approximated by Markov Chain Monte Carlo methods. We are developing methods for using more general prior distributions for model parameters and noise models than currently available. Examples of recent results are non-Gaussian noise models with arbitrary correlations between model outputs, outlier tolerant noise models with adaptable tailness, and Bayesian bootstrap methods for analyzing the performance of the models.
We have applied Bayesian neural networks in a number of modelling tasks. In practical applications the Bayesian approach usually requires more expert work than the standard error minimization approach, to build the probability models and priors, and to integrate out all the hyperparameters. The obtained results in our experience have been consistently better than with other statistical estimation methods, and the possibility of compute reliable confidence intervals of the results is necessary in real world applications.
Figures 1 and 2 show two examples of Bayesian neural networks in function approximation and classification tasks.
Figure 1: Example of Bayesian neural network for image reconstruction in Electrical Impedance Tomography (EIT). The left figure shows a cross section of a pipe filled with liquid and some gas bubbles (marked by dark green contours). The color shade shows the potential field due to injection of electric current from the redmost electrode, with the bluemost electrode grounded. The right figure shows the reconstruction of the conductivity image from the potential measurements of the 16 electrodes, using Bayesian neural network. The color indicates the bubble probability and blue contour the detected bubble boundary.
Figure 2: Example of a classifying forest scene to tree trunks and background. The figures from left are: the forest image; CART (Classification and Regression Tree); k-Nearest Neighbor classifier with k chosen by leave-one-out cross-validation; Committee of early-stopped MLP neural networks; Bayesian MLP; Bayesian MLP with ARD prior.
Researchers: Aki Vehtari, Jouko Lampinen
The project is part of the PROMISE (Applications of Probabilistic Modelling and Search Methods) project funded by TEKES and participating industries.
The goal of the project is to study theoretically justified and computationally practical methods for the Bayesian model assessment, comparison, and selection. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating the expected utilities, that is, the relative values of consequences.
The reliability of the estimated expected utility can be assessed by estimating its distribution. The distributions of the expected utilities can also be used to compare models, e.g., by computing the probability of one model having a better expected utility than some other model. The expected utilities take into account how the model predictions are going to be used and thus may reveal that even the best model selected may be inadequate or not practically better than the previously used models.
Main contributions of the work are the theoretical and methodological advances, which provide solid framework to assess the performance of the models in the terms of the application specialist, while taking properly into account the uncertainties in the process. The developed methods have already been applied with great success in model assessment and selection in real world concrete quality modeling case in cooperation with Lohja Rudus. By using the models and conclusions based on them made by the concrete expert it is, e.g., possible to achieve 5-15% savings in material costs in concrete factory.
Below is another example from a case project where one subgoal was a classification of the forest scene image pixels to tree and non-tree classes. The main problem in the task was the large variance in the classes. The appearance of the tree trunks varies in color and texture due to varying lighting conditions, epiphytes, and species dependent variations.In the non-tree class the diversity is much larger, containing,for example, terrain, tree branches, and sky. This diversity makes it difficult to choose the optimal features for the classification. Figure 3 shows comparison of the expected classification accuracies for two models. Model 1 uses 84 texture and statistical features extracted from images. Model 2 uses only 18 features, selected using the methods developed in the project, from the set of all 84 features used by Model 1. Although Model 2 is simpler, it has practically same expected accuracy as the Model 1.
Figure 3: An example of Bayesian model assessment and selection using expected utilities in forest scene classification problem. The figure shows the distributions of the expected classification accuracies for two different models classifying image pixels to tree trunks and background. Distribution describes how likely different values for the expected utility are.
Researchers: Jouko Lampinen, Timo Kostiainen
The Self-Organizing Map, SOM, is a very popular tool in exploratory data analysis. It is often used for the visualization of high-dimensional data. A theoretical and practical challenge in the SOM has been the difficulty to treat the method as a statistical model fitting procedure. This has greatly undermined the reliability of the results of data analysis and thus lead to a lot of time-consuming work in validating the results by other means.
In earlier attempts to associate a probability density model with the SOM, the SOM model has been modified. In this work we have derived the probability density model for which the unchanged SOM training algorithm gives the maximum likelihood estimate. The density model allows the application of model selection techniques to choose the parameters of the SOM to ensure as good generalization to the data as possible. Quantitative analysis of dependencies between data variables can also be carried out by calculating conditional distributions from the density model.
Figure 4: Left: Training data points and self-organizing map which consists of 2 × 4 units. Right: Probability density model associated with the SOM. The density model attempts to describe the distribution of the training data. The density function is not continuous due to the winner-take-all training rule of the SOM.
Researchers: Jouko Lampinen, Paula Litkey,
Laboratory of Computational Engineering, HUT
Jukka Laine, Laboratory of Metallurgy, HUT
The project is done in co-operation with the Laboratory of Metallurgy, within a TEKES project that studies the possibilities of using neural networks in steel manufacturing and casting. The project is an example of application oriented modelling task in industrial environment, but it also requires methodological research related to assessment of the reliability of the models and scaling of the modelling methods in real world problems.
The steel jominy-curve represents the hardeneability of steel. It is important that a steel plant can provide a reliable measure of the steel hardeneability to a client who is manufacturing demanding products, e.g. safety critical parts for automobiles.
Normally, the jominy curve is defined by quenching a steel sample and by measuring the hardness from several points along the specimen, which is an expensive test. With a reliable model for jominy curves the physical tests, c.a. 1500 pc/a, can be avoided and substantial savings can be achieved. The model can also be used to control the alloying of the steel during manufacturing and in the development of new steel grades. An ideal model should be applicable to a wide range of steel grades with different analysis and warn the user if the chemical composition is out of range of the model.
The goal of the project is to develop usable neural network models for the steel industry using large sets of real life process data. In addition to getting fine models we are examining means to assure the easy updating of the model with new data.
There are interactions of different chemical elements that are nonlinear and this makes it impossible to determine the complexity of the data only from the variation of the input values. The use of neural networks makes it relatively easy to model the relationship of the chemical composition and the shape of the measured jominy curve, which is quite difficult task for e.g. a regression model.
Figure 5: Example of jominy curve estimates from a neural network model. The shown curves are selected randomly from test set used to evaluate the model performance.
Researchers: Jukka Heikkonen, Timo Koskela,
Jouko Lampinen, and Kimmo Kaski
Project homepage: http://www.lce.hut.fi/research/InternetModelling/
Internet Services Modelling project is funded by Technology Development Centre (TEKES) and Ministry of Transport and Communications. Research is conducted in collaboration with Lappeenranta University of Technology (LUT), Center for Scientific Computing (CSC) and Nokia Research Center.
The objective of the project was to seek for tools and models for classifying and modelling the characteristics of the Internet services from the user point of view. Figure 6 shows the elements of the Internet service in a typical case. These include the user itself, Internet Service Provider (ISP), proxy cache, network and the service used.
The focus during the third year of the project was in developing a measurement scenario for the ISP's quality of service. Measurements of the services provided by ISP were carried out by using a client program installed to user's computer. Results from a questionnaire study were used in selecting the most popular Internet services to be measured.
In order to achieve useful quality measures for the services, acquired measurements must be weighted according to the actual behaviour of the users. User profiling turned out to be difficult for a variety of reasons. One possible solution is using a client program installed to user's computer, which logs the user's behaviour.
In the project's third year report a specification of a measurement system, which allows estimating QoS for a ISP is presented. It includes services to be measured, measurement techniques, quality requirements for the measurements, and user profiling issues.