Modelling and Data-Analysis

Bayesian Methods for Neural Networks

Researchers: Jouko Lampinen, Aki Vehtari, Paula Litkey, Ilkka Kalliomäki, Simo Särkkä, and Jani Lahtinen

Neural networks are popular tools in classification and non-linear function approximation. The main difficulty with neural networks is in controlling the complexity of the model. It is well known that the optimal number of degrees of freedom in the model depends on the number of training samples, amount of noise in the samples and the complexity of the underlying function being estimated. With standard neural networks techniques the means for both determining the correct model complexity and setting up a network with the desired complexity are rather crude and often computationally very expensive. Also, tools for analyzing the models (such as confidence intervals) are rather primitive. The Bayesian approach is based on a consistent way to do inference by combining the evidence from data to prior knowledge from the problem, and it provides efficient tools for model selection and analysis.

In the laboratory of computational engineering we are studying full Bayesian approch for neural networks, where the high-dimensional integrals required in computation of various marginal distributions are approximated by Markov Chain Monte Carlo methods. We are developing methods for using more general prior distributions for model parameters and noise models than currently available. Examples of recent results are non-Gaussian noise models with arbitrary correlations between model outputs, outlier tolerant noise models with adaptable tailness, and Bayesian bootstrap methods for analyzing the performance of the models.

We have applied Bayesian neural networks in a number of modelling tasks. In practical applications the Bayesian approach usually requires more expert work than the standard error minimization approach, to build the probability models and priors, and to integrate out all the hyperparameters. The obtained results in our experience have been consistently better than with other statistical estimation methods, and the possibility of compute reliable confidence intervals of the results is necessary in real world applications. Figures 1 and 2 show two examples of Bayesian neural networks in function approximation and classifcation tasks.

Figure 1a Figure 1b

Figure 1: Example of Bayesian neural network for image reconstruction in Electrical Impedance Tomography (EIT). The left figure shows a cross section of a pipe filled with liquid and some gas bubbles (marked by dark green contours). The color shade shows the potential field due to injection of electric current from the redmost electrode, with the bluemost electrode gounded. The right figure shows the reconstruction of the conductivity image from the potential measurements of the 16 electrodes, using Bayesian neural network. The color indicates the bubble probability and blue contour the detected bubble boundary.

Figure 2

Figure 2: Example of a classifying forest scene to tree trunks and background. The figures from left are: the forest image; CART (Classification and Regression Tree); k-Nearest Neighbor classifier with k chosen by leave-one-out cross-validaton; Committee of early-stopped MLP neural networks; Bayesian MLP; Bayesian MLP with ARD prior.

Bayesian Model Assessment and Selection Using Expected Utilities

Researchers: Aki Vehtari, Jouko Lampinen

The goal of the project is to study theoretically justified and computationally practical methods for the Bayesian model assessment, comparison, and selection. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities, that is, the relative values of consequences. We synthesize and extend the previous work in several ways. We give a unified presentation from the Bayesian viewpoint emphasizing the assumptions made and propose practical methods to obtain the distributions of the expected utility estimates.

The reliability of the estimated expected utility can be assessed by estimating its distribution. The distributions of the expected utilities can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. The expected utilities take into account how the model predictions are going to be used and thus may reveal that even the best model selected may be inadequate or not practically better than the previously used models.

The developed methods have already been applied with great success in model assessment and selection in real world concrete quality modeling case in cooperation with Lohja Rudus. By using the models and conclusions based on them made by the concrete expert it is, e.g., possible to achieve 5-15% savings in material costs in concrete factory.

Below is another example from a case project where one subgoal was a classification of the forest scene image pixels to tree and non-tree classes. The main problem in the task was the large variance in the classes. The appearance of the tree trunks varies in color and texture due to varying lighting conditions, epiphytes, and species dependent variations.In the non-tree class the diversity is much larger, containing,for example, terrain, tree branches, and sky. This diversity makes it difficult to choose the optimal features for the classification. Figure 3 shows comparison of the expected classification accuracies for two models. Model 1 uses 84 texture and statistical features extracted from images. Model 2 uses only 18 features, selected using the methods developed in the project, from the set of all 84 features used by Model 1. Although Model 2 is simpler, it has practically same expected accuracy as the Model 1.

Figure 3

Figure 3: An example of Bayesian model assessment and selection using expected utilities in forest scene classification problem. The figure shows the distributions of the expected classification accuracies for two different models classifying image pixels to tree trunks and background. Distribution describes how likely different values for the expected utility are.

Probability Density Model for the Self-Organizing Map

Researchers: Jouko Lampinen, Timo Kostiainen

The Self-Organizing Map, SOM, is a very popular tool in exploratory data analysis. It is often used for the visualization of high-dimensional data. A theoretical and practical challenge in the SOM has been the difficulty to treat the method as a statistical model fitting procedure. This has greatly undermined the reliability of the results of data analysis and thus lead to a lot of time-consuming work in validating the results by other means.

In earlier attempts to associate a probability density model with the SOM, the SOM model has been modified. In this work we have derived the probability density model for which the unchanged SOM training algorithm gives the maximum likelihood estimate. The density model allows the application of model selection techniques to choose the parameters of the SOM to ensure as good generalization to the data as possible. Quantitative analysis of dependencies between data variables can also be carried out by calculating conditional distributions from the density model.

Figure 4

Figure 4: Left: Training data points and self-organizing map which consists of 2 × 4 units. Right: Probability density model associated with the SOM. The density model attempts to describe the distribution of the training data. The density function is not continuous due to the winner-take-all training rule of the SOM.

Learning Scene and Object Analysis

Researchers: Jouko Lampinen, Timo Kostiainen, Ilkka Kalliomäki, Toni Tamminen, and Aki Vehtari

The project is funded by TEKES and participating enterprises in the the USIX technology programme. The project started in June 2000 and is scheduled for three years.

The goal of the project is to develop an object recognition and scene analysis system that can locate and recognize the objects in the scene and analyze the 3D structure of the objects and the scene. The approach we study is based on combining elements from view-based and model based methods using full Bayesian inference. The objects are defined by prior models that are learned from example images. The 3D representation is based on eigen shapes, so that a linear combination of the base shapes produces the perceived shape. View based approach is used in matching the perceived image and the image due to the assumed 3D structure.

We have developed a distortion tolerant feature matching method based on subspace of Gabor filter responses. The object is defined as a set of locations, with associated Gabor-features, and a prior model that defines the variations of the feature locations. The eigen-shape model corresponds to determining the covariance matrix for the feature locations, which is learned in bootstrap fashion by matchihg a large number of images by simpler prior models.

We have constructed efficient MCMC samplers for drawing samples from the posterior distributions of the matching locations, usig mainly Gibbs and Metropolis sampling.

Figure 5 shows example of feature location matching. Figure 6 shows example of inferring the 3D shape of a human head from one image, without any manual assistance in the process.

Figure 5a Figure 5b Figure 5c

Figure 5: Example of detail matching. Local features in the grid points in the left figure are matched to the image in the middle figure, using Gibbs sampling. In the nodes with yellow ring the probability of finding a matching detail is low. The right figure shows samples from the posterior distribution of the grid node locations, giving an idea of the accuracy of the detail matching.

Figure 6a Figure 6b Figure 6c

Figure 6: Example of inferring the 3D shape of a human head. The right figure shows the perceived image. The middle figure shows the posterior median of the locations of the feature grid nodes over the image. The right image shows a standard head shape morphed according to the distortion of the feature grid to match the estimted shape of the perceived face.

The goal is to recognize 3-D objects in a 2-D scene by inferring the 3-D shape and the texture of the object. There exists no unambiguos solution to the inference problem without additional constraints on the solution space. The external or prior information is represented using models for known objects or classes of objects, represented in terms of prior probabilities for different configurations of the models, and learnt from example images. Figure 7 shows an example of shape model for human faces. The shape is represented as linear combination of eigen shapes, learnt from a set of manually cropped training images of human faces.

Figure 7

Figure 7: Leading eigen-shapes of faces, learnt from a set of training images. The face on the left has been morphed according to the eigen shapes, into positive direction (upper row) and negative direction (lower row). It can be seen that components 2 and 3 are related to rotations of the head, while components 1, 4, and 5 are shape-related.

Figure 8

Figure 8: Example of estimation of shape, texture and illumination, and synthesis of a novel view of the object. a) Target image. b) Estimated object shape. c) Illumination-corrected estimate of the texture of the object. d) Estimate of object shape and the texture. e) Image with novel view and illumination.

Prediction of Steel Jominy Curves

Researchers: Jouko Lampinen, Paula Litkey, Laboratory of Computational Engineering, HUT
Jukka Laine, Laboratory of Metallurgy, HUT

The project is done in co-operation with the Laboratory of Metallurgy, within a TEKES project that studies the possibilities of using neural networks in steel manufacturing and casting. The project is an example of application oriented modelling task in industrial environment, but it also requires methodological research related to assessment of the reliability of the models and scaling of the modelling methods in real world problems.

The steel jominy-curve represents the hardeneability of steel. It is important that a steel plant can provide a reliable measure of the steel hardeneability to a client who is manufacturing demanding products, e.g. safety critical parts for automobiles.

Normally, the jominy curve is defined by quenching a steel sample and by measuring the hardness from several points along the specimen, which is an expensive test. With a reliable model for jominy curves the physical tests, c.a. 1500 pc/a, can be avoided and substantial savings can be achieved. The model can also be used to control the alloying of the steel during manufacturing and in the development of new steel grades. An ideal model should be applicable to a wide range of steel grades with different analysis and warn the user if the chemical composition is out of range of the model.

The goal of the project is to develop usable neural network models for the steel industry using large sets of real life process data. In addition to getting fine models we are examining means to assure the easy updating of the model with new data.

There are interactions of different chemical elements that are nonlinear and this makes it impossible to determine the complexity of the data only from the variation of the input values. The use of neural networks makes it relatively easy to model the relationship of the chemical composition and the shape of the measured jominy curve, which is quite difficult task for e.g. a regression model.

Figure 9

Figure 9: Example of jominy curve estimates from a neural network model. The shown curves are selected randomly from test set used to evaluate the model performance.

Optimizing the Web Cache

Researchers: Timo Koskela, Jukka Heikkonen, and Kimmo Kaski

Recently we studied the quality of service (QoS) of Internet services from the user's point of view. Currently the WWW is the most important service for the end users, generating most of the traffic volume in the Internet. Web caching is a technique where Web objects requested by clients are stored in a cache which is located near the clients. Subsequent requests for the same object are then served from the cache, improving the response time for the end users, reducing the overall network traffic, and reducing the load on the server.

Figure 10

Figure 10: Proxy cache fetches and stores the objects requested by the clients.

Figure 10. shows how the requests from the clients are routed through the proxy, which fetches the objects and stores them to cache. Since the cache's storage is limited, an important problem in optimizing cache's operation is to decide which strategy to use in replacing of cache objects. Commonly heuristic rules are used to decide which objects to replace. Our proposed model predicts the popularity of each object by using syntactic features collected from the HTTP responses and from the HTML structure of the document. Cache's operation can then be optimized by using the predicted object popularities.

In a case study, about 50000 HTML documents were classified according to their popularity by using linear and nonlinear models. Results showed that linear model could not find correlation between the features and document popularity. Nonlinear model gave better results, yielding mean classification percentages of 64 and 74 for the documents to be stored or to be removed from the Web cache, respectively.

Replacement strategy and also prefetch- and refresh strategies of the cache can be devised to use the predicted object popularities. For instance, a replacement strategy can replace first the largest objects which have the lowest popularity. This will free space for several smaller objects, increase the hit rate of the cache and decrease the outbound traffic volume. Similarly, a refresh strategy can check the validity of only the most popular objects, and fetch in advance those objects that were changed. This will minimize the traffic generated by the validation requests and improve the response time for the end users.

The On-line Adaptive Brain-Computer Interface

Researchers: Mikko Sams, Jukka Heikkonen, Tommi Nykopp, Janne Lehtonen, Laura Laitinen and Mikko Viinikainen

Brain Computer Interfaces (BCIs) are intended for enabling both the severely motor disabled as well as the healthy people to operate electrical devices and applications using brain signals. Our approach bases on an artificial neural network that recognizes and classifies different brain activation patterns associated with carefully selected mental tasks. By this means we pursuit to develop a robust classifier with short classification time and, most importantly, a low rate of false positives (i.e. wrong classifications). Figure 11 demonstrates a BCI in use.

Our group is especially interested in the neurophysiological basis of BCIs. We believe that before the signals can be classified they need to be fully understood. We study the signals, e.g., using time frequency representations (TFRs) and pick out important features from them. Figure 12 shows an example of a TFR. We are especially interested in the activation of the motor cortex. The user controls the BCI by either moving his hands or by imagining doing so.

Currently we are developing a BCI that measures the signals produced in the brain with magnetoencephalography (MEG). Most BCI groups measure the electric activity of the brain using electroencephalography (EEG). MEG signals are more localised than EEG signals and thus more easily classified. In the near future will extend our research on BCIs to simultaneous EEG and MEG recordings.

Project has gained some media coverage. In 2002 we appeared twice on television: 5th May in YLE Teema's news and 3rd December in FST's current affairs program OBS.

The project is part of Academy of Finland's Research Programme on Proactive Computing from November 2002 till end of the year 2005.

Figure 11

Figure 11: The user has a EEG cap on. By thinking about left and right hand movement the user controls the virtual keyboard with her brain activity.

Figure 12

Figure 12: TFR of a MEG sensor over the motor cortex. The activation of the brain is plotted with the time information on the x-axis and the frequency information on the y-axis. The colour scale represents the power of the activation. Subject began to move his right finger at zero. Strong activation in the 10-30 Hz range can be detected after the movement has ended.