Machine Vision

Learning Scene and Object Analysis

Researchers: Jouko Lampinen, Timo Kostiainen, Ilkka Kalliomäki,
Toni Tamminen, and Aki Vehtari

The project is funded by TEKES and participating enterprises in the the USIX technology programme. The project started in June 2000 and is scheduled for three years.

The goal of the project is to develop an object recognition and scene analysis system that can locate and recognize the objects in the scene and analyze the 3D structure of the objects and the scene. The approach we study is based on combining elements from view-based and model based methods using full Bayesian inference. The objects are defined by prior models that are learned from example images. The 3D representation is based on eigen shapes, so that a linear combination of the base shapes produces the perceived shape. View based approach is used in matching the perceived image and the image due to the assumed 3D structure.

Feature Matching

We have developed a distortion tolerant feature matching method based on subspace of Gabor filter responses. The object is defined as a set of locations, with associated Gabor-features, and a prior model that defines the variations of the feature locations. The eigen-shape model corresponds to determining the covariance matrix for the feature locations, which is learned in bootstrap fashion by matchihg a large number of images by simpler prior models.

We have constructed efficient MCMC samplers for drawing samples from the posterior distributions of the matching locations, usig mainly Gibbs and Metropolis sampling.

Figure 7 shows example of feature location matching. Figure 8 shows example of inferring the 3D shape of a human head from one image, without any manual assistance in the process.

Figure 7a Figure 7b Figure 7c

Figure 7: Example of detail matching. Local features in the grid points in the left figure are matched to the image in the middle figure. In the nodes with yellow ring the probability of finding a matching detail is low. The right figure shows samples from the posterior distribution of the grid node locations, giving an idea of the accuracy of the detail matching.

Figure 8a Figure 8b Figure 8c

Figure 8: Example of inferring the 3D shape of a human head. The left figure shows the perceived image. The middle figure shows the posterior median of the locations of the feature grid nodes over the image. The right image shows the estimated 3D head.

3D Object Recognition

The goal is to recognize 3-D objects in a 2-D scene by inferring the 3-D shape and the texture of the object. There is never a single unambiguous solution to such a problem without additional constraints on the solution space. We use Bayesian inference to incorporate external information with information obtained from the 2-D image of the scene, such as edge locations, regions of constant texture and local features. Bayesian probabilistic inference makes it possible to combine the different types of information. The external or prior information is represented using models for known objects or classes of objects. The object models are defined in terms of prior probabilities for different configurations of the models. These probabilities can be learned from example objects or they can be defined manually. Markov Chain Monte Carlo techniques are used to infer the probability distribution of possible solutions.

Figure 9

Figure 9: Example of matching a wireframe model of a rectangular solid to an office scene using edge information in the image. MCMC methods yield a number of possible solutions, and a probability value can be computed for each of them. Left: Two most likely matches. Middle and right: possible yet unlikely matches.

Figure 10

Figure 10: Estimation of shape and texture. a) Target image. b) Estimated object shape. c) Illumination-corrected estimate of texture of the object. d) Estimate of object shape and the texture. e) New view and illumination.

Combining statistical and model based methods in computer vision

Researchers: Jukka Heikkonen and Jouni Juujärvi

The goal of the project is to develop efficient methods for combining statistical and model based methods in computer vision applications. The central theme of the project is in a given computer vision application to separate unchangeable part (e.g. classifier) from the possible changeable part (e.g. image data). The changeable part should then be processed in a best possible way to maintain the performance of the unchangeable part.

The research can be divided in the following two main themas:

1) Normalization of images in such a way that the already trained classifier can work correctly with the possible changed image data. For instance, in a computer vision based wood quality control the surface color of the wood board can change unexpectedly due to the drying process resulting errors in quality classification. The effect of this types of changes can be diminished via efficient image normalizations.

2) Combining different information sources. One interesting research issue related to this category is the combining of different classifiers (committee of classifiers). We also approach the problem of creating a new classifier from the already trained ones, nones of which are not exactly optimal to overcome the changes during the on-line operation of the system, whereas the new classifier to be created without additional training should perform better.

The funding for the project is provided by Tekes for the years 2001-2002.

Figure 11
Figure 11: Different types of image normalizations.

Multiple View Geometry in Computer Vision

Researchers: Sami Brandt and Jukka Heikkonen

Viewing geometry imposes constraints in images taken from an arbitrary object or scene. The underlying geometrical constraints can be represented by multiple view tensors called fundamental matrix, trifocal and quadrifocal tensor. In short, these tensors consists of all the projective information in the presence of two, three and four images, that is, the camera projection matrices can be reconstructed up to an projective transformation. Moreover, they can be directly used in computing projective reconstruction of given correspondences in images.

In estimating the geometrical constraints between multiple views robust methods are needed to deal with mismatches. We have proposed an robust estimator that is optimal in the sense of consistence with similar assumptions as the ordinary maximum likelihood estimator is. So far, we have applied the estimator in two view geometry and its uncertainty estimation with both affine and projective camera models, though the method could be used in any problem where robustness is needed. The obtained results are promising and well in agreement with the theoretical considerations.

Image matching is one problem where the geometric constraints may be used. We have hence proposed a wavelet-based method for obtaining reliable point matches for tracking purposes. The novelty is in understanding and utilizing the uncertainty of the fundamental matrix. Even though the fundamental matrix represents point-line correspondence between two views its covariance gives a probability distribution for point-point matches since, to some extent, it learns the disparity content of the scene (see Figure 12).

Figure 12a Figure 12b

Figure 12: a) A point in the left corner of the mouth is selected in the left image of a stereo image pair. b) The estimated epipolar line (dashed) and the probability distribution contours obtained form the covariance of the fundamental matrix. The probability distribution has its maximum at the most probable location of the correspondence. (Original image copyrights belong to INRIA-Syntim.)

Computer Vision for Electron Tomography

Researchers: Sami Brandt and Jukka Heikkonen

In structural biology, electron tomography is used in reconstructing three-dimensional objects such as macromolecules, viruses, and cellular organelles to learn their three-dimensional structures and properties. The reconstruction is made from a set of transmission electron microscope (TEM) images which may be obtained either by tilting the specimen stage by small angular increments (single axis tilting) or a fixed angle and rotating by constant increments in the specimen plane (conical tilting).

In order to successfully perform the 3D reconstruction in electron tomography, transmission electron microscope images have to be accurately aligned or registered. So far, the problem is solved either by manually showing the corresponding fiducial markers from the set of images or automatically using simple correlation between the images on several rotations and scales. The present solutions, however, share the problem of being inefficient and/or inaccurate.

We have therefore developed two methods where the registration is automated. Most accurate alignment can be achieved if conventional colloidal gold markers are used. In contrast to the manual picking, our method collects the gold beads automatically by using recent techniques of computer vision. For cases when it is not possible to sprinkle gold particles on the preparation, we have proposed an alternative method that is based on tracking the high curvature points of the intensity surface of the images. Results show almost as good performance as we have obtained by using fiducial markers (Figure 13).

Figure 13a Figure 13b

Figure 13: Stereo image pair of a reconstructed mitochondrion where the image series have been aligned by tracking interest points of the image intensity surface.

The Adaptive Brain Interfaces

Researchers: Fabio Babiloni, Jukka Heikkonen, Kimmo Kaski, Tommi Nykopp and Markus Varsta of Laboratory of Computational Engineering, Helsinki University of Technology, Finland, José Millan and Josep Mourino of the Joint Research Centre of the European Commission Ispra, Italy, Prof. Maria Marciani of the IRCCS Ospedale di Riabilitazione S. Lucia, Italy, and Fabio Topani of Fase Sistemi, Rome, Italy.

The objective of the Adaptive Brain Interfaces (ABI) project is to use EEG signals as an alternative means of interaction with computers. The ABI project seek to build individual brain interfaces rather than universal ones valid for everybody. Our approach is based on a mutual learning process whereby the individual user and the ABI are coupled and adapt to each other via a short training period. During training a neural network learns user-specific EEG patterns describing the mental tasks while subjects learn to think in such a way that they are better understood by their personal interface. In other words, every single user chooses his/her most natural mental tasks to concentrate on (e.g., relaxation, visualisation, music composition, arithmetic, preparation of movements) and also the preferred strategies to undertake those tasks. The interface learns the mental tasks the user is concentrating on by analysing variations of EEG rhythms over several cortical areas of the brain. By combining each mental task with suitable application specific command, e.g. move cursor left, the user is able to communicate with surrounding world. During the project, the ABI was tested with several different applications, such as selection of letters from a virtual keyboard on a computer screen and writing a message, and with the classical Pacman video game.

Another concern of this project was the robust recognition of EEG patterns outside laboratory settings. During the project an appropriate EEG equipment that is compact, easy-to-use, and suitable for deployment in natural environments was produced and tested.

The funding for the project is provided by EU for the years 1998-2001.

Figure 14

Figure 14: The measurement of the EEG signals in the ABI project.