Modelling of Learning and Perception

Centre of Excellence in Computational Complex Systems Research


Learning Scene and Object Analysis

Researchers: Jouko Lampinen, Timo Kostiainen, Ilkka Kalliomäki, Toni Tamminen, and Aki Vehtari

The project is funded by TEKES and participating enterprises in the the USIX technology programme. The project started in June 2000 and is scheduled for three years.

The goal of the project is to develop an object recognition and scene analysis system that can locate and recognize the objects in the scene and analyze the 3D structure of the objects and the scene. The approach we study is based on combining elements from view-based and model based methods using full Bayesian inference. The objects are defined by prior models that are learned from example images. The 3D representation is based on eigen shapes, so that a linear combination of the base shapes produces the perceived shape. View based approach is used in matching the perceived image and the image due to the assumed 3D structure.

Feature Matching

We have developed a distortion tolerant feature matching method based on subspace of Gabor filter responses. The object is defined as a set of locations, with associated Gabor-features, and a prior model that defines the variations of the feature locations. The eigen-shape model corresponds to determining the covariance matrix for the feature locations, which is learned in bootstrap fashion by matchihg a large number of images by simpler prior models.

We have constructed efficient MCMC samplers for drawing samples from the posterior distributions of the matching locations, usig mainly Gibbs and Metropolis sampling.

Figure 1 shows example of feature location matching. Figure 2 shows example of inferring the 3D shape of a human head from one image, without any manual assistance in the process.

Figure 1a
Figure 1b
Figure 1c

Figure 1. Example of detail matching. Local features in the grid points in the left figure are matched to the image in the middle figure, using Gibbs sampling. In the nodes with yellow ring the probability of finding a matching detail is low. The right figure shows samples from the posterior distribution of the grid node locations, giving an idea of the accuracy of the detail matching.

Figure 2a
Figure 2b
Figure 2c

Figure 2. Example of inferring the 3D shape of a human head. The right figure shows the perceived image. The middle figure shows the posterior median of the locations of the feature grid nodes over the image. The right image shows a standard head shape morphed according to the distortion of the feature grid to match the estimted shape of the perceived face.

3D Object Recognition

The goal is to recognize 3-D objects in a 2-D scene by inferring the 3-D shape and the texture of the object. There exists no unambiguos solution to the inference problem without additional constraints on the solution space. The external or prior information is represented using models for known objects or classes of objects, represented in terms of prior probabilities for different configurations of the models, and learnt from example images. Figure 3 shows an example of shape model for human faces. The shape is represented as linear combination of eigen shapes, learnt from a set of manually cropped training images of human faces.

Figure 3

Figure 3. Leading eigen-shapes of faces, learnt from a set of training images. The face on the left has been morphed according to the eigen shapes, into positive direction (upper row) and negative direction (lower row). It can be seen that components 2 and 3 are related to rotations of the head, while components 1, 4, and 5 are shape-related.

Figure 4

Figure 4. Example of estimation of shape, texture and illumination, and synthesis of a novel view of the object. a) Target image. b) Estimated object shape. c) Illumination-corrected estimate of the texture of the object. d) Estimate of object shape and the texture. e) Image with novel view and illumination.

See also