# Learning Scene and Object Analysis

Researchers: Jouko Lampinen, Timo Kostiainen, Ilkka Kalliomäki, Toni Tamminen, and Aki Vehtari

The project is funded by TEKES and participating enterprises in the the USIX technology programme. The project started in June 2000 and is scheduled for three years.

The goal of the project is to develop an object recognition and scene analysis system that can locate and recognize the objects in the scene and analyze the 3D structure of the objects and the scene. The approach we study is based on combining elements from view-based and model based methods using full Bayesian inference. The objects are defined by prior models that are learned from example images. The 3D representation is based on eigen shapes, so that a linear combination of the base shapes produces the perceived shape. View based approach is used in matching the perceived image and the image due to the assumed 3D structure.

#### Feature Matching

We have developed a distortion tolerant feature matching method based on subspace of Gabor filter responses. The object is defined as a set of locations, with associated Gabor-features, and a prior model that defines the variations of the feature locations. The eigen-shape model corresponds to determining the covariance matrix for the feature locations, which is learned in bootstrap fashion by matchihg a large number of images by simpler prior models.

We have constructed efficient MCMC samplers for drawing samples from the posterior distributions of the matching locations, usig mainly Gibbs and Metropolis sampling.

Figure 1 shows example of feature location matching. Figure 2 shows example of inferring the 3D shape of a human head from one image, without any manual assistance in the process.

#### 3D Object Recognition

The goal is to recognize 3-D objects in a 2-D scene by inferring the 3-D shape and the texture of the object. There exists no unambiguos solution to the inference problem without additional constraints on the solution space. The external or prior information is represented using models for known objects or classes of objects, represented in terms of prior probabilities for different configurations of the models, and learnt from example images. Figure 3 shows an example of shape model for human faces. The shape is represented as linear combination of eigen shapes, learnt from a set of manually cropped training images of human faces.