Bayesian Model Assessment and Selection Using Expected Utilities
Researchers: Aki Vehtari, Jouko Lampinen, Janne Ojanen
The goal of the project is to study theoretically justified and computationally practical methods for the Bayesian model assessment, comparison, and selection. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities, that is, the relative values of consequences. We synthesize and extend the previous work in several ways. We give a unified presentation from the Bayesian viewpoint emphasizing the assumptions made and propose practical methods to obtain the distributions of the expected utility estimates.
The reliability of the estimated expected utility can be assessed by estimating its distribution. The distributions of the expected utilities can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. The expected utilities take into account how the model predictions are going to be used and thus may reveal that even the best model selected may be inadequate or not practically better than the previously used models.
The developed methods have already been applied with great success in model assessment and selection in real world concrete quality modeling case in cooperation with Lohja Rudus. By using the models and conclusions based on them made by the concrete expert it is, e.g., possible to achieve 5-15 % savings in material costs in concrete factory.
Below is another example from a case project where one subgoal was a classification of the forest scene image pixels to tree and non-tree classes. The main problem in the task was the large variance in the classes. The appearance of the tree trunks varies in color and texture due to varying lighting conditions, epiphytes, and species dependent variations.In the non-tree class the diversity is much larger, containing,for example, terrain, tree branches, and sky. This diversity makes it difficult to choose the optimal features for the classification. Figure shows comparison of the expected classification accuracies for two models. Model 1 uses 84 texture and statistical features extracted from images. Model 2 uses only 18 features, selected using the methods developed in the project, from the set of all 84 features used by Model 1. Although Model 2 is simpler, it has practically same expected accuracy as the Model 1.
An example of Bayesian model assessment and selection using expected utilities in forest scene classification problem. The figure shows the distributions of the expected classification accuracies for two different models classifying image pixels to tree trunks and background. Distribution describes how likely different values for the expected utility are.
See also
Cross-validation vs. DIC using stack loss data
References
- Jarno Vanhatalo and Aki Vehtari (2009). Discussion to 'Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations' by Håvard Rue, Sara Martino and Nicolas Chopin. Journal of the Royal Statistical Society, Series B (Statistical Methodology)., 71(2):383 (Available online 6 April 2009)
- Aki Vehtari (2007). Discussion to `Some Aspects of Bayesian Model Selection for Prediction' by Chakrabarti, A., and Ghosh, J. K.. In J. M. Bernardo, et al., editors, Bayesian Statistics 8, p. 83-84. Oxford University Press.
-
Simo Särkkä, Aki Vehtari, and Jouko Lampinen (2004). Time series
prediction by Kalman smoother with cross-validated noise
density. In IJCNN'2004: Proceedings of
the 2004 International Joint Conference on Neural
Networks, Budabest, July 2004.
The Winner of
Time Series Prediction Competition - The CATS Benchmark
(PDF)
- Vehtari, A. and Lampinen, J. (2003). Expected utility estimation
via cross-validation. In J. M. Bernardo, et al., editors,
Bayesian Statistics 7, pp. 701-710. Oxford University
Press. (PostScript)
(PDF)
- Vehtari, A. and Lampinen, J. (2002). Bayesian model assessment and
comparison using cross-validation predictive densities. Neural
Computation, 14(10):2439-2468. (PostScript)
(PDF)
- Vehtari, A. (2002). Discussion to `Bayesian measures of model complexity and fit' by Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 64(4):620. (PostScript) (PDF)

