- Bayesian methods for neural networks - FAQ
- MCMC Preprints (Bristol MCMC Research Group)
- D. MacKay, Bayesian Methods for Adaptive Models
- Radford M. Neal, Bayesian Learning for Neural Networks
- C. Bishop, Neural networks for pattern recognition, Oxford Press, 1995., Chapter 10.

- Introduction to Gaussian processes.
- Carl Rasmussen's Gaussian Process web site, where most of this material is taken from.
- MacKay's Gaussian processes web page

Carl Edward Rasmussen (1996). **Bayesian Regression using Gaussian Process priors**, presentation at the meeting of the American Statistical Association in Chicago, Aug. 4-8. Slides are available in postscript.

Carl Edward Rasmussen (1996). **Evaluation of Gaussian Processes and other Methods for Non-Linear Regression**, PhD thesis, graduate department of Computer Science, University of Toronto. postscript.

C. K. I. Williams (1996). **Computing with infinite networks**, in M. C. Mozer and M. I. Jordan and T. Petsche, eds., Advances in Neural Information Processing Systems 9, MIT Press, Cam
bridge, MA. postscript.

C. K. I. Williams (1998). **Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond**, in Learning and Inference in Graphical Models, M. I. Jordan ed.
, Kluwer Academic Press. Also available as technical report, NCRG/97/012, Aston University. postscript.

C. K. I. Williams and Carl Edward Rasmussen (1996). **Gaussian Processes for Regression**, in Touretsky, Mozer and Hasselmo, eds., Advance
s in Neural Information Processing Systems 8, MIT Press. postscript.

Huaiyu Zhu, C. K. I. Williams, Richard Rohwer and Michal Morciniec (1997). **Gaussian Regression and Optimal Finite Dimensional Linear Models**, technical report, NCRG/97/011, Aston University. postscript.

- MacKay, Introduction to Monte Carlo methods, A review paper to appear in the proceedings of an Erice summer school, ed. M.Jordan.
- Markov Chain Monte Carlo in Practice, W.R. Gilks, S. Richardson and D. J. Spiegelhalter, Chapman & Hall, London, 1996.

- Flexible Bayesian Models
- Netlab neural network toolbox
- BUGS Bayesian inference Using Gibbs Sampling
- MacKays's Bigback
- Mark Gibbs's Gaussian Process site and Tpros

- D. MacKay,
**Hyperparameters: optimize, or integrate out?**, alpha.ps.gz, abstract |**ps mirror, Canada**

To appear in Neural Computation under the title**Comparison of Approximate Methods for Handling Hyperparameters**. pred.ps.gz, abstract |**ps mirror, Canada** -
*David H. Wolpert and Charles E.M. Strauss*, What Bayes Has to Say About the Evidence Procedure

[postscript], [abstract]

- NCRG/98/002
**Regression with Input-dependent Noise: A Gaussian Process Treatment** - P. W. Goldberg and C. K. I. Williams and C. M. Bishop

In*Advances in Neural Information Processing Systems 10*. Editor: M. I. Jordan and M. J. Kearns and S. A. Solla. Lawrence Erlbaum.**Abstract:**Gaussian processes provide natural non-parametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior u ncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance. - NCRG/97/002
**Regression with Input-Dependent Noise: A Bayesian Treatment** - Christopher M. Bishop and Cazhaow S. Qazaz

In*Advances in Neural Information Processing Systems*. MIT Press.**Abstract:**In most treatments of the regression problem it is assumed that the distribution of target data can be described by a deterministic function of the inputs, together with additive Gaussian noise having constant variance. The use of maximum likelihood to train such models then corresponds to the minimization of a sum-of-squares error function. In many applications a more realistic model would allow the noise variance itself to depend on the input variables. However, the use o f maximum likelihood to train such models would give highly biased results. In this paper we show how a Bayesian treatment can allow for an input-dependent variance while overcoming the bias of maximum likelihood.

URL: http://www.lce.hut.fi/teaching/S-114.202/references.html