MLP Network in Regression Problem with 2 Inputs, 'demo_2input'
The demonstration program solves a simple 2-dimensional regression problem using an MLP network with Bayesian lerning. The demonstration is discussed completely in the documentation. The data in the demonstration has been used in the work by Christopher J. Paciorek and Mark J. Schervish (2004). The training data has an unknown Gaussian noise and can be seen in the figure 1.
An MLP network with 10 hidden units and one output is created. The
weight and bias structure is hierarchical with three levels of hyper-parameters
and automatic relevance determination (ARD). The sampling for hyperpameters
is done with Gibbs sampling and for weights with Hybrid Monte Carlo
sampling. We sampled for 2500 samples from which 250 burn in samples
were thrown away. After this We still thinned the sample chain by taking
only every 138'th sample. This was done to ensure the independence of
all samples. The convergence, independence and explicitly of samples
was examined by Markov Chain
Monte Carlo diagnostics tools. Figure 2 shows the prediction from
the network.
The code needed to get the results is shown below and is treated more
carefully in the documentation. The noisy data and the prediction of
network is shown in the bottom.
function demo_2input
% Load the data.
data=load('dat.1');
x = [data(:,1) data(:,2)];
y = data(:,3);
% Draw the data. The data is not sorted.
figure
title({'The noisy training data'});
[xi,yi,zi]=griddata(data(:,1),data(:,2),data(:,3),-1.8:0.01:1.8,[-1.8:0.01:1.8]');
mesh(xi,yi,zi)
% Create an MLP network for regression model. function mlp
% initializes weights to zero.
nin=size(x,2);
nhid=10;
nout=1;
net = mlp2('mlp2r', nin, nhid, nout);
% Create a Gaussian multivariate hierarchical prior with ARD
% for network...
net=mlp2normp(net, {{repmat(0.1,1,net.nin) 0.05 0.5 -0.05 1} ...
{0.1 0.05 0.5} ...
{0.1 -0.05 0.5} ...
{1}});
% ...and the same for residual.
net.p.r = norm_p({0.05 0.05 0.5});
% Set the options for Monte Carlo sampling
opt=mlp2r_mcopt;
opt.repeat=50;
opt.plot=0;
opt.hmc_opt.steps=40;
opt.hmc_opt.stepadj=0.1;
hmc2('state', sum(100*clock));
% Initialize weights to zero and sample for the first time.
% The opt.gibbs = 0 for this round of MC-sampling, so the
% sampling is done only for weigths. This way we can reach the
% equilibrium distribution more quickly.
net = mlp2unpak(net, mlp2pak(net)*0);
[r1,net1]=mlp2r_mc(opt, net, x, y);
% Now sample also for hyper-priors with Gibbs sampling.
opt.hmc_opt.stepadj=0.2;
opt.hmc_opt.steps=60;
opt.repeat=70;
opt.gibbs=1;
[r2,net2]=mlp2r_mc(opt, net1, x, y, [], [], r1);
% Now we should be close to equilibrium already and we can start
% the sampling with windowing and persistence.
opt.hmc_opt.stepadj=0.3;
opt.hmc_opt.steps=100;
opt.hmc_opt.window=5;
opt.hmc_opt.persistence=1;
[r3,net3]=mlp2r_mc(opt, net2, x, y, [], [], r2);
% Sample for the posterior 2500 samples from which every second
% is taken after 41 burn-in samples (the function thin).
opt.repeat=60;
opt.hmc_opt.steps=100;
opt.hmc_opt.stepadj=0.5;
opt.nsamples=2500;
[r,net]=mlp2r_mc(opt, net3, x, y, [], [], r3);
mlp=thin(r,150,25);
% Create new data with the right scale checked above
figure
[p1,p2]=meshgrid(-1.8:0.05:1.8,-1.8:0.05:1.8);
p=[p1(:) p2(:)];
tms=squeeze(mlp2fwds(mlp,p))';
g=mean(tms);
%Plot the new data
gp=zeros(size(p1));
gp(:)=g;
mesh(p1,p2,gp);
axis on;
Figure 1. A training data with Gaussian noise.
Figure 2. The prediction from MLP network.

