Bayesian Statistical Methods

Centre of Excellence in Computational Complex Systems Research

 

MLP network in 2-class Classification problem, 'demo_2classgp'

In the demonstration program a Gaussian process is used with a Bayesian learning for classification problem of 2 classes.

The demonstration program is based on synthetic two class data used by B. D. Ripley (http://www.stats.ox.ac.uk/pub/PRNN/). The data consists of 2-dimensional vectors that are divided into to classes, labeled 0 or 1. Each class has a bimodal distribution generated from equal mixtures of Gaussian distributions with identical covariance matrices.

There are two data sets from which other is used for training and the other for testing the prediction. The trained Gaussian process misclassifies the test data only in ~10% of cases. The decision line and the training data is shown in the picture below.

The code of demonstration program is shown below.

                  % Load the data
                  x=load('demos/synth.tr');
                  y=x(:,end);
                  x(:,end)=[];

                  [n, nin] = size(x);
                  gp=gp2('gp2b',nin,'exp', 'jitter');
                  gp.jitterSigmas=10;
                  gp.expScale=0.2;
                  gp.expSigmas=repmat(0.1,1,gp.nin);
                  gp.noiseSigmas = 0.2;
                  gp.f='norm';

                  gp.p.expSigmas=invgam_p({0.1 0.5 0.05 1});
                  gp.p.expScale=invgam_p({0.05 0.5});
                  
		  % Intialize weights to zero and set the optimization parameters...
		  w=randn(size(gp2pak(gp)))*0.01;
		  
		  fe=str2fun('gp2r_e');
		  fg=str2fun('gp2r_g');
		  n=length(y);
		  itr=1:floor(0.5*n);     % training set of data for early stop
		  its=floor(0.5*n)+1:n;   % test set of data for early stop
		  optes=scges_opt;
		  optes.display=1;
		  optes.tolfun=1e-1;
		  optes.tolx=1e-1;
		  
		  % ... Start scaled conjugate gradient optimization with early stopping.
		  [w,fs,vs]=scges(fe, w, optes, fg, gp, x(itr,:),y(itr,:), gp,x(its,:),y(its,:));
		  gp=gp2unpak(gp,w);
		  
		  % set the options for sampling
		  opt=gp2_mcopt;
		  opt.repeat=20;
		  opt.nsamples=1;
		  opt.hmc_opt.steps=20;
		  opt.hmc_opt.stepadj=0.1;
		  opt.hmc_opt.nsamples=1;
		  
		  opt.sample_latent = 20;
		  opt.sample_latent_scale = 0.5;
		  gp.latentValues = randn(size(y));
		  hmc2('state', sum(100*clock));
		  
		  [r1,g1,rstate1]=gp2b_mc(opt, gp, x, y);
		  
		  opt.repeat=10;
		  opt.nsamples = 1000;
		  opt.hmc_opt.persistence=1;
		  opt.sample_variances=0;
		  opt.hmc_opt.window=5;
		  opt.hmc_opt.stepadj=0.75;
		  opt.hmc_opt.steps=10;
		  
		  [r,g,rstate]=gp2b_mc(opt, g1, x, y, [], [], r1, rstate1);
		  
		  % Thin the sample chain.
		  r=thin(r,50,8);
		  
		  % Draw the decision line and training points in the same plot
		  [p1,p2]=meshgrid(-1.3:0.05:1.1,-1.3:0.05:1.1);
		  p=[p1(:) p2(:)];
		  
		  tms2=mean((logsig(gp2fwds(r, x, r.latentValues', p))),3);
		  
		  %Plot the decision line
		  gp=zeros(size(p1));
		  gp(:)=tms2;
		  contour(p1,p2,gp,[0.5 0.5],'k');
		  
		  hold on;
		  % plot the train data o=0, x=1
		  plot(x(y==0,1),x(y==0,2),'o');
		  plot(x(y==1,1),x(y==1,2),'x');
		  hold off;
		  
		  % Lets test how well the network works for test data. 
		  tx=load('demos/synth.ts');
		  ty=tx(:,end);
		  tx(:,end)=[];
		  
		  tga=mean(logsig(gp2fwds(r, x, r.latentValues', tx)),3);
		  
		  % lets calculate the percentage of misclassified points
		  missed = sum(abs(round(tga)-ty))/size(ty,1)*100;
	      
Figure 1

Figure 1. The training points and decision line.