Bayesian Statistical Methods

Centre of Excellence in Computational Complex Systems Research

 

MLP network in 2-class Classification problem, 'demo_2class'

In the demonstration program an MLP network is used with a Bayesian learning for classification problem of 2 classes.

The demonstration program is based on synthetic two class data used by B. D. Ripley (http://www.stats.ox.ac.uk/pub/PRNN/). The data consists of 2-dimensional vectors that are divided into to classes, labeled 0 or 1. Each class has a bimodal distribution generated from equal mixtures of Gaussian distributions with identical covariance matrices.

There are two data sets from which other is used for network training and the other for testing the prediction. The trained network misclassifies the test data only in ~10% of cases. The decision line and the training data is shown in the picture below.

The code of demonstration program is shown below.

	      function demo_2class
	      
	      % Load the data
	      x=load('demos/synth.tr');
	      y=x(:,end);
	      x(:,end)=[];

	      nin=size(x,2);
	      nhid=10;   
	      nout=size(y,2);
	      % create MLP with logistic output function ('mlp2b')
	      net = mlp2('mlp2b', nin, nhid, nout);
	      
	      %Create a Gaussian multivariate hierarchical prior with ARD 
	      % for network...
	      net=mlp2normp(net, {{repmat(10,1,net.nin) 0.05 0.5 -0.05 1}... % input-hidden weigth
	                          {10 0.05 0.5} ...                          % bias-hidden
	                          {10 -0.05 0.5} ...                         % hidden-output weigth
	                          {1}})                                      % bias-output

	      
	      % Intialize weights to zero and set the optimization parameters...
	      w=randn(size(mlp2pak(net)))*0.01;
	      
	      fe=str2fun('mlp2b_e');
	      fg=str2fun('mlp2b_g');
	      n=length(y);
	      itr=1:floor(0.5*n);     % training set of data for early stop
	      its=floor(0.5*n)+1:n;   % test set of data for early stop
	      optes=scges_opt;
	      optes.display=1;
	      optes.tolfun=1e-1;
	      optes.tolx=1e-1;
	      
	      % ... Start scaled conjugate gradient optimization with early stopping.
	      [w,fs,vs]=scges(fe, w, optes, fg, net, x(itr,:),y(itr,:), net,x(its,:),y(its,:));
	      net=mlp2unpak(net,w);

              % Set the starting values for weights hyperparameter to be the
              % variance of the early stopped weight.
              shypw1 = std(net.w1,0,2)';
              shypb1 = std(net.b1);
              shypw2 = std(net.w2(:));

              net=mlp2normp(net, {{shypw1.^2  0.5 0.05 -0.05 1}...               % input-hidden weigth
                                  {shypb1.^2  0.5 0.05} ...                      % bias-hidden
                                  {shypw2.^2 -0.5 0.05} ...                      % hidden-output weigth
                                  {1}})                                          % bias-output
	      
	      % First we initialize random seed for Monte 
	      % Carlo sampling and set the sampling options to default.
	      hmc2('state', sum(100*clock));
	      opt=mlp2b_mcopt;
	      opt.sample_inputs=0; % do not use RJMCMC for input variables
	      
	      %  Here we do the sampling without persistence.
	      opt.repeat=70;           
	      opt.hmc_opt.steps=10;    
	      opt.hmc_opt.stepadj=0.2; 
	      opt.gibbs=1;
	      opt.nsamples=1;
	      [r,net,rstate]=mlp2b_mc(opt, net, x, y);
	      
	      % Now that the starting values are found we start the main sampling.
	      opt.hmc_opt.stepadj=0.5; 
	      opt.hmc_opt.persistence=1;
	      opt.hmc_opt.steps=40;    
	      opt.hmc_opt.decay=0.95;  
	      opt.repeat=50;           
	      opt.nsamples=600;        
	      opt.hmc_opt.window=5;     
	      
	      [r,net,rstate]=mlp2b_mc(opt, net, x, y, [], [], r, rstate);
	      
	      % Thin the sample chain.
	      r=thin(r,200,6);          

	      % Draw the decision line and training points in the same plot
	      [p1,p2]=meshgrid(-1.3:0.05:1.1,-1.3:0.05:1.1);
	      p=[p1(:) p2(:)];
	      tms=mean((logsig(mlp2fwds(r,p))),3);
	      
	      %Plot the decision line
	      gp=zeros(size(p1));
	      gp(:)=tms;
	      contour(p1,p2,gp,[0.5 0.5],'k');
	      
	      hold on;
	      % plot the train data o=0, x=1
	      plot(x(y==0,1),x(y==0,2),'o');
	      plot(x(y==1,1),x(y==1,2),'x');
	      hold off;
	      
	      % Lets test how well the network works for test data. 
	      tx=load('demos/synth.ts');
	      ty=tx(:,end);
	      tx(:,end)=[];
	      
	      tga=mean(logsig(mlp2fwds(r,tx)),3);
	      
	      % lets calculate the percentage of misclassified points
	      missed = sum(abs(round(tga)-ty))/size(ty,1)*100;
	      
Figure 1

Figure 1. The training points and decision line.