Computational Neuroscience

Research Group at Laboratory of Computational Engineering

 

Adaptive control inspired by the cerebellar system

Introduction

Despite the inherent delays of the sensorimotor system and inertia of the controlled limbs, human-beings and animals are able to produce fluent, coordinated and accurately timed motor movements. Wide spectrum of lesion, recording and imaging studies prove that the cerebellum participates in timing, learning and coordination of motor control (Ivry, 1996; Thach, 1998), adjusting the body posture (Ioffe et al., 2006), and even non-motor functions(Allen et al., 1997; Ivry, 1996)---like attention and perception. Presumably, the cerebellum collaborates with the higher motor areas like primary motor cortex. It is known that the cerebellum can also act straight to the muscles through different deep nuclei.

Eyelid conditioning (Grethe and Thompson, 2003; Kim and Thompson, 1997), vestibulo-ocular reflex (VOR) (Anastasio, 2003; Belton and McCrea, 2000; Blazquez et al., 2003) and ocular following response (OKR) (Kawato and Gomi, 1992) are types of motor control dependent on cerebellar information processing and plasticity. The main result revealed by these studies is that the cerebellar system is able to calculate accurately timed motor PREDICTIONS using the input information representing the state of the body and environment mediated by so-called mossy fibers and an error signal mediated from the climbing fibers.(See Joensuu, 2006, Section 2.1 for review.)

In this project I have studied adaptive control inspired by the cerebellar system. I have designed an adaptive controller with similar properties to the cerebellum. The model is a top-down model of the cerebellar system meaning that the main aim of it is not to mimic the neuroanatomical structure of the cerebellum or the exact neurophysiological properties of its neurons in as much detail as possible. Rather, the aim of the model is to catch the properties of the cerebellar system found in many psychophysical experiments like in classical eyelid conditioning (Grethe and Thompson, 2003; Medina et al., 2000b; Ohyama et al., 2003) and vestibulo-ocular reflex/response (VOR) (Broussard and Kassardjian, 2004).

The adaptive model is applied to a simulated pole-balancing problem to study what kind of calculations the cerebellum, and also any artificial motor control system, might be necessary to accomplish to control a one-joint limb. Also, the purpose of implementing the model and applying it to motor control problems is to reveal what kind of sensory input information would be needed in motor control and how it would be used when applied to realistic motor control problems.

The study is tightly connected to the current focus of the research group to study sensory processing of an autonomous agent because, in an autonomous agent, sensory processing and motor control are tightly connected. Firstly, the motor control task is some meaningful action or the aim the system tries to achieve by using sensory information. On the other hand, the motor control task, then, defines the meaning of the sensory inputs. The other way around, the motor control system can be used to evaluate the sensory representations that the autonomous has learnt.

The model

The overall structure of the model is illustrated in the schematic block diagram in Figure 1.

Image

Fig. 1. Schematic block diagram of the model. e: error or reflex, y: motor command, d1: delay from plant to the teacher, d2: efferent delay from the controller to the plant, d3: efference-copy delay and d4: feedback delay from the plant and environment to the controller, I1: reference signal, I2: efference-copy signal, I3: sensory feedback signal representing the state of the plant, I4: sensory feedback signal representing the state of environment.

The model is composed of four main blocks: plant, learning system, environment and teacher.

The PLANT is an object controlled by the adaptive controller. ENVIRONMENT, for one, is the physical world where the plant is located. In the case of the pole-balancing problem the simulated plant and environment are illustrated in Figure 2.

Image

Fig. 2. The plant and the environment.

The simulation begins when the pole is placed to the middle of the arena. The task of the controller is to keep the pole balanced after an initial displacement or "push" and bring it back to the middle of the arena. The plant and environment are implemented using Webots robot simulation software (http://www.cyberbotics.com).

The LEARNING SYSTEM is the adaptive part of the model; it is the actual adaptive controller. The main component of the adaptive controller is a single output neuron, similar to Figure 3.

Image

Fig. 3. Simplified model of a neuron labeled k. x_j: input nodes, w_kj: synaptic weight of the neuron k, v_k: total sum of the weighted inputs.

The adaptive controller in Figure 3 receives three types of input: reference signals, sensory inputs and efference-copy signals. Sensory inputs I3 and I4 represent the state of the plant and state of environment (Figure 2), respectively. They provide the adaptive controller with contextual information. The signal I2 is efference copy about past actions or outputs of the adaptive controller. Finally, the signal I1 is the reference signal. It is always zero in the followin simulations. These signals are transformed into such input node activation that the model is able to use in the control/learning (more details about the input coding in the study of Joensuu (2006)).

The output y (Figure 4) of the learning system, for one, is a motor command that is further sent back to the plant. It is a function of total sum of the weighted inputs v_k.

Image

Fig. 4. The output force of the model as a function of total sum v_k. The maximum output is 15 and -15 to the left and to the right, respectively.

All muscles and servos have a range of maximum output. This is also the reason why the output of the model is restricted to values -15 15 (positive values: to the left and vice versa).

The TEACHER provides the adaptive controller with an on-line teaching signal or error signal e called reflex. It is a rough correcting signal already expressed in motor coordinates and it is used to adjust the weights w_j of the model neuron in Figure 3. The reflex is actually more like a desired direction for the adaptive controller than explicit desired output value. This means that the reflex signal alone is not able to control the pole properly. However, it is a good enough teaching signal for the model. The calculation of the reflex signal in the teacher block can be illustrated using the flow diagram in Figure 5.

Image

Fig. 5. The block diagram of the reflex used in the control problem. POS (d): (desired) position, S (d): (desired) speed, (d): (desired) angle and f : reflex force applied to the wheels of the robot.

The teacher block is a simple feedback controller where the desired value (position of the pole, POS d) is compared to the actual position. The position POS defines the desired speed S of the robot. If the speed is not high enough the angle of tilt must be modified. Finally, the angle is modified by adjusting the speed of the wheels of the robot using the force f. For example, thinki about a situation when the robot has drifted to the right from the desired location in the middle and is tilted little bit to the right. Then the reflex force makes the robot to accelerate even more to the right to make the robot first tilt to the left. When the robot is tilted to the left the reflex makes the robot to brake and to accelerate to the left towards the desired location. On the way to the desired location the speed of the wheels is adjusted by controlling the angle of the robot that makes the robot to accelerate and decelerate.

The predictive property of the model is implemented using so called eligibility traces (Figure 6). If learning is based on past/delayed inputs or eligibility traces but the actual output of the neuron for the same input is calculated in real time it implicitly means that the adaptive system learns to predict.

Image

Fig. 6. Eligibility traces of one (deg_y = 1 blue line), two (deg_y = 2, green line) or three (deg_y = 3, red line) cascaded leaky integrators.

Different kinds of eligibility traces are calculated using one (deg_y=1) or more (deg_y>1) cascaded leaky integrators.

Simulations

Next I will illustrate the learning process of the model using some videos of the simulations.

Video 1 illustrates the performance of the strong reflex.

Video 1.

As you can see from Video 1 the reflex force, calculated by the teacher, alone is too rough and "slow" to keep the pole balanced. Because of inertia of the controlled pole, the controller must be predictive. It must, for example, start slowing down in good time before it comes to the desired location in the middle of the arena. Otherwise it will drift over it.

In Video 2 the pole is controlled by the adaptive controller. In the video the controller has already been taught for several runs. In the beginning of the learning process the performance of the controller was poor but in the video it has learnt how to control the pole. The maximum output force of the reflex is set to a small value compared to the maximum output value of the adaptive controller.

Video 2.

In Video 2 the pole is also first displaced after which the controller must keep the pole upright positioned and bring it to the desired location to the middle of the arena.

In the next three videos the ability of the adaptive controller to use contextual information representing the state of environment is demonstrated. The task of the controller is to learn to anticipate the hit of a soccer ball that is thrown towards it. The controller has a simple "vision system" of two input neurons emulating a real vision system of biological organisms. There are two input neurons in the vision system of the model: one neuron for each possible direction from where the ball can approach the pole. The activation of each input neuron gets the bigger the closer to the pole the soccer ball is.

In Video 3 the ball is thrown against the pole for the first time.

Video 3.

First the signals from the "vision system" are meaningless for the controller and it does not react on them. However, the controller slowly learns that these signals are important and how to use them in control (Video 4).

Video 4.

Video 4 shows that the controller learns that the activation of the two input neurons from the "vision system" reliably predict a hefty hit and an error signal from the teacher. The model also learns how to utilize these signals to minimize the error and the displacement: it leans towards the ball. Next we decrease the mass of the ball close to zero. Because of that the hit of the ball is more ineffective than what the adaptive controller had learnt during previous runs. This is why the controller commads the pole to lean too much towards the ball and, for this reason, loses the balance.

Video 5.

Implementation

Here is the Matlab implementation of the model:

implementation.tar.bz2

The Gate software needed for communication between Matlab and Webots is available in the web page of Harm Aarts.

Download also the controllers, objects and world file used in the simulations by the Webots software:

controllers.tar.bz2

world.wbt

objects.tar.bz2

References

Allen, G., Buxton, R. B., Wong, E. C., and Courchesne, E. (1997). Attentional activation of the cerebellum independent of motor involvement. Science, 275:1940 1943.

Anastasio, T. J. (2003). Vestibulo-ocular reflex. In Arbib, M. A., editor, The Handbook of Brain Theory and Neural Networks, pages 1192---1196. The MIT Press, 2nd edition.

Belton, T. and McCrea, R. A. (2000). Role of the cerebellar flocculus region in can-cellation of the VOR during passive whole body rotation. Journal of Neurophysiology, 85:1599 1613.

Blazquez, P. M., Hirata, Y., Heiney, S. A., Green, A. M., and Highstein, S. M. (2003). Cerebellar signatures of vestibulo-ocular reflex motor learning. The Journal of Neuro-science, 30(23):9742 9751.

Broussard, D. M. and Kassardjian, C. D. (2004). Learning in a simple motor system. Learning & Memory, 11:127 136.

Grethe, J. S. and Thompson, R. F. (2003). Cerebellum and conditioning. In Arbib, M. A., editor, The Handbook of Brain Theory and Neural Networks, pages 187---190. The MIT Press, 2nd edition.

Ioffe, M. E., Ustinova, K. I., Chernikova, L. A., and Kulikov, M. A. (2006). Supervised learning of postural tasks in patients with poststroke hemiparesis, parkinson s disease or cerebellar ataxia. Experimental Brain Research, 168(3):384 394.

Ivry, R. B. (1996). The representation of temporal information in perception and motor control. Current Opinion in Neurobiology, 6(6):851 857.

(pdf) Joensuu, H. O. (2006). Adaptive control inspired by the cerebellar system. Master's Thesis, Helsinki University of Technology.

Kawato, M. and Gomi, H. (1992). The cerebellum and VOR/OKR learning models. Trends in Neuroscience, 15(11):445 453.

Kim, J. J. and Thompson, R. F. (1997). Cerebellar circuits and synaptic mechanisms involved in classical eyeblink conditioning. Trends in Neurosciences, 4(20):177 181.

Medina, J. F., Noresa, W. L., Ohyama, T., and Mauk, M. D. (2000b). Mechanisms of cerebellar learning suggested by eyelid conditioning. Current Opinion in Neurobiology, 10(6):717 724.

Ohyama, T., Nores, W. L., Murphy, M., and Mauk, M. D. (2003). What the cerebellum computes. Trends in Neurosciences, 26(4):222 227.

Thach, W. T. (1998). A role for the cerebellum in learning movement coordination. Neurobiology of Learning and Memory, 1(70):177 188.