Computational Systems Biology

Computational systems biology is a new and rapidly developing field of research with focus to understand structure and processes of biological systems at molecular, cellular, tissue and organ level, through computational modeling and novel information theoretic data- and image analysis methods. With the break-through in deciphering the human genome using the most up-to-date computational approaches and modern experimental biotechnology, it has become possible to understand the structure and functions of bio-molecules, information stored in DNA (bioinformatics), its expression to proteins, protein structures (proteomics), metabolic pathways and networks, intra- and inter-cell signaling, and the physico-chemical mechanisms involved in them (biophysics).

Using the computational information theoretic and modelling methodologies to experimental geno- and pheno-type data obtained with for example microarray techniques, gel-based techiques and mass-spectroscopy of proteins, molecular and cell imaging and microscopy etc. it is possible to understand the structure and function of biosystems. Generally speaking, Computational Systems Biology focuses either on information processing of biological data or on modeling physical and chemical processes of bio-systems. Through this type of quantitative systems approach Computational Systems Biology can play central role in predicting diseases and preventive medicine, in gene technology and pharmaceuticals, and in other biotechnology fields.

For these reasons the Computational Systems Biology has been added to the educational curriculum of the Laboratory of Computational Engineering. The aim is to train all-around bio-omputing experts for research, development, design, consulting, and services in public as well as private sectors.

Computer Vision for Electron Tomography

Researchers: Sami Brandt, Vibhor Kumar, Jukka Heikkonen, and Peter Engelhardt

In structural biology, electron tomography is used in reconstructing three-dimensional objects such as macromolecules, viruses, and cellular organelles to learn their three-dimensional structures and properties. The reconstruction is made from a set of transmission electron microscope (TEM) images which may be obtained by tilting the specimen stage by small angular increments (single axis tilting). In order to successfully perform the 3D reconstruction in electron tomography, transmission electron microscope images have to be accurately aligned or registered. The alignment problem can be posed as a motion estimation problem that can solved by using geometric computer vision methods.

Previously, we have developed two methods where the registration is automated. Most accurate alignment can be achieved if conventional colloidal gold markers are used. In contrast to the manual picking, our method collects the gold beads automatically by using recent techniques of computer vision. For cases when it is not possible to use gold particles, we have proposed an alternative method that is based on tracking high curvature points of the intensity surface of the images. Results show almost as good performance as we have obtained by using fiducial markers (Figure 17). The development of the alignment algorithms is still going on for better accuracy and to take computational aspects into consideration.

Figure 17

Figure 17: Stereo image pair of a reconstructed microvillus where the image series has been aligned by tracking certain interest points of the image intensity surface.

cryo-EM Single particle reconstruction

Researchers: Vibhor Kumar, Jukka Heikkonen, and Peter Engelhardt

Three dimensional model reconstruction of macromolecules from Cryo electron microscopic images is being seen as good alternative to study viruses and protein structures. In comparison to X-ray crystallography of protein cryo-EM study is not so expensive so this is being considered as good method for solving structure of proteins specially proteins which are hard to crystallize such as membrane proteins. It can also be used to validate the X-ray crystallographic structure of proteins. The 3D reconstruction is made from a large set of cryo electron microscope images of the specimen. In order to successfully perform the 3D reconstruction, right specimens should be picked from the cryo-EM micrographs and properly preprocessed and acurately aligned or registered. The particle picking problem can be approached with different techniques.

We have developed methods to pick the particle from the filtered micrograph. These methods work even in the presence of high noise to signal ratio in micrograph. In addition to this we are doing 3D reconstruction of macromolecules. We did 3D reconstruction of N protein of Hantavirus in order to study its structure and function. We are now finding new and efficient way to do high resolution 3d reconstruction. For this we have allready proposed filtering of cryo-Em images.and in order to find a good method for finding orientations of the specimen images and to reconstruct the 3d model we are looking some alternatives as maximum entropy method ,clustering and accurate averaging.

Figure 18

Figure 18: (a) shows the picking(yellow) of circular projections of KLH protein and subsequent refing of results by gabor filter (light blue) and histogram(red) based methods ; (b) A 3d model of Hantavirus N protein reconstructed from cryo-EM micrograph

Signal denoising with Minimum Description Length principle

Researchers: Jukka Heikkonen, Vibhor Kumar, Janne Ojanen, Jorma Rissanen

The need for high-throughput assays in molecular biology places increasing requirements on the applied signal processing and modelling methods. The effects of undesirable noise in the measured data must be somehow modelled and removed if we want to extract information about the studied data generating machinery. An efficient denoising method enables smaller details to be extracted reliably in high-throughput applications, where cost effectiveness demands minimization of the reaction volumes.

Denoising can be done in a quite elegant and efficient way by the Minimum Description Length (MDL) principle, which treats and separates noise from the useful information as that part in the data that cannot be compressed. This MDL denoising method requires no ad hoc parameters or knowledge of the noise characteristics. It provides a basis for high-speed automated processing systems without requiring continual user interventions to validate the results as in the conventional signal processing methods.

Our analysis of the denoising problem in 1-D signals such as mass spectrometry, capillary electrophoresis genotyping and DNA sequencing signals as well as in 2-D cryo-EM images shows that the MDL denoising method produces robust and intuitively appealing results sometimes even in situations where competing approaches perform poorly.

Figure 19

Figure 19: Left: The original cryo-EM image of PRD1 virus. Right: The result of MDL based denoising.

Gene regulatory networks

Researchers: Jukka Heikkonen, Vibhor Kumar, Aatu Kaapro

Gene regulatory networks govern which genes are expressed in a cell at any given time, how much product is made from each one, and the cell’s responses to diverse environmental cues and intracellular signals. A popular model of regulation is to represent networks of genes as if they directly affect each other. Such networks do not explicitly represent the proteins and metabolites that actually mediate cell interactions. Understanding, describing and modelling such gene regulation networks is one of the most challenging problems in functional genomics.

Since the development of the DNA microarray technique in the mid-90s, there has been an enormous increase in gene expression data from several organisms. This flood of large scale data can be used for mining gene-to-gene interactions. Methods that have been applied to gene regulatory network inference include among others, boolean networks, bayesian networks and recurrent neural networks. There are known effects, for example time lapses, that the current models do no take into account.

Because real regulatory networks in living cells are not that well characterized, models have to be verified with artificial data. Our first goal is to make expression data simulator capable of producing synthetic data that captures the known features of gene regulation. The research so far has concentrated on recurrent neural network models of gene regulation.

Automated allele calling method for capillary array electrophoresis genotyping

Researchers: Jukka Heikkonen, Janne Ojanen, Timo Miettinen*
*Finnish Genome Center, University of Helsinki

The project is done in co-operation with Finnish Genome Center.

Capillary array electrophoresis instruments provide a platform for high-throughput genotyping, on which more than 10 000 genotypes can be generated per day. However, the capacity of available genotyping software for analyzing the data does not meet the throughput of the electrophoresis instruments. In order to ensure high quality of the genotypes, most of the software require substantial manual editing following an initial semi-automated allele calling process. Therefore the current allele calling methods have become a serious bottleneck for the entire genotyping pipeline.

Our aim is to develop fully automated method to minimize user interaction. In addition we have implemented a number of quality measures to remove ambiguous results in order to avoid miscalls. Quality scores are calculated for each processing step separately to provide information on the quality of the signal and the reliability of the decision making processes of the program.

The portion of alleles that the new method was able to read correlated 100% to the number of alleles called manually. Also, the allele sizes corresponded with the sizes determined with the software provided by the manufacturer of the instrument. Thus, the new method provides a tool for fully automated, high accuracy genotyping. The automated genotyping software based on the proposed method will be made available free of charge under the GNU General Public License (GPL).

Figure 20

Figure 20: Capillary array electrophoresis genotyping workflow.

Modeling of Bacterial Metabolism

Researchers: Mika Toivanen, Antti Nyyssölä*, Matti Leisola* and Kimmo Kaski
*Laboratory of Bioprocess Engineering, HUT

The interest in computational methods in biological applications has recently been increasing greatly. This is, in part, due to a need to quantitatively integrate biological knowledge that is diverse and mostly qualitative. We have started this mission, as a part of a larger systems biology effort, by formulating a model of glucose metabolism of a lactic acid bacterium Lactococcus lactis.

Our model characterizes 24 enzymatic reactions involved in the metabolic pathway. The reaction rate equations depend on the concentrations of 31 metabolites and 131 kinetic parameters. The rate equations and the stoichiometry of the reaction network form a system of ordinary differential equations that is solved with Matlab. The problem with this approach is that the parameters have been collected from a number of articles and thus they are not directly comparable. In the future we wish to test our model with direct experimental data.

The model is handy e.g. in designing knock-out mutant strains of the bacteria. It predicts how the carbon fluxes change if an enzyme is deleted. These knock-out mutants are a common tool in biotechnology when the fermentation characteristics of these bacteria are engineered. The model also gives us more insight into the control of metabolism at enzymatic level and it is an excellent starting point for more advanced models.

Figure 21

Figure 21: The interplay of enzymatic reactions dictates the carbon fluxes in the cell. It is in our interests to direct the flux towards economically desirable products by combining our knowledge of systems biology and bioprocess engineering.

Genetic and Environmental Causes of Nephropathy in Type 1 Diabetes

Researchers: Ville-Petteri Mäkinen, Per-Henrik Groop*, Maija Wessman* and , Carol Forsblom*
*Folkhälsan Research Centre, Biomedicum Helsinki

Diabetes is a complex disease that has both hereditary and environmental background and it is unlikely that there is a single gene dictating the incidence. Therefore it is imperative that we apply a multifaceted and holistic approach in identifying the most significant risk factors.

The FinnDiane study, headed by Doc. Per-Henrik Groop from the Folkhälsan Research Centre, aims for the identification and early detection of diabetic complications. So far, the research group has accumulated clinical information of about 4000 type 1 diabetic patients and 1500 relatives in Finland. At the moment, a genome wide scan of some 120 selected families is in progress, so we will have a unique data in the world and an excellent opportunity to learn more about the disease and its complications.

Maintaining and analysing the huge database is a difficult task. It is no longer enough that we associate one gene to a particular phenotype, we have to be able to go beyond this first level of organisation and find larger patterns and clusters of susceptibility. LCE has the key role of providing the expertise and know-how in both data visualisation and statistical analysis.

Diabetes is turning into an epidemic in the developed world – at the moment there are over 30,000 type 1 and 200,000 type 2 patients in Finland alone. The FinnDiane study is focused on the type 1, which is characterised by an autoimmune reaction against the insulin producing cells in the pancreas. As a result, an affected person becomes rapidly dependent on external source of insulin.

After about 20 years of T1D, a third of the patients have or are in the process of developing kidney disease. Complications are the most dangerous aspect of diabetes, since they are mostly irreversible and very costly in every sense of the word. Furthermore, at present day, we lack a reliable early diagnosis method that would facilitate efficient prevention and treatment.

Figure 22

Figure 22: Pedigrees with diabetes and complications.

Systems biology of sexual reproduction

Researchers: Margareta Segerståhl and Kimmo Kaski

The multitude of different sex determination and reproduction mechanisms found in nature is not easily approached by conventional methods. First of all, this diversity is an evolutionary paradox: Darwinian natural selection should favor and spread a good solution for a function as important as reproduction, not scatter it. Secondly, the biological concept of sex determination is based on terms like maleness and femaleness. The scientific exactness of these attributes is far from good and a more formal understanding of this apparent dichotomy is to be hoped for. Thirdly, the germ cells are often neglected in sex determination studies. This lack of interest is most surprising because individuals with no functioning germ cells immediately become evolutionary dead ends.Further complications are added by the observations that germ cell sex determination does not necessarily follow the same sex determination program that establishes all other sexual characteristics of an individual.

In order to explore this important area of biology we have started a systems approach that is to provide a conceptual framework for a better analysis of experimental results. We want to create a formal model that allows effective use of computational and mathematical modeling methods because the amount of experimental data is increasing enormously fast. It is also important that the model is presentable in a way that allows co-evolution of theory and experiment.

The preliminary results from an unconventional analysis of germ cell identities show a surprising way to link sexual reproduction and multicellular development. This allows us to model a new level of biological organization between cell biology and complex multicellularity. It integrates different biological diciplines with systems analysis and modeling and enables us to look at many biological phenomena from a new perspective.