## Archive for April, 2012

### Low throughput biology

April 24, 2012

In modern biology there is a strong tendency to collect huge quantities of data with high throughput techniques. This data is only useful if we have good techniques of analysing it to obtain a better understanding of the biological systems being studied. One approach to doing this is to build mathematical models. An idea which is widespread is that the best models are those which are the closest to reality in the sense that they take account of as many effects as possible and use as many measured quantities as possible. Suppose for definiteness that the model is given by a system of ordinary differential equations. Then this idea translates into using systems with many variables and many parameters. There are several problems which may come up. The first is that some parameters have not been measured at all. The second is that those which have been measured are only known with poor accuracy and different parameters have been measured in different biological systems. A third problem is that even if the equations and parameters were known perfectly we are still faced with the difficult problem of analysing at least some aspects of the qualitative behaviour of solutions of a dynamical system of high dimension. The typical way of getting around this is to put the equations on the computer and calculate the solutions numerically for some initial data. Then we have the problem that we can only do the calculations for a finite number of initial data sets and it is difficult to know how typical the solutions obtained really are. To have a short name for the kind of model just described I will refer to it as a ‘complex model’.

In view of all these difficulties with complex models it makes sense to complement the above strategy by one which goes in a very different direction. The idea for this alternative approach is to build models which are as simple as possible subject to the condition that they include a biological effect of interest. The hope is then that a detailed analysis of the simple model will generate new and useful ideas for explaining biological phenomena or will give a picture of what is going on which may be crude but is nevertheless helpful in practise, perhaps even more helpful than a complex model.

It often happens that in analysing a complex model many of the parameters have to be guessed (perhaps just in an order of magnitude way) or estimated by some numerical technique. It is then justified to ask whether adding more variables and corresponding parameters really means adding information. How can we hope to understand complex models at all? If these were generic dynamical systems with the given number of unknowns and parameters this would be hopeless. Fortunately the dynamical systems arising in biology are far from generic. They have arisen by the action of evolution optimizing certain properties under strong constraints. Given that this is the case it makes sense to try and understand in what ways these systems are special. If key mechanisms can be identified then we can try to isolate them and study them intensively in relatively simple situations. My intention is not to deny the value of high throughput techniques. What I want to promote is the idea that it is bad if the pursuit of those approaches leads to the neglect of others which may be equally valuable. On a theoretical level this means the use of ‘simple models’ in contrast to ‘complex models’. There is a corresponding idea on the experimental side which may be even more necessary. This is to focus on the study of certain simple biological systems as a complement to high throughput techniques. This alternative might be called ‘low throughput biology’. It occurred to me that if I had this idea under this name then it might also have been introduced by others. Searching for the phrase with Google I only found a few references and as far as I could see the phrase was generally associated with a negative connotation. Rather than making an opposition between low throughput and high throughput techniques like David and Goliath it would be better to promote cooperation between the two. I have come across one good example in this in the work of Uri Alon and his collaborators on network motifs. This work is well explained in the lectures of Alon on systems biology which are available on Youtube. The idea is to take a large quantity of data (such as the network of all transcription factors of E. coli) and to use statistical analysis to identify qualitative features of the network which make it different from a random network. These features can then be isolated, analysed and, most importantly, understood in an intuitive way.

### Dynamics of the MAP kinase cascade

April 7, 2012

The MAP kinase cascade is a group of enzymes which can iteratively add phosphate groups to each other. More specifically, when a suitable number of phosphate groups have been added to one enzyme in the cascade it becomes activated and can add a phosphate to the next enzyme in the row. I found this kind of idea of enzymes modifying each other with the main purpose of activating each other fascinating when I first came across it. (The first example I saw was actually the complement cascade which occurs in immunology.) This type of structure is just asking to be modelled mathematically and not surprisingly a lot of work has been done on it. Here I will survey some of what is known.

The MAP kinase cascade is a structure which occurs in many types of cells. It has three layers. The first layer consists of a protein which can be phosphorylated once. The second layer consists of a protein which can be phosphorylated twice by the same enzyme. This enzyme is the phosphorylated form of the protein in the first layer. The third layer also consists of a protein which can be phosphorylated twice by the same enzyme. This enzyme is the doubly phosphorylated form of the protein in the second layer. The protein in the third layer is the one which is called MAP kinase (mitogen activated protein kinase, MAPK). A kinase is an enzyme which phosphorylates something else and so it is not suprising that the protein in the second layer is called a MAP kinase kinase (MAPKK). The protein in the first layer is accordingly called a MAP kinase kinase kinase (MAPKKK). The roles of the players in this scheme can be taken by different enzymes. For concreteness I name those which occur in the case of human T cells. There the proteins in the first, second and third layers are called Raf, MEK and ERK, respectively. The protein which phophorylates Raf, and hence starts the whole cascade, is Ras. It, or rather the corresponding gene ras, is famous as an oncogene. This means that when the gene is not working properly cancer can result. In fact many drugs used in cancer treatment target proteins belonging to the MAP kinase cascade.

A model for the MAP kinase cascade was written down by Huang and Ferrell (PNAS, 93, 10078). They used a description of Michaelis-Menten type where for each basic substance three species are included in the network. These are the substance itself (free substrate), the enzyme and the complex of the two. Of course since in the MAP kinase cascade certain proteins act both as substrate and enzyme in different reactions there is some overlap between these. For clarity this may be called the ‘extended Michaelis-Menten’ description to contrast it with the ‘effective Michaelis-Menten’ description arising from the extended version by a quasi-steady state limiting process. Note that for a given basic reaction network with $m$ species the extended MM description has more than $m$ species but still has mass-action kinetics whereas the effective MM description has $m$ species but kinetics more complicated than mass action. Phosphatases catalysing the reverse reactions are also included in the model. The phosphatase which removes both phosphate groups of ERK is called MKP3.

In the paper the steady states of the model are studied and an input-output relation is computed numerically. The activity of the MAPK is plotted as a function of the concentration of the first enzyme (Ras in the example). A sigmoidal curve is found which corresponds to what is called ultrasensitivity. The dynamical properties of the model are not discussed. In particular it is not discussed whether there might be multistability (more than one stable stationary solution for fixed values of the parameters) or periodic solutions. The authors also did experiments whose results agreed well with the theoretical predictions. The experiments were done with extracts from the oocytes (immature egg cells) of the frog Xenopus laevis.

The possible dynamic behaviour was investigated in later papers. In some of these the effect of adding an additional feedback was considered. This kind of feedback is probably important in real biological systems. It may, for instance, explain why the results of experiments on whole oocytes are different from those done with extracts. Here, for mathematical simplicity, I will restrict to the case without additional feedback, in other words to the original Huang-Ferrell model. Multistability in this type of model was found in a paper of Markevich, Hoek and Kholodenko (J. Cell Biol. 164, 353). They investigate both extended and effective MM dynamics numerically and find bistability for both. In the extended MM model, which is the one I am most interested in here, the phosphorylation is supposed to be distributive. In other the words the kinase is released between the two phosphorylation steps. The alternative to this is called processive phosphorylation. In a paper of Conradi et. al. this result is compared with chemical reaction network theory (CRNT). It is found that while techniques from CRNT yield results agreeing with those of Markevich et. al. for the case where both the kinase and the phosphatase act in a distributive way, if one of these is replaced by a processive mechanism it can be proved using the Deficiency One Algorithm of CRNT that there is no multistationarity. The case with distributive phosphorylation is the special case $n=2$ of what is called a multiple futile cycle with $n$ steps. Wang and Sontag (J. Math Biol. 57, 29) proved upper and lower bounds for the number of steady states in this type of system under certain assumptions on the parameters. In particular this confirms that there can be three steady states (without determining their stability). Going beyond the single layer to the full cascade opens up more possibilities. Numerical evidence has been presented by Qiao et. al. (PLOS Comp. Biol. 9, 2007) that there are periodic solutions. To understand why these should exist it might be best to think of them as relaxation oscillations.