In modern biology there is a strong tendency to collect huge quantities of data with high throughput techniques. This data is only useful if we have good techniques of analysing it to obtain a better understanding of the biological systems being studied. One approach to doing this is to build mathematical models. An idea which is widespread is that the best models are those which are the closest to reality in the sense that they take account of as many effects as possible and use as many measured quantities as possible. Suppose for definiteness that the model is given by a system of ordinary differential equations. Then this idea translates into using systems with many variables and many parameters. There are several problems which may come up. The first is that some parameters have not been measured at all. The second is that those which have been measured are only known with poor accuracy and different parameters have been measured in different biological systems. A third problem is that even if the equations and parameters were known perfectly we are still faced with the difficult problem of analysing at least some aspects of the qualitative behaviour of solutions of a dynamical system of high dimension. The typical way of getting around this is to put the equations on the computer and calculate the solutions numerically for some initial data. Then we have the problem that we can only do the calculations for a finite number of initial data sets and it is difficult to know how typical the solutions obtained really are. To have a short name for the kind of model just described I will refer to it as a ‘complex model’.
In view of all these difficulties with complex models it makes sense to complement the above strategy by one which goes in a very different direction. The idea for this alternative approach is to build models which are as simple as possible subject to the condition that they include a biological effect of interest. The hope is then that a detailed analysis of the simple model will generate new and useful ideas for explaining biological phenomena or will give a picture of what is going on which may be crude but is nevertheless helpful in practise, perhaps even more helpful than a complex model.
It often happens that in analysing a complex model many of the parameters have to be guessed (perhaps just in an order of magnitude way) or estimated by some numerical technique. It is then justified to ask whether adding more variables and corresponding parameters really means adding information. How can we hope to understand complex models at all? If these were generic dynamical systems with the given number of unknowns and parameters this would be hopeless. Fortunately the dynamical systems arising in biology are far from generic. They have arisen by the action of evolution optimizing certain properties under strong constraints. Given that this is the case it makes sense to try and understand in what ways these systems are special. If key mechanisms can be identified then we can try to isolate them and study them intensively in relatively simple situations. My intention is not to deny the value of high throughput techniques. What I want to promote is the idea that it is bad if the pursuit of those approaches leads to the neglect of others which may be equally valuable. On a theoretical level this means the use of ‘simple models’ in contrast to ‘complex models’. There is a corresponding idea on the experimental side which may be even more necessary. This is to focus on the study of certain simple biological systems as a complement to high throughput techniques. This alternative might be called ‘low throughput biology’. It occurred to me that if I had this idea under this name then it might also have been introduced by others. Searching for the phrase with Google I only found a few references and as far as I could see the phrase was generally associated with a negative connotation. Rather than making an opposition between low throughput and high throughput techniques like David and Goliath it would be better to promote cooperation between the two. I have come across one good example in this in the work of Uri Alon and his collaborators on network motifs. This work is well explained in the lectures of Alon on systems biology which are available on Youtube. The idea is to take a large quantity of data (such as the network of all transcription factors of E. coli) and to use statistical analysis to identify qualitative features of the network which make it different from a random network. These features can then be isolated, analysed and, most importantly, understood in an intuitive way.
April 24, 2012 at 8:19 am |
You probably already know it, but your post made me think of the Taken’s reconstruction theorem http://en.wikipedia.org/wiki/Takens'_theorem
which allows to reconstruct the attractor of a system with data collected from only one (generic) observable, and then estimate the dimension of the attractor that describes the long time behaviour of the system. It may be a way to get a clue on how complex the system should be, at least in terms of dimension. And use this huge amount of data to simplify the problem.
April 24, 2012 at 8:50 am |
Thanks for this comment. My level of knowledge on the Takens story is about epsilon squared and it sounds to me like something I should know more about.
August 17, 2012 at 1:46 pm |
For some ideas going in the same direction as this post and coming from a more authoritative source I recommend the article ‘Sillycon valley fever’ by Sydney Brenner (Current Biology 9, R671)
October 22, 2016 at 6:39 am |
Very relevant perspective to be conscious of when finding solution to biological questions