Archive for the ‘Uncategorized’ Category

Stability of steady states in models of the Calvin cycle

April 25, 2016

I have just written a paper with Stefan Disselnkötter on stationary solutions of models for the Calvin cycle and their stability. There we concentrate on the simplest models for this biological system. There were already some analytical results available on the number of positive stationary solutions (let us call them steady states for short), with the result that this number is zero, one or two in various circumstances. We were able to extend these results, in particular showing that in a model of Zhu et. al. there can be two steady states or, in exceptional cases, a continuum of steady states. This is at first sight surprising since those authors stated that there is at most one steady state. However they impose the condition that the steady states should be ‘physiologically feasible’. In fact for their investigations, which are done by means of computer calculations, they assume among other things that certain Michaelis constants which occur as parameters in the system have specific numerical values. This assumption is biologically motivated but at the moment I do not understand how the numbers they give follow from the references they quote. In any case, if these values are assumed our work gives an analytical proof that there is at most one steady state.

While there are quite a lot of results in the literature on the number of steady states in systems of ODE modelling biochemical systems there is much less on the question of the stability of these steady states. It was a central motivation of our work to make some progress in this direction for the specific models of the Calvin cycle and to develop some ideas to approaching this type of question more generally. One key idea is that if it can be shown that there is bifurcation with a one-dimensional centre manifold this can be very helpful in getting information on the stability of steady states which arise in the bifurcation. Given enough information on a sufficient number of derivatives at the bifurcation point this is a standard fact. What is interesting and perhaps less well known is that it may be possible to get conclusions without having such detailed control. One type of situation occurring in our paper is one where a stable solution and a saddle arise. This is roughly the situation of a fold bifurcation but we do not prove that it is generic. Doing so would presumably involve heavy calculations.

The centre manifold calculation only controls one eigenvalue and the other important input in order to see that there is a stable steady state for at least some choice of the parameters is to prove that the remaining eigenvalues have negative real parts. This is done by considering a limiting case where the linearization simplifies and then choosing parameters close to those of the limiting case. The arguments in this paper show how wise it can be to work with the rates of the reactions as long as possible, without using species concentrations. This kind of approach is popular with many people – it has just taken me a long time to get the point.

The advanced deficiency algorithm

January 23, 2016

Here I discuss another tool for analysing chemical reaction networks of deficiency greater than one. This is the Advanced Deficiency Algorithm developed by Feinberg and Ellison. It seems that the only direct reference for the mathematical proofs is Ellison’s PhD thesis. There is a later PhD thesis by Haixia Ji in which she introduces an extension of this called the Higher Deficiency Algorithm and where some of the ideas of Ellison are also recapitulated. In my lecture course, which ends next week, I will only have time to discuss the structure of the algorithm and give an extended example without proving much.

The Advanced Deficiency Algorithm has a general structure which is similar to that of the Deficiency One Algorithm. In some cases it can rule out multistationarity. Otherwise it gives rise to several sets of inequalities. If one of these has a solution then there is multistationarity and if none of them does there is no multistationarity. It is not clear to me if this is really an algorithm which is guaranteed to give a diagostic test in all cases. I think that this is probably not the case and that one of the themes of Ji’s thesis is trying to improve on this. An important feature of this algorithm is that the inequalities it produces are in general nonlinear and thus may be much more difficult to analyse than the linear inequalities obtained in the case of the Deficiency One Algorithm.

Now I have come to the end of my survey of deficiency theory for chemical reaction networks. I feel I have learned a lot and now is the time to profit from that by applying these techniques. The obvious next step is to try out the techniques on some of my favourite biological examples. Even if the result is only that I see why the techniques do not give anything interesting in this cases it will be useful to understand why. Of course I hope that I will also find some positive results.

The deficiency one theorem

December 2, 2015

Here I continue with the discussion of chemical reaction network theory begun in the previous post. After having presented a proof of the Deficiency Zero Theorem in my course I proceeded to the Deficiency One Theorem following the paper of Feinberg in Arch. Rat. Mech. Anal. 132, 311. If we have a reaction network we can consider each linkage class as a network in its own right and thus we can define its deficiency. If we denote the deficiency of the full network by \delta and the deficiency of the ith linkage class by \delta_i then in general \delta\ge\sum_i \delta_i. The first hypothesis of the Deficiency One Theorem is that the deficiencies of the linkage classes are no greater than one. The second is that equality holds in the inequality relating the deficiency of the network to those of its linkage classes. The stoichiometric subspace S of the full network is the sum of the corresponding spaces S_i for the linkage classes. The sum is direct precisely when equality holds in the inequality for the deficiencies. The third condition is that there is precisely one terminal strong linkage class in each linkage class (t=l). The first conclusion is that if there exists a positive stationary solution there is precisely one stationary solution in each stoichiometric compatibility class. The second is that if the network is weakly reversible there exists a positive stationary solution. Thus two of the conclusions of the Deficiency Zero Theorem have direct analogues in this case. Others do not. There is no statement about the stability of the stationary solutions. Networks which are not weakly reversible but satisfy the three conditions of the Deficiency One Theorem may have positive stationary solutions. In the paper of Feinberg the proof of the Deficiency One Theorem is intertwined with that of another result. It says that the linearization about a stationary solution of the restriction of the system to a stoichiometric compatibility class has trivial kernel. A related result proved in the same paper says that for a weakly reversible network of deficiency zero each stationary solution is a hyperbolic sink within its stoichiometric class. Statements of this type ensure that these stationary solutions possess a certain structural stability.

In the proof of the Deficiency One Theorem the first step (Step 1) is to show that when the assumptions of the theorem are satisfied and there is positive stationary solution c^* then the set of stationary solutions is equal to the set of points for which \log c-\log c^* lies in the orthogonal complement of the stoichiometric subspace. From this the conclusion that there is exactly one stationary solution in each stoichiometric compatibility class follows just as in the proof of the Deficiency Zero Theorem (Step 2). To complete the proof it then suffices to prove the existence of a positive stationary solution in the weakly reversible case (Step 3). In Step 1 the proof is reduced to the case where the network has only one linkage class by regarding the linkage classes of the original network as networks in their own right. In this context the concept of a partition of a network (\cal S,\cal C,\cal R) is introduced. This is a set of subnetworks ({\cal S},{\cal C}^i,{\cal R}^i). The set of species is unchanged. The set of reactions \cal R is a disjoint union of the {\cal R}^i) and {\cal C}^i is the set of complexes occurring in the reactions contained in {\cal R}^i). The partition is called direct if the stoichiometric subspace of the full network is the direct sum of those of the subnetworks. The most important example of a partition of a network in the present context is that given by the linkage classes of any network. That it is direct is the second condition of the Deficiency One Theorem. The other part of Step 1 of the proof is to show that the statement holds in the case that there is only one linkage class. The deficiency is then either zero or one. Since the case of deficiency zero is already taken care of by the Deficiency Zero Theorem we can concentrate on the case where \delta=1. Then the dimension of {\rm ker} A_k is one and that of {\rm ker} (YA_k) is two. The rest of Step 1 consists of a somewhat intricate algebraic calculation in this two-dimensional space. It remains to discuss Step 3. In this step the partition given by the linkage classes is again used to reduce the problem to the case where there is only one linkage class. The weak reversibility is preserved by this reduction. Again we can assume without loss of generality that \delta=1. The subspace U={\rm im} Y^T+{\rm span}\omega_{\cal C} is a hyperplane in F({\cal C}). We define \Gamma to be the set of functions of the form \log a with a a positive element of {\rm ker} (YA_k). The desired stationary solution is obtained as a point of the intersection of U with \Gamma. To show that this intersection is non-empty it is proved that there are points of \Gamma on both sides of U. This is done by a careful examination of the cone of positive elements of {\rm ker} (YA_k).

To my knowledge the number of uses of the Deficiency Zero Theorem and the Deficiency One Theorem in problems coming from applications is small. If anyone reading this has other information I would like to hear it. I will now list the relevant examples I know. The Deficiency Zero Theorem was applied by Sontag to prove asymptotic stability for McKeithan’s kinetic proofreading model of T cell activation. I applied it to the model of Salazar and Höfer for the dynamics of the transcription factor NFAT. Another potential application would be the multiple futile cycle. This network is not weakly reversible. The simple futile cycle has deficiency one and the dual futile cycle deficiency two. The linkage classes have deficiency zero in both cases. Thus while the first and third conditions of the Deficiency One Theorem are satisfied the second is not. Replacing the distributive phosphorylation in the multiple futile cycle by processive phosphorylation may simplify things. This has been discussed by Conradi et. al., IEE Proc. Systems Biol. 152, 243. In the case of two phosphorylation steps the system obtained is of deficiency one but the linkage classes have deficiency zero so that condition (ii) is still not satisfied. It seems that the Deficiency One Algorithm may be more helpful and that will be the next subject I cover in my course.

Conference on biological oscillators at EMBL in Heidelberg

November 17, 2015

EMBL, the European Molecular Biology Laboratory, is an international institution consisting of laboratories at five sites, two in Germany, one in the UK, one in France and one in Italy. I recently attended a meeting on the theme ‘Biological Oscillators’ at the site in Heidelberg. The impressive building is in the form of a double helix. There are two spiral ramps over several stories which are linked by bridges (‘hydrogen bonds’, in German Wasserstoffbrücken). This helix provides an original setting for the poster sessions. The building is reached by ascending steep hills in the area behind the castle. I took the comfortable option of using the bus provided by the institute. This meeting had about 130 participants but I think that the capacity is much greater.

One of the most interesting talks on the first day from my point of view was by Petra Schwille from the Max Planck Institute for Biochemistry. She talked about the Min system which is used by bacteria to determine their plane of division. The idea is that certain proteins (whose identity is explicitly known) oscillate between the ends of the cell and that the plane of division is the nodal surface of the concentration of one of these. The speaker and her collaborators have been able to reconstitute this system in a cell-free context. A key role is played by the binding of the proteins to the cell membrane. Diffusion of bound proteins is much slower than that of proteins in solution and this situation of having two different diffusion constants in a coupled system is similar to the classical scenario known from the Turing instability. It sounds like modelling this system mathematically can be a lot of fun and that there is no lack of people interested in doing so.

There was also a ‘Keynote Lecture’ by Jordi Garcia-Ojalvo which lived up to the promise of its special title. The topic was the growth of a colony of Bacillus subtilis. (The published reference is Nature 523, 550.) In fact, to allow better control, the colony is constrained to be very thin and is contained in a microfluidic system which allows its environment to be manipulated precisely. A key observation is that the colony does not grow at a constant rate. Instead its growth rate is oscillatory. The speaker explained that this can be understood in terms of the competition between the cells near the edge of the colony and those in the centre. The colony is only provided with limited resources (glycerol, glutamate and salts). It may be asked which resource limits the growth rate. It is not the glycerol, which is the primary carbon source. Instead it is the glutamate, which is the primary source of nitrogen. An important intermediate compound in the use of glutamate is ammonium. If cells near the boundary of the colony produced ammonium it would be lost to the surroundings. Instead they use ammonium produced by the interior cells. It is the exterior cells which grow and they can deprive the inner cells of glutamate. This prevents the inner cells producing ammonium which is then lacking for the growth of the outer cells. This establishes a negative feedback loop which can be seen as the source of the oscillations in growth rate. The feasibility of this mechanism was checked using a mathematical model. The advantage of the set-up for the bacteria is that if the colony is exposed to damage from outside it can happen that only the exterior cells die and the interior cells generate a new colony. The talk also included a report on further work (Nature 527, 59) concerning the role of ion channels in biofilms. There are close analogies to the propagation of nerve signals and the processes taking place can be modelled by equations closely related to the Hodgkin-Huxley system.

I will now mention a collection of other topics at the conference which I found particularly interesting. One recurring theme was NF\kappaB. This transcription factor is known to exhibit oscillations. A key question is what their function is, if any. One of the pioneers in this area, Mike White, gave a talk at the conference. There were also a number of other people attending working on related topics. I do not want to go any deeper here since I think that this is a theme to which I might later devote a post of its own, if not more than one. I just note two points from White’s talk. One is that this substance is a kind of hub or bow-tie with a huge number of inputs and outputs. Another is that the textbook picture of the basic interactions of NF\kappaB is a serious oversimplification. Another transcription factor which came up to a comparable extent during the conference is Hes1, which I had never previously heard of. Jim Ferrell gave a talk about the coordination of mitosis in Xenopus eggs. These are huge cells where communication by means of diffusion would simply not be fast enough. The alternative proposed by Ferrell are trigger waves, which can travel much faster. Carl Johnson talked about mechanisms ensuring the stability of the KaiABC oscillator. He presented videos showing the binding of individual KaiA molecules to KaiC. I was was amazed that these things can be seen directly and are not limited to the cartoons to be found in biology textbooks. Other videos I found similarly impressive were those of Alexander Aulehla showing the development of early mouse embryos (segmentation clock) where it could be seen how waves of known chemical events propagating throught the tissues orchestrate the production of structures in the embryo. These pictures brought the usual type of explanations used in molecular biology to a new level of concreteness in my perception.

Computer-assisted proofs

June 26, 2015

My spontaneous reaction to a computer-assisted proof is to regard it as having a lesser status than one done by hand. Here I want to consider why I feel this way and if and under what circumstances this point of view is justified. I start by considering the situation of a traditional mathematical proof, done by hand and documented in journal articles or books. In this context it is impossible to write out all details of the proof. Somehow the aim is to bridge the gap between the new result and what experts in the area are already convinced of. This general difficulty becomes more acute when the proof is very long and parts of it are quite repetitive. There is the tendency to say that the next step is strictly analogous to the previous one and if the next step is written out there is the tendency for the reader to think that it is strictly analogous and to gloss over it. Human beings (a class which includes mathematicians) make mistakes and have a limited capacity to concentrate. To sum up, a traditional proof is never perfect and very long and repetitive proofs are likely to be less so than others. So what is it that often makes a traditional proof so convincing? I think that in the end it is its embedding in a certain context. An experienced mathematician has met with countless examples of proofs, his own and those of others, errors large and small in those proofs and how they can often be repaired. This is complemented by experience of the interactions between different mathematicians and their texts. These things give a basis for judging the validity of a proof which is by no means exclusively on the level of explicit logical argumentation.

How is it, by comparison, with computer-assisted proofs? The first point to be raised is what is meant by that phrase. Let me start with a rather trivial example. Suppose I use a computer to calculate the kth digit of n factorial where k and n are quite large. If for given choices of the numbers a well-tested computer programme can give me the answer in one minute then I will not doubt the answer. Why is this? Because I believe that the answer comes from an algorithm which determines a unique answer. No approximations or floating point operations are involved. For me interval arithmetic, which I discussed in a previous post, is on the same level of credibility, which is the same level of credibility as a computer-free proof. There could be an error in the hardware or the software or the programme but this is not essentially different from the uncertainties connected with a traditional proof. So what might be the problem in other cases? One problem is that of transparency. If a computer-assisted proof is to be convincing for me then I must either understand what algorithm the computer is supposed to be implementing or at least have the impression that I could do so if I invested some time and effort. Thus the question arises to what extent this aspect is documented in a given case. There is also the issue of the loss of the context which I mentioned previously. Suppose I believe that showing that the answer to a certain question is ‘yes’ in 1000 cases constitutes a proof of a certain theorem but that checking these cases is so arduous that a human being can hardly do so. Suppose further that I understand an algorithm which, if implemented, can carry out this task on a computer. Will I then be convinced? I think the answer is that I will but I am still likely to be left with an uncomfortable feeling if I do not have the opportunity to see the details in a given case if I want to. In addition to the question of whether the nature of the application is documented there is the question of whether this has been done in a way that is sufficiently palatable that mathematicians will actually carefully study the documentation. Rather than remain on the level of generalities I prefer to now go over to an example.

Perhaps the most famous computer-assisted proof is that of the four colour theorem by Appel and Haken. To fill myself in on the background on this subject I read the book ‘Four colors suffice’ by Robin Wilson. The original problem is to colour a map in such a way that no two countries with a common border have the same colour. The statement of the theorem is that it is always possible with four colours. This statement can be reformulated as a question in graph theory. Here I am not interested in the details of how this reformulation is carried out. The intuitive idea is to associate a vertex to each country and an edge to each common border. Then the problem becomes that of colouring the vertices of a planar graph in such a way that no two adjacent vertices have the same colour. From now on I take this graph-theoretic statement as basic. (Unfortunately it is in fact not just a graph-theoretic, in particular combinatorial, statement since we are talking about planar graphs.) What I am interested in is not so much the problem itself as what it can teach us about computer-assisted proofs in general. I found the book of Wilson very entertaining but I was disappointed by the fact that he consistently avoids going over to the graph-theoretic formulation which I find more transparent (that word again). In an article by Robin Thomas (Notices of the AMS, 44, 848) he uses the graph-theoretic formulation more but I still cannot say I understood the structure of the proof on the coarsest scale. Thomas does write that in his own simplified version of the original proof the contribution of computers only involves integer arithmetic. Thus this proof does seem to belong to the category of things I said above I would tend to accept as a mathematical proof, modulo the fact that I would have to invest the time and effort to understand the algorithm. There is also a ‘computer-checked proof’ of the four colour theorem by Georges Gonthier. I found this text interesting to look at but felt as if I was quickly getting into logical deep waters. I do not really understand what is going on there.

To sum up this discussion, I am afraid that in the end the four colour problem was not the right example for me to start with and I that I need to take some other example which is closer to mathematical topics which I know better and perhaps also further from having been formalized and documented.

Organizing posts by categories

August 25, 2012

I have a tendency to use the minimal amount of technology I have to in order to achieve a particular goal. So for instance, having been posting things on this blog for several years, I have made use of hardly any of the technical possibilities available.  Among other things I did not assign my posts to categories, just putting them in one long list. I can well understand that not everyone who wants to read about immunology wants to read about general relativity and vice versa. Hence it is useful to have a sorting mechanism which can help to direct people to what they are interested in. Now I have invested the effort to add information on categories to most of the posts. It was easy (though time-consuming) to do and I find that the results are useful. It is helpful for me myself to navigate through the material and it is interesting for me to see at a glance how many posts on which subjects there are. For now on I will systematically assign (most) new posts to a category and the effort to do so should be negligible. This post is an exception since it does not really fit into any category I have.

Do you know these matrices?

March 9, 2012

I have come across a class of matrices with some interesting properties. I feel that they must be known but I have not been able to find anything written about them. This is probably just because I do not know the right place to look. I will describe these matrices here and I hope that somebody will be able to point out a source where I can find more information about them. Consider an n\times n matrix A with elements a_{ij} having the following properties. The elements with i=j (call them b_i) are negative. The elements with j=i+1\ {\rm mod}\ n (call them c_i) are positive. All other elements are zero. The determinant of a matrix of this type is \prod_i b_i+(-1)^{n+1}\prod_i c_i. Notice that the two terms in this sum always have opposite signs. A property of these matrices which I found surprising is that B=(-1)^{n+1}(\det A)A^{-1} is a positive matrix, i.e. all its entries b_{ij} are positive. In proving this it is useful to note that the definition of the class is invariant under cyclic permutation of the indices. Therefore it is enough to show that the entries in the first row of B are non-zero. Removing the first row and the first column from A leaves a matrix belonging to the class originally considered. Removing the first row and a column other than the first from A leaves a matrix where a_{n1} is alone in its column. Thus the determinant can be expanded about that element. The result is that we are left to compute the determinant of an (n-2)\times (n-2)matrix which is block diagonal with the first diagonal block belonging to the class originally considered and the second diagonal block being the transpose of a matrix of that class. With these remarks it is then easy to compute the determinant of the (n-1)\times (n-1) matrix resulting in each of these cases. In more detail b_{11}=(-1)^{n+1}b_2b_3\ldots b_n and b_{1j}=(-1)^{n-j}b_2b_3\ldots b_{j-1}c_j\ldots c_n for j>1.

Knowing the positivity of (-1)^{n+1}(\det A)A^{-1} means that it is possible to apply the Perron-Frobenius theorem to this matrix. In the case that \det A has the same sign as (-1)^{n+1} it follows that A^{-1} has an eigenvector all of whose entries are positive. The corresponding eigenvalue is positive and larger in magnitude than any other eigenvalue of A^{-1}. This vector is also an eigenvalue of A with a positive eigenvalue. Looking at the characteristic polynomial it is easy to see that if (-1)^n(b_1b_2\ldots b_n+(-1)^{n+1}c_1c_2\ldots c_n)<0 the matrix A has exactly one positive eigenvalue and that none of its eigenvalues is zero.

The Perron-Frobenius theorem

October 20, 2011

The Perron-Frobenius theorem is a result in linear algebra which I have known about for a long time. On the other hand I never took the time to study a proof carefully and think about why the result holds. I was now motivated to change this by my interest in chemical reaction network theory and the realization that the Perron-Frobenius theorem plays a central role in CRNT. In particular, it lies at the heart of the original proof of the existence part of the deficiency zero theorem. Here I will review some facts related to the Perron-Frobenius theorem and its proof.

Let A be a square matrix all of whose entries are positive. Note how this condition makes no sense for an endomorphism of a vector space in the absence of a preferred basis. Then A has a positive eigenvalue \lambda_+ and it is bigger than the magnitude of any other eigenvalue. The dimension of the generalized eigenspace corresponding to this eigenvalue is one. There is a vector in the eigenspace all of whose components are positive. Let C_i be the sum of the entries in the ith column of A. Then \lambda_+ lies between the minimum and the maximum of the C_i.

If the assumption on A is weakened to its having non-negative entries then most of the properties listed above are lost. However analogues can be obtained if the matrix is irreducible. This means by definition that the matrix has no invariant coordinate subspace. In that case A has a positive eigenvalue which is at least as big as the magnitude of any other eigenvalue. As in the positive case it has multiplicity one. There is a vector in the eigenspace all of whose elements are positive. In general there are other eigenvalues of the same magnitude as the maximal positive eigenvalue and they are related to it by multiplication with powers of a root of unity. The estimate for the maximal real eigenvalue in terms of column sums remains true. The last statement follows from the continuous dependence of the eigenvalues on the matrix.

Suppose now that a matrix B has the properties that its off-diagonal elements are non-negative and that the sum of the elements in each of its columns is zero. Then the sum of the elements in each column of a matrix of the form B+\lambda I is \lambda. On the other hand for \lambda sufficiently large the entries of the matrix B+\lambda I are non-negative. If B is irreducible then it can be concluded that the Perron eigenvalue of B+\lambda I is \lambda, that the kernel of B is one-dimensional and that it is spanned by a vector all of whose components are positive. In the proof of the deficiency zero theorem this is applied to certain restrictions of the kinetic matrix. The irreducibility property of B follows from the fact that the network is weakly reversible.

The Perron-Frobenius theorem is proved in Gantmacher’s book on matrices. He proves the non-negative case first and uses that as a basis for the positive case. I would have preferred to see a proof for the positive case in isolation. I was not able to extract a simple conceptual picture which I found useful. I have seen some mention of the possibility of applying the Brouwer fixed point theorem but I did not find a complete treatment of this kind of approach written anywhere. There is an infinite-dimensional version of the theorem (the Krein-Rutman theorem). It applies to compact operators on a Banach space which satisfy a suitable positivity condition. In fact this throws some light on the point raised above concerning a preferred basis. Some extra structure is necessary but it does not need to be as much as a basis. What is needed is a positive cone. Let K be the set of vectors in n-dimensional Euclidean space, all of whose components are non-negative. A matrix is non-negative if and only if it leaves K invariant and this is something which can reasonably be generalized to infinite dimensions. Thus the set K is the only extra structure which is required.

Me on TV

November 26, 2010

Recently I was interviewed by TV journalists for a documentary of the channel 3Sat called “Rätsel Dunkle Materie” [The riddle of dark matter]. It was broadcast yesterday. Before I say more about my experience with this let me do a flashback to the only other time in my life I appeared on TV. On that occasion the BBC visited our school. I guess I was perhaps twelve at the time although I do not know for sure. I was filmed reading a poem which I had written myself. I was seen sitting in a window of the Bishops’ Palace in Kirkwall, looking out. I suppose only my silhouette was visible. I no longer have the text of the poem. All I know is that the first line was ‘Björn, adventuring at last’ and that later on there was some stuff about ravens. At that time I was keen on Vikings. The poem was no doubt very heroic, so that the pose looking out the window was appropriate.

Coming back to yesterday, the documentary consisted of three main elements. There was a studio discussion with three guests – the only one I know personally is Simon White. There were some clips illustrating certain ideas. Thirdly there were short sequences from interviews with some other people. I was one of these people. They showed a few short extracts of the interview with me and I was quite happy with the selection they made. This means conversely that they nicely cut out things which I might not have liked so much. I was answering questions posed by one of the journalists and which were not heard on TV. They told me in advance that this would be the case. They told me that for this reason I should not refer to the question during my answers. I found this difficult to do and I think I would need some practice to do it effectively. Fortunately it seems that they efficiently cut out these imperfections. I did not know the questions in advance of the filming and this led to some hesitant starts in my answers. This also did not come through too much in what was shown. Summing up, it was an interesting experience and I would do it again if I had the chance. Of course being a studio guest would be even more interesting …

I found the documentary itself not so bad. I could have done without the part about religion at the end. Perhaps the inclusion of this is connected with the fact that the presenter of the series, Gert Scobel, studied theology and also has a doctorate in hermeneutics. (I had to look up that word to have an idea what it meant.) An aspect of the presentation which was a bit off track was that it gave the impression that the idea of a theory unifying general relativity and quantum theory was solely due to Stephen Hawking. Before ending this post I should perhaps say something about my own point of view on dark matter and dark energy. Of course they are symptoms of serious blemishes in our understanding of reality. I believe that dark matter and dark energy are better approaches to explaining the existing observational anomalies than any other alternative which is presently available. In the past I have done some work related to dark energy myself. The one thing that I do not like about a lot of the research in this area is that while people are very keen on proposing new ‘theories’ (which are often just more or less vague ideas for models) there is much less enthusiasm for working out these ideas to obtain a logically sound proposal. Of course that would be more difficult. A case study in this direction was carried out in the diploma thesis of Nikolaus Berndt which was done under my supervision. The theme was to what extent the so-called Cardassian models (do not) deserve to be called a theory. We later produced a joint publication on this. It has not received much attention in the research community and as far as I know has only been cited once.

The principle of symmetric criticality

May 12, 2010

There are many interesting partial differential equations which can be expressed as the Euler-Lagrange equations corresponding to some Lagrangian. Thus they are equivalent to the condition that the action defined by the Lagrangian is stationary under all variations. Sometimes we want to study solutions of the equations which are invariant under some symmetry group. Starting from the original equations, it is possible to calculate the symmetry-reduced equations. This is what I and many others usually do, without worrying about a Lagrangian formulation. Suppose that in some particular case the task of doing a symmetry reduction of the Lagrangian is significantly easier than the corresponding task for the differential equations. Then it is tempting to take the Euler-Lagrange equations corresponding to the symmetry-reduced action and hope that for symmetric solutions they are equivalent to the Euler-Lagrange equations without symmetry. But is this always true? The Euler-Lagrange equations without symmetry are equivalent to stationarity under all variations while the Euler-Lagrange equations for the symmetry-reduced action are equivalent to stationarity under symmetric perturbations. The second property is a priori weaker than the first. This procedure is often implicit in physics papers, where the variational formulation is more at the centre of interest than the equations of motion.

The potential problem just discussed is rarely if ever mentioned in the physics literature. Fortunately this question has been examined a long time ago by Richard Palais in a paper entitled ‘The principle of symmetric criticality’ (Commun. Math. Phys. 69, 19). I have known of the existence of this paper for many years but I never took the trouble to look at it seriously. Now I have finally done so. Palais shows that the principle is true if the group is compact or if the action is by isometries on a Riemannian manifold. Here the manifold is allowed to be an infinite-dimensional Hilbert manifold, so that examples of relevance to field theories in physics are included. The proof in the Riemannian case is conceptually simple and so I will give it here. Suppose that (M,g) is a Riemannian manifold and f a function on M. Let a group G act smoothly on M leaving g and f invariant. Let p be a critical point of the restriction of f to the set F of fixed points of the group action. It can be shown that F is a smooth totally geodesic submanifold. (In fact in more generality a key question is whether the fixed point set is a submanifold. If this is not the case even the definition of the principle may be problematic.) The gradient of f at p is orthogonal to F. Now consider the geodesic starting at p with initial tangent vector equal to the gradient of f. It is evidently invariant under the group action since all the objects entering into its definition are. It follows that this geodesic consists of fixed points of the action of G and so must be tangent to F. Hence the gradient of f vanishes.

When does the principle fail? Perhaps the simplest example is given by the action of the real numbers on the plane generated by the vector field x\frac{\partial}{\partial y} and the function x. This has no critical points but its restriction to the fixed point set, which is the y-axis, has critical points everywhere.


Get every new post delivered to your Inbox.

Join 44 other followers