Light and lighthouses

June 3, 2019

I recently had the idea that I should improve my university web pages. The most important thing was to give a new presentation of my research. At the same time I had the idea that the picture of me on the main page was not very appropriate for attracting people’s attention and I decided to replace it with a different one. Now I have a picture of me in front of the lighthouse ‘Les Éclaireurs’ in the Beagle Channel, taken by my wife. I always felt a special attachment to lighthouses. This was related to the fact that as a child I very much liked the adventure of visiting uninhabited or sparsely inhabited small islands and these islands usually had lighthouses on them. This was in particular true in the case of Auskerry, an island which I visited during several summers to ring birds, especially storm petrels. I wrote some more about this in my very first post on this blog. For me the lighthouse is a symbol of adventure and of things which are far away and not so easy to reach. In this sense it is an appropriate symbol for how I feel about research. There too the goals are far away and hard to reach. In this context I am reminded of a text of Marcel Proust which is quoted by Mikhail Gromov in the preface to his book ‘Metric structures for Riemannian and non-Riemannian spaces’:

‘Même ceux qui furent favorables à ma perception des vérités que je voulais ensuite graver dans le temple, me félicitèrent de les avoir découvertes au microscope, quand je m’étais au contraire servi d’un télescope pour apercevoir des choses, très petites en effet, mais parce qu’elles étaient situées à une grande distance, et qui étaient chacune un monde’

[Even those who were favourable to my perception of the truths which I wanted to engrave in the temple, congratulated me on having discovered them with a microscope, when on the contrary I used a telescope to perceive things, in fact very small, but because they were situated at a great distance, and each of which was a world in itself.]

I feel absolutely in harmony with that text. Returning to lighthouses, I think they are also embedded in my unconscious. Years ago, I was fascinated by lucid dreams. A lucid dream usually includes a key moment, where lucidity begins, i.e. where the dreamer becomes conscious of being in a dream. In one example I experienced this moment was brought about by the fact of simultaneously seeing three lighthouses, those of Copinsay, Auskerry and the Brough of Birsay. Since I knew that in reality it is impossible to see all three at the same time this made it clear to me that I must be dreaming.

The function of a lighthouse is to use light to convey information and to allow people (seafarers) to recognise things which are important for them. Thus a lighthouse is a natural symbol for such concepts as truth, reason, reliability, learning and science. These concepts are of course also associated with the idea of light itself, that which allows us to see things. These are the elements which characterize the phase of history called the enlightenment. Sometimes I fear that we are now entering a phase which is just the opposite of that. Perhaps it could be called the age of obscurity. It is characterized by an increasing amount of lies, deceit, ignorance and superstition. Science continues its progress but sometimes it seems to me like a thin ray among gathering darkness. A future historian might describe the arch leading from the eighteenth to the twenty-first century. I recently watched a video of the Commencement speech of Angela Merkel in Harvard. In a way many of the things she said were commonplaces, nothing new, but listening to her speech and seeing the reactions of the audience it became clear to me that it is important these days to repeat these simple truths. Those of us who have not forgotten them should propagate them. And with some luck, the age of obscurity may yet be averted.


Book on cancer therapy using immune checkpoints, part 2

April 20, 2019

I now finished reading the book of Graeber I wrote about in the last post. Here are some additional comments. Chapter 7 is about CAR T cells, a topic which I wrote about briefly here. I also mentioned in that post that there is a mathematical model related to this in the literature but I have not got around to studying it. Chapter 8 is a summary of the present state of cancer immunotherapy while the last chapter is mainly concerned with an individual case where PD-1 therapy showed a remarkable success but the patient, while against all odds still alive, is still not cancer-free. It should not be forgotten that the impressive success stories in this field are accompanied by numerous failures and the book also reports at length on what these failures can look like for individual patients.

For me the subject of this book is the most exciting topic in medicine I know at the moment. It is very dynamic with numerous clinical studies taking place. It is suggested in the book that there is a lot of redundancy in this and correspondingly a lot of waste, financial and human. My dream is that progress in this area could be helped by more theoretical input. What do I mean by progress? There are three directions which occur to me. (1) Improving the proportion of patients with a given type of cancer who respond by modifying a therapy or replacing it by a different one. (2) Identifying in advance which patients with a given type of cancer will respond to which therapy, so as to allow rational choices between therapies in individual cases. (3) Identifying new types of cancer which are promising targets for a given therapy. By theoretical input I mean getting a better mechanistic understanding of the ways in which given therapies work and using that to obtain a better understanding of the conditions needed for success. The dream goes further with the hope that this theoretical input could be improved by the formulation and analysis of mathematical models.

What indications are there that this dream can lead to something real? I have already mentioned one mathematical model related to CAR T-cells. I have mentioned a mechanistic model for PD-1 by Mellman and collaborators here. This has been made into a mathematical model in a 2018 article by Arulraj and Barik (PLoS ONE 13(10): e0206232). There is a mathematical model for CTLA-4 by Jansson et al. (J. Immunol. 175, 1575) and it has been extended to model the effects of related immunotherapy in a 2018 paper of Ganesan et al. (BMC Med. Inform. Decis. Mak. 18,37).

I conclude by discussing one topic which is not mentioned in the book. In Mainz (where I live) there is a company called BIONTECH with 850 employees whose business is cancer immunotherapy. The CEO of the company is Ugur Sahin, who is also a professor at the University of Mainz. I have heard a couple of talks by him, which were on a relatively general level. I did not really understand what his speciality is, only that it has something to do with mRNA. I now tried to learn some more about this and I realised that there is a relation to a topic mentioned in the book, that of cold and hot tumours. The most favourable situation for immune checkpoint therapies is where a tumour does in principle generate a strong immune response and has adapted to switch that off. Then the therapy can switch it back on. This is the case of a hot tumour, which exhibits a lot of mutations and where enough of these mutations are visible to the immune system. By contrast for a cold tumour, with no obvious mutations, there is no basis for the therapy to work on. The idea of the type of therapy being developed by Sahin and collaborators is as follows (my preliminary understanding). First analyse DNA and RNA from the tumour of a patient to identify existing mutations. Then try to determine by bioinformatic methods which of these mutations could be presented effectively by the MHC molecules of the patients. This leads to candidate proteins which might stimulate the immune system to attack the tumour cells. Now synthesise mRNA coding for those proteins and use it as a vaccine. The results of the first trials of this technique are reported in a 2017 paper in Nature 547, 222. It has 295 citations in Web of Science which indicates that it has attracted some attention.

Book on cancer therapy using immune checkpoints

April 19, 2019

In a previous post I wrote about cancer immunotherapy and, in particular, about the relevance of immune checkpoints such as CTLA-4. For the scientific work leading to this therapy Jim Allison and Tasuku Honjo were awarded the Nobel Prize for Medicine in 2018. I am reading a book on this subject, ‘The Breakthrough. Immunotherapy and the Race to Cure Cancer’ by Charles Graeber. I did not feel in harmony with this book due to some notable features which made it far from me. One was the use of words and concepts which are typically American and whose meanings I as a European do not know. Of course I could go out and google them but I do not always feel like it. A similar problem arises from the fact that I belong to a different generation than the author. It is perhaps important to realise that the author is a journalist and not someone with a strong background in biology or medicine. One possible symptom of this is the occurrence of spelling mistakes or unconventional names (e.g. ‘raff’ instead of ‘raf’, ‘Mederex’ instead of ‘Medarex’ for the company which played an essential role in the development of antibodies for cancer immunotherapy, ‘dendrites’ instead of ‘dendritic cells’). As a consequence I think that if a biological statement made in the book looks particularly interesting it is worth trying to verify it independently. For example, the claim in one of the notes to Chapter 5 that penicillin is fatal to mice is false. This is not only of interest as a matter of scientific fact since it has also been used as an (unjustified) argument by protesters against medical experiments in animals. More details can be found here.

Despite this I find the book a very rewarding read due to the stories it tells. It was exciting to read the first chapter which describes the experiences of one of the first patients to experience what seemed like a miracle cure due to treatment with an antibody to PD-L1. I find it fascinating to get an impression of what a person in this type of situation actually lives through. On a personal note, I was happy to see that when the patient met the team of researchers who had developed the treatment one of the people present was Ira Mellman. As I mentioned in a previous post I have been present at a lecture of Mellman. The second chapter describes known cases where an infectious disease can lead to the elimination of a tumour. It describes how, more than a hundred years ago, William Coley tried to turn this observation into a therapy. His success in doing so was very limited and this was unavoidable. The ideas needed to understand what might be going on in such a situation simply did not exist at that time. Without understanding it was impossible to pursue a therapy in a controlled way. I knew something about this story before reading the book but it filled in a lot more background for me. The key figure in the third chapter is Steven Rosenberg. I had not heard his name before. He had an important position at the NIH and pursued research into cancer immunotherapy during a period where there were few returns. One substance which he tried to use therapeutically was IL-2. Here again I was pleased to come across the name of a person who I have heard give a talk, as mentioned in a previous post. This is Kendall Smith, the discoverer of IL-2.

Chapter four is concerned with Jim Allison, the discoverer of the first type of cancer immunotherapy using CTLA-4. I find it interesting that in his research Allison was not deriven by the wish to find a cancer therapy. He wanted to understand T cells and their activation. While doing so he discovered CTLA-4, as an important ‘off switch’ for T cells. It seems that from the beginning Allison liked to try certain experiments just to see what would happen. If what he found was more complicated than he expected he found that good. In any case, Allison did an experiment where mice with tumours were given antibodies to CTLA-4. This disables the off switch. The result was that while the tumours continued to grow in the untreated control mice they disappeared in the treated mice. The 100% reponse was so unexpected that Allison immediately repeated the experiment to rule out having made some mistake. The result was the same.

The fifth chapter throws some light on the question why researchers were so sceptical for so long about the idea that the immune system can effectively fight cancer. The central conceptual reason is that in order to interpret the results of certain experiments it is not enough to consider the typical cancer cell. Instead it is necessary to think on a population level and see the tumour as an ecosystem. When cancer cells are attacked in some way and the majority die there will be a few left over which are immune to that particular type of attack. That small population will then expand and the tumour will grow again. The genetic composition of a typical cell in the new tumour will be very different from that of the old one. In the case of an attack by the immune system this gives rise to the concept of ‘cancer immunoediting’. On the road to transforming the experimental results of Allison into a therapy there were further conceptual obstacles along the road. In the phase 2 clinical trial for an antibody against CTLA-4 run by Bristol-Myers-Squibb (which had taken over the company Medarex which had started the development of the drug) the criteria for success were badly chosen. They were based on what might have been good criteria for a chemotherapy but were not good for the new type of therapy. Success was based on the tumours of a certain percentage of patients having shrunk by a certain amount after three months. The problem was that the time scale (which had been chosen to limit the expense) was too short and tumour size was not the right thing to look at. It could be seen that he tumours of certain patients had grown but they reported that they were feeling better. It happened that a patient who was about to die called up months later, after the planned endpoint of the trial and said ‘I’m fine’. What is the explanation for these things? The first aspect is that the immune response required to attack the tumour takes a considerable time to develop and success needs more than three months. The other is that a bigger tumour does not necessarily mean more cancer cells. It can also mean that there are huge numbers of immune cells in the tumour. Imaging the size of the tumour misses that. A similar trial to that of BMS gave similar results and was abandoned. That the same did not happen with the BMS trial was apparently due to someone called Axel Hoos. He persuaded the company to extend the trial and introduced a better endpoint criterion, the proportion of patients who live for a certain time. This led to success and eventually to the approval of the drug ipilimumab. Its rate of success, in the case of metastatic melanoma, is that about 20% of the patients are cured (in the sense that the tumours go away and have not come back until today, the survival curve flattens out at a positive value). The side effects are formidable, due to autoimmune reactions.

Chapter six comes back to the therapy with PD-L1 with which the book started. The treatments with antibodies against PD-1 and PD-L1 have major advantages compared to those with CTLA-4. The success rate with metastatic melanoma can exceed 50% and the side effects are much less serious. The latter aspect has to do with the fact that in this case the mode of action is less to activate T cells in general than to sustain the activation of cells which are already attacking the tumour. This does not mean that treatments targetting CTLA-4 have been superceded. For certain types of cancer it can be better than those targetting PD-1 or PD-L1 and combinations may be better than either type of therapy alone. For the second class of drugs getting them on the market was also not easy. In the book it is described how this worked in the case of a drug developed by Genentech. It had to be decided whether the company wanted to develop this drug or a more conventional cancer therapy. The first was more risky but promised a more fundamental advance if successful. There was a showdown between the oncologists and the immunologists. After a discussion which lasted several hours the person responsible for the decision said ‘This is enough, we are moving forward’ and chose the risky alternative.

This post has already got quite long and it is time to break it off here. What I have described already covers the basic discussion in the book of the therapies using CTLA-4 and PD-1 or PD-L1. I will leave everthing else for another time.

Banine’s ‘Jours caucasiens’

April 11, 2019

I have just read the novel ‘Jours caucasiens’ by Banine. This is an autobiographical account of the author’s childhood in Baku. I find it difficult to judge how much of what she writes there is true and how much is a product of her vivid imagination. I do not find that so important. In any case I found it very interesting to read. It is not for readers who are easily shocked. Banine is the pen name of Umm-El-Banine Assadoulaeff. She was born in Baku into a family of oil magnates and multimillionaires. In fact she herself was in principle a multimillionaire for a few days after the death of her grandfather, until her fortune was destroyed when Azerbaijan was invaded by the Soviet Union. In later years she lived in Paris and wrote in French. To my taste she writes very beautifully in French. I first heard of her through the diaries of Ernst Jünger. While he was an officer in the German army occupying Paris during the Second World War he got to know Banine and visited her regularly. It was not entirely unproblematic for her during the occupation when she was visited at her appartment by a German army officer in uniform. She seemed to regard this with humour. The two had a close but platonic relationship.

As I mentioned in a previous post, during the year I lived in Paris I was a frequent visitor to the library of the Centre Pompidou. One of the books I found and read there was a book by Banine about her meetings with Jünger. I believe she actually wrote three books about him and I am not sure which one it was I read. I enjoyed reading her book and it was nice to see an account of Jünger’s time in Paris which was complimentary to his own. I had completely forgotten about Banine until recently. I was reminded of her by the following chain of circumstances. Together with my wife we were thinking of going on holiday to Georgia. I found an interesting organised tour visiting Georgia, Armenia and Azerbaijan. One of the places to be visited was Baku. I must confess that it was not clear to me at that point that Baku was in Azerbaijan. In any case, it occurred to me that Banine was born in Baku and I looked her up in Wikipedia. I found out that she had written the book ‘Jours caucasiens’ and I thought it might be good to read it before the planned trip. I got the book from the university library without being sure I wanted to read it. The prose of the first page captured me immediately and did not let me go. The potential trip to Georgia will not take place this year, if at all. Even if it does not it has had the pleasant consequence of leading me to rediscover Banine.

The society in which Banine grew up was the result of the discovery of oil. Her ancestors had been poor farmers who suddenly became very rich because oil-wells were built on their land. She presents her family as being very uncivilised. They were muslims but had already been strongly affected by western culture. I found an article in the magazine ‘Der Spiegel’ from 1947 where ‘Jours caucasiens’ is described by the words ‘gehören zu den skandalösesten Neuerscheinungen in Paris’ [is one of the most scandalous new publications in Paris]. It also says that her family was very unhappy about the way they were presented in the book and I can well understand that. It seems that she had a low opinion of her family and their friends and the culture they belonged to, although she herself did not seem to mind being part of it. She was attracted by Western culture and Paris was the place of her dreams. As a child she had a German governess. Her mother died when she was very young and after her father had remarried she had a French and an English teacher for those languages. She quickly fell in love with French. On the other hand, she saw having to learn English as a bit of a nuisance. Her impression was that the English had just taken the words from German and French and changed them in a strange way.

After the Russian invasion Banine’s father, who had been a government minister in the short-lived Azerbaijan Republic, was imprisoned. He was released due to the efforts of a man whose motivation for doing so was the desire to marry Banine. She was very much against this. Perhaps the strongest reason was that he had red hair. There was a superstition that red-haired people, who were not very common in that region, had evil supernatural powers. Banine’s grandmother told her a story about an alchemist who discovered the secret of red-haired people. According to him they should be treated in the following way. He cut off their head, boiled it in a pot and put the head on a pedestal. If this was done correctly then the heads would start to speak and make prophecies which were always true. Banine could not help associating her potential husband with this horrible myth. Unfortunately she was under a lot of social pressure and after hesitating a bit agreed to the marriage. Apart from being a sign of gratitude for her father’s release this was also a way of persuading her suitor to use his influence to get a visa for her father to allow him to leave Russia. In the end she accepted this arrangement instead of running away with the man she loved. At this time she was fifteen years old. Her father got the visa and left the country. Later she also got a visa and was able to leave. The last stage of her journey was with the Orient Express from Constantinople to Paris. The book ends as the train is approaching Paris and a new life is starting for her.

Stability in the multiple futile cycle

April 8, 2019

In a previous post I described the multiple futile cycle, where a protein can be phosphorylated up to n times. About ten years ago Wang and Sontag proved that with a suitable choice of parameters this system has 2k+1 steady states. Here k denotes the integral part of n/2. The question of the stability of these steady states was left open. On an intuitive level it is easy to have the picture that stable steady states and saddle points should alternate. This suggests that there should be k+1 stable states and k saddles. On the other hand it is not clear where this intuition comes from and it is very doubtful whether it is reliable in a high-dimensional dynamical system. I have thought about this issue for several years now. I had some ideas but was not able to implement them in practise. Now, together with Elisenda Feliu and Carsten Wiuf, we have written a paper where we prove that indeed there are parameters for which there exist steady states with the stability properties suggested by the intuitive picture.

How can information about the relative stability of steady states be obtained by analytical calculations? For this it is good if the steady states are close together so that their stability can be investigated by local calculations. One way they can be guaranteed to be close together is if they all originate in a single bifurcation as a parameter is varied. This is the first thing we arranged in our proof. The next observation is that the intuition I have been talking about is based on thinking in one dimension. In a one-dimensional dynamical system alternating stability of steady states does happen, provided degenerate situations are avoided. Thus it is helpful if the centre manifold at the bifurcation is one-dimensional. This is the second thing we arranged in our proof. To get the particular kind of alternating stability mentioned above we also need the condition that the flow is contracting towards the centre manifold. I had previously solved the case n=2 of this problem with Juliette Hell but we had no success in extending it to larger values of n. The calculations became unmanageable. One advantage of the case n=2 is that the bifurcation there was a cusp and certain calculations are done in great generality in textbooks. These are based on the presence of a one-dimensional centre manifold but it turns out to be more efficient for our specific problem to make this explicit.

The general structure of the proof is that we first reduce the multiple futile cycle, which has mass action kinetics, to a Michaelis-Menten system which is much smaller. This reduction is well-behaved in the sense of geometric singular perturbation theory (GSPT), since the eigenvalues of a certain matrix are negative. With this in place steady states can be lifted from the Michaelis-Menten system to the full system while preserving their stability properties. The bifurcation arguments mentioned above are then applied to the Michaelis-Menten system.

The end result of the ideas discussed so far is that the original analytical problem is reduced to three algebraic problems. The first is the statement about the eigenvalues required for the application of GSPT. This was obtained for the case n=2 in my work with Juliette but we had no idea how to extend it to higher values of n. The second is to analyse the eigenvalues of the linearization of the system about the bifurcation point. What we want is that two eigenvalues are zero and that all others have negative real parts. (One zero eigenvalue arises because there is a conservation law while the second corresponds to the one-dimensional centre manifold.) There are many parameters which can be varied when choosing the bifurcation point and a key observation is that this choice can be made in such a way that the linearization is a symmetric matrix, which is very convenient for studying eigenvalues. The third problem is to determine the leading order coefficient which determines the stability of the bifurcation point within the centre manifold.

I started to do parts of the algebra and I would describe it as being like entering a jungle with a machete. I was able to find a direction to proceed and show that some progress could be made but I very soon got stuck. Fortunately my coauthors came and built a reliable road to the final goal.

The pole-shifting theorem, part 2

March 25, 2019

In the last post I wrote about the pole-shifting theorem but included almost no information about the proof. Now I want to sketch that proof, following the account in the book of Sontag. It is convenient to talk about prescribing the characteristic polynomial rather than prescribing the eigenvalues. The characteristic polynomial contains the information about the set of eigenvalues together with information about their multiplicity. It is helpful to use changes of basis in the state space to simplify the problem. A change of basis leads to a similarity transformation of A and so does not change the characteristic polynomial. It also does not change the rank of R(A,B). Hence the property of controllability is not changed. Which polynomials can be obtained from matrices of the form A+BF also does not change since the change of basis can be used to transform F in an obvious way. Putting these things together shows that when proving the theorem for given matrices it is allowed to pass to new matrices by a change of basis when convenient.

The first step in the proof is to look at the theorem in the case of one control variable (m=1). I will use the notation (A,b). In this case the system can be brought into a special form, the controller form, by a change of basis. Then it is elementary to solve for the unique feedback which solves the problem. The next step is to reduce the case of general m to a modified control problem with m=1. Let v be any vector with Bv non-zero and b=Bv. The idea is to show that there is an F_1 such that (A+BF_1,b) is controllable. If this can be done then the result for m=1 gives, for a given polynomial \chi, a matrix f such that the characteristic polynomial of (A+BF_1)+bf is \chi. But (A+BF_1)+bf=A+B(F_1+vf) and so taking F=F_1+vf solves the desired problem.

It remains to find F_1. For this purpose a sequence of vectors is constructed as follows. First choose a vector v such that Bv is non-zero and let x_1=Bv. Then x_1 is a non-zero element of the state space. Next choose x_2=Ax_1+u_1, where u_1 belongs to the image of B, in such a way that x_1 and x_2 are linearly independent. If this succeeds continue to choose x_3=Ax_2+u_2 in a similar way. The idea is to construct a maximal chain of linearly independent vectors \{x_1,\ldots,x_k\} of this type. The claim is now that if (A,B) is controllable k=n. Consider the space spanned by the x_i. It is of dimension k. Since the chain cannot be extended Ax_k+Bu must also belong to this space, for any choice of u. In particular Ax_k belongs to the space. Hence the image of B belongs to the space. The definition of the x_i then implies that Ax_i belongs to the space for all i so that the space is invariant under A. Putting these facts together shows that the image of R(A,B) is contained in this space. By controllability it must therefore be the whole n-dimensional Euclidean space. Next define F_1x_i=u_i for i=1,2,\ldots,k-1 and F_1x_k arbitrarily. Then R(A+BF_1,x_1)=(x_1,\ldots,x_n), which completes the proof.

In fact this theorem can be extended to one which describes which polynomials can be assigned when (A,B) is not controllable. They are the polynomials of the form \chi_1\chi_u where \chi_1 is an arbitrary monic polynomial of degree r and \chi_u is a polynomial defined by (A,B) called the uncontrollable part of the characteristic polynomial of A. What this means is that some poles (the uncontrollable ones) are fixed once and for all and the others can be shifted arbitrarily.

The pole-shifting theorem

March 22, 2019

Here I discuss a theorem of linear algebra whose interest comes from its applications in control theory.The discussion follows that in the book ‘Mathematical Control Theory’ by Eduardo Sontag. Let A be an n\times n matrix and B an n\times m matrix. We consider the expression A+BF for an m\times n matrix F. The game is now to fix A and B and attempt, by means of a suitable choice of F, to give the eigenvalues of A+BF specified values. The content of the theorem is that this is always possible, provided a suitable genericity assumption, called controllability, is satisfied. In fact this statement has to be modified slightly. I want to work with real matrices and thus the eigenvalues automatically come in complex conjugate pairs. Thus the correct statement concerns candidates for the set of eigenvalues which satisfy that restriction. Where does the name of the theorem come from? Its source is the fact that eigenvalues of a matrix M can be though of as poles of the function (\det (M-\lambda I))^{-1} or the matrix-valued function (M-\lambda I)^{-1}. This is a picture popular in classical control theory. The primary importance of this result for control theory is that the stability of a control system is in many cases determined by a matrix of the form A+BF. If we can choose F so that the eigenvalues of A+BF are all real and negative then we have shown how a system can be designed for which the desired state is asymptotically stable. When the state is perturbed it returns to the desired state. It even does so in a monotone manner, i.e. without any overshoot.

What is the genericity condition? It is implest to explain in the case m=1. Then B is a column vector and we can consider the vectors B, AB, A^2B, \ldots. After at most n-1 steps this sequence of vectors becomes constant. Controllability is the condition that the vectors generated in this way span the whole space. This condition can be reformulated as follows. We identify a set of n vectors in n dimensions with the Euclidean space of n^2 dimensions. To put it another way we place the vectors side by side as the columns of a matrix. Then the condition of controllability is nothing other than the condition that the rank of the resulting matrix is n. The path to the general definition is then simple. The matrices listed before are no longer vectors but we place them side by side to get an n\times mn matrix R(A,B), the reachability or controllability matrix. The condition for controllability is then that this matrix has rank n.

There are also other equivalent conditions for controllability, known under the name of the Hautus Lemma. This says that the rank condition (call this condition (i)) is equivalent to the condition (call it condition (ii)) that the rank of the matrix obtained by placing \lambda I-A next to B is n for all complex numbers \lambda. It is easily seen that it is equivalent to assume this in the case that \lambda is any eigenvalue of A. The proof that (i) imples (ii) is elementary linear algebra. The converse is more complicated and relies on the concept of the Kalman controllability decomposition. The proof of the pole-shifting theorem itself is rather involved and I will not discuss it here.

Herd immunity

February 14, 2019

I have a long term interest in examples where mathematics has contributed to medicine. Last week I heard a talk at a meeting of the Mainzer Medizinische Gesellschaft about vaccination. The speaker was Markus Knuf, director of the pediatric section of the Helios Clinic in Wiesbaden. In the course of his talk he mentioned the concept of ‘herd immunity’ several times. I was familiar with this concept and I have even mentioned it briefly in some of my lectures and seminars. It never occurred to me that in fact this is an excellent example of a case where medical understanding has benefited from mathematical considerations. Suppose we have a population of individuals who are susceptible to a particular disease. Suppose further that there is an endemic state, i.e. that the disease persists in the population at a constant non-zero level. It is immediately plausible that if we vaccinate a certain proportion \alpha of the population against the disease then the proportion of the population suffering from the disease will be lower than it would have been without vaccination. What is less obvious is that if \alpha exceeds a certain threshold \alpha_* then the constant level will be zero. This is the phenomenon of herd immunity. The value of \alpha_* depends on how infectious the disease is. A well-known example with a relatively high value is measles, where \alpha is about 0.95. In other words, if you want to get rid of measles from a population then it is necessary to vaccinate at least 95% of the population. It occurs to me that this idea is very schematic since measles does not occur as a constant rate. Instead it occurs in large waves. This idea is nevertheless one which is useful when making public health decisions. Perhaps a better way of looking at it is to think of the endemic state as a steady state of a dynamical model. The important thing is that this state is asymptotically stable in the dynamic set-up so that it recovers from any perturbation (infected individuals coming in from somewhere else). It just so happens that in the usual mathematical models for this type of phenomenon whenever a positive steady state (i.e. one where all populations are positive) exists it is asymptotically stable. Thus the distinction between the steady state and dynamical pictures is not so important. After I started writing this post I came across another post on the same subject by Mark Chu-Carroll. I am not sad that he beat me to it. The two posts give different approaches to the same subject and I think it is good if this topic gets as much publicity as possible.

Coming back to the talk I mentioned, a valuable aspect of it was that the speaker could report on things from his everyday experience in the clinic. This makes things much more immediate than if someone is talking about the subject on an abstract level. Let me give an example. He showed a video of a small boy with an extremely persistent cough. (He had permission from the child’s parents to use this video for the talk.) The birth was a bit premature but the boy left the hospital two weeks later in good health. A few weeks after that he returned with the cough. It turned out that he had whooping cough which he had caught from an adult (non-vaccinated) relative. The man had had a bad cough but the cause was not realised and it was attributed to side effects of a drug he was taking for a heart problem. The doctors did everything to save the boy’s life but the infection soon proved fatal. It is important to realize that this is not an absolutely exceptional case but a scenario which happens regularly. It brings home what getting vaccinated (or failing to do so) really means. Of course an example like this has no statistical significance but it can nevertheless help to make people think.

Let me say some more about the mathematics of this situation. A common type of model is the SIR model. The dependent variables are S, the number of individuals who are susceptible to infection by the disease, I, the number of individuals who are infected (or infectious, this model ignores the incubation time) and R, the number of individuals who have recovered from the disease and are therefore immune. These three quantities depend on time and satisfy a system of ODE containing a number of parameters. There is a certain combination of these parameters, usually called the basic reproductive rate (or ratio) and denoted by R_0 whose value determines the outcome of the dynamics. If R_0\le 1 the infection dies out – the solution converges to a steady state on the boundary of the state space where I=0. If, on the other hand, R_0>1 there exists a positive steady state, an endemic equilibrium. The stability this of this steady state can be examined by linearizing about it. In fact it is always stable. Interestingly, more is true. When the endemic steady state exists it is globally asymptotically stable. In other words all solutions with positive initial data converge to that steady state at late time. For a proof of this see a paper by Korobeinikov and Wake (Appl. Math. Lett. 15, 955). They use a Lyapunov function to do so. At this point it is appropriate to mention that my understanding of these things has been improved by the hard work of Irena Vogel, who recently wrote her MSc thesis on the subject of Lyapunov functions in population models under my supervision.


The probability space as a fiction

February 12, 2019

I have always had the impression that I understood probability theory very poorly. I had a course on elementary probability theory as an undergraduate and I already had difficulties with that. I was very grateful that in the final exam there was a question on the Borel-Cantelli Lemma which was about the only thing I did understand completely. More recently I have taught elementary probability myself and I do now have a basic understanding there. As a source I used the book of Feller which was the text I had as an undergraduate. I nevertheless remained without a deeper understanding of the subject. In the more recent past I have often been to meetings on reaction networks and on such occasions there are generally talks about both the deterministic and stochastic cases. I did learn some things in the stochastic talks but I was missing the mathematical background, the theory of continuous time Markov chains. My attempts to change this by background reading met with limited success. Yesterday I found a book called ‘Markov Chains’ by J. R. Norris and this seems to me more enlightening than anything I had tried before.

Looking at this book also led to progress of a different kind. I started thinking about the question of why I found probability theory so difficult. One superficial view of the subject is that it is just measure theory except that the known objects are called by different names. Since I do understand measure theory and I have a strong affinity for language if that was the only problem I should have been able to overcome it. Then I noticed a more serious difficulty, which had previously only been hovering on the edge of my consciousness. In elementary probability the concept of a probability space is clear – it is a measure space with total measure one. In more sophisticated probability theory it seems to vanish almost completely from the discussion. My impression in reading texts or listening to talks on the subject is that there is a probability space around in the background but that you never get your hands on it. You begin to wonder if it exists at all and this is the reason for the title of this post. I began to wonder if it is like the embedding into Euclidean space which any manifold in principle has but which plays no role in large parts of differential geometry. An internet search starting from this suspicion let me to an enlightening blog post of Terry Tao called ‘Notes 0: A review of probability theory‘. There he reviews ‘foundational aspects of probability theory’. Fairly early in this text he compares the situation with that in differential geometry. He compares the role of the probability space to that of a coordinate system in differential geometry, a probably better variant of my thought with the embeddings. He talks about a ‘probabilistic way of thinking’ as an analogue of the ‘geometric way of thinking’. So I believe that I have now discovered the basic thing I did not understand in this context – I have not yet understood the probabilistic way of thinking. When I consider the importance when doing differential geometry of (not) understanding the geometric way of thinking I see what an enormous problem this is. It is the key to understanding the questions of ‘what things are’ and ‘where things live’. For instance, to take an example from Tao’s notes, Poisson distributions are probability measures (‘distribution’ is the probabilistic translation of the word ‘measure’) on the natural numbers, the latter being thought of as a potential codomain of a random variable. Tao writes ‘With this probabilistic viewpoint, we shall soon see the sample space essentially disappear from view altogether …’ Why I am thinking about the Cheshire cat?

In a sequel to the blog post just mentioned Tao continues to discuss free probability. This is a kind of non-commutative extension of ordinary probability. It is a subject I do not feel I have to learn at this moment but I do think that it would be useful to have an idea how it reduces to ordinary probability in the commutative case. There is an analogy between this and non-commutative geometry. The latter subject is one which fascinated me sufficiently at the time I was at IHES to motivate me to attend a lecture course of Alain Connes at the College de France. The common idea is to first replace a space (in some sense) by the algebra of (suitably regular) functions on that space with pointwise operations. In practise this is usually done in the context of complex functions so that we have a * operation defined by complex conjugation. This then means that continuous functions on a compact topological space define a commutative C^*-algebra. The space can be reconstructed from the algebra. This leads to the idea that a C^*-algebra can be thought of as a non-commutative topological space. I came into contact with these things as an undergraduate through my honours project, supervised by Ian Craw. Non-commutative geometry has to do with extending this to replace the topological space by a manifold. Coming back to the original subject, this procedure has an analogue for probability theory. Here we replace the continuous functions by L^\infty functions, which also form an algebra under pointwise operations. In fact, as discussed in Tao’s notes, it may be necessary to replace this by a restricted class of L^\infty functions which are in particular in L^1. The reason for this is that a key structure on the algebra of functions (random variables) is the expectation. In this case the * operation is also important. The non-commutative analogue of a probability space is then a W^*-algebra (von Neumann algebra). Comparing with the start of this discussion, the connection here is that while the probability space fades into the background the random variables (elements of the algebra) become central.

Minimal introduction to Newton polygons

January 24, 2019

In my work on dynamical systems I have used Newton polygons as a practical tool but I never understood the theoretical basis for why they are helpful. Here I give a minimal theoretical discussion. I do not yet understand the link to the applications I just mentioned but at least it is a start. Consider a polynomial equation of the form p(x,y)=0. The polynomial p can be written in the form p(x,y)=\sum_{i,j}a_{ij}x^iy^j. Suppose that p(0,0)=0, i.e. that a_{00}=0. I look for a family y=u(x) of solutions satisfying u(x)=Ax^\alpha+\ldots. We have F(x,u(x))=0. Intuitively the zero set of p is an algebraic variety which near the origin is the union of a finite number of branches. The aim is to get an analytic approximation to these branches. Substituting the ansatz into the equation gives a_{ij}x^iy^j=a_{ij}A^jx^{\alpha j+i}+\ldots=0. If we compare the size of the summands in this expression then we see that summands have the same order of magnitude if they satisfy the equation \alpha j+i=C for the same constant C. Let S be the subset of the plane with coordinates (i,j) for those cases where a_{ij}\ne 0. For C=0 the line L with equation \alpha j+i=C does not intersect S. If we increase C then eventually the line L will meet S. If it meets S in exactly one point then the ansatz is not consistent. A given value of \alpha is only possible if the line meets S in more than point for some C. Let \tilde S be the set of points with coordinates (k,l) such that k\ge i and l\ge j for some (i,j)\in S and let K be the convex hull of \tilde S. Then for an acceptable value of \alpha the line L must have a segment in common with K. There are only finitely many values of \alpha for which this is the case. A case which could be of particular interest is that of the smallest branch, i.e. that for which \alpha takes the smallest value. Consider for simplicity the case that only two points of L belong to S. Call their coordinates (i_1,j_1) and (i_2,j_2). Then the coefficient A is determined by the relation A^{j_2-j_1}=-\frac{a_{i_1j_1}}{a_{i_2j_2}}. Further questions which arise are whether the formal asymptotic expansion whose leading term has been calculated can be extended to higher order and whether there is a theorem asserting the existence of a branch for which this is actually an asymptotic approximation. These will not be pursued here.