I have always had the impression that I understood probability theory very poorly. I had a course on elementary probability theory as an undergraduate and I already had difficulties with that. I was very grateful that in the final exam there was a question on the Borel-Cantelli Lemma which was about the only thing I did understand completely. More recently I have taught elementary probability myself and I do now have a basic understanding there. As a source I used the book of Feller which was the text I had as an undergraduate. I nevertheless remained without a deeper understanding of the subject. In the more recent past I have often been to meetings on reaction networks and on such occasions there are generally talks about both the deterministic and stochastic cases. I did learn some things in the stochastic talks but I was missing the mathematical background, the theory of continuous time Markov chains. My attempts to change this by background reading met with limited success. Yesterday I found a book called ‘Markov Chains’ by J. R. Norris and this seems to me more enlightening than anything I had tried before.
Looking at this book also led to progress of a different kind. I started thinking about the question of why I found probability theory so difficult. One superficial view of the subject is that it is just measure theory except that the known objects are called by different names. Since I do understand measure theory and I have a strong affinity for language if that was the only problem I should have been able to overcome it. Then I noticed a more serious difficulty, which had previously only been hovering on the edge of my consciousness. In elementary probability the concept of a probability space is clear – it is a measure space with total measure one. In more sophisticated probability theory it seems to vanish almost completely from the discussion. My impression in reading texts or listening to talks on the subject is that there is a probability space around in the background but that you never get your hands on it. You begin to wonder if it exists at all and this is the reason for the title of this post. I began to wonder if it is like the embedding into Euclidean space which any manifold in principle has but which plays no role in large parts of differential geometry. An internet search starting from this suspicion let me to an enlightening blog post of Terry Tao called ‘Notes 0: A review of probability theory‘. There he reviews ‘foundational aspects of probability theory’. Fairly early in this text he compares the situation with that in differential geometry. He compares the role of the probability space to that of a coordinate system in differential geometry, a probably better variant of my thought with the embeddings. He talks about a ‘probabilistic way of thinking’ as an analogue of the ‘geometric way of thinking’. So I believe that I have now discovered the basic thing I did not understand in this context – I have not yet understood the probabilistic way of thinking. When I consider the importance when doing differential geometry of (not) understanding the geometric way of thinking I see what an enormous problem this is. It is the key to understanding the questions of ‘what things are’ and ‘where things live’. For instance, to take an example from Tao’s notes, Poisson distributions are probability measures (‘distribution’ is the probabilistic translation of the word ‘measure’) on the natural numbers, the latter being thought of as a potential codomain of a random variable. Tao writes ‘With this probabilistic viewpoint, we shall soon see the sample space essentially disappear from view altogether …’ Why I am thinking about the Cheshire cat?
In a sequel to the blog post just mentioned Tao continues to discuss free probability. This is a kind of non-commutative extension of ordinary probability. It is a subject I do not feel I have to learn at this moment but I do think that it would be useful to have an idea how it reduces to ordinary probability in the commutative case. There is an analogy between this and non-commutative geometry. The latter subject is one which fascinated me sufficiently at the time I was at IHES to motivate me to attend a lecture course of Alain Connes at the College de France. The common idea is to first replace a space (in some sense) by the algebra of (suitably regular) functions on that space with pointwise operations. In practise this is usually done in the context of complex functions so that we have a * operation defined by complex conjugation. This then means that continuous functions on a compact topological space define a commutative -algebra. The space can be reconstructed from the algebra. This leads to the idea that a
-algebra can be thought of as a non-commutative topological space. I came into contact with these things as an undergraduate through my honours project, supervised by Ian Craw. Non-commutative geometry has to do with extending this to replace the topological space by a manifold. Coming back to the original subject, this procedure has an analogue for probability theory. Here we replace the continuous functions by
functions, which also form an algebra under pointwise operations. In fact, as discussed in Tao’s notes, it may be necessary to replace this by a restricted class of
functions which are in particular in
. The reason for this is that a key structure on the algebra of functions (random variables) is the expectation. In this case the * operation is also important. The non-commutative analogue of a probability space is then a
-algebra (von Neumann algebra). Comparing with the start of this discussion, the connection here is that while the probability space fades into the background the random variables (elements of the algebra) become central.
February 14, 2019 at 1:18 pm |
Very interesting this post. Since I know work on statistics it is nice to see similarities with differential geometry. In fact, recently I was reminded to a possible connection when I saw that an important technique in statistics is the Principal Component Analysis which consists in a change of variables where one considers a projection into a different space using a linear combination of the old feature variables.