## From classical deduction to Bayesian probability

Terence Tao on classical deduction and Bayesian probability:

In classical logic, one can represent one’s information about a system as a set of possible states that the system could be in, based on the information at hand. With each new measurement of the system, some possibilities could be eliminated, leading to an updated posterior set of information that is an improvement over the prior set of information. A good example of this type of updating occurs when solving a Sudoku puzzle; each new cell value that one learns about constrains the possible values of the remaining cells. Other examples can be found in the classic detective stories of Arthur Conan Doyle featuring Sherlock Holmes. Proof by contradiction can also be viewed as an instance of this type of deduction.

A modern refinement of classical deduction is that of Bayesian probability. Here, one’s information about a system is not merely represented as a set of possible states, but by a probability distribution on the space of all states, indicating one’s current beliefs on the likelihood of each particular state actually being the true state. Each new measurement of the system then updates a prior probability distribution to a posterior probability distribution, using Bayes’ formula

$P (A \mid B) = \displaystyle \frac{P (B \mid A) P(A)}{ P(B) }$.

Bayesian probability is widely used in statistics, in machine learning, and in the sciences.

To relate Bayesian probability to classical deduction, recall that every probability distribution has a support, which (in the case when the space of states is discrete) is the set of all states that occur with non-zero probability. When performing a Bayesian update on a discrete space, any state which is inconsistent with the new piece of information will have its posterior probability set to zero, and thus be removed from the support. Thus we see that whilst the probability distribution evolves by Bayesian updating, the support evolves by classical deductive logic. Thus one can view classical logic as the qualitative projection of Bayesian probability, or equivalently, one can view Bayesian probability as a quantitative refinement of classical logic.

Alternatively, one can view Bayesian probability as a special case of classical logic by taking a frequentist interpretation. In this interpretation, one views the actual universe (or at least the actual system) as just one of a large number of possible universes (or systems). In each of these universes, the system is in one of the possible states; the probability assigned to each state is then the proportion of the possible universes in which that state is attained. Each new measurement eliminates some fraction of the universes in a given state, depending on how likely or unlikely that state was to actually produce that measurement; the surviving universes then have a new posterior probability distribution, which is related to the prior distribution by Bayes’ formula.

It is instructive to interpret Sherlock Holmes‘ famous quote, “When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth,” from a Bayesian viewpoint. The statement is technically correct; however, when performing this type of elimination to an (a priori) improbable conclusion, the denominator in Bayes’ formula is extremely small, and so the deduction is unstable if it later turns out that some of the possibilities thought to have been completely eliminated, were in fact only incompletely eliminated. (See also the mantra “extraordinary claims require extraordinary evidence”, which can be viewed as the Bayesian counterpoint to Holmes’ classical remark.)

Another interesting place where one can contrast classical deduction with Bayesian deduction is with regard to taking converses. In classical logic, if one knows that $A$ implies $B$, one cannot then deduce that $B$ implies $A$. However, in Bayesian probability, if one knows that the presence of $A$ elevates the likelihood that $B$ is true, then an observation of $B$ will conversely elevate the prior probability that $A$ is true, thanks to Bayes’ formula: if $P(B \mid A) > P(B)$, then $P(A \mid B) > P(A)$. This may help explain why taking converses is an intuitive operation to those who have not yet been thoroughly exposed to classical logic. (It is also instructive to understand why this disparity between the two types of deduction is not in conflict with the previously mentioned links between the two. This disparity is roughly analogous to the disparity between worst-case analysis and average-case analysis.)

Bayesian probability can be generalised further; for instance, quantum mechanics (with the Copenhagen interpretation) can be viewed as a noncommutative generalisation of Bayesian probability, though the connection to classical logic is then lost when one is dealing with observables that do not commute. But this is another story…

__________

Please do note that Terence Tao’s original post contains neither links nor boldface highlighting. I took the liberty of adding those for convenience and emphasis. To improve legibility I also wrote the mathematical expressions in $\LaTeX$.

__________

Source:

Terence Tao, “A modern refinement of classical deduction is that of Bayesian probability”, Google Buzz, April 4, 2010.