emeinhardt/Probability theory resources.md

## Probability theory resources.md

      
    Raw
  

              Probability theory resources.md
            
          
    Supplementary interactive learning resources


Peter Norvig has some Jupyter notebooks on elementary probability theory - https://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb
Russell & Norvig's AI textbook has associated code in a variety of languages here, including on probabilistic inference and some of the examples discussed in the text.
probmods.org is a web textbook (with in-browser executable code) on probabilistic models of cognition. (For those with prior functional programming experience, note that the first edition of the textbook uses a subset of Scheme rather than JavaScript as the probabilistic programming language.)

History and Motivation

If you're interested in the history of ideas and the role of probabilistic reasoning in the early to middle history of AI, computational models of cognition, and the importance of having a systematic approach to reasoning about beliefs under uncertainty:

Russell & Norvig's AI textbook (Artificial Intelligence: A Modern Approach) has excellent chapter and section endnotes that cover the history of chapter/section topics in AI; the arc of textbook topics will also motivate the different kinds of problems that (purely) logical vs. probabilistic (vs. other methods) are well-suited for.
Judea Pearl's work in the early-to-mid 80s is one of the biggest reasons why probabilistic reasoning (and specifically Bayesian causal models) swept AI in the 1980s and 1990s. Pearl's 1988 book Probabilistic Reasoning in Intelligent Systems (esp. chapters 1-2 + 9-10) covers more on the relationship between logic and probability - particularly the approach to handling uncertainty (nonmonotonic logics) that was prevalent in the late 70s and early 80s.
The first part of ET Jaynes's book on probability theory offers an intuitive motivation for probability theory starting from propositional logic.
Classical/Shannon information theory is basically 'the logarithm' of probability theory - among other things, it defines measures of 'uncertainty' about the outcome of a random variable and the amount of 'information' that one variable has about another. There are several different ways you can start with some axioms that you'd like a measure of 'uncertainty' (or 'information') to have and then arrive uniquely at classic information measures.

Chris Olah has an excellent longform article Visual Information Theory introducing basic information measures. (It also has some graphics that are great ways of visually or geometrically understanding Bayes' Theorem.)
Stone's Information Theory: A Tutorial Introduction is a very accessible elementary overview.
Cover & Thomas is the standard graduate introductory textbook on information theory.
If you're already comfortable with probability theory, Csiszar's 2008 article 'Axiomatic Characterizations of Information Measures' covers different ways of arriving at basic information measures.