## What is Probability and information theory?

### Introduction

Probability theory may be a mathematical framework for representing uncertain statements. It provides a way of quantifying uncertainty and axioms for deriving new uncertain statements. In AI applications, we use probabilitytheory in two major ways. First, the laws of probability tell us how AI systems should reason, so we design our algorithms to compute or approximate expressions derived using applied mathematics. Second, we statistics to theoretically analyze the behavior of proposed AI systems. applied mathematics may be a fundamental tool of the many disciplines of science and engineering.
### Why Probability?

Many branches of computing deal mostly with entities that are entirely deterministic and certain. A programmer can usually safely assume that a CPU will execute each machine instruction flawlessly. Errors in hardware do occur but are rare enough that the majority of software applications don’t get to be designed to account

for them. as long as many computer scientists and software engineers add a relatively clean and certain environment, it is often surprising that machine learning makes heavy use of applied mathematics. This is often because machine learning should affect uncertain quantities and sometimes can also get to affect stochastic (non-deterministic) quantities. Uncertainty and stochasticity can arise from many sources. Researchers have made compelling arguments for quantifying uncertainty using probability since the minimum of the 1980s. Many of the arguments presented here are summarized from or inspired by Pearl (1988). Nearly all activities require some ability to reason within the presence of uncertainty. In fact, beyond mathematical statements that are true by definition, it’s difficult to consider any proposition that’s absolutely true or any event that’s absolutely

guaranteed to occur. There are three possible sources of uncertainty:

1. Inherent stochasticity within the system being modeled. for instance, most interpretations of quantum physics describe the dynamics of subatomic particles as being probabilistic. we will also create theoretical scenarios that we postulate to possess random dynamics, like hypothetical cards where we assume that the cards are truly shuffled into a random order.

2. Incomplete observability. Even deterministic systems can appear stochastic when we cannot observe all of the variables that drive the behavior of the system. For instance, within the Monty Hall problem, a giveaway contestant is asked to settle on between three doors and wins a prize held behind the chosen door. Two doors cause a goat while a 3rd results in a car. the result given the contestant’s choice is deterministic, but from the contestant’s point of view, the result is uncertain.

3. Incomplete modeling. once we use a model that has got to discard a number of the information we’ve observed, the discarded information leads to uncertainty within the model’s predictions. for instance, suppose we build a robot that will exactly observe the situation of each object around it. If the robot discretizes space when predicting the longer-term location of those objects, then the discretization makes the robot immediately become uncertain about the precise position of objects: each object might be anywhere within the discrete cell that it had been observed to occupy. In many cases, it’s more practical to use an easy but uncertain rule rather than a posh but certain one, albeit the truth rule is deterministic and our modeling system has the fidelity to accommodate a posh rule. for instance, the simple rule “Most birds fly” is reasonable to develop and is broadly useful, while a rule of the shape, “Birds fly, apart from very young birds that haven’t yet learned to fly, sick or injured birds that have lost the power to fly, flightless species of birds including the cassowary, ostrich, and kiwi. . .” is dear to develop, maintain and communicate, and in any case of this effort remains very brittle and susceptible to failure. as long as we’d like a way of representing and reasoning about uncertainty, it is not immediately obvious that applied mathematics can provide all of the toolswe want for AI applications. applied mathematics was originally developed to research the frequencies of events. it’s easy to ascertain how probability theory is often wont to study events like drawing a particular hand of cards during a game of poker. These sorts of events are often repeatable. once we say that an outcome features a probability p of occurring, it means if we repeated the experiment (e.g., draw a hand of cards) infinitely repeatedly, then proportion p of the repetitions would end in that outcome. this type of reasoning doesn’t seem immediately applicable to propositions that aren’t repeatable. If a doctoranalyzes a patient and says that the patient features a 40% chance of getting the flu, this suggests something very different—we can’t make infinitely many replicas of the patient, neither is there any reason to believe that different replicas of the patient would present with equivalent symptoms yet have varying underlying conditions. In the case of the doctor diagnosing the patient, we use probability to represent a degree of belief, with 1 indicating absolute certainty that the patient has the flu and 0 indicating absolute certainty that the patient doesn’t have the flu. The former quite probability, related to the rates at which events occur, is

known as frequentist probability, while the latter, associated with qualitative levels of certainty, is understood as Bayesian probability. If we list several properties that we expect sense reasoning about uncertainty to possess, then the sole thanks to satisfying those properties are to treat Bayesian probabilities as behaving precisely the same as frequentist probabilities. for instance, if we would like to compute the probability that a player will win a poker game as long as she features a certain set of cards, we use precisely the same formulasas once we compute the probability that a patient features a disease as long as she has certain symptoms. For more details about why a little set of common-sense assumptions imply that equivalent axioms must control both sorts of probability, see Ramsey (1926). Probability is often seen because of the extension of logic to affect uncertainty. Logic provides a group of formal rules for determining what propositions are implied tobe true or false given the idea that another set of propositions is true or false. applied mathematics provides a group of formal rules for determining the likelihood of a proposition being true given the likelihood of other propositions.