### Introduction

Probability theory may be a mathematical framework for representing uncertain statements. It provides a way of quantifying uncertainty and axioms for deriving new uncertain statements. In AI applications, we use probability

theory in two major ways. First, the laws of probability tell us how AI systems should reason, so we design our algorithms to compute or approximate expressions derived using applied mathematics. Second, we statistics to theoretically analyze the behavior of proposed AI systems. applied mathematics may be a fundamental tool of the many disciplines of science and engineering.

### Why Probability?

Many branches of computing deal mostly with entities that are entirely deterministic and certain. A programmer can usually safely assume that a CPU will execute each machine’s instruction flawlessly. Errors in hardware do occur but are rare enough that the majority of software applications don’t get to be designed to account

for them. as long as many computer scientists and software engineers add a relatively clean and certain environment, it is often surprising that machine learning makes heavy use of applied mathematics. This is often because machine learning should affect uncertain quantities and sometimes can also get to affect stochastic (non-deterministic) quantities. Uncertainty and stochasticity can arise from many sources. Researchers have made compelling arguments for quantifying uncertainty using probability since the minimum of the 1980s. Many of the arguments presented here are summarized from or inspired by Pearl (1988). Nearly all activities require some ability to reason within the presence of uncertainty. In fact, beyond mathematical statements that are true by definition, it’s difficult to consider any proposition that’s absolutely true or any event that’s absolutely

guaranteed to occur. There are three possible sources of uncertainty:

1. Inherent stochasticity within the system being modeled. for instance, most interpretations of quantum physics describe the dynamics of subatomic particles as being probabilistic. we will also create theoretical scenarios that we postulate to possess random dynamics, like hypothetical cards where we assume that the cards are truly shuffled into a random order.

we want for AI applications. applied mathematics was originally developed to research the frequencies of events. it’s easy to ascertain how probability theory is often wont to study events like drawing a particular hand of cards during a game of poker. These sorts of events are often repeatable. once we say that an outcome features a probability p of occurring, it means if we repeated the experiment (e.g., draw a hand of cards) infinitely repeatedly, then proportion p of the repetitions would end in that outcome. this type of reasoning doesn’t seem immediately applicable to propositions that aren’t repeatable. If a doctor

analyzes a patient and says that the patient features a 40% chance of getting the flu, this suggests something very different—we can’t make infinitely many replicas of the patient, neither is there any reason to believe that different replicas of the patient would present with equivalent symptoms yet have varying underlying conditions. In the case of the doctor diagnosing the patient, we use probability to represent a degree of belief, with 1 indicating absolute certainty that the patient has the flu and 0 indicating absolute certainty that the patient doesn’t have the flu. The former quite probability, related to the rates at which events occur, is

known as frequentist probability, while the latter, associated with qualitative levels of certainty, is understood as Bayesian probability. If we list several properties that we expect sense reasoning about uncertainty to possess, then the sole thanks to satisfying those properties are to treat Bayesian probabilities as behaving precisely the same as frequentist probabilities. for instance, if we would like to compute the probability that a player will win a poker game as long as she features a certain set of cards, we use precisely the same formulas

as once we compute the probability that a patient features a disease as long as she has certain symptoms. For more details about why a little set of common-sense assumptions imply that equivalent axioms must control both sorts of probability, see Ramsey (1926). Probability is often seen because of the extension of logic to affect uncertainty. Logic provides a group of formal rules for determining what propositions are implied to

be true or false given the idea that another set of propositions is true or false. applied mathematics provides a group of formal rules for determining the likelihood of a proposition being true given the likelihood of other propositions.