We want to formalize the assignment of probabilities to events in a sample space.

Roadmap:

  1. Introduce algebras as the structure used to formalize events in a sample space; events to which we can assign a “probability.”
  2. Define probability measures to assign probabilities to those events

Sigma Fields

Definition

is a if the following hold:

  • (Contains the empty set)
  • (Closed under relative complements)
  • (Closed under countable unions)

Motivation

The basis for probability will be the assignment of numerical probability values to certain subsets of a sample space. We will demand some relationships hold between set operations on those subsets and the corresponding probability values we assign

Let be our sample space. loosely speaking, it contains things that can happen in a single trial of an experiment.

If we are throwing a 3 sided die:

On this sample space we want to be able to ask questions about the outcomes of a singular trial. Using the random variable

  • What is the probability that the first throw is 1?
  • What is the probability the first throw is nothing?
  • What is the probability the first throw is 3 OR 1?
  • What is the probability the first thro is 1 OR 2 OR 3?

These questions correspond to assigning the following probability measures to subsets of respectively

On just , we cannot ask “AND” questions, as that sample space is

An initial attempt to answer the first underlying question is to consider the power set which is the set of all subsets of .

For our example above:

It feels like any question we can pose on a single trial of our “tossing a three sided die once” experiment can be phrased by asking about the measure assigned to every element in this power set, at least without worrying about how to formalize that assignment for now.

The problem with power sets

The problem is that a power set is not abstract enough for our purposes. It happens to work for discrete sample spaces, but what about continuous regions?

In a continuous sample space, we cannot assign a non-zero probability measure to every region in less we introduce inconsistencies like the Banach–Tarski  paradox.

The power set is too general and too large, so we restrict our attention to a selected few subsets which satisfies some properties.

Measurable Space

The pair is called a measurable space, where is a set and is its canonical sigma-field.

In the context of probability theory, is called the sample space, is called the collection of all events, and an is called an event

Sample Spaces

The interpretation of a sample space , is that it represents the set of possible outcomes for some experiment. Often we assume that our sample space keeps track of more things than we actually care about. We can think of the sample space as representing how the whole universe unfolds, and from that we can deduce what we care about regarding our experiment. We will construct probability theory in a way that tracking extra irrelevant information does not affect the answers we would derive for questions we care about. Thus, coming up with different sample spaces for the same experiments is fine, and sometimes even helpful!

Examples

  1. If our experiment is “flipping a coin and recording the whether it came up heads or tails” then the sample space might be

  2. If our experiment is “flipping a coin 100 times and recording the whether it came up heads or tails” each time then the sample space might be:

  3. If we are ultimately interested in the outcome of a single coin flip, we could use the sample space where the outcome we care about is the only thing one could deduce from the sample space, or we could use the sample space where the outcome we care about can be deduced from the first of the 100 flips via the projection function that maps a tuple of length 100 to its first component: , effectively throwing away other irrelevant information. Similar projection functions can be used when discussing discrete stochastic processes.


  4. Similarly, if we are interested in the result of flipping a coin 100 times and then counting the number of H, we could either choose our sample space to record the outcome of every flip, , and then reason about the quantity we care about via a random variable given by , or, perhaps more efficiently, we could choose our sample space to be to directly record the count.


  5. If our experiment is “rolling a 6-sided die and recording the number on the top face as well as the air temperature, the number of people in the room and the colour of the person on our right’s shirt” then the sample space might be:

So, it seems like a sample space is “at least as general” as another sample space describing the same experiment if there is a suitable surjective function from . Examples of such functions we’ve seen so far include projections onto components of a tuple, and aggregation of the values of components of a tuple.

Probability Measures

Recall a measurable space consists of

We’d like to formally assign probabilities to elements of our sigma fields. To do this we will make use of measures.

There are a number of things we want out of a probability measure:

  • Additivity
  • Add up to 1

Let be an algebra, is a finitely additive measure on if disjoint, with , we have .

A \textbf{measure} on an algebra is a set function with the extension of finitely additive measures to include countably infinite events, ie. If or such that and if , then the measure is additive:

What is the fuss about measures on sigma algebras over measures on algebras? well in the regular algebras the union of any finitely many sets is still in the algebra BUT if there are infinitely many sets in an algebra there is no guarantee that the union of infinitely many things is still in the algebra. We do have this guarantee in the case of a sigma algebra. The union will be in the algebra and so we can have an “additive” measure. A algebra is a more restrictive thing.

So if and is a measure, then is a probability measure; A measure with “total mass one.”