\section{Permutations and Combinations}

With just a few more simple counting rules, we'll reach a critical
mass.  We'll then be able to solve an huge range of counting problems
by mixing and matching those rules.


But first, we have to prove the Product Rule.

\begin{theorem}[Product Rule]
If $A_1, A_2, \dots A_n$ are sets, then:

\[
|A_1 \times A_2 \times \dots \times A_n| =
        |A_1| \cdot |A_2| \cdot \dots \cdot |A_n|
\]
\end{theorem}

Basically the Product Rule says that the number of ways to
pick one item from $A_1$, 1 item from $A_2$, $\dots$, and 1 item from
$A_n$ is $\prod_{i=1}^n |A_i|$.
The proof can be done by induction on $n$, but we leave it to the
reader.


\noindent The situation is represented schematically below.

\bigskip
\centerline{\resizebox{!}{1.5in}{\includegraphics{lec11-king.eps}}}
\bigskip


\subsection{Indicator Random Variables}

\emph{Indicator} random variables describe experiments to detect whether
or not something happened.  The random variable $M$ is an example of an
indicator variable, indicating whether or not all three coins match.

\begin{definition}
An \emph{indicator random variable} is a random variable that maps
every outcome to either 0 or 1.
\end{definition}

Indicator random variables are also called \emph{Bernoulli} or
\emph{characteristic} random variables.  Typically, indicator random
variables identify all outcomes that share some property
(``characteristic''): outcomes with the property are mapped to 1, and
outcomes without the property are mapped to 0.

\subsection{Events Defined by a Random Variable}

There is a natural relationship between random variables and events.
Recall that an event is just a subset of the outcomes in the sample
space of an experiment.

The relationship is simplest for an indicator random variable.  An
indicator random variable partitions the sample space into two blocks:
outcomes mapped to 1 and outcomes mapped to~0.  These two sets of outcomes
are events.  For example, the random variable $M$ partitions the sample
space as follows:
\[
\underbrace{\mathtt{HHH} \quad \mathtt{TTT}}_{\mbox{mapped to 1}} \quad
\underbrace{\mathtt{HHT} \quad \mathtt{HTH} \quad \mathtt{HTT} \quad
        \mathtt{THH} \quad \mathtt{THT} \quad \mathtt{TTH}}_{\mbox{mapped to 0}}
\]
Thus, the random variable $M$ defines two events, the event $[M = 1]$ that
all coins match and the event $[M = 0]$ that not all coins match.

The random variable $C$ partitions the sample space into four blocks:
\[
\underbrace{\mathtt{TTT}}_{\mbox{mapped to 0}} 
\underbrace{\mathtt{TTH} \quad \mathtt{THT} \quad \mathtt{HTT}}_{\mbox{mapped to 1}} \quad
\underbrace{\mathtt{THH} \quad \mathtt{HTH} \quad \mathtt{HHT}}_{\mbox{mapped to 2}} \quad
\underbrace{\mathtt{HHH}}_{\mbox{mapped to 3}} 
\]
Thus, the random variable $C$ defines the four events $[C=i]$ for $i \in
\set{0, 1, 2, 3}$.  These are the events that no coin is heads, that one
coin is heads, that two coins are heads, and finally that three coins are
heads.

A general random variable may partition the sample space into many blocks.
A block contains all outcomes mapped to the same value by the random
variable.

\iffalse
In general, suppose $P(r_1,\dots,r_n)$ is a proposition about real numbers
$r_1,\dots,r_n$, and $R_1,\dots,R_n$ are random variables.  Then we use
the notation $[P(R_1,\dots,R_n)]$ to denote the event that the values of
$R_1,\dots,R_n$ satisfy $P$.  That is,
\[
[P(R_1,\dots,R_n)] \eqdef \set{ w \in \sspace \suchthat P(R_1(w),\dots,R_n(w))}.
\]
\fi

\subsection{Probability of Events Defined by a Random Variable}

Recall that the probability of an event is the sum of the probabilities of
the outcomes it contains.  From this rule, we can compute the probability
of various events associated with a random variable.  For example, if $R :
\sspace \to \reals$ is a random variable and $x$ is a real number, then
\begin{equation*}
\pr{R = x}  =  \sum_{w \in [R = x]} \pr{w}.
\end{equation*}
For example, we can compute $\pr{C = 2}$ as follows:
\begin{align*}
\pr{C = 2} = & \sum_{w \in [C = 2]} \pr{w} & \text{(def of $\pr{}$)}\\
           = & \pr{\mathtt{THH}} + \pr{\mathtt{HTH}}
               + \pr{\mathtt{HHT}} \text{(the 3 outcomes in $[C=2]$)}\\
           = & \frac{1}{8} + \frac{1}{8} + \frac{1}{8} = \frac{3}{8}.
\end{align*}
Note that each outcome has probability $1/8$, since the three coins are
fair and independent.

Similarly, we can compute $\pr{M = 1}$ and $\pr{C \geq 2}$
\begin{eqnarray*}
\pr{M = 1}      & = & \sum_{w \in [M = 1]} \pr{w} \\
                & = & \pr{\mathtt{HHH}} + \pr{\mathtt{TTT}}\\
                & = & \frac{1}{8} + \frac{1}{8} = \frac{1}{4}.
\\
\pr{C \geq 2}   & = &   \sum_{w \in [C \geq 2]} \pr{w} \\
                & = &   \pr{\mathtt{THH}} + \pr{\mathtt{HTH}} + \pr{\mathtt{HHT}} + \pr{\mathtt{HHH}} \\
             & = &   \frac{1}{8}+\frac{1}{8}+\frac{1}{8}+\frac{1}{8} = \frac{1}{2}.
\end{eqnarray*}
The justification for each step is the same as before.

It's common in such calculations to group outcomes by their value.
For instance, we could also have calculated:
\begin{eqnarray*}
\pr{C \geq 2}   & = &   \pr{C = 2} + \pr{C = 3} \\
                & = &   \pr{\mathtt{THH}, \mathtt{HTH}, \mathtt{HHT}} + \pr{\mathtt{HHH}} \\
                & = &   \frac{3}{8}+\frac{1}{8} = \frac{1}{2}
\end{eqnarray*}

\iffalse
If the range of $R$ is $\set{a_0,a_1,\dots} \subset \reals$, an expression
of the form $\pr{R \geq x}$ can also be evaluated with the summation
$\sum_{a_i \geq x} \pr{R = a_i}$.  That is, instead of summing over outcomes
mapped to at least $x$ as above, we can sum probabilities over all values
of the random variable that are greater than or equal to $x$.  The result
is the same, since both summations cover the same outcomes.
\fi

Similarly, we find the probability of the event that $C \in \set{ 1, 3}$.
\iffalse
\begin{eqnarray*}
\pr{C \in \{1, 3 \}}
        & = &   \sum_{w \in [C \in \set{1, 3}]} \pr{w} \\
        & = &   \pr{\mathtt{TTH}}+\pr{\mathtt{THT}}+\pr{\mathtt{HTT}} + \pr{\mathtt{HHH}} \\
        & = &   \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} = \frac{1}{2}.
\end{eqnarray*}

As in the preceding example, this probability could be evaluated by
summing probabilities over values of $C$ instead of by summing over
outcomes:
\fi
\begin{eqnarray*}
\pr{C \in \{1, 3 \}} & = &   \pr{C = 1} + \pr{C = 3} \\
                     & = &   \pr{\mathtt{TTH}, \mathtt{THT}, \mathtt{HTT}}
                              + \pr{\mathtt{HHH}} \\
        & = &   \frac{3}{8}+\frac{1}{8} = \frac{1}{2}.
\end{eqnarray*}

In general, for a set $A = \set{a_0,a_1,\dots}$ of real numbers, $\pr{R
\in A}$ can also be evaluated by summing over the values in $A$.  That is,
\[
\pr{R \in A} = \sum_{a \in A} \pr{R = a}.
\]

\subsection{Conditional Probability}

Mixing conditional probabilities and events involving random variables
creates no new difficulties.  For example, $\prcond{C \geq 2}{M = 0}$
is the probability that at least two coins are heads ($C \geq 2$),
given that all three coins are not the same ($M = 0$).  We can compute
this probability using the familiar Product Rule:

\begin{eqnarray*}
\prcond{C \geq 2}{M = 0}
        & = &   \frac{\pr{C \geq 2 \land M = 0}}{\pr{M = 0}} \\
        & = &   \frac{\pr{\set{\mathtt{THH}, \mathtt{HTH}, \mathtt{HHT}}}}
                     {\pr{\set{\mathtt{THH}, \mathtt{HTH}, \mathtt{HHT},
                             \mathtt{HTT}, \mathtt{THT}, \mathtt{TTH}}}} \\
        & = &   \frac{3/8}{6/8} =  \frac{1}{2}.
\end{eqnarray*}

\subsection{Independence}

\iffalse
The notion of independence does not carry over to random variables so
easily.  In analogy to event independence, we will first define
independence for a pair of random variables and then define mutual
independence for two or more random variables.
\fi

\subsubsection{Independence for Two Random Variables}

\begin{definition}\label{ind1}
Two random variables $R_1$ and $R_2$ are \emph{independent}\footnote{This
definition works for sample spaces $\sspace = \set{w_0,w_1,\dots}$ of the
kind we consider in 6.042.  For more general sample spaces, the definition
is that
\[
\prcond{y_1 \le R_1  \le x_1}{y_2 \le R_2 \le x_2} =
                         \pr{y_1 \le R_1 \le x_1}
\]
for all $y_1,x_1,y_2, x_2 \in \reals$ and $\pr{y_2 \le R_2 \le x_2} \neq 0$.}
if for all $x_1, x_2 \in \reals$ such that $\pr{R_2 = x_2} \neq 0$,
we have:
\[
\prcond{R_1 = x_1}{R_2 = x_2} = \pr{R_1 = x_1}
\]
\end{definition}

As with independence of events, we can also formulate independence of two
random variables in terms of the conjunction of events:
\begin{definition}\label{ind2}
Two random variables $R_1$ and $R_2$ are \emph{independent} if for all
$x_1, x_2 \in \reals$ , we have:
\[
\pr{R_1 = x_1 \land R_2 = x_2} = \pr{R_1 = x_1} \cdot \pr{R_2 = x_2}.
\]
\end{definition}

Definition~\ref{ind1} says that the probability that $R_1$ has a
particular value is unaffected by the value of $R_2$, reflecting the
intuition behind independence.  Definition~\ref{ind2} has the slight
technical advantage that it applies even if $\pr{R_2 = x_2} = 0$.
Otherwise, the two definitions are equivalent, and we will use them
interchangably.

\subsubsection{Proving that Two Random Variables are Not Independent}

Are $C$ and $M$ independent?  Intuitively, no: the number of heads, $C$,
not only affects, but completely determines whether all three coins match,
that is, whether $M = 1$.  To verify this, let's use the first
definition~\ref{ind1} of independence.  We must find some $x_1, x_2 \in
\reals$ such that the condition in the first definition is false.  For
example, the condition does not hold for $x_1 = 2$ and $x_2 = 1$:
\begin{eqnarray*}
\pr{C = 2 \land M = 1} = 0
        & \mbox{ but } &
\pr{C = 2} \cdot \pr{M = 1} = \frac{3}{8} \cdot \frac{1}{4} \neq 0
\end{eqnarray*}

The first probability is zero because we never have exactly two heads ($C
= 2$) when all three coins match ($M = 1$).  The other two probabilities
were computed earlier.

\subsubsection{A Dice Example}

Suppose that we roll two fair, independent dice.  We can regard the
numbers that turn up as random variables, $D_1$ and $D_2$.  For
example, if the outcome is $w = (3, 5)$, then $D_1(w) = 3$ and $D_2(w)
= 5$.

Let $T = D_1 + D_2$.  Then $T$ is also a random variable, since it is
a function mapping each outcome to a real number, namely the sum of
the numbers shown on the two dice.  For outcome $w = (3, 5)$, we have
$T(w) = 3 + 5 = 8$.

Define $S$ as follows:
\[
S \eqdef \left\{
\begin{array}{cl}
1 & \mbox{if $T = 7$}, \\
0 & \mbox{if $T \neq 7$}.
\end{array}
\right.
\]
That is, $S = 1$ if the sum of the dice is 7, and $S = 0$ if the sum
of the dice is not 7.  For example, for outcome $w = (3, 5)$, we have
$S(w) = 0$, since the sum of the dice is 8.  Since $S$ is a function
mapping each outcome to a real number, $S$ is also a random variable.
In particular, $S$ is an indicator random variable, since every
outcome is mapped to 0 or 1.

The definitions of random variables $T$ and $S$ illustrate a general
rule: \emph{any function of random variables is also random variable}.

Are $D_1$ and $T$ independent?  That is, is the sum, $T$, of the two dice
independent of the outcome, $D_1$, of the first die?  Intuitively, the
answer appears to be no.  To prove this, let's use the
Definition~\ref{ind2} of independence.  We must find $x_1, x_2 \in \reals$
such that $\pr{x_2} \neq 0$ and the condition in the second definition
does not hold.

For example, we can choose $x_1 = 2$ and $x_2 = 3$:
\[
\prcond{T = 2}{D_1 = 3} = 0 \neq \frac{1}{36} = \pr{T = 2}.
\]
The first probability is zero, since if we roll a three on the first die
($D_1 = 3$), then there is no way that the sum of both dice is two ($T =
2$).  On the other hand, if we throw both dice, the probability that the
sum is two is $1/36$, since we could roll two ones.

Are $S$ and $D_1$ independent?  That is, is the probability of the event,
$S$, that the sum of both dice is seven independent of the outcome, $D_1$,
of the first die?  Once again, intuition suggests that the answer is
``no''.  Surprisingly, however, these two random variables \emph{are}
actually independent!

Proving that two random variables are independent requires some work.
Let's use Definition~\ref{ind1} of independence based on conditional
probability.  We must show that for all $x_1, x_2$ in $\reals$ such that
$\pr{D_1 = x_2} \neq 0$:
\[
\prcond{S = x_1}{D_1 = x_2} = \pr{S = x_1}.
\]

First, notice that we only have to show the equation for values of $x_2$
such that $\pr{D_1 = x_2} \neq 0$.  This means we only have to consider
$x_2$ equal to 1, 2, 3, 4, 5, or 6.  If $x_1$ is neither 0 nor 1, then the
condition holds trivially because both sides are zero.  So it remains to
check the equation for the cases where $x_1 \in \set{0, 1}$ and $x_2 \in
\set{ 1,2,3,4,5,6}$, that is, a total of $2 \cdot 6 = 12$ cases.

Two observations make this easier.  First, there are $6 \cdot 6 = 36$
outcomes in the sample space for this experiment.  The outcomes are
equiprobable, so each outcome has probability $1/36$.  The two dice sum to
seven in six outcomes: $1+6$, $2+5$, $3+4$, $4+3$, $5+2$, and $6+1$.
Therefore, the probability of rolling a seven, $\pr{S = 1}$, is $6/36 =
1/6$.

Second, after we know the result of the first die, there is always exactly
one value for the second die that makes the sum seven.  For example, if
the first die is 2, then the sum is seven only if the second die is a 5.
Therefore, $\prcond{S = 1}{D_1 = x_2} = 1/6$ for $x_2 = 1, 2, 3, 4, 5,$ or
$6$.

These two observations establish the independence condition in six
cases:
\begin{eqnarray*}
\prcond{S = 1}{D_1 = 1} = & \cfrac{1}{6} & = \pr{S = 1} \\
\prcond{S = 1}{D_1 = 2} = & \cfrac{1}{6} & = \pr{S = 1} \\
                    &    \vdots   &  \\
\prcond{S = 1}{D_1 = 6} = & \cfrac{1}{6} & = \pr{S = 1}
\end{eqnarray*}
The remaining cases are complementary to the the first six.  For
example, we know that $\pr{S = 0} = 5/6$, since the
complementary event, $S = 1$, has probability $1/6$.
\begin{eqnarray*}
\prcond{S = 0}{D_1 = 1} = & \cfrac{5}{6} & = \pr{S = 0} \\
\prcond{S = 0}{D_1 = 2} = & \cfrac{5}{6} & = \pr{S = 0} \\
                    &    \vdots   &  \\
\prcond{S = 0}{D_1 = 6} = & \cfrac{5}{6} & = \pr{S = 0}
\end{eqnarray*}

We have established that the independence condition holds for all
necessary $x_1, x_2 \in \reals$.  This proves that $S$ and $D_1$ are
independent after all!

\subsubsection{Mutual Independence}

The definition of mutual independence for random variables is similar
to the definition for events.

\begin{definition}
Random variables $R_1, R_2, \dots$ are \emph{mutually independent} iff
\[
\pr{\lgintersect_{i} [R_i = x_i]} = \prod_{i} \pr{R_i = x_i},
\]
for all $x_1, x_2, \dots \in \reals$.
\end{definition}

For example, consider the experiment of throwing three independent, fair
dice.  Random variable $R_1$ is the value of the first die.  Random
variable $R_2$ is the sum of the first two dice, mod 6.  Random variable
$R_3$ is the sum of all three values, mod 6.  These three random variables
are mutually independent.  Can you prove it?


\section{The Hat Check Problem}

There is a dinner party where $N$ men check their hats.  The hats are
mixed up during dinner, so that afterward each man receives a random
hat.  In particular, each man gets his own hat with probability $1/N$.
What is the expected number of men who get their own hat?

This is a fantastic problem!  Working out the solution directly from
either definition of expected value presents tremendous difficulties.
For example, the problem statement does {\em not} guarantee that all
permutations of the hats are equally likely.  Therefore, we do not
even know how to assign a probability to every outcome in the sample
space.  Given that, how can we possibly hope to solve the problem?!

Amazingly, we will get an answer by using no fewer than {\em three}
neat tricks.

\subsection{Modelling the Problem with Indicator Variables}

Let the random variable $R$ be the number of men that get their own
hat.  The first trick is to express $R$ as a sum of indicator
variables.  In particular, let $R_i$ be an indicator for the event
that the $i$-th man gets his own hat; that is, $R_i = 1$ if the $i$-th
man gets his hat, and $R_i = 0$ if he gets the wrong hat.  The number
of men that get their own hat is the sum of these indicators:

\beqn
R & = & R_1 + R_2 + \ldots + R_N
\eeqn

\noindent Now we can take the expectated value of both sides:

\beqn
R & = & R_1 + R_2 + \ldots + R_N
\eeqn

\subsection{}