Discrete Distributions#

import numpy as np
import matplotlib.pyplot as plt
import scipy.special as sp

Binomial Coefficient#

The Python package scipy.special includes the function scipy.special.factorial to compute the factorial

\[ n! = n(n-1)(n-2) \cdots (2)(1) \]

and the function scipy.special.comb to compute the binomial coefficient

\[ { n \choose k } = \frac{n!}{k!(n-k)!} \]

The binomial coefficient \({n \choose k}\) is the number of combinations of \(k\) elements chosen from a set of \(n\) elements.

For example, compute \(7!\):

sp.factorial(7)
5040.0

Compute \({5 \choose 2}\):

sp.comb(5,2)
10.0

Compute \({5 \choose 2}\) using the formula \(\frac{5!}{2!(5-2)!}\):

sp.factorial(5)/(sp.factorial(5-2)*sp.factorial(2))
10.0

Let’s plot the binomial coefficient (for fixed n) as a function of k.

n = 10
k = np.arange(0,n+1)
b = sp.comb(n,k)
plt.bar(k,b)
plt.show()
../_images/c3360296dcdfe220043baad609b81b730be73e30c8b54d942ec901b1b26b6961.png

Look at the array of binomial coefficients:

print(b)
[  1.  10.  45. 120. 210. 252. 210. 120.  45.  10.   1.]

Random Sampling#

The Python package NumPy contains a subpackage numpy.random which includes functions to generate random numbers. For example, the function np.random.randint(a,b,size=(n,m)) generates a \(n \times m\) matrix of integers sampled from the half-closed interval \([a,b)\) with equal probability.

Flipping Coins#

Simulate flipping 2 coins 5 times (where tails is 0 and heads is 1):

flips = np.random.randint(0,2,size=(5,2))
print(flips)
[[0 1]
 [0 0]
 [0 1]
 [0 1]
 [1 1]]

Rolling Dice#

Simulate rolling 3 dice 10 times:

rolls = np.random.randint(1,7,size=(10,3))
print(rolls)
[[4 6 1]
 [2 1 3]
 [6 5 2]
 [1 5 6]
 [5 1 5]
 [2 5 4]
 [3 5 2]
 [2 1 5]
 [2 1 4]
 [6 4 1]]

Drawing Cards#

Use the function numpy.random.choice to randomly choose elements from a set. For example, we can create a deck of cards as a Python list with text entries (each text entry represents a card in the standard deck) and then draw a certain number of them (without replacement).

deck = ['2H','3H','4H','5H','6H','7H','8H','9H','10H','JH','QH','KH','AH',
        '2D','3D','4D','5D','6D','7D','8D','9D','10D','JD','QD','KD','AD',
        '2S','3S','4S','5S','6S','7S','8S','9S','10S','JS','QS','KS','AS',
        '2C','3C','4C','5C','6C','7C','8C','9C','10C','JC','QC','KC','AC']

hand = np.random.choice(deck,size=5,replace=False)
print(hand)
['6D' 'QS' '9D' 'KH' '2D']

Write a list comprehension to simulate drawing 5 cards 10 times:

hands = np.array([np.random.choice(deck,size=5,replace=False) for _ in range(10)])
print(hands)
[['10S' '7S' 'KH' 'KC' '8D']
 ['10C' 'KC' '6S' '4H' '8H']
 ['9C' '4S' '9D' 'JH' 'QS']
 ['AD' 'AS' 'JS' '3H' 'QH']
 ['2S' '5S' '10S' 'QC' 'QH']
 ['10D' '8C' '4H' 'JC' 'JD']
 ['6D' '10H' 'KC' '8H' '9C']
 ['6H' '10D' 'KS' 'AC' 'QH']
 ['JS' '3H' 'AC' '4H' '8H']
 ['8D' 'QD' 'JS' '10D' '4S']]

Custom Distributions#

We can also define our own custom distributions using the fucntion np.random.choice. We specify the sample space omega and the corresponding vector of probabilities probs.

For example, let’s consider a weighted coin such that flips result in heads with probability 2/3 and tails with probability 1/3. Genreate 10 samples of flipping the weighted coin three times.

omega = [0,1]
probs = [1/3,2/3]
n = 10
t = 3
samples = np.random.choice(omega,size=(n,t),p=probs)
print(samples)
[[1 1 1]
 [1 1 0]
 [1 1 1]
 [0 0 1]
 [1 0 1]
 [1 1 0]
 [0 1 0]
 [0 0 1]
 [1 0 1]
 [1 0 1]]

Random Variables#

A random variable on a sample space \(\Omega\) is a function \(X : \Omega \to \mathbb{R}\). In the previous section, we constructed random samples as NumPy arrays where each row was the result of a random experiment. In other words, each row was an element in the sample space \(\Omega\). Apply NumPy functions such as np.sum, np.max, np.min and np.any to simulate random variables.

For example, let’s generate random samples of rolling 4 dice.

rolls = np.random.randint(1,7,size=(10,4))
print(rolls)
[[4 6 4 1]
 [4 3 2 3]
 [3 1 2 1]
 [5 1 5 3]
 [3 5 1 2]
 [6 6 4 3]
 [1 5 3 6]
 [3 6 2 4]
 [5 1 3 2]
 [5 6 3 4]]

Use the function np.sum to compute the sum of each row.

sum_rolls = np.sum(rolls,axis=1)
print(sum_rolls)
[15 12  7 14 11 19 15 15 11 18]

The parameter axis=1 specifies that we sum the values of rolls across the rows. If we had entered axis=0 then the result would be the sum down the columns, and if we don’t specify then the result is the sum of all values in the array.

We can also use np.max and np.min to comute the maximum and minimum values in each roll:

max_rolls = np.max(rolls,axis=1)
print(max_rolls)
[6 4 3 5 5 6 6 6 5 6]

The following code computes the random variable which returns 1 if the roll includes a 3 and 0 otherwise.

roll_3 = np.any(rolls == 3,axis=1).astype(int)
print(roll_3)
[0 1 1 1 1 1 1 1 1 1]

Histograms#

Now that we can compute values of random variables on sample spaces of rolling dice and flipping coins, let’s visualize results as histograms.

For example, construct the histogram of 5000 random samples of flipping 5 coins and counting the number of heads.

ncoins = 5
nsamples = 5000
flips = np.random.randint(0,2,size=(nsamples,ncoins))
heads = np.sum(flips,axis=1)

Look at the first 10 samples:

flips[:10]
array([[1, 0, 0, 0, 0],
       [1, 0, 1, 0, 0],
       [1, 1, 1, 1, 1],
       [0, 0, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 1, 1, 1],
       [1, 1, 0, 1, 1],
       [1, 0, 1, 1, 0],
       [1, 0, 1, 0, 0],
       [0, 1, 1, 0, 1]])

Look at the sum of the first 10 samples to verify:

heads[:10]
array([1, 2, 5, 1, 1, 4, 4, 3, 2, 3])

Now let’s contruct the histogram:

freqs,bins,fig = plt.hist(heads,bins=range(0,ncoins+2),density=True,align='left',rwidth=0.5)
../_images/c74b531e6e6c17f0d32a2e15cc08cada14fb33c56e70358b2737338ad807ba6e.png

Take a closer look at the values of the parameters entered in

  • bins=range(0,ncoins+2) specifies the limits of the bins are the integers from 0 to ncoins+2 (exclusive). Therefore the bins in this case are \([0,1),[1,2),[2,3),[3,4),[4,5),[5,6)\).

  • density=True specifies that the height of each bar is the relative frequency of each value as opposed to the total count of each value.

  • align='left' and rwidth=0.5 specify that the center of each bar is located at the edge of the bin and the width of each bar is 0.5 times the width of the bin.

Finally, the function plt.hist returns 3 objects:

  • freqs is the array of relative frequencies (in other words, the height of each bar in the graph)

  • bins is the array of limits of the bins

  • fig is the Matplotlib figure object (in case we want to further modify or style the figure)

print(freqs)
[0.0294 0.1644 0.3098 0.316  0.1506 0.0298]
print(bins)
[0. 1. 2. 3. 4. 5. 6.]

We know that the probability of getting \(k\) heads when flipping \(n\) coins is given by

\[ \frac{n \choose k}{2^n} \]

Therefore the histogram above should match approximately the figure below:

n = ncoins
k = np.arange(0,n+1)
b = sp.comb(n,k)
plt.bar(k,b/2**n)
plt.show()
../_images/33eeff0828098340c28a88cd5ec79cb7a60823ce0bc001b3cb8eeb180a4b23d4.png
print(b/2**n)
[0.03125 0.15625 0.3125  0.3125  0.15625 0.03125]

Examples#

Rolling Dice#

Generate 10000 random samples of the sum of 3 dice and construct the histogram.

ndice = 3
nsides = 6
nsamples = 10000
samples = np.random.randint(1,nsides+1,size=(nsamples,ndice))
results = np.sum(samples,axis=1)
freqs,bins,fig = plt.hist(results,bins=range(ndice,nsides*ndice+2),density=True,align='left',rwidth=0.5)
plt.grid(True)
plt.show()
../_images/d33a78284899ed53d610f8962237ff719a52d37411733eea27497fff2644c97f.png
print("Relative Frequencies:",freqs)
Relative Frequencies: [0.0055 0.0147 0.029  0.0456 0.0685 0.1079 0.1102 0.1305 0.1209 0.1195
 0.0873 0.0705 0.0428 0.0294 0.0123 0.0054]

Weighted Coin#

Consider an unfair coin which flips to heads with probability 2/3. Generate 1000 random samples of 7 flips, count the number of heads and construct the histogram.

omega = [0,1]
probs = [1/3,2/3]
nsamples = 1000
nflips = 7
samples = np.random.choice(omega,size=(nsamples,nflips),p=probs)
results = np.sum(samples,axis=1)
freqs,bins,fig = plt.hist(results,bins=range(0,nflips+2),density=True,align='left',rwidth=0.5)
plt.grid(True)
plt.show()
../_images/501bcec1adbb022af6f0dad9ec88fd673fb8d3ccadf48176cba434024bb2759f.png
print(freqs)
[0.    0.001 0.032 0.141 0.263 0.301 0.204 0.058]