Exponential Distribution#

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

The exponential distribution (with parameter \(\lambda\)) is given by the probability density function

\[\begin{split} f(x) = \left\{ \begin{array}{ccc} \lambda e^{- \lambda x} & , & x \ge 0 \\ 0 & , & x < 0 \end{array} \right. \end{split}\]

We denote the exponential distribution by \(Exp(\lambda)\). The mean and variance are given by

\[ \mu = \frac{1}{\lambda} \hspace{1in} \sigma^2 = \frac{1}{\lambda^2} \]

Let’s plot the exponential distrbution for different values of \(\lambda\).

plt.figure(figsize=(10,4))
x = np.linspace(-1,4,1000)
exponential = lambda x,lam: lam*np.exp(-lam*x)*np.heaviside(x,1)
for lam in [.25,.5,1,2]:
    y = exponential(x,lam)
    plt.plot(x,y)
plt.title('Exponential Distribution $Exp(\lambda)$')
plt.legend(['$\lambda=1/4$','$\lambda=1/2$','$\lambda=1$','$\lambda=2$'])
plt.grid(True)
plt.show()
../../_images/1ccf18f74b85d41b8ec89685710b075928cc0e9dc12b92ef62d2ab07c9fda029.png

See also

Check out Wikipedia: Exponential Distribution for more information.

Example: Precipitation Data#

The file precipitation.csv consists of daily precipitation measured at the Vancouver Airport from 1995 to 2023. Let’s import the data, look the first few rows and then plot the histogram of precipitation.

df = pd.read_csv('precipitation.csv')
df.head()
day month year dayofyear precipitation
0 13 4 2023 103 0.0
1 12 4 2023 102 0.0
2 11 4 2023 101 6.2
3 10 4 2023 100 0.0
4 9 4 2023 99 9.1
df['precipitation'].hist(bins=np.arange(0,40.5,0.5),density=True)
plt.ylabel('Frequency'), plt.ylabel('Precipitation (mm)')
plt.title('Daily Precipitation (1995-2023)')
plt.grid(True)
plt.show()
../../_images/37b02b69ee66c42ee001009f79d551ea9b54e37159a58019ae932df833bf9939.png

Let’s focus on days with at least 2mm of precipitation:

df = df[df['precipitation'] > 2]
df['precipitation'].hist(bins=np.arange(0,40.5,0.5),density=True)
plt.xlabel('Precipitation (mm)'), plt.ylabel('Frequency')
plt.title('Days with Precipitation above 2mm (1995-2023)')
plt.grid(True)
plt.show()
../../_images/7459e591c46a6e8d88f2d3137ed08463c51039c4dfbfc6833def4547418935c8.png

To fit an exponential distribution we need to shift the data:

df['precipitation2'] = df['precipitation'] - 2
df['precipitation2'].hist(bins=np.arange(0,40.5,0.5),density=True)
plt.xlabel('Precipitation in excess of 2mm (mm)'), plt.ylabel('Frequency')
plt.title('Days with Precipitation above 2mm (1995-2023)')
plt.grid(True)
plt.show()
../../_images/48ce53373d32fb9e6b8d40dc88d78577e9a8465a5f26822f5bdf017461d17505.png

Compute the sample mean and variance:

mu = df['precipitation2'].mean()
sigma2 = df['precipitation2'].var()
print('mean =',mu,', variance =',sigma2)
mean = 8.064954486345904 , variance = 71.13641043695192

The sample mean provides an estimate of the parameter \(\lambda\):

lam = 1/mu
print('lambda =',lam)
lambda = 0.12399326018429686
df['precipitation2'].hist(bins=np.arange(0,40.5,0.5),density=True)
x = np.linspace(0,40,200)
y = exponential(x,lam)
plt.plot(x,y)
plt.xlabel('Precipitation in excess of 2mm (mm)'), plt.ylabel('Frequency')
plt.title('Days with Precipitation above 2mm (1995-2023)')
plt.grid(True)
plt.show()
../../_images/49f66195c894c077a7cf70833b52b20b19b4cf332bf39320dfdae5f9a20a5e9e.png

Finally, shift the data back again to better present the results:

df['precipitation'].hist(bins=np.arange(0,40.5,0.5),density=True)
x = np.linspace(0,40,400)
y = exponential(x-2,lam)
plt.plot(x,y)
plt.title('Days with Precipitation above 2mm (1995-2023)')
plt.xlabel('Precipitation (mm)'), plt.ylabel('Frequency')
plt.grid(True)
plt.show()
../../_images/1b6b6cb7039dcd1a9e631dd68caea752bb6d2122dc5e5aa9d2a46638fbf93b5e.png