Open In Colab

We’re on a bit of a roll with the basics, so let’s do a review/examples of common ideas in sampling real quick:

  • Inverse CDF trick
  • Rejection sampling
  • Importance sampling

We’ll also touch on the central limit theorem. We might also come back to Markov chain Monte Carlo later.

# But first, imports.
from typing import Union
import warnings

import numpy as np
import pandas as pd
import plotnine as gg
from scipy import stats


Inverse CDF trick

Say we’re given a density such that its cumulative distribution function is invertible.

If we have , then by change of variables:

which implies that .


Given a density

So the CDF is

Inverting this on :

num_samples = 100000
ys = np.random.rand(num_samples)
xs = np.arccos(1 - 2 * ys)
ps = 0.5 * np.sin(xs)

df = pd.DataFrame({
    'x': xs,
    'p': ps,
p = (gg.ggplot(df)
     + gg.aes(x='x')
     + gg.geom_histogram(gg.aes(y='stat(density)'), bins=30, color='black', fill='pink')
     + gg.geom_line(gg.aes(y='p'), size=2)


Rejection sampling

Say I’m given a pdf , but I can’t get its inverse CDF for whatever reason. But, there’s another distribution that I do know how to sample from, and I have everywhere in the support of . Then we can use rejection sampling:

  1. Draw .
  2. Accept with probability .

The rejection probability will be

so clearly if we want an efficient sampling procedure we want to pick such that we don’t require too large.


Let’s consider the density:

Say I know how to sample from the normal distribution .

Let’s do rejection sampling.

def p(x: Union[float, np.floating, np.ndarray]) -> Union[float, np.ndarray]:
  """Piecewise definition of p(x)=1/pi sin^2(x) for x in [0, 2pi]."""
  lower_bound = 0
  upper_bound = 2 * np.pi
  y = np.sin(x)**2 / np.pi

  if isinstance(x, np.ndarray):
    y[x < 0] = 0.
    y[x > 2*np.pi] = 0.
  elif x < lower_bound or x > upper_bound:
    return 0.

  return y
# Define q, the distribution we can sample from.
normal = stats.distributions.norm(loc=np.pi)
q = normal.pdf

# Set k=10 (we'll see that this is *just* enough).
k = 10

# Plot these distributions
xs = np.linspace(-np.pi, 3*np.pi, 1000)
qs = k * q(xs)
ps = p(xs)

df = pd.DataFrame({
    r'$p(x)$': ps,
    r'$kq(x)$': qs,
    'x': xs

df = pd.melt(df, id_vars=['x'], var_name='distribution', value_name='density')

plot = (gg.ggplot(df)
        + gg.aes(x='x', y='density', color='distribution') 
        + gg.geom_line(size=2)


# Do rejection sampling.
num_samples = 10000

samples = normal.rvs(size=num_samples)
ratios = p(samples) / (k * q(samples))
accept = np.random.rand(num_samples) < ratios
accepted_samples = samples[accept]

print('Acceptance rate:', len(accepted_samples) / num_samples, '(k={}).'.format(k))
Acceptance rate: 0.0981 (k=10).
samples_df = pd.DataFrame({'x': accepted_samples})
plot + gg.geom_histogram(data=samples_df, mapping=gg.aes(x='x', y='stat(density)', inherit_aes=False), 
                         bins=50, color='black', fill='teal', alpha=0.6)


Importance sampling

Often we don’t care so much about drawing samples from some distribution but rather to compute expectations of things. This motivates importance sampling:

This allows us to generate Monte-Carlo estimates of w.r.t. by drawing samples from a completely different distribution .


In this example, we’ll use samples from the exponential distribution to compute the mean of the beta distribution .

p = stats.distributions.beta(a=2, b=4)
q = stats.distributions.expon()

xs = np.linspace(0, 5, num=1000)
ps = p.pdf(xs)
qs = q.pdf(xs)

df = pd.DataFrame({
    'x': xs,
    'p': ps,
    'q': qs,

df = pd.melt(df, id_vars=['x'], var_name='distribution', value_name='density')
gg.ggplot(df) + gg.aes(x='x', y='density', color='distribution') + gg.geom_line(size=2)


num_samples = 10000
samples = q.rvs(size=num_samples)
mean = np.mean(samples * p.pdf(samples) / q.pdf(samples))

print('Importance weighted estimate: {mean:.4f}. True mean: {true_mean:.4f}.'.format(mean=mean, true_mean=p.mean()))
Importance weighted estimate: 0.3357. True mean: 0.3333.

Central limit theorem

Say we draw a dataset of samples i.i.d. from some finite-variance distribution :

The central limit theorem (CLT) states that the sample mean

is a normally-distributed random variable:

Note that this holds regardless of – an astonishing result.


To demonstrate this we’ll draw a bunch of different datasets (trials) from an exponential distribution and compute the statistics of the sample mean, and compare them to the distribution we expect from the CLT.

# Empirically compute a histogram of the sample mean for various trials.

num_samples = 100  # Size of D.
num_trials = 1000

p = stats.distributions.expon()
results = []
for _ in range(num_trials):
  result = p.rvs(size=num_samples).mean()

df = pd.DataFrame({
    r'$\mu$': np.array(results)
# CLT says we should expect the sample mean to follow this distribution.
normal = stats.distributions.norm(loc=p.mean(), scale=np.sqrt(p.var() / num_samples))
xs = np.linspace(0.6, 1.4, num=1000)

p_df = pd.DataFrame({
  'x': xs,
  'p': normal.pdf(xs)
plot = (gg.ggplot(df)
        + gg.geom_line(data=p_df, mapping=gg.aes(x='x', y='p'), size=2, linetype='dashed')
        + gg.geom_histogram(mapping=gg.aes(x='$\mu$', y='stat(density)'), bins=20, colour='black', fill='beige', alpha=0.6)