Lecture 4: Sampling non-uniform distributions

One-dimensional distribution sampling

Introduction

Computer knows how to produce well a random number $\Upsilon$ in [0:1]. This is perfect if one wants to sample a uniform distribution. But what do we do if the distribution that we want to sample is not uniform? We shall show some explicative examples. There are many efficient ways to sample a gaussian distribution

Markov sampling of a Gaussian distribution

One way is to use the $\Upsilon$=random.uniform(0,1) to decides the allowed moves in a Markov chain Metropolis algorithm acording to the rule of thumb. Here is the naive pin-sampling algorithm.

In [4]:
import random, math, pylab
%matplotlib inline

x = 0.0
data = []
delta = 0.5
n_reject =0
for k in xrange(100000):
    x_new = x + random.uniform(-delta, delta)
    if random.uniform(0.0, 1.0) <  \
         math.exp (- x_new ** 2 / 2.0) / math.exp (- x ** 2 / 2.0): 
        x = x_new  
    else: 
         n_reject += 1       
    #print x        
    data.append(x)
print 'rejected moves', n_reject   
pylab.title('Markov sampling of a Gaussian distribution')
pylab.xlabel('x')
pylab.hist(data,bins=100,normed=True)
pylab.savefig('gauss_markov.png')
pylab.show()
rejected moves 10102

Direct sampling of a gaussian distribution

But it is not necessary to to a Markov chain if the distrivuition function can be "boxed". In this case we can do a direct sampling, as the children pebble game on the Monte Carlo beach.

In [5]:
import random, math,pylab
%matplotlib inline

data = []
y_max = 1.0 / math.sqrt(2.0 * math.pi)
x_cut = 5.0
n_data = 100000
n_accept = 0
n_reject =0
while n_accept < n_data:
    y = random.uniform(0.0, y_max)
    x = random.uniform(-x_cut, x_cut)
    if y < math.exp( - x **2 / 2.0)/math.sqrt(2.0 * math.pi): 
        n_accept += 1
        #print x
        data.append(x)
    else:
        n_reject += 1
print 'rejected moves', n_reject       
pylab.title('Direct-reject sampling of a Gaussian istribution')
pylab.xlabel('x')
pylab.xlim(-5, 5)
pylab.hist(data,bins=100,normed=True)
pylab.savefig('gauss_reject.png')
pylab.show()
rejected moves 300102

In this case the direct sampling works nicely. However rejection rate may become very (too much) high, above all when the boxing is a ill-defined procedure. Let's consider fo exemple another not so nice distribuition.

Nasty case

Let's consider a function that is difficult to bound, because diverging at some point, for instance $\pi(x)= 1/2\sqrt{x}$

Direct Sampling

In [6]:
import random, math, pylab

y_max = 500.0
x_cut = 1.0
n_data = 10000
data = []
n_accept = 0
n_reject = 0
while n_accept < n_data: 
    y = random.uniform(0.0, y_max)
    x = random.uniform(0.0, x_cut)
    if y < 1.0 / (2.0 * math.sqrt(x)):
        n_accept += 1
        data.append(x)
    else:
        n_reject += 1
        
print 'acceptance ratio', float(n_accept)/float(n_reject)
pylab.hist(data, bins=100, normed='True')
x = [a / 100.0 for a in xrange(1, 100)]
y = [1.0 / (2.0 * math.sqrt(a)) for a in x]
pylab.plot(x, y, 'red', linewidth = 2)
pylab.title('Theoretical distribution $\pi(x)={1}/{(2 \sqrt{x})}$ and normalized\
    \n histogram for '+str(n_accept)+' accepted samples',fontsize=16)
pylab.xlabel('$x$', fontsize=18)
pylab.ylabel('$\pi(x)$', fontsize=18)
pylab.show()
acceptance ratio 0.00199449439766

Markov Chain

In [7]:
import random, math, pylab

x = 0.2
delta = 0.5
data = []
y_max = 0
n_trials = 10000
for k in xrange(n_trials):
    x_new = x + random.uniform(-delta, delta)
    if x_new > 0.0 and x_new < 1.0:
        if random.uniform(0.0, 1.0) < math.sqrt(x) / math.sqrt(x_new): 
            x = x_new 
    if 1.0 / math.sqrt(x) > y_max: 
         y_max =  1.0 / math.sqrt(x)
         print y_max, x, k
    data.append(x)

pylab.hist(data, bins=100, normed='True')
pylab.xlabel('$x$', fontsize=16)
pylab.ylabel('$\pi(x)$', fontsize=16)
x = [a / 100.0 for a in xrange(1, 101)]
y = [0.5 / math.sqrt(a) for a in x]
pylab.plot(x, y, linewidth=1.5, color='r')
pylab.title('Theoretical distribution $\pi(x)={1}/{(2 \sqrt{x})}$ and normalized\
    \n histogram for '+str(len(data))+' samples',fontsize=16)
pylab.show()
2.2360679775 0.2 0
7.4942128026 0.017805245168 2
10.8943900969 0.00842547037467 81
13.3507233522 0.00561035582734 154
34.2400528486 0.000852964875015 810
61.4187973727 0.000265092464851 1238

Can we do better? Yes, let's start with a discrete distribution problem, the Alberto's Saturday night problem.

Tower Sampling

Discrete distribution

The question is, what to do this evening? We have a finite set of total $K$ possible activities to choose, each activity $k$ has a probability (espressing you desire to pick it) $\pi_k$.

Direct sampling

A possibility is to box this activities, the size of each box being proportional to $\pi_k$, and then "trow peebles", like portrayed in this picture.

As for the gaussian direct sampling, the method works but there is a rejection rate because some peebles will fall out of the box. This rejection rate is of the order of $\pi_{max}/\pi_{av.}$, and can become sometimes big. Can we find a method which is rejection free?

Discrete Tower Sampling

The idea is to pile up the boxes (each one rapresenting a different activity/choice) and form a tower high 1. Again the size of each bos is proportional to $\pi_k$.

Now we know for sure that each peeble will select an activity and rejection is zero. The procedure is then simple:

1) Pile up the boxes such that $\Pi_1= \pi_1$,$\Pi_2= \pi_1+ \pi_2$, ...up to $\Pi_K= \sum_{\hbox{all } j} \pi_j= 1$

2) Trow the peeble and find a random number $\Upsilon$ in (0,1)

3) Find the the activity k corresponding to $\Upsilon$.

The last point 3) is the tricky one.....how to do it?

3.i) If the number of disctrete choices is not too big, one just go over the list of $\Pi_k$ and find the $k$ such that $\Pi_{K-1} <\Upsilon< \Pi_K$

3.ii) Use a bisection method

iterate with the starting condition $k_{min}=1$, $k_{max}=K$

take $k_{min}$ and $k_{max}$, calculate $ \bar{k}= (k_{min}+k_{max})/2$

If $\Upsilon > \Pi_{\bar{k}} \quad k_{min}= \bar{k}$

else If $\Upsilon < \Pi_{\bar{k}-1} \quad k_{max}= \bar{k}$

else output $\bar{k}$ and exit

In [8]:
import random

# bisection search to find the bin corresponding to eta
def bisection_search(eta, w_cumulative):
    kmin = 0
    kmax = len(w_cumulative)
    while True:
        k = int((kmin + kmax) / 2)
        if w_cumulative[k] < eta:
            kmin = k
        elif w_cumulative[k - 1] > eta:
            kmax = k
        else:
            return k - 1

# sample an integer number according to weights
def tower_sample(weights):
    sum_w = sum(weights)
    w_cumulative = [0.0]
    for l in xrange(len(weights)):
        w_cumulative.append(w_cumulative[l] + weights[l])
    eta = random.random() * sum_w
    sampled_choice = bisection_search(eta, w_cumulative)
    return sampled_choice

weights = [0.4, 0.3, 0.8, 0.1, 0.2]
n_samples = 20
for sample in xrange(n_samples):
    print tower_sample(weights)
0
2
2
0
2
1
2
0
1
0
2
4
2
0
2
2
1
2
1
1

Continuous distribution with the tower sampling

Can the discrete towr sampling idea used to sample continuous distributions, especially helping with the nasty cases (it is by naow clear that for a gaussian case everything works, and we do not actually need tower sampling). Let's consider once again our nasty function $\pi(x)= 1/2\sqrt{x}$. The idea is to "discretize" it into hystograms, i.e. many boxes.

In the limit of virtually infinite hystograms, we get that the cumulative function $$\Pi(x)=\int^{x} dx' \pi(x')$$ The procedure is the same

1) Trow a peeble on the tower, prouce a random numeber $\Upsilon=$ random.uniform(0,1)

2) Find $\Upsilon= \Pi(x)$ (bisection)

3) Find $x=\Pi^{-1}(x)$

The weakness of the method is to determine $\Pi(x)$, which is equivalent to determine the primitive of a function and can be quite complicate.

For our nasty function $\pi(x)= 1/2\sqrt{x}$ this can be done analitycally \begin{equation} \Pi(x)= \sqrt{x}= \Upsilon \quad \quad x=\Upsilon^2 \quad \quad (*) \end{equation} an this just a trivial change of variable.

The following code show that the results is not too bad

In [9]:
import random
import pylab, bisect
 
def cumulative(l):
    acc,  L = 0, [0]
    for x in l:
        acc += x
        L.append(acc)
    return L
 
class rand_tower:
    L = []
    def __init__(self,l):
        self.L = cumulative(l)
    def call(self):
        u = random.uniform(0.0, self.L[-1])
        return bisect.bisect_right(self.L, u) - 1
 
def test(n, r):
    l = [ 1.0/float(i + 1.)**0.5 for i in range(n)]
    g = rand_tower(l)
    samples = [g.call() for i in range(r)]
    pylab.hist(samples, bins=n, range=(-0.5,n - 0.5), normed=True)
    y_vec = [l[k]/sum(l)  for k in range(n)]
    pylab.plot(range(n),y_vec,'ro')
    pylab.title('Tower sampling')
    pylab.savefig('Tower_sampling.png')
    pylab.show()
 
test(100, 1000000)

In particular our hystograms never overshoot the function, and even increasing rhe precision w do not hit the problems at the sourronding of zero that we encountered with the Markov chain and direct rejection sampling.

The Gaussian case again...

is however not convenient with the tower method, because of the primitive-determination issue. For the gaussian distribution $\pi(x)= \frac{1}{\sqrt{2}\pi} e^{-x^/2}$, the primitive is $$ \Pi(x)= \frac{1}{2} \left[1+ \hbox{erf}(x/\sqrt{2})\right]$$ with erf$(x)= \frac{2}{\sqrt{\pi}} \int_0^x \, dy \, e^-{y^2}$ Which cannot be calculated analitically. Morever, in order to determine x, we need the inverse of it. Fortunately erf is a very well known function and can be determined numerically very easily. But all these issues make you understanding that the tower sampling method is not convenient in this case.

Integration and sampling

The semplicity if the $\Pi(x)$ calculation for the "nasty-case" $\pi(x)= 1/2\sqrt{x}$ has brought to a simple relation $(*)$ above, between the variable to sample $x$ and the random uniform $\Upsilon$. Does not remind you of a changing of variable? This gibes us a genial idea on how to sample a non-uniform distribution using the uniform one ($\Upsilon$), if a changing of variable is possible. Let' consider for instnace a general distribution $\pi(x) \sim x^{\sigma}$. We want $$ \int_{0}^{1} \,1 \, d\Upsilon= C \int_{0}^{1} \,x^{\sigma} dx $$ we seek the trnasformation $$ d\Upsilon= C \,x^{\sigma} dx $$ this leads $$ \hbox{ran}(0,1)= \Upsilon= C' x^{\sigma+1}+ C" $$

which fixing the constants in such a way that the bounds of ran(0,1) correspon to x in 0 and 1

$$ x= \hbox{ran}(0,1)^{1/\sigma+1} $$

so to sample x it should be enough make a change of variables and use rand.unif !

In [10]:
import random, pylab
%matplotlib inline
sigma = -0.5
n_trials = 1000000
data=[]
for trial in xrange(n_trials):
    x = (random.uniform(0.0, 1.0)) ** (1.0 / (sigma + 1.0))
    data.append(x)

b_n= 100       
#y_vec = [ (float(k+1)/float(b_n))**sigma  for k in range(b_n)]
x = pylab.arange(0.0025, 1, 0.01);
y = (sigma+1)*x**sigma
pylab.title('variable-changing distribution')
pylab.xlabel('x')
pylab.hist(data,bins=b_n,normed=True)
pylab.plot(x, y,linewidth=2.5)
pylab.savefig('variable_sigma.png')
pylab.show() 

Multidimensional sampling with Gaussian integrals

We relate now the concepts of sampling and of integration. We then treat general methods for directly sampling non-uniform distributions. A number of very detailed texts have been written on the subject, see for example.[1]

Sample transformation: The example of Gaussian random numbers

As an illustration of how sampling relates to integration, we consider the Gaussian integral:

$$ \int_{-\infty}^{\infty} \frac{dx}{\sqrt{2 \pi}} \exp(-x^2/2) = 1 $$

As we learned many years ago, we square this integral, write it down once in x and once in y:

$$ \begin{align} \left[\int_{-\infty}^{\infty} \frac{dx}{\sqrt{2 \pi}} \exp(-x^2/2) \right]^2 & = \int_{-\infty}^{\infty} \frac{dx}{\sqrt{2 \pi}} \exp(-x^2/2) \int_{-\infty}^{\infty} \frac{dy}{\sqrt{2 \pi}} \exp(-y^2/2)\\ \ldots & = \int_{-\infty}^{\infty} \frac{dx dy}{2 \pi} \exp[-(x^2+y^2)/2], \end{align} $$

and then switch to polar coordinates $dx \, dy = r \, dr \, d\phi$ to reach the following expression for the integral

$$ \begin{align} \ldots & = \int_{0}^{2 \pi} \frac{d\phi}{2 \pi} \int_{0}^{\infty} r dr \exp(-r^2/2), \end{align} $$

We may then perform the substitutions

$$ r^2/2=\Upsilon (r dr = d\Upsilon)\quad \text{and}\quad \exp(-\Upsilon) = \Psi $$

to reach

$$ \begin{align} \ldots & = \int_{0}^{2 \pi} \frac{d\phi}{2 \pi}\ \int_{0}^{\infty} d\Upsilon \exp(-\Upsilon)\\ \ldots & = \int_{0}^{2 \pi} \frac{d\phi}{2 \pi}\ \int_{0}^{1} d\Psi = 1. \end{align} $$

In this last equation, we can compute the integrals (thus giving the value of the Gaussian integral), but we may in particular sample them:

$\phi$ is a uniform random variable between 0 and 2$\pi$ and $\Psi$ is a uniform random variable between 0 and 1. In the below program, we actually sample the random variables $\phi$ and $\Psi$, and transform these samples in several steps back to $x$ and $y$. This procedure carries the name of "sample transformation".

Python program for the Gaussian sample transformation

Note that the substitutions are applied to the samples $\phi$ and $\Psi$. This is also known as the Box-Muller transform.

In [11]:
from random import uniform
import math
import pylab
%matplotlib inline

def gauss(sigma):
    phi = uniform(0.0, 2.0 * math.pi)
    Psi = uniform(0.0, 1.0)
    Upsilon = - math.log(Psi)
    r = sigma * math.sqrt(2.0 * Upsilon)
    x = r * math.cos(phi)
    y = r * math.sin(phi)
    return x, y
 
listx = []; listy = []
for k in range(10000):
    x, y = gauss(1.0)
    listx.append(x)
    listy.append(y)
list = listx + listy
pylab.figure(1)
pylab.hist(list, bins=25, normed=True)
pylab.title('Histogram Gaussian')
pylab.savefig('gaussians_histogram.png')
pylab.figure(2)
pylab.title('Correlation x,y')
pylab.axis('equal')
pylab.plot(listx,listy,'ro')
pylab.savefig('gaussians_correlations.png')
pylab.show()

The advantage of the Gaussian sampling is that it gves a multidimensional spherically symmetric sampling.

Look how can we use gaussian sampling to uniformely sample the surface of a sphere

In [12]:
from mpl_toolkits.mplot3d import Axes3D
import pylab, random, math
%matplotlib inline

 
ax = pylab.gca(projection='3d')
ax.set_aspect('equal')
x, y, z = [], [], []
for i in range(2000):
    a, b, c = random.gauss(0.0, 1.0), random.gauss(0.0, 1.0), random.gauss(0.0, 1.0)
    length = math.sqrt(a ** 2 + b ** 2 + c ** 2)
    x.append(a / length)
    y.append(b / length)
    z.append(c / length)
pylab.plot(x, y, z, '.')
pylab.savefig('surface_sphere.png')
pylab.show()
In [ ]: