Random numbers

Uniform and normal distributions

To generate a random number drawn from the standard uniform distribution, we can use command rand from package Random:

using Random 

rand()
0.22686673754119435

To get many random numbers presented in the array form, we must give the dimensions:

rand(3, 4)
3×4 Matrix{Float64}:
 0.320465  0.447005  0.560518  0.990463
 0.189469  0.514505  0.499002  0.00913538
 0.976965  0.881017  0.449687  0.623803

The same command can be used to draw from the discrete uniform distribution. To this end, as the first argument we provide the list of elements as an array or a range. All the elements are drawn with equal probabilities:

@show  rand([.2 .3 .5]);
@show  rand(100:130);
rand([0.2 0.3 0.5]) = 0.5
rand(100:130) = 128

An array of the size \(S\times K\times N\) with elements drawn from list_of_elements can be generated by using rand(list_of_elements, dim1, dim2,...), e.g.

rand(["a" "b" "c"], 4,3,2)
4×3×2 Array{String, 3}:
[:, :, 1] =
 "a"  "b"  "a"
 "a"  "b"  "a"
 "a"  "c"  "a"
 "a"  "c"  "b"

[:, :, 2] =
 "c"  "c"  "a"
 "a"  "b"  "b"
 "a"  "a"  "b"
 "a"  "c"  "a"

Analogously, you can draw numbers from the standard normal distribution \(\mathcal{N}(\mu=0,\sigma^2=1)\) by using randn:

randn(2,3)
2×3 Matrix{Float64}:
 -0.803837  -0.891013  -1.40048
 -0.640025  -0.110768   0.361461

Multivariate normal distribution

Example 7

Draw sample of \(10000\) observations from distribution given by the multivariate nomal distribution:

\[\begin{split}\left(\substack{y_1\\y_2}\right)\sim \mathcal{N} \left( \left(\substack{2\\1}\right), \left[ \begin{array}{cc} .75 & .1 \\ .1 & .25 \\ \end{array} \right]\right).\end{split}\]

We know that if \(x\sim \mathcal{N}(\mu, \Sigma)\) and \(y=\alpha+Bx,\) then \(y\sim\mathcal{N}(\alpha+B \mu, B \Sigma B').\) Given \(x\sim \mathcal{N}(0, \mathbf{I}),\) we can produce \(y\) by transformation \(y=\left(\substack{2\\1}\right)+Bx,\) where \(B\) is a lower triangular matrix of Cholesky decomposition for

\[\begin{split}B\Sigma B'=\left[ \begin{array}{cc} .75 & .1 \\ .1 & .25 \\ \end{array} \right].\end{split}\]

Using this property, to get \(\mathbf{y}\) we can draw two random variables \(x_1\) and \(x_2\) independetly from the standard normal distribution first.

X = randn(10000, 2); #2 columns; each column is a r.v.
10000×2 Matrix{Float64}:
  0.614505   -0.668119
  1.03135    -0.0948743
 -0.29912    -0.100329
 -1.04933     0.719724
  0.693856    0.756485
 -0.0218002  -0.0695528
 -0.926921   -2.09854
  0.595044    0.623781
  1.78484     1.07401
  0.380712   -1.4075
  0.43448    -0.866373
 -0.584113   -0.775322
 -0.214707   -1.65594
  ⋮          
  0.269616    1.08181
 -0.0072384   0.12458
  1.16377     1.91824
 -0.0882252  -0.392798
  0.163183   -0.595299
  0.1951      1.25567
 -1.37787     0.375443
  0.263264    0.43212
  1.23051     0.724854
  0.541672    0.0221197
 -0.8804     -0.0077747
 -0.79008    -2.21579

As we can see the mean and variance of both variables are equal to around \(0\) and \(1,\) respectively. Both variables are independent, so their covariance are close to \(0.\)

using Statistics

println("mean(x₁)=$(mean(X[:,1]))")
println("var(x₁)=$(var(X[:,1]))")

println("mean(x₂)=$(mean(X[:,2]))")
println("var(x₂)=$(var(X[:,2]))")

println("cov(x₁, x₂)=$(cov(X[:,1], X[:,2]))")
mean(x₁)=0.011790572458630223
var(x₁)=1.017881472956483
mean(x₂)=0.00869419286907865
var(x₂)=0.9779904586024419
cov(x₁, x₂)=-0.004860308248876938

In our example, the covariance matrix of \(\mathbf{y}\) is given by:

BΣBt = [.75 .1;
        .1 .25]
2×2 Matrix{Float64}:
 0.75  0.1
 0.1   0.25

To conduct Cholesky decomposition we use command cholesky from pakcage LinearAlgebra. Without getting too much into details, it creates a structure with elements that can be accessed by using . after a created object. To get a lower triangular matrix from the decomposition we need to call cholesky(BBt).L. This function is written in the object-oriented paradigm.

using LinearAlgebra

B = cholesky(BΣBt).L
2×2 LowerTriangular{Float64, Matrix{Float64}}:
 0.866025   ⋅ 
 0.11547   0.486484

Indeed, our new matrix, \(B\), is the matrix we are looking for as i.e.

\[\begin{split}B B'=\left[ \begin{array}{cc} .75 & .1 \\ .1 & .25 \\ \end{array} \right].\end{split}\]
B*B'
2×2 Matrix{Float64}:
 0.75  0.1
 0.1   0.25

To get \(\mathbf{y},\) for each pair \((x_1, x_2)\) we need to perform the following operation \(\left(\substack{y_1\\y_2}\right) =\left(\substack{2\\1}\right)+B\left(\substack{x_1\\x_2}\right).\) We can use map for this:

y_map = map( (x₁, x₂) ->begin
                    x_vec = [x₁, x₂]
                    y₁, y₂ = [1, 2] + B*x_vec
                     return [y₁ y₂]
                end,
                  X[:,1], X[:,2])
10000-element Vector{Matrix{Float64}}:
 [1.53217655115301 1.7459274847477118]
 [1.8931791461680745 2.07293574116509]
 [0.7409543704848185 1.916652193182746]
 [0.09125707985321885 2.228968606594767]
 [1.6008968037577533 2.4481374946126477]
 [0.9811204469598399 1.963646380409594]
 [0.19726301281148062 0.8720627526760132]
 [1.5153230167250609 2.372169057107908]
 [2.5457150109368563 2.7285827782992755]
 [1.329706157059598 1.3592360754313089]
 [1.376270955025524 1.628692827112217]
 [0.4941436989058252 1.5553709014876684]
 [0.8140586580676017 1.1696216776795643]
 ⋮
 [1.2334940938277101 2.557414163254594]
 [0.9937313599170234 2.059770309415118]
 [2.0078514083441 3.067571893908056]
 [0.9235947662915548 1.7987225969622231]
 [1.1413205515461562 1.7292395134391374]
 [1.1689612526831619 2.6333912202223053]
 [-0.19326890795427332 2.023544627712599]
 [1.2279933103053142 2.2406186240020043]
 [2.0656503068252374 2.494716337140241]
 [1.469101416967663 2.0733077324878817]
 [0.2375514392985837 1.8945579256179952]
 [0.31577041485013124 0.8308227480454937]

The returned object is an array with two-element vectors \((y_1, y_2)\). For convenience, we may want to extract those elements and arrange \((y_1, y_2)\) in a two-column array:

Y = Array{Float64}(undef, 10000,2);
Y[:, 1] =  map(x-> x[1] ,y_map);
Y[:, 2] =  map(x-> x[2] ,y_map);

As we see, means and covariances are approximately equal to values of the random variables from which we wanted to draw random numbers:

println("mean(y₁)=$(mean(Y[:,1]))")
println("var(y₁)=$(var(Y[:,1]))")

println("mean(y₂)=$(mean(Y[:,2]))")
println("var(y₂)=$(var(Y[:,2]))")

println("cov(y₁, y₂) = $(cov(Y[:,1], Y[:,2]))")
mean(y₁)=1.0102109352743347
var(y₁)=0.763411104717362
mean(y₂)=2.005591043620997
var(y₂)=0.24448344570534644
cov(y₁, y₂) = 0.09974046303320667

Instead of using map, we could perform simple matrix operations. The result would be exactly the same:

[1 2] .+ X*B'
10000×2 Matrix{Float64}:
  1.53218    1.74593
  1.89318    2.07294
  0.740954   1.91665
  0.0912571  2.22897
  1.6009     2.44814
  0.98112    1.96365
  0.197263   0.872063
  1.51532    2.37217
  2.54572    2.72858
  1.32971    1.35924
  1.37627    1.62869
  0.494144   1.55537
  0.814059   1.16962
  ⋮          
  1.23349    2.55741
  0.993731   2.05977
  2.00785    3.06757
  0.923595   1.79872
  1.14132    1.72924
  1.16896    2.63339
 -0.193269   2.02354
  1.22799    2.24062
  2.06565    2.49472
  1.4691     2.07331
  0.237551   1.89456
  0.31577    0.830823

Simulating Markov chains

Example 8

Suppose that the income of a single household follows a certain Markov chain with the transition probability:

\[\begin{split}\mathbb{P}=\begin{bmatrix} .8 & .1 & .1\\ .1 & .8 & .1\\ .1 & .1 & .8\\ \end{bmatrix}\end{split}\]

and income states equal to \(\mathbf{y} = [e^{-\frac{1}{2}}, 1 , e^{\frac{1}{2}}].\)

Simulate the dynamics of the income in the horizon of \(100\) periods. The initial income is equal to \(y_0=1.\)

Command rand allows sampling from a finite set of elements where element has equal probabilities. A similar thing can be done using sample from package StatsBase:

sample( exp.([-.5, 0, .5]) )
UndefVarError: sample not defined

Stacktrace:
 [1] top-level scope
   @ In[15]:1
 [2] eval
   @ ./boot.jl:360 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

As in the previous example, the probability is the same across elements:

draws = [sample(exp.([-.5, 0, .5])) for i 1:10000] # sample(exp.([-.5, 0, .5]) , 10000 ) will give the same result
countmap( draws )
Dict{Float64, Int64} with 3 entries:
  1.64872  => 3356
  1.0      => 3380
  0.606531 => 3264

If we want to draw element with different probabilities, we have provide weights:

draws = [sample( exp.([-.5, 0, .5]), Weights( [.8, .1, .1]  ) ) for i 1:10000]
countmap( draws )
Dict{Float64, Int64} with 3 entries:
  1.64872  => 956
  1.0      => 968
  0.606531 => 8076

In the considered example, the probabilities of states in the next period depends on the current state. The transition matrix is given by:

P = [.8 .1 .1; 
    .1 .8 .1; 
    .1 .1 .8]
3×3 Matrix{Float64}:
 0.8  0.1  0.1
 0.1  0.8  0.1
 0.1  0.1  0.8

To simulate the income dynamics we need to build a loop in which Weights are taken from a row of the transition matrix associated with the current state:

history = Array{Int}(undef, 101); #index (!) of the current state
history[1] = 2; 

using StatsBase


for τ  2:length(history)
    history[τ] = sample(1:length(y), 
                        Weights( P[history[τ-1], :] )
                        )

end

Eventually, we have one simulated dynamics of the Markov chain:

using Plots
plot(y[history])
ylabel!("Income")
xlabel!("Time")
_images/random_numbers_46_0.svg

Random numbers generated through a quantile function

Continuous cdfs

Example 9

Suppose that we want to draw random numbers according to a distribution given by the following cumulative distribution function:

\[\begin{split} F(x) = \begin{cases} 0, \text{if $x< 0$ }\\ x^2, \text{if $x\in[0,1]$ }\\ 1, \text{if $x>1$ } \end{cases}\end{split}\]

The function has different definitions for three intervals, \((-\infty, 0), [0,1], (1,+\infty)\). Using a similar approach to the one used in creating the CRRA utility function, we can use ternary operator ?: twice to create \(F(x)\) in Julia:

F(x) = 0x≤1 ? x^2 :
              ( x< 0 ? 0 : 1)
F (generic function with 1 method)

As we can see, \(F(x)\) continuous and stricly increasing for \(x\in[0,1].\)

using Plots
plot(F, -.1, 1.2)
_images/random_numbers_53_0.svg

Notice that the values \(F(x)\) of each continuous cdf is uniformly distributed. This means that each value \(p\in[0,1]\) is equally likely. To draw random numbers for a random variable distributed according to \(F(x),\) we can use the quantile function (also called the inverse cdf). This function maps quantiles to a threshold value \(x\) below which random draws would fall \(p\) percent of the time. Next, draws from the uniform distribution can be used to represent quantiles of the distribution given by \(F(x).\) Using the quantile function, those draws can be projected on actual values, \(x.\)

For monotonically increasing and continuous cdf’s, this is simply the inverse function of the cdf. In our case, it is:

\[ Q(p) = \sqrt{p}, \]

where \(p\in[0,1].\) In Julia we can define this function in the following way:

invF(p) = 0p≤1 ? (p) : @error "Forbidden argument: p∉[0,1]" 
invF (generic function with 1 method)
plot(invF, 0, 1)
xlabel!("p"); ylabel!("x")
_images/random_numbers_56_0.svg

Now we can generate some numbers from the uniform distribution and then by using invF we project those draws on the actual values of \(x\). We do this exercise for \(10,000\) draws and \(20\) draws:

draws   = rand(10000);
draws_x = invF.(draws);
draws_very_few   = rand(20);
draws_x_very_few = invF.(draws_very_few);

First, let’s plot the empirical cdf’s of numbers drawn from the uniform distribution and compare them with the true cdf (which for the standardized uniform distribution is equal to \(G(x)=x\)). To get empirical cdf’s, I will use ecdf from package StatsBase. The output of ecdf is a function that returns the implied values of the cdf for all values (not only the drawn ones).

using StatsBase
draws_ecdf          = ecdf(draws);
draws_very_few_ecdf = ecdf(draws_very_few);
draws_ecdf(.5)
0.4986
draws_very_few_ecdf.( [.2 .4 .6] )
1×3 Matrix{Float64}:
 0.3  0.6  0.75
using Plots
using Pipe
@pipe range(0,1, length=100) |> plot(_, draws_ecdf.(_), label="Empirical CDF (N=10,000)",
                                legend = :topleft)
@pipe range(0,1, length=100) |> plot!(_, draws_very_few_ecdf.(_), label ="Empirical CDF (N=20)")
@pipe range(0,1, length=100) |> plot!(_, map( x-> x, _), label ="True CDF of the uniform dist.")
title!("Draws of quantiles vs. cdf of the uniform dist.")
xlabel!("p")
ylabel!("cdf")
_images/random_numbers_64_0.svg

As can be seen, the empirical cdf with 10,000 draws resembles the true cdf of the uniform distribution very closely, while there is a much bigger discrepancy for the simulation only with 20 observations.

Now let’s compare the empirical cdf of implied \(x\) for 10,000 and 20 draws with the true cdf, the quadratic function from our example.

using StatsBase
cdf_x = ecdf(draws_x);
cdf_x_very_few = ecdf(draws_x_very_few);
using Pipe
@pipe range(0,1, length=100) |> plot(_, cdf_x.(_), label="Empirical CDF (N=10000)",
                                legend=:topleft)
@pipe range(0,1, length=100) |> plot!(_, cdf_x_very_few.(_), label ="Empirical CDF (N=20)")
@pipe range(0,1, length=100) |> plot!(_, F.(_), label ="True CDF")
_images/random_numbers_67_0.svg

Our exercise shows that by using the quantile function and random number generators from the uniform distribution, we can draw numbers from any arbitrary distribution, which exhibits continuous cdf and its quantile function is closed-form.

Continuous cdf but no closed-formed solution for the quantile function

Example 10

Suppose that we want to draw random numbers according to a distribution given by the following cumulative distribution function:

\[\begin{split} G(x) = \begin{cases} 0, \text{if $x< 0$ }\\ \frac{x^2+x\sqrt{x+3}}{3}, \text{if $x\in[0,1]$ }\\ 1, \text{if $x>1$ } \end{cases}\end{split}\]

Function \(G(x)\) is a continuous cdf just like in our previous example:

G(x) =  0x≤1 ? ( x^2 + x * (x+3) )/3 :
                ( x< 0 ? 0 : 1)

plot(G, -.1, 1.1)
_images/random_numbers_71_0.svg

Similarly to \(F(x)\), \(G(x)\) is a continuous cdf. In contrast to our previous example, it is not easy to derive the inverse function of \(G(x)\) in the closed form. Recall, for a given quantile \(p^*,\) we want to find such \(x\) that meets \(p^* = F(x)\). Therefore, the quantile function is defined as:

\[ Q(p^*) = \{x\in \mathbb{R}:\quad p^* = F(x) \}. \]

This issue can be addressed numerically. We can run a root finding procedure for the function given by:

\[f(x) = p^* - F(x),\]

where \(p^*\) is the parameter.

Let’s ilustrate this by find the median of the distribution given by \(G(x).\) For this, we can use function find_zero from package Roots:

using Roots
median = find_zero( x -> .5 - G(x), .1) #2nd argument: initial guess for $x$
0.6004852578317297
median
0.6004852578317297

For the distribution given by \(G(x)\), the median is equal to around \(0.6.\) We can illustrate this by using a plot:

plot(G, -.1, 1.1, label="cdf", 
                  legend=:topleft)
plot!([-.1,  median], [.5, .5], label="value of p*")
plot!([median,  median], [0, .5], label="root of the cdf given p*")
xlabel!("x")
ylabel!("p")
_images/random_numbers_76_0.svg

Note

More generally, we can use function invG(p), which for a given input p conducts root finding procedure find_zero( x -> p - G(x), .1).

invG(p) = find_zero( x -> p - G(x), .1)
invG (generic function with 1 method)
draws   = rand(10000);
draws_G = invG.(draws);

draws_very_few   = rand(20);
draws_G_very_few = invG.(draws_very_few);
ecdf_G          = ecdf(draws_G);
ecdf_G_very_few = ecdf(draws_G_very_few);
@pipe range(0,1, length=100) |> plot(_, ecdf_G.(_), label="Empirical CDF (N=10000)",
                                legend=:topleft)
@pipe range(0,1, length=100) |> plot!(_, ecdf_G_very_few.(_), label ="Empirical CDF (N=20)")
@pipe range(0,1, length=100) |> plot!(_, G.(_), label ="True CDF")
_images/random_numbers_81_0.svg

Cdfs with discrete jumps

Example 11

Suppose that we want to draw random numbers according to a distribution given by the following cumulative distribution function:

\[\begin{split} H(x) = \begin{cases} 0, \text{if $x< 0$ }\\ x^3, \text{if $x\in [0,\frac{1}{2})$ }\\ x^2, \text{if $x\in[\frac{1}{2},1]$}\\ 1, \text{if $x>1$ } \end{cases} \end{split}\]

Just like before, we can define the cdf using ?. ternary operator:

H(x) = 0x≤1 ? ( x <.5 ? x^3 : x^2) :
               ( x < 0  ? 0   :  1)
H (generic function with 1 method)
plot(H, -.1, 1.1)
_images/random_numbers_86_0.svg

In the considered function, there is a discrete jump at \(x=0.5,\) which means that there is a probability mass at this point. In our case, there is the likelihood of \(\left({.5}\right)^2-\left({.5}\right)^3=.125\) that the drawn number will be equal exactly to \(.5.\)

For distributions with discrete jumps, we need to use a generalized version of the quantile function:

\[ Q(p)\,=\,\inf\left\{ x\in \mathbb{R} : p \le F(x) \right\}. \]

To study distributions with discrete jumps

\[ Q(p)\,=\,\inf\left\{ x\in \mathbb{R} : p \le F(x) \right\} \]

Consequently, the (extended) quantile for the one from the discussed example will be as follows:

\[\begin{split} Q(p) = \begin{cases} \sqrt[3]{p}, \text{if $p\in [0,\frac{1}{8})$ }\\ 0.5, \text{if $p\in [\frac{1}{8}, \frac{1}{4})$ }\\ \sqrt{p}, \text{if $p\in(\frac{1}{4},1]$}\\ \end{cases} \end{split}\]
invH(p) = 0p≤1 ? ( p<.125 ? (p) : 
                            .125  p  .25 ? .5 :
                                            (p)  ) : 
                 @error "Forbidden argument: p∉[0,1]" 
invH (generic function with 1 method)

All other steps will be exactly the same as in the initial example with a continuous cdf.

draws   = rand(10000);
draws_H = invH.(draws);

draws_very_few   = rand(20);
draws_H_very_few = invH.(draws_very_few);
ecdf_H          = ecdf(draws_H);
ecdf_H_very_few = ecdf(draws_H_very_few);
@pipe range(0,1, length=100) |> plot(_, ecdf_H.(_), label="Empirical CDF (N=10000)",
                                legend=:topleft)
@pipe range(0,1, length=100) |> plot!(_, ecdf_H_very_few.(_), label ="Empirical CDF (N=20)")
@pipe range(0,1, length=1000) |> plot!(_, H.(_), label ="True CDF")
_images/random_numbers_93_0.svg

As we can see, the procedure (with enough number of draws) succesfully replicates the discrete jump at \(0.5.\)