Random numbers
Contents
Random numbers¶
Uniform and normal distributions¶
To generate a random number drawn from the standard uniform distribution, we can use command rand
from package Random
:
using Random
rand()
0.22686673754119435
To get many random numbers presented in the array form, we must give the dimensions:
rand(3, 4)
3×4 Matrix{Float64}:
0.320465 0.447005 0.560518 0.990463
0.189469 0.514505 0.499002 0.00913538
0.976965 0.881017 0.449687 0.623803
The same command can be used to draw from the discrete uniform distribution. To this end, as the first argument we provide the list of elements as an array or a range. All the elements are drawn with equal probabilities:
@show rand([.2 .3 .5]);
@show rand(100:130);
rand([0.2 0.3 0.5]) = 0.5
rand(100:130) = 128
An array of the size \(S\times K\times N\) with elements drawn from list_of_elements
can be generated by using rand(list_of_elements, dim1, dim2,...)
, e.g.
rand(["a" "b" "c"], 4,3,2)
4×3×2 Array{String, 3}:
[:, :, 1] =
"a" "b" "a"
"a" "b" "a"
"a" "c" "a"
"a" "c" "b"
[:, :, 2] =
"c" "c" "a"
"a" "b" "b"
"a" "a" "b"
"a" "c" "a"
Analogously, you can draw numbers from the standard normal distribution \(\mathcal{N}(\mu=0,\sigma^2=1)\) by using randn
:
randn(2,3)
2×3 Matrix{Float64}:
-0.803837 -0.891013 -1.40048
-0.640025 -0.110768 0.361461
Multivariate normal distribution¶
Draw sample of \(10000\) observations from distribution given by the multivariate nomal distribution:
We know that if \(x\sim \mathcal{N}(\mu, \Sigma)\) and \(y=\alpha+Bx,\) then \(y\sim\mathcal{N}(\alpha+B \mu, B \Sigma B').\) Given \(x\sim \mathcal{N}(0, \mathbf{I}),\) we can produce \(y\) by transformation \(y=\left(\substack{2\\1}\right)+Bx,\) where \(B\) is a lower triangular matrix of Cholesky decomposition for
Using this property, to get \(\mathbf{y}\) we can draw two random variables \(x_1\) and \(x_2\) independetly from the standard normal distribution first.
X = randn(10000, 2); #2 columns; each column is a r.v.
10000×2 Matrix{Float64}:
0.614505 -0.668119
1.03135 -0.0948743
-0.29912 -0.100329
-1.04933 0.719724
0.693856 0.756485
-0.0218002 -0.0695528
-0.926921 -2.09854
0.595044 0.623781
1.78484 1.07401
0.380712 -1.4075
0.43448 -0.866373
-0.584113 -0.775322
-0.214707 -1.65594
⋮
0.269616 1.08181
-0.0072384 0.12458
1.16377 1.91824
-0.0882252 -0.392798
0.163183 -0.595299
0.1951 1.25567
-1.37787 0.375443
0.263264 0.43212
1.23051 0.724854
0.541672 0.0221197
-0.8804 -0.0077747
-0.79008 -2.21579
As we can see the mean and variance of both variables are equal to around \(0\) and \(1,\) respectively. Both variables are independent, so their covariance are close to \(0.\)
using Statistics
println("mean(x₁)=$(mean(X[:,1]))")
println("var(x₁)=$(var(X[:,1]))")
println("mean(x₂)=$(mean(X[:,2]))")
println("var(x₂)=$(var(X[:,2]))")
println("cov(x₁, x₂)=$(cov(X[:,1], X[:,2]))")
mean(x₁)=0.011790572458630223
var(x₁)=1.017881472956483
mean(x₂)=0.00869419286907865
var(x₂)=0.9779904586024419
cov(x₁, x₂)=-0.004860308248876938
In our example, the covariance matrix of \(\mathbf{y}\) is given by:
BΣBt = [.75 .1;
.1 .25]
2×2 Matrix{Float64}:
0.75 0.1
0.1 0.25
To conduct Cholesky decomposition we use command cholesky
from pakcage LinearAlgebra
.
Without getting too much into details, it creates a structure with elements that can be accessed by using .
after a created object.
To get a lower triangular matrix from the decomposition we need to call cholesky(BBt).L
.
This function is written in the object-oriented paradigm.
using LinearAlgebra
B = cholesky(BΣBt).L
2×2 LowerTriangular{Float64, Matrix{Float64}}:
0.866025 ⋅
0.11547 0.486484
Indeed, our new matrix, \(B\), is the matrix we are looking for as i.e.
B*B'
2×2 Matrix{Float64}:
0.75 0.1
0.1 0.25
To get \(\mathbf{y},\) for each pair \((x_1, x_2)\) we need to perform the following operation \(\left(\substack{y_1\\y_2}\right) =\left(\substack{2\\1}\right)+B\left(\substack{x_1\\x_2}\right).\)
We can use map
for this:
y_map = map( (x₁, x₂) ->begin
x_vec = [x₁, x₂]
y₁, y₂ = [1, 2] + B*x_vec
return [y₁ y₂]
end,
X[:,1], X[:,2])
10000-element Vector{Matrix{Float64}}:
[1.53217655115301 1.7459274847477118]
[1.8931791461680745 2.07293574116509]
[0.7409543704848185 1.916652193182746]
[0.09125707985321885 2.228968606594767]
[1.6008968037577533 2.4481374946126477]
[0.9811204469598399 1.963646380409594]
[0.19726301281148062 0.8720627526760132]
[1.5153230167250609 2.372169057107908]
[2.5457150109368563 2.7285827782992755]
[1.329706157059598 1.3592360754313089]
[1.376270955025524 1.628692827112217]
[0.4941436989058252 1.5553709014876684]
[0.8140586580676017 1.1696216776795643]
⋮
[1.2334940938277101 2.557414163254594]
[0.9937313599170234 2.059770309415118]
[2.0078514083441 3.067571893908056]
[0.9235947662915548 1.7987225969622231]
[1.1413205515461562 1.7292395134391374]
[1.1689612526831619 2.6333912202223053]
[-0.19326890795427332 2.023544627712599]
[1.2279933103053142 2.2406186240020043]
[2.0656503068252374 2.494716337140241]
[1.469101416967663 2.0733077324878817]
[0.2375514392985837 1.8945579256179952]
[0.31577041485013124 0.8308227480454937]
The returned object is an array with two-element vectors \((y_1, y_2)\). For convenience, we may want to extract those elements and arrange \((y_1, y_2)\) in a two-column array:
Y = Array{Float64}(undef, 10000,2);
Y[:, 1] = map(x-> x[1] ,y_map);
Y[:, 2] = map(x-> x[2] ,y_map);
As we see, means and covariances are approximately equal to values of the random variables from which we wanted to draw random numbers:
println("mean(y₁)=$(mean(Y[:,1]))")
println("var(y₁)=$(var(Y[:,1]))")
println("mean(y₂)=$(mean(Y[:,2]))")
println("var(y₂)=$(var(Y[:,2]))")
println("cov(y₁, y₂) = $(cov(Y[:,1], Y[:,2]))")
mean(y₁)=1.0102109352743347
var(y₁)=0.763411104717362
mean(y₂)=2.005591043620997
var(y₂)=0.24448344570534644
cov(y₁, y₂) = 0.09974046303320667
Instead of using map
, we could perform simple matrix operations.
The result would be exactly the same:
[1 2] .+ X*B'
10000×2 Matrix{Float64}:
1.53218 1.74593
1.89318 2.07294
0.740954 1.91665
0.0912571 2.22897
1.6009 2.44814
0.98112 1.96365
0.197263 0.872063
1.51532 2.37217
2.54572 2.72858
1.32971 1.35924
1.37627 1.62869
0.494144 1.55537
0.814059 1.16962
⋮
1.23349 2.55741
0.993731 2.05977
2.00785 3.06757
0.923595 1.79872
1.14132 1.72924
1.16896 2.63339
-0.193269 2.02354
1.22799 2.24062
2.06565 2.49472
1.4691 2.07331
0.237551 1.89456
0.31577 0.830823
Simulating Markov chains¶
Suppose that the income of a single household follows a certain Markov chain with the transition probability:
and income states equal to \(\mathbf{y} = [e^{-\frac{1}{2}}, 1 , e^{\frac{1}{2}}].\)
Simulate the dynamics of the income in the horizon of \(100\) periods. The initial income is equal to \(y_0=1.\)
Command rand
allows sampling from a finite set of elements where element has equal probabilities.
A similar thing can be done using sample
from package StatsBase
:
sample( exp.([-.5, 0, .5]) )
UndefVarError: sample not defined
Stacktrace:
[1] top-level scope
@ In[15]:1
[2] eval
@ ./boot.jl:360 [inlined]
[3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1094
As in the previous example, the probability is the same across elements:
draws = [sample(exp.([-.5, 0, .5])) for i ∈1:10000] # sample(exp.([-.5, 0, .5]) , 10000 ) will give the same result
countmap( draws )
Dict{Float64, Int64} with 3 entries:
1.64872 => 3356
1.0 => 3380
0.606531 => 3264
If we want to draw element with different probabilities, we have provide weights:
draws = [sample( exp.([-.5, 0, .5]), Weights( [.8, .1, .1] ) ) for i ∈1:10000]
countmap( draws )
Dict{Float64, Int64} with 3 entries:
1.64872 => 956
1.0 => 968
0.606531 => 8076
In the considered example, the probabilities of states in the next period depends on the current state. The transition matrix is given by:
P = [.8 .1 .1;
.1 .8 .1;
.1 .1 .8]
3×3 Matrix{Float64}:
0.8 0.1 0.1
0.1 0.8 0.1
0.1 0.1 0.8
To simulate the income dynamics we need to build a loop in which Weights
are taken from a row of the transition matrix associated with the current state:
history = Array{Int}(undef, 101); #index (!) of the current state
history[1] = 2;
using StatsBase
for τ ∈ 2:length(history)
history[τ] = sample(1:length(y),
Weights( P[history[τ-1], :] )
)
end
Eventually, we have one simulated dynamics of the Markov chain:
using Plots
plot(y[history])
ylabel!("Income")
xlabel!("Time")
Random numbers generated through a quantile function¶
Continuous cdfs¶
Suppose that we want to draw random numbers according to a distribution given by the following cumulative distribution function:
The function has different definitions for three intervals, \((-\infty, 0), [0,1], (1,+\infty)\).
Using a similar approach to the one used in creating the CRRA utility function, we can use ternary operator ?:
twice to create \(F(x)\) in Julia
:
F(x) = 0≤x≤1 ? x^2 :
( x< 0 ? 0 : 1)
F (generic function with 1 method)
As we can see, \(F(x)\) continuous and stricly increasing for \(x\in[0,1].\)
using Plots
plot(F, -.1, 1.2)
Notice that the values \(F(x)\) of each continuous cdf is uniformly distributed. This means that each value \(p\in[0,1]\) is equally likely. To draw random numbers for a random variable distributed according to \(F(x),\) we can use the quantile function (also called the inverse cdf). This function maps quantiles to a threshold value \(x\) below which random draws would fall \(p\) percent of the time. Next, draws from the uniform distribution can be used to represent quantiles of the distribution given by \(F(x).\) Using the quantile function, those draws can be projected on actual values, \(x.\)
For monotonically increasing and continuous cdf’s, this is simply the inverse function of the cdf. In our case, it is:
where \(p\in[0,1].\)
In Julia
we can define this function in the following way:
invF(p) = 0≤p≤1 ? √(p) : @error "Forbidden argument: p∉[0,1]"
invF (generic function with 1 method)
plot(invF, 0, 1)
xlabel!("p"); ylabel!("x")
Now we can generate some numbers from the uniform distribution and then by using invF
we project those draws on the actual values of \(x\). We do this exercise for \(10,000\) draws and \(20\) draws:
draws = rand(10000);
draws_x = invF.(draws);
draws_very_few = rand(20);
draws_x_very_few = invF.(draws_very_few);
First, let’s plot the empirical cdf’s of numbers drawn from the uniform distribution and compare them with the true cdf (which for the standardized uniform distribution is equal to \(G(x)=x\)).
To get empirical cdf’s, I will use ecdf
from package StatsBase
.
The output of ecdf
is a function that returns the implied values of the cdf for all values (not only the drawn ones).
using StatsBase
draws_ecdf = ecdf(draws);
draws_very_few_ecdf = ecdf(draws_very_few);
draws_ecdf(.5)
0.4986
draws_very_few_ecdf.( [.2 .4 .6] )
1×3 Matrix{Float64}:
0.3 0.6 0.75
using Plots
using Pipe
@pipe range(0,1, length=100) |> plot(_, draws_ecdf.(_), label="Empirical CDF (N=10,000)",
legend = :topleft)
@pipe range(0,1, length=100) |> plot!(_, draws_very_few_ecdf.(_), label ="Empirical CDF (N=20)")
@pipe range(0,1, length=100) |> plot!(_, map( x-> x, _), label ="True CDF of the uniform dist.")
title!("Draws of quantiles vs. cdf of the uniform dist.")
xlabel!("p")
ylabel!("cdf")
As can be seen, the empirical cdf with 10,000 draws resembles the true cdf of the uniform distribution very closely, while there is a much bigger discrepancy for the simulation only with 20 observations.
Now let’s compare the empirical cdf of implied \(x\) for 10,000 and 20 draws with the true cdf, the quadratic function from our example.
using StatsBase
cdf_x = ecdf(draws_x);
cdf_x_very_few = ecdf(draws_x_very_few);
using Pipe
@pipe range(0,1, length=100) |> plot(_, cdf_x.(_), label="Empirical CDF (N=10000)",
legend=:topleft)
@pipe range(0,1, length=100) |> plot!(_, cdf_x_very_few.(_), label ="Empirical CDF (N=20)")
@pipe range(0,1, length=100) |> plot!(_, F.(_), label ="True CDF")
Our exercise shows that by using the quantile function and random number generators from the uniform distribution, we can draw numbers from any arbitrary distribution, which exhibits continuous cdf and its quantile function is closed-form.
Continuous cdf but no closed-formed solution for the quantile function¶
Suppose that we want to draw random numbers according to a distribution given by the following cumulative distribution function:
Function \(G(x)\) is a continuous cdf just like in our previous example:
G(x) = 0≤x≤1 ? ( x^2 + x * √(x+3) )/3 :
( x< 0 ? 0 : 1)
plot(G, -.1, 1.1)
Similarly to \(F(x)\), \(G(x)\) is a continuous cdf. In contrast to our previous example, it is not easy to derive the inverse function of \(G(x)\) in the closed form. Recall, for a given quantile \(p^*,\) we want to find such \(x\) that meets \(p^* = F(x)\). Therefore, the quantile function is defined as:
This issue can be addressed numerically. We can run a root finding procedure for the function given by:
where \(p^*\) is the parameter.
Let’s ilustrate this by find the median of the distribution given by \(G(x).\)
For this, we can use function find_zero
from package Roots
:
using Roots
median = find_zero( x -> .5 - G(x), .1) #2nd argument: initial guess for $x$
0.6004852578317297
median
0.6004852578317297
For the distribution given by \(G(x)\), the median is equal to around \(0.6.\) We can illustrate this by using a plot:
plot(G, -.1, 1.1, label="cdf",
legend=:topleft)
plot!([-.1, median], [.5, .5], label="value of p*")
plot!([median, median], [0, .5], label="root of the cdf given p*")
xlabel!("x")
ylabel!("p")
Note
More generally, we can use function invG(p)
, which for a given input p
conducts root finding procedure find_zero( x -> p - G(x), .1)
.
invG(p) = find_zero( x -> p - G(x), .1)
invG (generic function with 1 method)
draws = rand(10000);
draws_G = invG.(draws);
draws_very_few = rand(20);
draws_G_very_few = invG.(draws_very_few);
ecdf_G = ecdf(draws_G);
ecdf_G_very_few = ecdf(draws_G_very_few);
@pipe range(0,1, length=100) |> plot(_, ecdf_G.(_), label="Empirical CDF (N=10000)",
legend=:topleft)
@pipe range(0,1, length=100) |> plot!(_, ecdf_G_very_few.(_), label ="Empirical CDF (N=20)")
@pipe range(0,1, length=100) |> plot!(_, G.(_), label ="True CDF")
Cdfs with discrete jumps¶
Suppose that we want to draw random numbers according to a distribution given by the following cumulative distribution function:
Just like before, we can define the cdf using ?.
ternary operator:
H(x) = 0≤x≤1 ? ( x <.5 ? x^3 : x^2) :
( x < 0 ? 0 : 1)
H (generic function with 1 method)
plot(H, -.1, 1.1)
In the considered function, there is a discrete jump at \(x=0.5,\) which means that there is a probability mass at this point. In our case, there is the likelihood of \(\left({.5}\right)^2-\left({.5}\right)^3=.125\) that the drawn number will be equal exactly to \(.5.\)
For distributions with discrete jumps, we need to use a generalized version of the quantile function:
To study distributions with discrete jumps
Consequently, the (extended) quantile for the one from the discussed example will be as follows:
invH(p) = 0≤p≤1 ? ( p<.125 ? ∛(p) :
.125 ≤ p ≤ .25 ? .5 :
√(p) ) :
@error "Forbidden argument: p∉[0,1]"
invH (generic function with 1 method)
All other steps will be exactly the same as in the initial example with a continuous cdf.
draws = rand(10000);
draws_H = invH.(draws);
draws_very_few = rand(20);
draws_H_very_few = invH.(draws_very_few);
ecdf_H = ecdf(draws_H);
ecdf_H_very_few = ecdf(draws_H_very_few);
@pipe range(0,1, length=100) |> plot(_, ecdf_H.(_), label="Empirical CDF (N=10000)",
legend=:topleft)
@pipe range(0,1, length=100) |> plot!(_, ecdf_H_very_few.(_), label ="Empirical CDF (N=20)")
@pipe range(0,1, length=1000) |> plot!(_, H.(_), label ="True CDF")
As we can see, the procedure (with enough number of draws) succesfully replicates the discrete jump at \(0.5.\)