Pipes
Pipes¶
Pipes are a very powerful tool for working with a sequence of commands. It consists on writing a sequence of jobs where the output of one task is the input of the next task. Suppose that you want to evaluate a composition of some functions:
One way to perform this in Julia
is simply copying this formula directly: i.e. h(g(f(x)))
.
However, working with this formulation can be problematic for two reasons.
First, the code becomes not very clean quite quickly.
Second, eliminating or adding some functions to the composition might be cumbersome due to a far location of the right bracket from a corresponding function.
To address this problem we can use pipes.
Piping is an efficient way for writing sequences of jobs not only in many programming languages (e.g. %>%
in tidyverse
in R
) but also in UN*X systems (by using |
operator).
In Julia
pipes can be used either with a built-in base implementation or with package Pipe
. In this section I am going to focus on the latter one, which is a more powerful solution.
The considered composition of functions \(h \circ g\circ f (x),\) can be written in the following way (note the different order than in \(h \circ g\circ f (x)!\)):
@pipe x |> f(_) |> g(_) |> h(_)
In the presented example our pipe start with input x
which is passed to function f
and represented by _
.
Next, the result of the operation f(x)
is passed as an argument (again represented as _
) to function g
.
The output of g
is the input of h
. This way it is equivalent to h(g(f(x)))
.
Let’s study the exponential grid from our previous example once again.
(I am aware that in this context collect
converting a range to an array is not necessary. I keep this to illustrate a few tasks in one line)
exp.( collect( range(log(1), stop=log(42), length=73) ) )
73-element Vector{Float64}:
1.0
1.0532831317160591
1.1094053555575891
1.1685179472442655
1.2307802429398609
1.2963600687379486
1.3654341930319522
1.4381888029888847
1.5148200064111028
1.5955343603388272
1.6805494278182591
1.7700943643360474
1.8644105355008187
⋮
23.727548568154436
24.991826663810592
26.323469455763327
27.726066345998433
29.203397991080454
30.7594464927957
32.3984061317844
34.12469467309464
35.94296527413145
37.85811902709873
39.87531816974189
42.00000000000001
Using pipes, we can do this in the following way:
using Pipe
@pipe range(log(1), stop=log(42), length=73) |> collect |> exp.(_)
73-element Vector{Float64}:
1.0
1.0532831317160591
1.1094053555575891
1.1685179472442655
1.2307802429398609
1.2963600687379486
1.3654341930319522
1.4381888029888847
1.5148200064111028
1.5955343603388272
1.6805494278182591
1.7700943643360474
1.8644105355008187
⋮
23.727548568154436
24.991826663810592
26.323469455763327
27.726066345998433
29.203397991080454
30.7594464927957
32.3984061317844
34.12469467309464
35.94296527413145
37.85811902709873
39.87531816974189
42.00000000000001
Some numerical methods operate on standardized intervals (\emph{e.g.}, Chebychev interpolation).
Note
Suppose that we want to standardize our grid to the interval \([-1,1]\)
Using pipes, we can do it by adding one additional step at the end and keep the code quite clean.
@pipe range(log(1), stop=log(42), length=73) |>
collect |>
exp.(_) |>
map(x -> 2*(x-1)/(42-1) - 1, _ ) #standardizing to [-1, 1]
73-element Vector{Float64}:
-1.0
-0.997400822843119
-0.9946631533874347
-0.991779612329548
-0.9887424271736653
-0.9855434112810757
-0.9821739418033194
-0.9786249364395666
-0.974886828955556
-0.970949543398106
-0.9668024669356947
-0.9624344212519002
-0.9578336324145942
⋮
0.10866090576363097
0.17033300799076057
0.23529119296406464
0.3037105534633382
0.3757755117600221
0.45168031672174136
0.5316295674041169
0.6158387645412018
0.7045348914210465
0.7979570257121331
0.8963569838898482
1.0000000000000004
The drawback of this approach is that ranges of the interval are hard-coded in the first operation range(log(1), stop=log(42), length=73)
and the last operation map(x -> 2*(x-1)/(42-1) - 1, _ )
.
To make the pipe more flexible we can parametrize the first, \(x_1,\) and the last element, \(x_n.\)
Operator _
representing the output of the previous process is optional for functions with one default input.
Pipes from package Pipe
allow to refer to certain elements of the output from the previous step. To this end, if we want to call the first and second elements of the earlier outcome we can use _[1]
and _[2]
, respectively.
In our example we can split the process of building the logarithmic grid into:
taking initial values \(x_1\) and \(x_n;\)
computing logs of those values;
building the equispaced grid for logs.
@pipe [1 42]|>
[log(_[1]) log(_[2])] |>
range(_[1], stop=_[2], length=73) |>
collect |>
exp.(_)
73-element Vector{Float64}:
1.0
1.0532831317160591
1.1094053555575891
1.1685179472442655
1.2307802429398609
1.2963600687379486
1.3654341930319522
1.4381888029888847
1.5148200064111028
1.5955343603388272
1.6805494278182591
1.7700943643360474
1.8644105355008187
⋮
23.727548568154436
24.991826663810592
26.323469455763327
27.726066345998433
29.203397991080454
30.7594464927957
32.3984061317844
34.12469467309464
35.94296527413145
37.85811902709873
39.87531816974189
42.00000000000001
This way we can we have \(x_1\) and \(x_s\) as the very first inputs that can be easily changed.
However, still process map(x -> 2*(x-1)/(42-1) - 1, _ )
is hard-coded at \(x_1 = 1\) and \(x_n=42.\) Admittedly, we could try to make it more flexible by adding maximum
and minimum
in this step:
@pipe [1 42]|>
[log(_[1]) log(_[2])] |>
range(_[1], stop=_[2], length=73) |>
collect |>
exp.(_)|>
map(x -> 2*(x-minimum(_))/(maximum(_)-minimum(_)) - 1, _ )
73-element Vector{Float64}:
-1.0
-0.997400822843119
-0.9946631533874347
-0.991779612329548
-0.9887424271736653
-0.9855434112810757
-0.9821739418033194
-0.9786249364395666
-0.974886828955556
-0.970949543398106
-0.9668024669356947
-0.9624344212519002
-0.9578336324145942
⋮
0.10866090576363074
0.17033300799076034
0.23529119296406442
0.303710553463338
0.3757755117600219
0.45168031672174114
0.5316295674041167
0.6158387645412016
0.7045348914210461
0.7979570257121329
0.896356983889848
1.0
Without pipes, a more Matlabian code will look like this:
x₁ = 1
xₙ = 42
y_aux = exp.( collect( range(log(x₁), stop=log(xₙ), length=73) ) )
y_proc = 2*(y_aux .- minimum(y_aux))./(maximum(y_aux).-minimum(y_aux)) .- 1
y_aux = nothing
x₁ = nothing
xₙ = nothing
Now suppose that you want to create an equispaced grid instead, which still standardized to the interval of \([-1, 1].\)
Using the Matlab
-style syntax it would require manual modifications in range
(elimination of logs in log(x₁)
and log(xₙ)
) and getting rid of exp.
.
x₁ = 1
xₙ = 42
y_aux = ( collect( range((x₁), stop=(xₙ), length=73) ) )
y_proc = 2*(y_aux .- minimum(y_aux))./(maximum(y_aux).-minimum(y_aux)) .- 1
y_aux = nothing
x₁ = nothing
xₙ = nothing
On the other hand, using pipes, we would need to comment two lines:
@pipe [1 42]|>
# [log(_[1]) log(_[2])] |>
range(_[1], stop=_[2], length=73) |>
collect |>
# exp.(_)|>
map(x -> 2*(x-minimum(_))/(maximum(_)-minimum(_)) - 1, _ )
73-element Vector{Float64}:
-1.0
-0.9722222222222222
-0.9444444444444444
-0.9166666666666666
-0.8888888888888888
-0.8611111111111112
-0.8333333333333333
-0.8055555555555556
-0.7777777777777778
-0.75
-0.7222222222222222
-0.6944444444444444
-0.6666666666666667
⋮
0.6944444444444446
0.7222222222222223
0.75
0.7777777777777777
0.8055555555555554
0.8333333333333335
0.8611111111111112
0.8888888888888888
0.9166666666666665
0.9444444444444446
0.9722222222222223
1.0
Note
In my opinion, using pipes during an exploratory analysis of our models is less prone to some coding errors than the other approach. Nonetheless, some of my friends from the private sector criticize overusage of pipes. According to them, such codes are harder to be deployed to production and tougher to monitor its execution. However, in my opinion, in contrast to the industry where codes are run many times, in the academia most of the time it is enough to run the correct code once.
A very quick concluding example. We can also use pipes on dictionaries. The problem discussed here can be addressed with pipes too. It is not the nicest solution, but it exemplifies using pipes on dictionaries:
equation = Dict([
("x" , collect(range(1, step=.1, length=100)) ),
("a", 1),
("b", 12),
("c", π)
])
equation["y"] = @pipe equation |> @. _["a"]*_["x"]^2 + _["b"]*_["x"] + _["c"];