Pipes

Pipes

Pipes are a very powerful tool for working with a sequence of commands. It consists on writing a sequence of jobs where the output of one task is the input of the next task. Suppose that you want to evaluate a composition of some functions:

\[h \circ g\circ f (x)= h(g(f(x))).\]

One way to perform this in Julia is simply copying this formula directly: i.e. h(g(f(x))). However, working with this formulation can be problematic for two reasons. First, the code becomes not very clean quite quickly. Second, eliminating or adding some functions to the composition might be cumbersome due to a far location of the right bracket from a corresponding function. To address this problem we can use pipes. Piping is an efficient way for writing sequences of jobs not only in many programming languages (e.g. %>% in tidyverse in R) but also in UN*X systems (by using | operator).

In Julia pipes can be used either with a built-in base implementation or with package Pipe. In this section I am going to focus on the latter one, which is a more powerful solution. The considered composition of functions \(h \circ g\circ f (x),\) can be written in the following way (note the different order than in \(h \circ g\circ f (x)!\)):

    @pipe x |> f(_) |> g(_) |> h(_)

In the presented example our pipe start with input x which is passed to function f and represented by _. Next, the result of the operation f(x) is passed as an argument (again represented as _) to function g. The output of g is the input of h. This way it is equivalent to h(g(f(x))).

Let’s study the exponential grid from our previous example once again. (I am aware that in this context collect converting a range to an array is not necessary. I keep this to illustrate a few tasks in one line)

exp.( collect( range(log(1), stop=log(42), length=73) ) )
73-element Vector{Float64}:
  1.0
  1.0532831317160591
  1.1094053555575891
  1.1685179472442655
  1.2307802429398609
  1.2963600687379486
  1.3654341930319522
  1.4381888029888847
  1.5148200064111028
  1.5955343603388272
  1.6805494278182591
  1.7700943643360474
  1.8644105355008187
  ⋮
 23.727548568154436
 24.991826663810592
 26.323469455763327
 27.726066345998433
 29.203397991080454
 30.7594464927957
 32.3984061317844
 34.12469467309464
 35.94296527413145
 37.85811902709873
 39.87531816974189
 42.00000000000001

Using pipes, we can do this in the following way:

using Pipe
@pipe range(log(1), stop=log(42), length=73) |> collect |> exp.(_)
73-element Vector{Float64}:
  1.0
  1.0532831317160591
  1.1094053555575891
  1.1685179472442655
  1.2307802429398609
  1.2963600687379486
  1.3654341930319522
  1.4381888029888847
  1.5148200064111028
  1.5955343603388272
  1.6805494278182591
  1.7700943643360474
  1.8644105355008187
  ⋮
 23.727548568154436
 24.991826663810592
 26.323469455763327
 27.726066345998433
 29.203397991080454
 30.7594464927957
 32.3984061317844
 34.12469467309464
 35.94296527413145
 37.85811902709873
 39.87531816974189
 42.00000000000001

Some numerical methods operate on standardized intervals (\emph{e.g.}, Chebychev interpolation).

Note

Suppose that we want to standardize our grid to the interval \([-1,1]\)

\[x^s_i = 2\cdot \frac{x_i -1}{x_n-x_1} -1\]

Using pipes, we can do it by adding one additional step at the end and keep the code quite clean.

@pipe range(log(1), stop=log(42), length=73) |> 
                                     collect |> 
                                     exp.(_) |>
            map(x -> 2*(x-1)/(42-1) - 1, _ ) #standardizing to [-1, 1]
73-element Vector{Float64}:
 -1.0
 -0.997400822843119
 -0.9946631533874347
 -0.991779612329548
 -0.9887424271736653
 -0.9855434112810757
 -0.9821739418033194
 -0.9786249364395666
 -0.974886828955556
 -0.970949543398106
 -0.9668024669356947
 -0.9624344212519002
 -0.9578336324145942
  ⋮
  0.10866090576363097
  0.17033300799076057
  0.23529119296406464
  0.3037105534633382
  0.3757755117600221
  0.45168031672174136
  0.5316295674041169
  0.6158387645412018
  0.7045348914210465
  0.7979570257121331
  0.8963569838898482
  1.0000000000000004

The drawback of this approach is that ranges of the interval are hard-coded in the first operation range(log(1), stop=log(42), length=73) and the last operation map(x -> 2*(x-1)/(42-1) - 1, _ ).

To make the pipe more flexible we can parametrize the first, \(x_1,\) and the last element, \(x_n.\)

Operator _ representing the output of the previous process is optional for functions with one default input.

Pipes from package Pipe allow to refer to certain elements of the output from the previous step. To this end, if we want to call the first and second elements of the earlier outcome we can use _[1] and _[2], respectively.

In our example we can split the process of building the logarithmic grid into:

  1. taking initial values \(x_1\) and \(x_n;\)

  2. computing logs of those values;

  3. building the equispaced grid for logs.

@pipe [1 42]|> 
            [log(_[1]) log(_[2])] |> 
range(_[1], stop=_[2], length=73) |> 
                          collect |> 
                            exp.(_)
73-element Vector{Float64}:
  1.0
  1.0532831317160591
  1.1094053555575891
  1.1685179472442655
  1.2307802429398609
  1.2963600687379486
  1.3654341930319522
  1.4381888029888847
  1.5148200064111028
  1.5955343603388272
  1.6805494278182591
  1.7700943643360474
  1.8644105355008187
  ⋮
 23.727548568154436
 24.991826663810592
 26.323469455763327
 27.726066345998433
 29.203397991080454
 30.7594464927957
 32.3984061317844
 34.12469467309464
 35.94296527413145
 37.85811902709873
 39.87531816974189
 42.00000000000001

This way we can we have \(x_1\) and \(x_s\) as the very first inputs that can be easily changed. However, still process map(x -> 2*(x-1)/(42-1) - 1, _ ) is hard-coded at \(x_1 = 1\) and \(x_n=42.\) Admittedly, we could try to make it more flexible by adding maximum and minimum in this step:

@pipe [1 42]|> 
            [log(_[1]) log(_[2])] |> 
range(_[1], stop=_[2], length=73) |> 
                          collect |> 
                           exp.(_)|>
map(x -> 2*(x-minimum(_))/(maximum(_)-minimum(_)) - 1, _ )
73-element Vector{Float64}:
 -1.0
 -0.997400822843119
 -0.9946631533874347
 -0.991779612329548
 -0.9887424271736653
 -0.9855434112810757
 -0.9821739418033194
 -0.9786249364395666
 -0.974886828955556
 -0.970949543398106
 -0.9668024669356947
 -0.9624344212519002
 -0.9578336324145942
  ⋮
  0.10866090576363074
  0.17033300799076034
  0.23529119296406442
  0.303710553463338
  0.3757755117600219
  0.45168031672174114
  0.5316295674041167
  0.6158387645412016
  0.7045348914210461
  0.7979570257121329
  0.896356983889848
  1.0

Without pipes, a more Matlabian code will look like this:

x₁ = 1
xₙ = 42

y_aux   = exp.( collect( range(log(x₁), stop=log(xₙ), length=73) ) )
y_proc  = 2*(y_aux .- minimum(y_aux))./(maximum(y_aux).-minimum(y_aux)) .- 1

y_aux = nothing
x₁    = nothing
xₙ    = nothing

Now suppose that you want to create an equispaced grid instead, which still standardized to the interval of \([-1, 1].\) Using the Matlab-style syntax it would require manual modifications in range (elimination of logs in log(x₁) and log(xₙ)) and getting rid of exp..

x₁ = 1
xₙ = 42

y_aux   = ( collect( range((x₁), stop=(xₙ), length=73) ) )
y_proc  = 2*(y_aux .- minimum(y_aux))./(maximum(y_aux).-minimum(y_aux)) .- 1

y_aux = nothing
x₁    = nothing
xₙ    = nothing

On the other hand, using pipes, we would need to comment two lines:

@pipe [1 42]|> 
            # [log(_[1]) log(_[2])] |> 
range(_[1], stop=_[2], length=73) |> 
                          collect |> 
                        #    exp.(_)|>
map(x -> 2*(x-minimum(_))/(maximum(_)-minimum(_)) - 1, _ )
73-element Vector{Float64}:
 -1.0
 -0.9722222222222222
 -0.9444444444444444
 -0.9166666666666666
 -0.8888888888888888
 -0.8611111111111112
 -0.8333333333333333
 -0.8055555555555556
 -0.7777777777777778
 -0.75
 -0.7222222222222222
 -0.6944444444444444
 -0.6666666666666667
  ⋮
  0.6944444444444446
  0.7222222222222223
  0.75
  0.7777777777777777
  0.8055555555555554
  0.8333333333333335
  0.8611111111111112
  0.8888888888888888
  0.9166666666666665
  0.9444444444444446
  0.9722222222222223
  1.0

Note

In my opinion, using pipes during an exploratory analysis of our models is less prone to some coding errors than the other approach. Nonetheless, some of my friends from the private sector criticize overusage of pipes. According to them, such codes are harder to be deployed to production and tougher to monitor its execution. However, in my opinion, in contrast to the industry where codes are run many times, in the academia most of the time it is enough to run the correct code once.

A very quick concluding example. We can also use pipes on dictionaries. The problem discussed here can be addressed with pipes too. It is not the nicest solution, but it exemplifies using pipes on dictionaries:

equation  = Dict([
                    ("x" , collect(range(1, step=.1, length=100)) ),
                    ("a", 1),
                    ("b", 12),
                    ("c", π)
                ])

equation["y"] = @pipe equation |>  @. _["a"]*_["x"]^2 + _["b"]*_["x"] + _["c"];