The Surprising and Unfortunate Precedence of the Pipe Operator in R

I love the pipe operator in R. It is hard to imagine using R without it. Before R had a native pipe operator we had magrittr’s pipe operator (%>%) which I would always load if my analysis involved using tidyverse packages such as dplyr and tidyr. Piping encourages a style of programming that is particularly suited to data preparation, exploration, and modelling. Instead of ending up with either a lot of new variables in the environment or a sea of parentheses, we end up with a set of instructions neatly chained together by |>. One could argue the resulting pipeline reads like a paragraph describing the operations required to move from one object in our analysis to the next.

However, the pipe operator has a flaw. To understand why, we need to look at the magrittr pipe. The %>% operator is known as an infix operator and these have certain limitations. In particular, infix operators in R have precedence that is lower than :, but higher than * and /. Think of this as the programming equivalent of order of operations. We can confirm the precedence by defining a custom operator %add% that takes the input on the left and adds the number on the right. Unlike + which has a lower precedence than *, this new operator has a higher precedence.

`%add%` <- \(x,y) x+y
2*3+1

[1] 7

2*3%add%1

[1] 8

The precedence of infix operators like %add% and magrittr’s pipe can be narrowed down to somewhere between : and *. The following confirms : has a higher precedence than %add%.

2:3%add%1

[1] 3 4

You could also look at the abstract syntax tree (AST).

lobstr::ast(2:3%add%1)

█─`%add%` 
├─█─`:` 
│ ├─2 
│ └─3 
└─1

lobstr::ast(2*3%add%1)

█─`*` 
├─2 
└─█─`%add%` 
  ├─3 
  └─1

The base R pipe operator adopted a similar precedence to the magrittr pipe. This I think is unfortunate. Consider a situation where we would like to start a pipe chain with a mathematical expression. Perhaps we square the elements of a matrix and then do some further data processing. You might think something like the following would work, if you were not aware of the pipe operator’s precedence.

expression(
  mat * mat |>
    f1() |>
    f2() # |> ...
)

expression(mat * f2(f1(mat)))

Instead, we must write the following.

expression(
  (mat * mat) |>
    f1() |>
    f2() # |> ...
)

expression(f2(f1((mat * mat))))

Is this a big deal? Probably not if you are aware of it. Then it is just an inconvenience. However, even after being made aware of this, it does seem counter-intuitive and a bit disappointing. I don’t expect people to write statements like the following.

expression(x |> f1() |> f2() * y |> f2())

expression(f2(f1(x)) * f2(y))

On the other hand, I think someone might expect to be able to start a pipe with a computation or formula without having to add parentheses. The developers of the magrittr pipe were limited in their choice of precedence. However, the base R pipe should not have this limitation. In my ideal version of R the pipe operator has precedence between ~ and ->.

In reality, we are left with a pipe that sometimes requires parentheses when it doesn’t seem like it should be necessary, while allowing us to drop parentheses in situations where we wouldn’t dare to anyway. However, I think there is a silver lining. The weird precedence of the pipe operator encourages us to take care in how we start a pipe. Starting with an object is a safe approach and this fits into the piping paradigm. A pipe chain describes how to move from object A to object B using a series of functions. Thus we solve this problem with more pipes, which can surely only be a good thing.

expression(
  mat |>
    (\(m)m*m)() |>
    f1() |>
    f2() # |> ...
)

expression(f2(f1((function(m) m * m)(mat))))

Maybe there is another silver lining – the behaviour of the pipe operator provides an endless supply of “R puzzles”.