Mistakes I have made. And made. And made again. Expect this page to grow as I (re)discover more gotchas.
library(purrr)
magrittr
dot tensionThe tidyverse takes its dot .
pronoun from the magrittr
package. It means “the thing we are operating on” and is also known as the “argument placeholder”.
You don’t need the dot when you’re using pipe-friendly functions and the planets align for you:
8 %>% log2()
#> [1] 3
## is same as
log2(8)
#> [1] 3
But sometimes the thing you’re passing into the right-hand side (RHS) is not the first argument:
2 %>% log(8)
#> [1] 0.3333333
## is not what I want and is not the same as
2 %>% log(8, .)
#> [1] 3
## or
2 %>% log(8, base = .)
#> [1] 3
And sometimes you want to prevent the left-hand side from being used as the (invisible) first argument on the RHS. So you have to enclose RHS in curly braces:
iris %>% {
c(rows = nrow(.), cols = ncol(.))
}
#> rows cols
#> 150 5
One last thing … and this leads to the gotcha. The .
can also be used to create a unary function:
att <- . %>% toupper() %>% paste("ALL THE THINGS!")
"open source" %>% att()
#> [1] "OPEN SOURCE ALL THE THINGS!"
"butter" %>% att()
#> [1] "BUTTER ALL THE THINGS!"
"teach" %>% att()
#> [1] "TEACH ALL THE THINGS!"
What is att
anyway?
att
#> Functional sequence with the following components:
#>
#> 1. toupper(.)
#> 2. paste(., "ALL THE THINGS!")
#>
#> Use 'functions' to extract the individual functions.
It is a “functional sequence”.
It’s fairly easy to write code where you think .
is a placeholder, but it generates a functional sequence.
Watch me.
library(purrr)
library(tibble)
x <- list(list(int = 1L, chr = 'a'), list(int = 2L, chr = 'b'))
## YES GOOD WORKS
x %>% {
tibble(id = map_int(., "int"),
chr = map_chr(., "chr"))
}
#> # A tibble: 2 x 2
#> id chr
#> <int> <chr>
#> 1 1 a
#> 2 2 b
## NO BAD DOES NOT WORK
x %>% {
tibble(id = . %>% map_int("int"),
chr = . %>% map_chr("chr"))
}
#> All columns in a tibble must be 1d or 2d objects:
#> * Column `id` is fseq
#> * Column `chr` is fseq
What went wrong?
. %>% map_int("int")
built a unary function, instead of passing x
into map_int()
. Do not start a pipeline with .
unless you want a unary function.
What does this have to do with purrr
?
If you’ve got a complicated object x
(e.g., a deeply nested list from JSON), you might build a data frame with repeated calls to map_*()
functions. Be careful where you put your dot .
!
purrr
is strict about typespurrr
’s type checking is very strict, which is overhwhelmingly positive. But it will force you to be more aware of integer vs. double.
set.seed(4561)
(x <- sample(1:5))
#> [1] 4 5 1 2 3
times_two <- function(x) x * 2
times_two(x)
#> [1] 8 10 2 4 6
x_list <- as.list(x)
## WTF?
x_list %>%
map_int(times_two)
#> Error: Can't coerce element 1 from a double to a integer
Why can I suddenly not multiply these numbers by 2?
Because we’ve said to expect integers back and, though the elements of x
are integer, the result of multiplying by the double 2 is double.
What can you do? Buckle down and make sure that integer stays integer, if that’s appropriate. Or loosen up and use map_dbl()
instead.
## GOOD, in the buckle down sense
times_two <- function(x) x * 2L
x_list %>%
map_int(times_two)
#> [1] 8 10 2 4 6
## GOOD, in the loosen up sense
times_two <- function(x) x * 2
x_list %>%
map_dbl(times_two)
#> [1] 8 10 2 4 6