Load purrr and repurrrsive, which contains recursive list examples. If you’re just jumping here, the example datasets are introduced elsewhere, including via interactive listviewer widgets.
library(purrr)
library(repurrrsive)
map()
overviewRecall the usage of purrr’s core map()
function:
map(.x, .f, ...)
map(VECTOR_OR_LIST_INPUT, FUNCTION_TO_APPLY, OPTIONAL_OTHER_STUFF)
You can provide further arguments via ...
, but you don’t have to. The above expands to something like this:
res <- vector(mode = "list", length = length(.x))
res[[1]] <- .f(.x[[1]], ...)
res[[2]] <- .f(.x[[2]], ...)
## and so on, until the end of .x
res
Note that any additional arguments provided via ...
are used “as is” in each call to .f
. In other words, map()
is not vectorized over these arguments. If you need that, check out map2()
, pmap()
, and friends.
map()
function specificationOne of the main reasons to use purrr is the flexible and concise syntax for specifying .f
, the function to apply.
The shortcuts for extracting by name and position are covered thoroughly elsewhere and won’t be repeated here.
We demonstrate three more ways to specify general .f
:
We work with the Game of Thrones character list, got_chars
. Each character can have aliases, which are stored in a vector in each character’s component. We pull out the aliases for three characters to use as our demo.
aliases <- set_names(map(got_chars, "aliases"), map_chr(got_chars, "name"))
(aliases <- aliases[c("Theon Greyjoy", "Asha Greyjoy", "Brienne of Tarth")])
#> $`Theon Greyjoy`
#> [1] "Prince of Fools" "Theon Turncloak" "Reek" "Theon Kinslayer"
#>
#> $`Asha Greyjoy`
#> [1] "Esgred" "The Kraken's Daughter"
#>
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth" "Brienne the Beauty" "Brienne the Blue"
Use a pre-existing function. Or, as here, define one ourselves, which gives a nice way to build-in our specification for the collapse
argument.
my_fun <- function(x) paste(x, collapse = " | ")
map(aliases, my_fun)
#> $`Theon Greyjoy`
#> [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
#>
#> $`Asha Greyjoy`
#> [1] "Esgred | The Kraken's Daughter"
#>
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
Define an anonymous function on-the-fly, in the conventional way. Here we put our desired value for the collapse
argument into the function defintion itself.
map(aliases, function(x) paste(x, collapse = " | "))
#> $`Theon Greyjoy`
#> [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
#>
#> $`Asha Greyjoy`
#> [1] "Esgred | The Kraken's Daughter"
#>
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
Alternatively you can simply name the function and provide collapse
via ...
.
map(aliases, paste, collapse = " | ")
#> $`Theon Greyjoy`
#> [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
#>
#> $`Asha Greyjoy`
#> [1] "Esgred | The Kraken's Daughter"
#>
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
We saved possibly the best for last.
purrr provides a very concise way to define an anonymous function: as a formula. This should start with the ~
symbol and then look like a typical top-level expression, as you might write in a script. Use .x
to refer to the input, i.e. an individual element of the primary vector or list.
map(aliases, ~ paste(.x, collapse = " | "))
#> $`Theon Greyjoy`
#> [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
#>
#> $`Asha Greyjoy`
#> [1] "Esgred | The Kraken's Daughter"
#>
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
It’s rare to write these calls perfect and whole the first time. You should probably pilot your idea on a single element. Then drop your proven, working logic into one of the above templates. When things aren’t working as expected, consider: have you tried to skip too many steps? Pull out an example, get everything to work there, check it on another example, then scale back up again.
A development process for the above might look like this:
(a <- map(got_chars, "aliases")[[19]]) ## OOPS! NULL --> a useless example
#> NULL
(a <- map(got_chars, "aliases")[[16]]) ## ok good
#> [1] "Bran" "Bran the Broken" "The Winged Wolf"
paste(a, sep = " | ") ## OOPS! not what I want
#> [1] "Bran" "Bran the Broken" "The Winged Wolf"
paste(a, collapse = " | ") ## ok good
#> [1] "Bran | Bran the Broken | The Winged Wolf"
got_chars[15:17] %>% ## I am a programming god
map("aliases") %>%
map_chr(paste, collapse = " | ")
#> [1] "Varamyr Sixskins | Haggon | Lump"
#> [2] "Bran | Bran the Broken | The Winged Wolf"
#> [3] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"
Since we’ve simplifed the aliases to a single string for each character, we can hold them as an atomic character vector instead of as list. Wouldn’t it be nice to put that in a data frame, with another variable holding the names? The enframe()
function from tibble takes a named vector and promotes the names to a proper variable.
From the top, using four characters to conserve space:
aliases <- set_names(map(got_chars, "aliases"), map_chr(got_chars, "name"))
map_chr(aliases[c(3, 10, 20, 24)], ~ paste(.x, collapse = " | ")) %>%
tibble::enframe(value = "aliases")
#> # A tibble: 4 x 2
#> name aliases
#> <chr> <chr>
#> 1 Victarion Grey… The Iron Captain
#> 2 Davos Seaworth Onion Knight | Davos Shorthand | Ser Onions | Onion Lord…
#> 3 Eddard Stark Ned | The Ned | The Quiet Wolf
#> 4 Aeron Greyjoy The Damphair | Aeron Damphair
Alternative way to get same data frame
tibble::tibble(
name = map_chr(got_chars, "name"),
aliases = got_chars %>%
map("aliases") %>%
map_chr(~ paste(.x, collapse = " | "))
) %>%
dplyr::slice(c(3, 10, 20, 24))
#> # A tibble: 4 x 2
#> name aliases
#> <chr> <chr>
#> 1 Victarion Grey… The Iron Captain
#> 2 Davos Seaworth Onion Knight | Davos Shorthand | Ser Onions | Onion Lord…
#> 3 Eddard Stark Ned | The Ned | The Quiet Wolf
#> 4 Aeron Greyjoy The Damphair | Aeron Damphair
This is a very typical workflow: take an unwieldy nested list and, via extraction and/or simplification, produce a more approachable data frame.
These are the different ways to specify the function .f
in the map()
-type functions in purrr.
map(aliases, function(x) paste(x, collapse = "|"))
map(aliases, paste, collapse = "|")
map(aliases, ~ paste(.x, collapse = " | "))
Each character can be allied with one of the houses (or with several or with zero). These allegiances are held as a vector in each character’s component.
allegiances
that holds the characters’ house affiliations.nms
that holds the characters’ names.nms
to the allegiances
list via set_names
....
would be used “as is”. Specifically they are not used in a vectorized fashion. What happens if you pass collapse = c(" | ", " * ")
? Why is that?map2()
What if you need to map a function over two vectors or lists in parallel?
You can use map2()
for that. Here is the usage:
map2(.x, .y, .f, ...)
map(INPUT_ONE, INPUT_TWO, FUNCTION_TO_APPLY, OPTIONAL_OTHER_STUFF)
map2()
has all the type-specific friends you would expect: map2_chr()
, map2_lgl()
, etc.
How will we specify the function to apply? All the usual options are open.
What shall our example be? Each character has a free text field, giving the date and possibly location of his or her birth. Let’s paste that together with the character’s name to get a sentence.
First, obtain the two inputs.
nms <- got_chars %>%
map_chr("name")
birth <- got_chars %>%
map_chr("born")
Now map over both with an existing function, defined by us.
my_fun <- function(x, y) paste(x, "was born", y)
map2_chr(nms, birth, my_fun) %>% head()
#> [1] "Theon Greyjoy was born In 278 AC or 279 AC, at Pyke"
#> [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock"
#> [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke"
#> [4] "Will was born "
#> [5] "Areo Hotah was born In 257 AC or before, at Norvos"
#> [6] "Chett was born At Hag's Mire"
Anonymous function, conventional form.
map2_chr(nms, birth, function(x, y) paste(x, "was born", y)) %>% head()
#> [1] "Theon Greyjoy was born In 278 AC or 279 AC, at Pyke"
#> [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock"
#> [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke"
#> [4] "Will was born "
#> [5] "Areo Hotah was born In 257 AC or before, at Norvos"
#> [6] "Chett was born At Hag's Mire"
Anonymous function via formula. Use .x
and .y
to refer to the individual elements of the two primary inputs.
map2_chr(nms[16:18], birth[16:18], ~ paste(.x, "was born", .y)) %>% tail()
#> [1] "Brandon Stark was born In 290 AC, at Winterfell"
#> [2] "Brienne of Tarth was born In 280 AC"
#> [3] "Catelyn Stark was born In 264 AC, at Riverrun"
pmap()
What if you need to map a function over two or more vectors or lists in parallel?
You can use pmap()
for that. Here is the usage:
pmap(.l, .f, ...)
map(LIST_OF_INPUT_LISTS, FUNCTION_TO_APPLY, OPTIONAL_OTHER_STUFF)
words
df <- got_chars %>% {
tibble::tibble(
name = map_chr(., "name"),
aliases = map(., "aliases"),
allegiances = map(., "allegiances")
)
}
my_fun <- function(name, aliases, allegiances) {
paste(name, "has", length(aliases), "aliases and",
length(allegiances), "allegiances")
}
df %>%
pmap_chr(my_fun) %>%
tail()
#> [1] "Kevan Lannister has 1 aliases and 1 allegiances"
#> [2] "Melisandre has 5 aliases and 0 allegiances"
#> [3] "Merrett Frey has 1 aliases and 1 allegiances"
#> [4] "Quentyn Martell has 4 aliases and 1 allegiances"
#> [5] "Samwell Tarly has 7 aliases and 1 allegiances"
#> [6] "Sansa Stark has 3 aliases and 2 allegiances"