Atomic vectors

It is useful to understand lists as a data structure that generalizes atomic vectors. So we really need to start there.

The garden variety R object is an atomic vector like these:

(v_log <- c(TRUE, FALSE, FALSE, TRUE))
#> [1]  TRUE FALSE FALSE  TRUE
(v_int <- 1:4)
#> [1] 1 2 3 4
(v_doub <- 1:4 * 1.2)
#> [1] 1.2 2.4 3.6 4.8
(v_char <- letters[1:4])
#> [1] "a" "b" "c" "d"

Atomic vectors are homogeneous. Each atom has the same flavor, by which I roughly mean type or storage mode, and is a scalar, by which I mean “has length one”. The above examples cover the most common flavors of R vectors (logical, integer, double, character), though you will eventually encounter more exotic ones.

Exercises

  1. Define the vectors above or similar. Use the family of is.*() functions to confirm vector type, e.g. is.logical(). You will need to guess or look some of them up. Long-term, you may wish to explore the rlang::is_*() family of functions.
  2. What do is.numeric(), is.integer(), and is.double() return for the vectors that hold floating point number versus integers?

You can construct a vector “by hand” with the c() function. We used it above to construct the logical vector. All the other vectors came about through other means and this is indicative of real life: most vectors aren’t made explicitly with c(). They tend to be created with some generator, like the 1:n shortcut, or via transformation of an existing object.

To “index a vector” means to address specific elements or atoms, either for reading or writing. We index a vector using square brackets, like so: x[something]. There are several ways to express which elements you want, i.e. there are several valid forms for something:

  • logical vector: keep elements of x for which something is TRUE and drop those for which it’s FALSE

    v_char[c(FALSE, FALSE, TRUE, TRUE)]
    #> [1] "c" "d"
    v_char[v_log]
    #> [1] "a" "d"
  • integer vector, all positive: the elements specified in something are kept
  • negative integers, all negative: the elements specified in something are dropped

    v_doub[2:3]
    #> [1] 2.4 3.6
    v_char[-4]
    #> [1] "a" "b" "c"
  • character vector: presumes that x is a named vector and the elements whose names are specified in something are kept not shown here, since none of our vectors are named

Exercises

  1. What happens when you request the zero-th element of one of our vectors?
  2. What happens when you ask for an element that is past the end of the vector, i.e. request x[k] when the length of x is less than k?
  3. We indexed a vector x with a vector of positive integers that is shorter than x. What happens if the indexing vector is longer than x?
  4. We indexed x with a logical vector of the same length. What happen if the indexing vector is shorter than x?

Do the exercises and you’ll see it’s possible to get an atomic vector of length zero and also to get elements that are NA. Notice that, in both of these scenarios, the underlying variable type is retained.

v_int[0]
#> integer(0)
typeof(v_int[0])
#> [1] "integer"
v_doub[100]
#> [1] NA
typeof(v_doub[100])
#> [1] "double"

Yes, there are different flavors of NA!

Coercion

Even though R’s vectors have a specific type, it’s quite easy to convert them to another type. This is called coercion. As a language for data analysis, this flexibility works mostly to our advantage. It’s why we generally don’t stress out over integer versus double in R. It’s why we can compute a proportion as the mean of a logical vector (we exploit automatic coercion to integer in this case). But unexpected coercion is a rich source of programming puzzles, so always consider this possibility when debugging.

There’s a hierarchy of types: the more primitive ones cheerfully and silently convert to those higher up in the food chain. Here’s the order:

  1. logical
  2. integer
  3. double
  4. character

For explicit coercion, use the as.*() functions.

v_log
#> [1]  TRUE FALSE FALSE  TRUE
as.integer(v_log)
#> [1] 1 0 0 1
v_int
#> [1] 1 2 3 4
as.numeric(v_int)
#> [1] 1 2 3 4
v_doub
#> [1] 1.2 2.4 3.6 4.8
as.character(v_doub)
#> [1] "1.2" "2.4" "3.6" "4.8"
as.character(as.numeric(as.integer(v_log)))
#> [1] "1" "0" "0" "1"

But coercion can also be triggered by other actions, such as assigning a scalar of the wrong type into an existing vector. Watch how easily I turn a numeric vector into character.

v_doub_copy <- v_doub
str(v_doub_copy)
#>  num [1:4] 1.2 2.4 3.6 4.8
v_doub_copy[3] <- "uhoh"
str(v_doub_copy)
#>  chr [1:4] "1.2" "2.4" "uhoh" "4.8"

Our numeric vector was silently coerced to character. Notice that R did this quietly, with no fanfare. Again, when debugging, always give serious thought to this question: Is this object of the type I think it is? How sure am I about that?

I end the discussion of atomic vectors with two specific examples of “being intentional about type”.

  • Use of type-specific NAs when doing setup.
  • Use of L to explicitly request integer. This looks weird but is a nod to short versus long integers. Just accept that this is how we force a literal number to be interpreted as an integer in R.
(big_plans <- rep(NA_integer_, 4))
#> [1] NA NA NA NA
str(big_plans)
#>  int [1:4] NA NA NA NA
big_plans[3] <- 5L
## note that big_plans is still integer!
str(big_plans)
#>  int [1:4] NA NA 5 NA
## note that omitting L results in coercion of big_plans to double
big_plans[1] <- 10
str(big_plans)
#>  num [1:4] 10 NA 5 NA

As the tutorial goes on, you’ll see how the purrr package makes it easier for you to be intentional and careful about type in your R programming.

Exercises

  1. Recall the hieararchy of the most common atomic vector types: logical < integer < numeric < character. Try to use the as.*() functions to go the wrong way. Call as.logical(), as.integer(), and as.numeric() on a character vector, such as letters. What happens?

Lists

What if you need to hold something that violates the constraints imposed by an atomic vector? I.e. one or both of these is true:

  • Individual atoms might have length greater than 1.
  • Individual atoms might not be of the same flavor.

You need a list!

A list is actually still a vector in R, but it’s not an atomic vector. We construct a list explicitly with list() but, like atomic vectors, most lists are created some other way in real life.

(x <- list(1:3, c("four", "five")))
#> [[1]]
#> [1] 1 2 3
#> 
#> [[2]]
#> [1] "four" "five"
(y <- list(logical = TRUE, integer = 4L, double = 4 * 1.2, character = "character"))
#> $logical
#> [1] TRUE
#> 
#> $integer
#> [1] 4
#> 
#> $double
#> [1] 4.8
#> 
#> $character
#> [1] "character"
(z <- list(letters[26:22], transcendental = c(pi, exp(1)), f = function(x) x^2))
#> [[1]]
#> [1] "z" "y" "x" "w" "v"
#> 
#> $transcendental
#> [1] 3.141593 2.718282
#> 
#> $f
#> function(x) x^2

We have explicit proof above that list components can

  • Be heterogeneous, i.e. can be of different “flavors”. Heck, they don’t even need to be atomic vectors – you can stick a function in there!
  • Have different lengths.
  • Have names. Or not. Or some of both.

Exercises

  1. Make the lists x, y, and z as shown above. Use is.*() functions to get to know these objects. Try to get some positive and negative results, i.e. establish a few things that x is and is NOT. Make sure to try is.list(), is.vector(), is.atomic(), and is.recursive(). Long-term, you may wish to explore the rlang::is_*() family of functions.

It should be clear that lists are much more general than atomic vectors. But they also share many properties: for example, they have length and they can be indexed.

List indexing

There is a new wrinkle when you index a list vs an atomic vector. There are 3 ways to index a list and the differences are very important:

  1. With single square brackets, i.e. just like we indexed atomic vectors. Note this always returns a list, even if we request a single component.

    x[c(FALSE, TRUE)]
    #> [[1]]
    #> [1] "four" "five"
    y[2:3]
    #> $integer
    #> [1] 4
    #> 
    #> $double
    #> [1] 4.8
    z["transcendental"]
    #> $transcendental
    #> [1] 3.141593 2.718282
  2. With double square brackets, which is new to us. This can only be used to access a single component and it returns the “naked” component. You can request a component with a positive integer or by name.

    x[[2]]
    #> [1] "four" "five"
    y[["double"]]
    #> [1] 4.8
  3. With the $, which you may already use to extract a single variable from a data frame (which is a special kind of list!). Like [[, this can only be used to access a single component, but it is even more limited: You must specify the component by name.

    z$transcendental
    #> [1] 3.141593 2.718282

My favorite explanation of the difference between the list-preserving indexing provided by [ and the always-simplifying behaviour of [[ is given by the “pepper shaker photos” in R for Data Science. Highly recommended!

Exercises

  1. Use [, [[, and $ to access the second component of the list z, which bears the name “transcendental”. Use the length() and the is.*() functions explored elsewhere to study the result. Which methods of indexing yield the same result vs. different?
  2. Put the same data into an atomic vector and a list:

    my_vec <- c(a = 1, b = 2, c = 3)
    my_list <- list(a = 1, b = 2, c = 3)

    Use [ and [[ to attempt to retrieve elements 2 and 3 from my_vec and my_list. What succeeds vs. fails? What if you try to retrieve element 2 alone? Does [[ even work on atomic vectors? Compare and contrast the results from the various combinations of indexing method and input object.

Vectorized operations

Many people are shocked to learn that the garden variety R object is a vector. A related fact is that many operations “just work” element-wise on vectors, with no special effort. This often requires an adjustment for people coming from other languages. Your code can be simpler than you think!

Consider how to square the integers 1 through n. The brute force method might look like:

n <- 5
res <- rep(NA_integer_, n) 
for (i in seq_len(n)) {
  res[i] <- i ^ 2
}
res
#> [1]  1  4  9 16 25

The R way is:

n <- 5
seq_len(n) ^ 2
#> [1]  1  4  9 16 25

Element-wise or vectorized operations are “baked in” to a surprising degree in R. Which is great. You get used to it soon enough.

But then there’s the let down. This happens for atomic vectors, but not, in general, for lists. This makes sense because, in general, there is no reason to believe that the same sorts of operations make sense for each component of a list. Unlike atomic vectors, they are heterogeneous.

Here’s a demo using as.list() to create the list version of an atomic vector.

## elementwise exponentiation of numeric vector works
exp(v_doub)
#> [1]   3.320117  11.023176  36.598234 121.510418
## put the same numbers in a list and ... this no longer works :(
(l_doub <- as.list(v_doub))
#> [[1]]
#> [1] 1.2
#> 
#> [[2]]
#> [1] 2.4
#> 
#> [[3]]
#> [1] 3.6
#> 
#> [[4]]
#> [1] 4.8
exp(l_doub)
#> Error in exp(l_doub): non-numeric argument to mathematical function

So, how do you apply a function elementwise to a list?! What is the list analogue of exp(v_doub)?

Use purrr::map()! The first argument is the list to operate on. The second is the function to apply.

library(purrr)
map(l_doub, exp)
#> [[1]]
#> [1] 3.320117
#> 
#> [[2]]
#> [1] 11.02318
#> 
#> [[3]]
#> [1] 36.59823
#> 
#> [[4]]
#> [1] 121.5104

Vocabulary: we talk about this as “mapping the function exp() over the list l_doub”. Conceptually, we loop over the elements of the list and apply a function.

my_list <- list(...)
my_output <- ## something of an appropriate size and flavor
for(i in seq_along(my_list)) {
  my_output[[i]] <- f(my_list([[i]]))
}

A major objective of this tutorial is to show you how to avoid writing these explicit for() loops yourself.