Mara Averick recently tweeted about a blog post from February 2015 by Zev Ross, demonstrating the conversion of JSON data into a data frame (Using R to download and parse JSON: an example using data from an open data portal). I thought it would be interesting to revisit this data wrangling task with purrr, which has only been on CRAN since fall 2015.
Zev is dealing with data on food markets in New York State that, sadly but realistically, cannot be well-represented in simple rectangular form, such as CSV. Therefore the data is available as JSON and it’s our job to turn it into a useful data frame. This is a very, very common task and a typical use case for purrr.
I downloaded and unpacked the zip archive provided by Zev: foodMarkets.zip. These files are available in this tutorial’s repo in an unpacked state: foodMarkets.
Packages used below:
tbl_df
), which is friendly to list-columns.library(jsonlite)
#>
#> Attaching package: 'jsonlite'
#> The following object is masked from 'package:purrr':
#>
#> flatten
library(purrr)
library(listviewer)
library(tibble)
library(dplyr)
food_mkts_raw <- fromJSON("foodMarkets/retail_food_markets.json",
simplifyVector = FALSE)
data
When you first import JSON, it is common to have no clue what’s in there. You must explore.
str(food_mkts_raw, max.level = 1)
#> List of 2
#> $ meta:List of 1
#> $ data:List of 28355
#> .. [list output truncated]
Ooh, a data
component sounds promising!
str(food_mkts_raw$data, max.level = 1, list.len = 5)
#> List of 28355
#> $ :List of 23
#> .. [list output truncated]
#> $ :List of 23
#> .. [list output truncated]
#> $ :List of 23
#> .. [list output truncated]
#> $ :List of 23
#> .. [list output truncated]
#> $ :List of 23
#> .. [list output truncated]
#> [list output truncated]
Working hypothesis: we have data of similar form for each of 28355 food markets. A slightly mysterious, but homogeneous list.
Pull out just the data
component and store as food_mkts
.
food_mkts <- food_mkts_raw[["data"]]
meta
and columns
Before we move on, what’s in the meta
component of the raw JSON? It turns out we should focus on its sole component, which is named view
.
jsonedit(food_mkts_raw[[c("meta", "view")]])
The columns
component is especially valuable, because it tells us what data we have for each food market over in the data
component. This is an example of a common but, in my opinion, dangerous practice of storing information with no explanatory names and providing a separate lexicon. This is kind of like a CSV with variables named X1
through Xn
or no names at all.
We need to pull out the name
elements of columns
. We kick off with [[
vector indexing into a nested list, to drill down to the columns
sub-component. Then map_chr(..., "name")
to grab the name
s as character vector.
(cnames <- food_mkts_raw[[c("meta", "view", "columns")]] %>%
map_chr("name"))
#> [1] "sid" "id" "position" "created_at"
#> [5] "created_meta" "updated_at" "updated_meta" "meta"
#> [9] "County" "License Number" "Operation Type" "Estab Type"
#> [13] "Entity Name" "DBA Name" "Street Number" "Street Name"
#> [17] "Address Line 2" "Address Line 3" "City" "State"
#> [21] "Zip Code" "Square Footage" "Location"
Downstream work will be easier if the food market elements bear the names in cnames
. I believe this is how the JSON should have been to begin with. This makes exploration and troubleshooting easier and, more importantly, it helps in our goal to create a data frame.
food_mkts <- food_mkts %>%
map(set_names, cnames)
As a warm-up activity, pull out and simplify the data for one column, across all food markets. From Zev’s post, I decided to look at DBA Name
(“doing business as”).
food_mkts %>%
map_chr("DBA Name") %>%
head()
#> [1] "PLAZA 23 TRUCK STOP " "PRICE CHOPPER #245 "
#> [3] "PEACOCK " "FINYOUR FISHMONGER "
#> [5] "R&A GROCERY STORE " "ANTHONYS CHOC DIP FRUIT"
I think we’ll want to trim whitespace at some point!
Again, freeriding on Zev’s post, I know that 22 of the 23 variables can be extracted in a fairly simple way. Everything but Location
, which holds some unparsed JSON we must tackle separately.
Create a vector of our targetted variable names:
(to_process <- cnames[cnames != "Location"])
#> [1] "sid" "id" "position" "created_at"
#> [5] "created_meta" "updated_at" "updated_meta" "meta"
#> [9] "County" "License Number" "Operation Type" "Estab Type"
#> [13] "Entity Name" "DBA Name" "Street Number" "Street Name"
#> [17] "Address Line 2" "Address Line 3" "City" "State"
#> [21] "Zip Code" "Square Footage"
Goal: create a data frame with these variables and one row per food market.
Voice from the future: not all of these 22 variables are populated for every food market. There are explicit NULL
s. Zev handled this with a two-layered approach: catch the NULL
s and replace with NA
s, prior to combining with data from other markets via sapply()
and data.frame()
.
purrr’s handling of void, explicit NULL
, and implicit NULL
is quite good, but doesn’t address this case either. Therefore I too must fanny around with NULL
s before I can make my beautiful data frame.
Ideally, I would use map_df()
like this for food markets instead of just the first 3:
food_mkts[1:3] %>%
map_df(`[`, to_process)
#> # A tibble: 3 × 22
#> sid id position created_at
#> <int> <chr> <int> <int>
#> 1 1 D943D633-8DB5-4740-B5F4-06AAB6FC52E2 1 1412261567
#> 2 2 7A069C98-90B2-4B11-A4A7-3C4761D14DDD 2 1412261567
#> 3 3 68DBE61E-ACF4-4803-BD90-87380FDAA4F9 3 1412261567
#> # ... with 18 more variables: created_meta <chr>, updated_at <int>,
#> # updated_meta <chr>, meta <chr>, County <chr>, `License Number` <chr>,
#> # `Operation Type` <chr>, `Estab Type` <chr>, `Entity Name` <chr>, `DBA
#> # Name` <chr>, `Street Number` <chr>, `Street Name` <chr>, `Address Line
#> # 2` <chr>, `Address Line 3` <chr>, City <chr>, State <chr>, `Zip
#> # Code` <chr>, `Square Footage` <chr>
But I have learned the hard way that the NULL
s are a dealkiller.
How does map_df()
actually work? First, it calls map()
and, in my application, gets the sub-list of non-Location
variables for each food market. Then a list of these lists is shipped off to dplyr::bind_rows()
for row-binding and type conversion. But the presence of NULL
s breaks this workflow when there are markets, like the one below, that lack, say, a street number. Basically most of R’s data-frame-making facilities balk at NULL
s. Here is some exploration of that.
food_mkts[[67]][14:16]
#> $`DBA Name`
#> [1] "PORTOBELLA PETES "
#>
#> $`Street Number`
#> NULL
#>
#> $`Street Name`
#> [1] "BLDG 12 STATE CAMPUS DOL "
data.frame(food_mkts[[67]][14:16])
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0
food_mkts[66:68] %>%
map_df(`[`, to_process)
#> Error in eval(expr, envir, enclos): incompatible sizes (1 != 0)
I want to rescue this workflow, because the automatic type conversion from dplyr::bind_rows()
is very appealing.
Therefore, like Zev, I use a custom function safe_extract()
. It’s a slightly expanded version of [
that replaces NULL
with NA
.
safe_extract <- function(l, wut) {
res <- l[wut]
null_here <- map_lgl(res, is.null)
res[null_here] <- NA
res
}
safe_extract(food_mkts[[67]][14:16])
#> $`DBA Name`
#> [1] "PORTOBELLA PETES "
#>
#> $`Street Number`
#> [1] NA
#>
#> $`Street Name`
#> [1] "BLDG 12 STATE CAMPUS DOL "
Now I can get a data frame easily.
(mkts_df <- food_mkts %>%
map_df(safe_extract, to_process))
#> # A tibble: 28,355 × 22
#> sid id position created_at
#> <int> <chr> <int> <int>
#> 1 1 D943D633-8DB5-4740-B5F4-06AAB6FC52E2 1 1412261567
#> 2 2 7A069C98-90B2-4B11-A4A7-3C4761D14DDD 2 1412261567
#> 3 3 68DBE61E-ACF4-4803-BD90-87380FDAA4F9 3 1412261567
#> 4 4 9DA5A827-A0AD-4782-8CE2-5D6867B53293 4 1412261567
#> 5 5 5D624E4B-E4A8-4F3D-BF88-190663DF8FCF 5 1412261567
#> 6 6 86BC29A4-55DA-43C5-BDED-6EF2C92C3C8B 6 1412261567
#> 7 7 51EE0EB4-6D71-4C2E-AD44-E225A1A45987 7 1412261567
#> 8 8 22B148DF-F188-4245-8E69-1137F9A6064E 8 1412261567
#> 9 9 62526ABF-801E-46DE-97CD-05C2DDE6BB20 9 1412261567
#> 10 10 13153CBA-3E65-4D4D-9A4C-ACB24BCD33DF 10 1412261567
#> # ... with 28,345 more rows, and 18 more variables: created_meta <chr>,
#> # updated_at <int>, updated_meta <chr>, meta <chr>, County <chr>,
#> # `License Number` <chr>, `Operation Type` <chr>, `Estab Type` <chr>,
#> # `Entity Name` <chr>, `DBA Name` <chr>, `Street Number` <chr>, `Street
#> # Name` <chr>, `Address Line 2` <chr>, `Address Line 3` <chr>,
#> # City <chr>, State <chr>, `Zip Code` <chr>, `Square Footage` <chr>
Victory is ours. Well, partial victory. We still need to deal with Location
.
Self-contained code to go from raw JSON to data frame for 22 out of 23 variables, relying on automatic type conversion.
food_mkts_raw <- fromJSON("foodMarkets/retail_food_markets.json",
simplifyVector = FALSE)
cnames <- food_mkts_raw[[c("meta", "view", "columns")]] %>%
map_chr("name")
food_mkts <- food_mkts_raw[["data"]] %>%
map(set_names, cnames)
to_process <- cnames[cnames != "Location"]
safe_extract <- function(l, wut) {
res <- l[wut]
null_here <- map_lgl(res, is.null)
res[null_here] <- NA
res
}
mkts_df <- food_mkts %>%
map_df(safe_extract, to_process)
The more proper workflow is to build the columns first, with purrr’s type-specific mapping functions, then put them into a data frame. This is safer because it forces you to articulate and enforce type expectations for each variable.
NULL
s.Here’s a naïve approach for two variables:
food_mkts %>% {
tibble(
dba_name = map_chr(., "DBA Name"),
city = map_chr(., "City")
)
}
#> # A tibble: 28,355 × 2
#> dba_name city
#> <chr> <chr>
#> 1 PLAZA 23 TRUCK STOP ALBANY
#> 2 PRICE CHOPPER #245 WATERVLIET
#> 3 PEACOCK ALBANY
#> 4 FINYOUR FISHMONGER GUILDERLAND
#> 5 R&A GROCERY STORE ALBANY
#> 6 ANTHONYS CHOC DIP FRUIT ALBANY
#> 7 DOLLAR TREE #4168 GLENMONT
#> 8 GREAT WALL EXPRESS ASIA SLINGERLANDS
#> 9 INANC I OZBAY ALBANY
#> 10 SAGAMIA SLINGERLANDS
#> # ... with 28,345 more rows
Doing this for 22 variables is a drag, though it’s certainly possible, and even reasonable for the ingest of an important dataset.
However, because I can, I store the type of each variable and do extraction + type conversion programmatically. This lets us explore more of the functional programming side of purrr.
I exploit the variable types learned from the automatic type conversion in the “rows first” approach. Here’s a data frame with one row per variable, providing the expected class and the type-appropriate map_*()
and NA
. It will soon become clear why I use the specific variable names, .f
and .null
.
(vdf <- mkts_df %>%
map_chr(class) %>%
enframe(name = ".f", value = "type") %>%
mutate(vname = .f %>% tolower() %>% gsub("\\s+", "_", .),
mapper = c(integer = "map_int", character = "map_chr")[type],
.null = list(integer = NA_integer_,
character = NA_character_)[type]))
#> # A tibble: 22 × 5
#> .f type vname mapper .null
#> <chr> <chr> <chr> <chr> <list>
#> 1 sid integer sid map_int <int [1]>
#> 2 id character id map_chr <chr [1]>
#> 3 position integer position map_int <int [1]>
#> 4 created_at integer created_at map_int <int [1]>
#> 5 created_meta character created_meta map_chr <chr [1]>
#> 6 updated_at integer updated_at map_int <int [1]>
#> 7 updated_meta character updated_meta map_chr <chr [1]>
#> 8 meta character meta map_chr <chr [1]>
#> 9 County character county map_chr <chr [1]>
#> 10 License Number character license_number map_chr <chr [1]>
#> # ... with 12 more rows
Here’s how I could use do.call()
to create one variable based on one row of this data frame.
i <- 14 # DBA Name
vdf[i, ]
#> # A tibble: 1 × 5
#> .f type vname mapper .null
#> <chr> <chr> <chr> <chr> <list>
#> 1 DBA Name character dba_name map_chr <chr [1]>
do.call(vdf$mapper[i],
list(.x = food_mkts, .f = vdf$.f[i], .null = vdf$.null[[i]])) %>%
head()
#> [1] "PLAZA 23 TRUCK STOP " "PRICE CHOPPER #245 "
#> [3] "PEACOCK " "FINYOUR FISHMONGER "
#> [5] "R&A GROCERY STORE " "ANTHONYS CHOC DIP FRUIT"
do.call()
is a base R function that constructs and executes a function call from a function (name or actual function) and a list of arguments. It’s exactly what we need. But we also want to do this for each of the 22 variables.
How to scale up? We use the invoke()
family of functions in purrr. invoke()
itself is a pipe-friendly wrapper around do.call()
. invoke_map()
and friends allow you to map over functions and arguments in parallel, analagous to map2()
. The big difference with invoke_map()
is that the two primary inputs are a vector of functions and a vector of argument values. We use invoke_map_df()
here, which is the variant that requests to get a data frame back.
(mkts_df <- invoke_map_df(.f = set_names(vdf$mapper, vdf$vname),
.x = transpose(vdf[c(".f", ".null")]),
food_mkts))
#> # A tibble: 28,355 × 22
#> sid id position created_at
#> <int> <chr> <int> <int>
#> 1 1 D943D633-8DB5-4740-B5F4-06AAB6FC52E2 1 1412261567
#> 2 2 7A069C98-90B2-4B11-A4A7-3C4761D14DDD 2 1412261567
#> 3 3 68DBE61E-ACF4-4803-BD90-87380FDAA4F9 3 1412261567
#> 4 4 9DA5A827-A0AD-4782-8CE2-5D6867B53293 4 1412261567
#> 5 5 5D624E4B-E4A8-4F3D-BF88-190663DF8FCF 5 1412261567
#> 6 6 86BC29A4-55DA-43C5-BDED-6EF2C92C3C8B 6 1412261567
#> 7 7 51EE0EB4-6D71-4C2E-AD44-E225A1A45987 7 1412261567
#> 8 8 22B148DF-F188-4245-8E69-1137F9A6064E 8 1412261567
#> 9 9 62526ABF-801E-46DE-97CD-05C2DDE6BB20 9 1412261567
#> 10 10 13153CBA-3E65-4D4D-9A4C-ACB24BCD33DF 10 1412261567
#> # ... with 28,345 more rows, and 18 more variables: created_meta <chr>,
#> # updated_at <int>, updated_meta <chr>, meta <chr>, county <chr>,
#> # license_number <chr>, operation_type <chr>, estab_type <chr>,
#> # entity_name <chr>, dba_name <chr>, street_number <chr>,
#> # street_name <chr>, address_line_2 <chr>, address_line_3 <chr>,
#> # city <chr>, state <chr>, zip_code <chr>, square_footage <chr>
Partial victory is ours again! A data frame, one row per food market, with 22 variables. How did this work? Line-by-line:
.f = set_names(vdf$mapper, vdf$vname)
specifies the list of functions to iterate over: the type-specific mappers we pre-selected above. I bother to apply names here because they propagate to the variable names of the data frame..x = transpose(vdf[c(".f", ".null")])
gives the parallel list of function arguments, namely the strings specifying named elements to extract and the value to insert in place of explicit NULL
. Two things that might be confusing:
.f
is at the variable level, i.e. inside our iteration. In the previous line, .f
refers to the overall operation, i.e. setting up our iteration.transpose()
is needed to repackage the .f
and .null
variables. Instead of a list of two same-length vectors, we want a single list holding lists of length two.food_mkts
is the ...
argument. It’s a common input across all units of iteration.Self-contained code to go from raw JSON to data frame for 22 out of 23 variables, with explicit type specification.
food_mkts_raw <- fromJSON("foodMarkets/retail_food_markets.json",
simplifyVector = FALSE)
cnames <- food_mkts_raw[[c("meta", "view", "columns")]] %>%
map_chr("name")
food_mkts <- food_mkts_raw[["data"]] %>%
map(set_names, cnames)
vdf <- tribble(
~ .f, ~ type,
"sid", "integer",
"id", "character",
"position", "integer",
"created_at", "integer",
"created_meta", "character",
"updated_at", "integer",
"updated_meta", "character",
"meta", "character",
"County", "character",
"License Number", "character",
"Operation Type", "character",
"Estab Type", "character",
"Entity Name", "character",
"DBA Name", "character",
"Street Number", "character",
"Street Name", "character",
"Address Line 2", "character",
"Address Line 3", "character",
"City", "character",
"State", "character",
"Zip Code", "character",
"Square Footage", "character")
vdf <- vdf %>%
mutate(vname = .f %>% tolower() %>% gsub("\\s+", "_", .),
mapper = c(integer = "map_int", character = "map_chr")[type],
.null = list(integer = NA_integer_,
character = NA_character_)[type])
mkts_df <- invoke_map_df(.f = set_names(vdf$mapper, vdf$vname),
.x = transpose(vdf[c(".f", ".null")]),
food_mkts)
Location
We need to process the 23rd variable, Location
. There are no big new principles here, so this will heavy on the code and light on explanation.
Location
Pull Location
into its own list and inspect one element.
loc_raw <- food_mkts %>%
map("Location")
loc_raw[[345]]
#> [[1]]
#> [1] "{\"address\":\"196 VLIET BLVD\",\"city\":\"COHOES\",\"state\":\"NY\",\"zip\":\"12047\"}"
#>
#> [[2]]
#> [1] "42.77787640800045"
#>
#> [[3]]
#> [1] "-73.71585755099966"
#>
#> [[4]]
#> NULL
#>
#> [[5]]
#> [1] FALSE
Ah, more key-less JSON, giving rise to name-less lists. Before we do anything else, extract names from the meta
component and apply them. Re-inspect an element to verify we’ve succeeded.
i <- which(cnames == "Location")
location_meta <- food_mkts_raw[[c("meta", "view", "columns")]][[i]]
(lnames <- location_meta[["subColumnTypes"]] %>% flatten_chr())
#> [1] "human_address" "latitude" "longitude" "machine_address"
#> [5] "needs_recoding"
loc_raw <- loc_raw %>%
map(set_names, lnames)
loc_raw[[345]]
#> $human_address
#> [1] "{\"address\":\"196 VLIET BLVD\",\"city\":\"COHOES\",\"state\":\"NY\",\"zip\":\"12047\"}"
#>
#> $latitude
#> [1] "42.77787640800045"
#>
#> $longitude
#> [1] "-73.71585755099966"
#>
#> $machine_address
#> NULL
#>
#> $needs_recoding
#> [1] FALSE
The human_address
element of each location holds a JSON string:
ha_
to the variable names to avoid name conflicts downstream.ha <- loc_raw %>%
map("human_address") %>%
map_df(fromJSON) %>%
set_names(paste0("ha_", names(.)))
head(ha)
#> # A tibble: 6 × 4
#> ha_address ha_city ha_state ha_zip
#> <chr> <chr> <chr> <chr>
#> 1 240 CHURCH ST #242 ALBANY NY 12202
#> 2 515 19TH ST WATERVLIET NY 12189
#> 3 795 BROADWAY ALBANY NY 12207
#> 4 2050 WESTERN AVE GUILDERLAND NY 12084
#> 5 265 CENTRAL AVE ALBANY NY 12206
#> 6 1693 CENTRAL AVE ALBANY NY 12205
Convert everything else left in the location data to a data frame. Convert latitude
and longitude
into numeric variables and make sure they don’t look insane. This New York latitude and longitude map indicates that latitude should range between 40° and 50° and longitude between -80° and -72°.
ee <- loc_raw %>%
map_df(safe_extract, lnames[lnames != "human_address"]) %>%
mutate_at(vars(latitude, longitude), as.numeric)
head(ee)
#> # A tibble: 6 × 4
#> latitude longitude machine_address needs_recoding
#> <dbl> <dbl> <lgl> <lgl>
#> 1 42.63541 -73.75541 NA FALSE
#> 2 42.73111 -73.70496 NA FALSE
#> 3 42.65794 -73.74780 NA FALSE
#> 4 42.69787 -73.88924 NA FALSE
#> 5 42.66340 -73.77088 NA FALSE
#> 6 42.72312 -73.83912 NA FALSE
ee %>%
select(latitude, longitude) %>%
map(summary)
#> $latitude
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 40.51 40.71 40.84 41.46 42.45 45.00 1
#>
#> $longitude
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> -79.75 -74.18 -73.94 -74.58 -73.84 -71.92 1
Column bind the main data frame mkts_df
with the human address data in ha
and everything else from Location
.
(mkts_df <- bind_cols(mkts_df, ha, ee))
#> # A tibble: 28,355 × 30
#> sid id position created_at
#> <int> <chr> <int> <int>
#> 1 1 D943D633-8DB5-4740-B5F4-06AAB6FC52E2 1 1412261567
#> 2 2 7A069C98-90B2-4B11-A4A7-3C4761D14DDD 2 1412261567
#> 3 3 68DBE61E-ACF4-4803-BD90-87380FDAA4F9 3 1412261567
#> 4 4 9DA5A827-A0AD-4782-8CE2-5D6867B53293 4 1412261567
#> 5 5 5D624E4B-E4A8-4F3D-BF88-190663DF8FCF 5 1412261567
#> 6 6 86BC29A4-55DA-43C5-BDED-6EF2C92C3C8B 6 1412261567
#> 7 7 51EE0EB4-6D71-4C2E-AD44-E225A1A45987 7 1412261567
#> 8 8 22B148DF-F188-4245-8E69-1137F9A6064E 8 1412261567
#> 9 9 62526ABF-801E-46DE-97CD-05C2DDE6BB20 9 1412261567
#> 10 10 13153CBA-3E65-4D4D-9A4C-ACB24BCD33DF 10 1412261567
#> # ... with 28,345 more rows, and 26 more variables: created_meta <chr>,
#> # updated_at <int>, updated_meta <chr>, meta <chr>, county <chr>,
#> # license_number <chr>, operation_type <chr>, estab_type <chr>,
#> # entity_name <chr>, dba_name <chr>, street_number <chr>,
#> # street_name <chr>, address_line_2 <chr>, address_line_3 <chr>,
#> # city <chr>, state <chr>, zip_code <chr>, square_footage <chr>,
#> # ha_address <chr>, ha_city <chr>, ha_state <chr>, ha_zip <chr>,
#> # latitude <dbl>, longitude <dbl>, machine_address <lgl>,
#> # needs_recoding <lgl>
Trim any leading/trailing whitespace from the character variables.
mkts_df <- mkts_df %>%
mutate_if(is.character, trimws)
Take a look at some of the most interpretable variables.
mkts_df %>%
select(dba_name, city, latitude, longitude, square_footage)
#> # A tibble: 28,355 × 5
#> dba_name city latitude longitude square_footage
#> <chr> <chr> <dbl> <dbl> <chr>
#> 1 PLAZA 23 TRUCK STOP ALBANY 42.63541 -73.75541 3600
#> 2 PRICE CHOPPER #245 WATERVLIET 42.73111 -73.70496 0
#> 3 PEACOCK ALBANY 42.65794 -73.74780 2000
#> 4 FINYOUR FISHMONGER GUILDERLAND 42.69787 -73.88924 2100
#> 5 R&A GROCERY STORE ALBANY 42.66340 -73.77088 1500
#> 6 ANTHONYS CHOC DIP FRUIT ALBANY 42.72312 -73.83912 0
#> 7 DOLLAR TREE #4168 GLENMONT 42.60176 -73.79454 10000
#> 8 GREAT WALL EXPRESS ASIA SLINGERLANDS 42.64117 -73.86302 0
#> 9 INANC I OZBAY ALBANY 42.71746 -73.83267 0
#> 10 SAGAMIA SLINGERLANDS 42.64117 -73.86302 1000
#> # ... with 28,345 more rows