Excerpt of the Gapminder data on life expectancy, GDP per capita, and population by country.
Format
The main data frame gapminder
has 1704 rows and 6 variables:
- country
factor with 142 levels
- continent
factor with 5 levels
- year
ranges from 1952 to 2007 in increments of 5 years
- lifeExp
life expectancy at birth, in years
- pop
population
- gdpPercap
GDP per capita (US$, inflation-adjusted)
The supplemental data frame gapminder_unfiltered
was not
filtered on year
or for complete data and has 3313 rows.
See also
country_colors
for a nice color scheme for the countries
Examples
str(gapminder)
#> tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
#> $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
#> $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#> $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
#> $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
#> $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
head(gapminder)
#> # A tibble: 6 × 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Afghanistan Asia 1952 28.8 8425333 779.
#> 2 Afghanistan Asia 1957 30.3 9240934 821.
#> 3 Afghanistan Asia 1962 32.0 10267083 853.
#> 4 Afghanistan Asia 1967 34.0 11537966 836.
#> 5 Afghanistan Asia 1972 36.1 13079460 740.
#> 6 Afghanistan Asia 1977 38.4 14880372 786.
summary(gapminder)
#> country continent year lifeExp
#> Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
#> Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
#> Algeria : 12 Asia :396 Median :1980 Median :60.71
#> Angola : 12 Europe :360 Mean :1980 Mean :59.47
#> Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
#> Australia : 12 Max. :2007 Max. :82.60
#> (Other) :1632
#> pop gdpPercap
#> Min. :6.001e+04 Min. : 241.2
#> 1st Qu.:2.794e+06 1st Qu.: 1202.1
#> Median :7.024e+06 Median : 3531.8
#> Mean :2.960e+07 Mean : 7215.3
#> 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
#> Max. :1.319e+09 Max. :113523.1
#>
table(gapminder$continent)
#>
#> Africa Americas Asia Europe Oceania
#> 624 300 396 360 24
aggregate(lifeExp ~ continent, gapminder, median)
#> continent lifeExp
#> 1 Africa 47.7920
#> 2 Americas 67.0480
#> 3 Asia 61.7915
#> 4 Europe 72.2410
#> 5 Oceania 73.6650
plot(lifeExp ~ year, gapminder, subset = country == "Cambodia", type = "b")
plot(lifeExp ~ gdpPercap, gapminder, subset = year == 2007, log = "x")
if (require("dplyr")) {
gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
summarise(lifeExp = median(lifeExp))
# how many unique countries does the data contain, by continent?
gapminder %>%
group_by(continent) %>%
summarize(n_obs = n(), n_countries = n_distinct(country))
# by continent, which country experienced the sharpest 5-year drop in
# life expectancy and what was the drop?
gapminder %>%
group_by(continent, country) %>%
select(country, year, continent, lifeExp) %>%
mutate(le_delta = lifeExp - lag(lifeExp)) %>%
summarize(worst_le_delta = min(le_delta, na.rm = TRUE)) %>%
filter(min_rank(worst_le_delta) < 2) %>%
arrange(worst_le_delta)
}
#> `summarise()` has grouped output by 'continent'. You can override using the
#> `.groups` argument.
#> # A tibble: 5 × 3
#> # Groups: continent [5]
#> continent country worst_le_delta
#> <fct> <fct> <dbl>
#> 1 Africa Rwanda -20.4
#> 2 Asia Cambodia -9.10
#> 3 Americas El Salvador -1.51
#> 4 Europe Montenegro -1.46
#> 5 Oceania Australia 0.170