Solutions

us_data <- filter(gapminder, country == "United States")

Q01. Replicate that scatterplot using ggplot() with a points layer

ggplot(data = us_data, aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

Q02. Add a main title and axis labels (labs) to your ggplot

us_data %>%
ggplot(data = ., aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  labs(x = "GDP per Capita", 
       y = "Life Expectancy", 
       title = "GDP Per Capita vs Life Exp. in US")

Q03. Use multiple geom’s in a ggplot (if there’s no blank ’__’, you don’t need to write anything!):

us_data %>%
ggplot(data = ., aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_line() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Q04. Change the plot above to facet by ‘year’ instead of faceting by ‘continent’.

ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  facet_wrap(~ year)

Q05. Draw a scatterplot comparing gdpPercap and lifeExp, where different continents are drawn with different colors.

ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) +
  geom_point()

Q06. Use ‘color’ to represent ‘continent’, and use ‘fill’ to represent ‘year’.

ggplot(
  data = gapminder,
  aes(x = gdpPercap, y = lifeExp, color = continent, fill = year)
  ) +
  geom_point(shape = 21)

Q07. Try mapping ‘shape’ to ‘continent’ inside of an aes() call.

ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point(aes(shape = continent))

Q08. So far we have 3 aesthetic mappings other than x and y axis:

They are ‘color’, ‘fill’, and ‘shape’. The last ‘aes’ we’ll talk about is ‘size’, which adjusts point size. Add 2 ‘aes’ to this plot: map ‘continent’ to ‘color’, and map ‘pop’ to ‘size’. How many legends are there now?

ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point(aes(color = continent, size = pop))

Q09. Draw a histogram to visualize lifeExp using geom_histogram().

ggplot(
  data = gapminder,
  aes(x = lifeExp, color = continent, fill = continent)
  ) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Q10. Experiment with different binwidths for your histogram.

ggplot(
  data = gapminder,
  aes(x = lifeExp, color = continent, fill = continent)
  ) +
  geom_histogram(binwidth = 5)

Q11. geom_freqpoly() is just like a histogram, but it uses lines instead of bars to communicate the number of observations in each bin.

Again, experiment with binwidth.

ggplot(
  data = gapminder, aes(x = lifeExp, color = continent)
  ) +
  geom_freqpoly(binwidth = 10)

Q12. Make a filled frequency polygon using an area plot: geom_area().

In the blank, experiment with setting ‘color’ versus ‘fill’ as ‘continent’.

ggplot(data = gapminder, aes(x = lifeExp, color = continent)) +
  geom_area(stat = "bin")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data = gapminder, aes(x = lifeExp, fill = continent)) +
  geom_area(stat = "bin")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Q13. Notice that the continents are stacked up on top of each other in the previous plot.

To change that behavior, set position = “dodge”.

ggplot(data = gapminder, aes(x = lifeExp, fill = continent)) +
  geom_area(stat = "bin", position = 'dodge')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Q14. The issue now is that there’s overplotting. Data for the Americas is totally hidden behind the other continents! One way to fix this is to adjust the transparency of points through ‘alpha’.

Setting ‘alpha = .5’ reduces the ‘geom’ transparency to 50%. Experiment with different alphas.

ggplot(data = gapminder, aes(x = lifeExp, fill = continent)) +
  geom_area(stat = "bin", position = "dodge", alpha = 0.5)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Q15. Experiment with setting ‘color’ and ‘fill’ to ‘continent’.

ggplot(data = gapminder, aes(x = lifeExp, fill = continent)) +
  geom_density()

ggplot(data = gapminder, aes(x = lifeExp, color = continent)) +
  geom_density()

Q16. With geom_density and fill, experiment with different alpha’s.

ggplot(data = gapminder, aes(x = lifeExp, fill = continent)) +
  geom_density(alpha = 0.6)

Q17. Add a vertical line to this plot.

Hint: for a vertical line, you’ll need to specify an x-intercept (and for a horizontal line, you’d need to specify a y-intercept). Do this with ‘xintercept = 70’.

ggplot(
  data = gapminder,
  aes(x = lifeExp, color = continent, fill = continent)
  ) +
  geom_density(alpha = .3) +
  geom_vline(xintercept = 70)

Q18. Include the vertical line you did above, and also note how the annotation (‘annotate()’) labels the line.

ggplot(
  data = gapminder,
  aes(x = lifeExp, color = continent, fill = continent)
  ) +
  geom_density(alpha = .3) +
  geom_vline(xintercept = 70) +
  annotate(
    "text", x = 70, y = .075, label = "70 years", angle = 90
    )

Q19. We’ve already talked about basic scatterplots. Try to recall how to draw a ggplot that plots ‘gdpPercap’ on the x-axis and ‘lifeExp’ on the y-axis.

ggplot(
  data = gapminder, 
  aes(x = gdpPercap, y = lifeExp)
  ) +
  geom_point()

Q20. Add a smoothed line layer to the plot you did above using geom_smooth().

ggplot(
  data = gapminder, 
  aes(x = gdpPercap, y = lifeExp)
  ) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Q21. Remove the standard error ribbon using geom_smooth(se = FALSE).

ggplot(
  data = gapminder, 
  aes(x = gdpPercap, y = lifeExp)
  ) +
  geom_point(se = FALSE)
## Warning in geom_point(se = FALSE): Ignoring unknown parameters: `se`

Q22. Visualize the OLS fit by using geom_smooth(method = “lm”).

“lm” stands for “linear model”.

ggplot(
  data = gapminder, 
  aes(x = gdpPercap, y = lifeExp)
  ) +
  geom_point() +
  geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Q23. Instead of plotting ‘gdpPercap’ on the x-axis, plot the log of ‘gdpPercap’ using log().

You should find that a linear model seems to fit this transformation much better.

ggplot(data = gapminder, aes(x = log(gdpPercap), y = lifeExp)) +
  geom_point() +
  geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Notice that the units on the x-axis of the previous plot are in log terms, which is hard to interpret. I prefer this method: Do a log transformation of the x-axis using ‘scale_x_log10()’, and use ‘labels = scales::comma’ to suppress scientific notation on the labels for the x-axis.

ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
  scale_x_log10(labels = scales::comma) +
  geom_point() +
  geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Q24. The second issue is that there’s over-plotting near the center.

I’d like a way to visualize how dense the points are getting. For this, we can replace ‘geom_point()’ with ‘geom_hex()’.

ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
  scale_x_log10(labels = scales::comma) +
  geom_hex() +
  geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Q25. Draw a line plot using ‘geom_line’. Compare US ‘gdpPercap’ over time to two other countries of your choice.

gapminder %>%
  filter(country %in% c("United States", "Costa Rica", "Mexico")) %>%
  ggplot(aes(x = year, y = gdpPercap, color = country)) +
  geom_line()

Q26. Draw a boxplot with geom_boxplot() that compares the gdpPercap (on the y-axis) of different continents (on the x-axis, also using color).

Experiment with applying a log transformation to gdpPercap (now on the y axis).

ggplot(
  data = gapminder,
  aes(x = continent, y = gdpPercap, color = continent)
  ) +
  geom_boxplot()

ggplot(
  data = gapminder,
  aes(x = continent, y = log(gdpPercap), color = continent)
  ) +
  geom_boxplot()

Q27. Change the plot from the previous question to be a violin plot using ‘geom_violin’.

Also include ‘fill’. Make sure you use ‘scale_y_log10()’ to transform the y-axis into log terms.

ggplot(
  data = gapminder,
  aes(x = continent, y = log(gdpPercap), color = continent, fill = continent)
  ) +
  scale_y_log10() +
  geom_violin()

Finally, check out how simple it is to change your ‘facet_wrap’ to a slick animation with ‘gganimate’!

The ‘facet_wrap’ version:

ggplot(
  data = gapminder,
  aes(x = log(gdpPercap), y = lifeExp)
) +
  geom_point(aes(color = continent)) +
  geom_density2d(color = "grey", alpha = .5) +
  facet_wrap(~ year)