EC 320 - Introduction to Econometrics
2025
Data manipulation and querying is a very useful skill.
An equally important one is being able to visualize the data.
In this lecture, you will learn a handful of recipes to draw plots using ggplot
In particular, you will learn about:
We need some data.
As you’ll see, the type of plot you will want to draw depends crucially on the type of data that you have.
Suppose we have a tibble called students
that looks like this:
sex | study_time | grade1 | final_grade |
---|---|---|---|
female | 5-10 hrs | 94.9 | 95.4 |
male | 2-5 hrs | 79.6 | 73.7 |
female | 5-10 hrs | 64.2 | 49.1 |
. . . | . . . | . . . | . . . |
Where study_time
is hours per week studying math, grade1
is their first-semester math grade, and final_grade
is their final grade in the math course
sex | study_time | grade1 | final_grade |
---|---|---|---|
female | 5-10 hrs | 94.9 | 95.4 |
male | 2-5 hrs | 79.6 | 73.7 |
female | 5-10 hrs | 64.2 | 49.1 |
. . . | . . . | . . . | . . . |
We have some variables which are categorical (sex
takes either “male” or “female” and study_time
takes on 0-2 hrs, 2-5 hrs, etc.).
We have other variables which are numeric, and in particular, continuous (grade1
and final_grade
can take any value between 0 and 100)
These being different “type” of variables means we use different “recipes” to visualize them
I will be using the dataset you can find below the lecture, named students
Download it, open RStudio and in your terminal load it using:
The exact path will change depending on where it is stored on your device. I recommend right-clicking on the file itself and copying the path shown in the properties.
Note: This is not necessary for you to do. It’s just if you want to follow along
geom_bar()
Q: How many hours do students spend studying?
Here I pipe the tibble
into the function ggplot()
ggplot()
needs an aesthetic mapping. You need to tell it which variables in your dataset map to which visual aesthetic in the plot.
After the ggplot()
call, add any extra layer with +
geom_bar()
draws the bar plot using the previous instructions
Which admittedly looks ugly but we can spice it up later
geom_histogram()
Q: What is the grade distribution?
geom_boxplot()
Q. Do students who study more (discrete) earn higher grades (continuous)?
ggplot()
ggplot()
the aesthetic wrapped in aes()
geom_point()
Q. How well does a student’s first-semester grade predict their final grade in a (high school) class?
Which looks fine, but what about adding an additional layer: a best fit line
We can further modify our geom_smooth
line to make it a linear model using method = "lm"
and also remove the predicted standard errors se=FALSE
geom_bar()
Q. Do females report studying for longer than males?
Relationship between time studied and sex
We can do this in two different ways:
geom_bar()
Q. Do females report studying for longer than males?
Relationship between time studied and sex
We can do this in two different ways:
fill
aesthetic mapping to color in the bars using seperate colors for males and femalesSay we want to show a scatterplot and have it differentiate amongst categories using different colors. It is very similar to fill =
, but instead we use color =
.
Aesthetic mappings get wrapped in aes()
and map variables in your tibble to aesthetics in your plot like which variable gets drawn on the x-axis, which goes on the y-axis, and which variable is represented in color
Geoms are added to the plot using +
as layers
This worksheet will help you learn coding by doing. You will:
EC320, R Lecture | ggplot