David J. Lilja (Click here for contact information)
Department of Electrical and Computer Engineering
University of Minnesota, Minneapolis

For citation information see: https://z.umn.edu/mapsUsingR

Copyright 2021 David J. Lilja (This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License) 1

Abstract

Plotting data on a map can be a powerful technique for visualizing geographical information. Animating that data – that is, making it move – can further enhance the understanding of the underlying data. This tutorial will teach you how to plot data on simple maps using ggplot2 and animate it using gganimate. You also will learn how to use dplyr to partition data into subsets and compute summary statistics of these subsets to be plotted onto a map.

1 Introduction

Maps are a great way to visualize certain types of data sets since they show at a glance how the data is distributed in space. Additionally, the color and size of the individual data points can be changed to show extra features or characteristics about the data. Animating the data points, that is, making them move, provides a powerful enhancement to a map to show how the data evolves over time.

This tutorial provides several examples to teach you how to plot spatial data on a map and make it move using the R programming language. The underlying philosophy of this tutorial is to keep the map as simple as possible while gradually building up additional layers of information. Each new layer should be added only if it increases our understanding of the data. It is all too easy to add too much color and motion, for example, which may make a map look exciting, but ultimately ends up obscuring the information we are trying to extract from the data.

We use the R package ggplot2 to generate the maps. This software package has a rich set of features for producing plots of all types. This tutorial will not make you an expert on ggplot2, though. For that, I suggest you read its documentation and study the huge variety of examples available on various websites. We instead focus on only those features of ggplot2 that we need to generate some interesting and useful maps. You will probably get more out of this tutorial if you have used ggplot2 previously. However, I think you can understand the examples well enough even if you have never used ggplot2 before. Once we have learned to plot data on a map, we show how the R package gganimate can be used to add motion.

Finally, we introduce the dplyr package. The functions in dplyr can be used to easily partition data into subsets and compute summary statistics of the subsets. These statistics then can be plotted onto a map instead of plotting the individual data points.

I also want to point out that this is not the only way to generate maps using R. There are many packages available for the R environment to help you produce all sorts of maps. The examples below demonstrate just one approach. I hope they provide a useful initial step for your further experimentation generating maps using R.

2 The Data Format and Basic Steps

The data set to be plotted on a map requires each data point to have a latitude and longitude associated with it. The longitude will determine the x coordinate on the plot of the map while the latitude will determine the y coordinate. To animate the map to show how the data changes over time, a time value, such as the date the data point was collected, needs to be associated with each point. In addition, some extra values may be associated with each data point. These extra values could indicate interesting characteristics about the data, such as its cost, size, or some interesting statistic, for instance. These extra values can be highlighted on the map by using the value to determine some attribute of the plotted point, such as its color or size. Thus, each data point should consist of the tuple (x, y, t, z1, z2, …) where x = longitude, y = latitude, t = time parameter, and z1, z2, etc. are optional values that indicate some interesting characteristics of the data. As a simple example, we might have a collection of data where each data point contains the location where a temperature was taken, the temperature itself, and the time it was taken.

Once the data is in the proper format, we can generate the desired map, and animate the data on the map, by following these basic steps:

  1. Generate a base map using an existing database of the region on which we want to plot the data.

  2. Plot a dot on the base map for each data point using its (x,y) coordinates. Optionally, the color and size of the data points can be set using the (z1, z2, …) parameters.

  3. Use the time parameter, t, to animate the data thereby showing how it changes over time.

The following examples focus on maps in the United States. However, there are many databases available to generate base maps for other regions. The choice of what database to use depends on the geographical location of your data points and the availability of a corresponding map.

3 Example: Generating a Simple Animated Map

For the first example, we begin by generating some artificial data that we will plot on a map of the San Francisco Bay area. We decide to use artificial data for this example to emphasize the steps required to generate the map without the risk of getting lost in the details of a real data set. As we move to a real data set in the next example, though, we will see that the steps required to generate an interesting map are the same.

3.1 The Test Data

For the test data, we generate a set of data points with the format (lat, long, year, temp). The rnorm function is used to generate random values that represent the latitude (lat) and longitude (long) of each point. These values are normally distributed and centered around 37.8 degrees north latitude and 122.4 degrees west longitude. These specific latitude and longitude values were chosen since they are roughly in the middle of San Francisco (there is no trick to choosing these values – I simply looked them up). The year value is an integer uniformly distributed between 1990 and 2020 to represent the year the data was collected. It is generated using the runif() function. Finally, the temp value is a uniformly distributed value between -10 and 40. It is intended to simulate some value, such as the temperature, that was recorded at the corresponding location in the given year. We will use this value to show how a characteristic of each data point, such as the color, can be controlled by this additional data.

Here is the R code used to generate this artificial data set and insert it into a new data frame:

center_lat <- 37.8
center_long <- -122.4
width <- 0.2
num_points <- 500
test_data <- data.frame('lat'=rnorm(num_points, mean=center_lat, sd=width),
                       'long'=rnorm(num_points, mean=center_long, sd=width),
                       'year'=floor(runif(num_points, min=1990, max=2020)),
                       'temp'=runif(num_points, min=-10, max=40)
                       )

We use the head() function to show the first few lines of this data set. Note that the specific values you generate are likely to be different since we are using a random number generator to produce the values.

head(test_data)

3.2 The Necessary Libraries

We need to load the ggplot2 and gganimate packages into the R environment before we can continue to generate the maps. We also need the gifski package. This package is used by gganimate to convert a series of images into GIF animations to be displayed on the screen.

library(ggplot2)
library(gganimate)
library(gifski)

3.3 The Base Map

To use ggplot2 to create any sort of plot, you must provide two basic items: (1) the data to be plotted, and (2) what the authors of ggplot2 refer to as an “aesthetic mapping.” This mapping, which is specified using aes(), provides a powerful mechanism for tellingggplpot2 how the data should be mapped onto the plot. Additional functions are then added to the ggplot2 function call to layer more features onto the plot. We just touch on the basic capabilities of ggplot2 in this tutorial, but the examples should be sufficient to get you started making your own maps.

The first step before we can plot our data on to a map is to create a data frame that contains the base map. The maps package provides data about the boundaries of many different regions. The map_data() function from this package is used to access this data and return a data frame that can be plotted with ggplot2. In the following, we assign the variable which_state the name of the state we want to plot, california in this case. The use of the keyword county in the call to map_data() specifies that we want county-level boundaries for the indicated region.

which_state <- "california"
county_info <- map_data("county", region=which_state)

Here are the first few lines of the data obtained from the map_data function call:

head(county_info)

This data set describes the shape of each county in California by specifying the latitude and longitude of the corners of polygons that approximate the shapes. The group column indicates the county to which the corresponding set of polygon corners belongs. For example, group 1 corresponds to Alameda county, group 2 is Alpine county, and so on. The order column specifies the order in which the points should be connected to correctly draw each polygon. Finally, the region identifies the state and the subregion identifies the county.

With this data, we can use ggplot to plot an outline of the state that also contains the outlines of the counties, as follows:

base_map <- ggplot(data = county_info, mapping = aes(x = long, y = lat, group = group)) +
 geom_polygon(color = "black", fill = "white") +
  coord_quickmap() +
  theme_void() 

This code produces the following map:

base_map

It may not be entirely obvious what is happening here, so let’s dissect each line of code in this example.

The first line,

base_map <- ggplot(data = county_info, mapping = aes(x=long, y=lat, group=group)) +

calls the ggplot function telling it to use the aesthetic mapping defined by aes() to plot the county_info data frame that we previously created. The aes() mapping says to use the long column from the county_info data frame as the x coordinate in the plot and the lat column as the y coordinate. The points should be grouped by ggplot using the group column in county_info. This last statement can be confusing since group appears twice. The first group in group=group refers to a parameter defined in the aes() function called group. The second group is the name of the column in the data frame county_info. The semantics of the language make it clear to the R interpreter which group means what, but it can be confusing the first time you see it.

The next line,

 geom_polygon(color = "black", fill = "white") +

adds a layer to the map we are creating. Note the + sign at the end of the previous line. That + sign tells ggplot that the next line is to be added to the previous line as another layer in the plot. This geom_polygon function tells ggplot that we want polygons to be drawn with black lines and filled with white. These specifications can be changed to any colors you want.

The next two lines,

coord_quickmap() +
  theme_void()

are used to make the map look nice. coord_quickmap() causes ggplot to scale the axes in a way that makes the map look the way we expect it to look. Without it, the map will look strangely spread out in one axis. The whole map is given a white background using theme_void().

Finally, notice that base_map <- in the first line assigns the information about the plot to the variable base_map. Thus, by simply typing the variable name, base_map, the map is displayed.

3.4 Adding Data to the Base Map

Now that we have a base map we like, we can add our own data set to the map using geom_point(), as shown below. This function adds another layer to the base map by plotting each of the data points from our data set, test_data, as a scatter-plot using long and lat as the x and y coordinates, respectively. The data points are grouped using year. The importance off this grouping will become apparent later when we animate the map.

map_with_data <- base_map +
  geom_point(data = test_data, aes(x = long, y = lat, group=year))
map_with_data

We can see that all of our data points are tightly clustered around San Francisco in this map. In fact, they are too tightly clustered to see the individual values. Furthermore, there are large areas of the map shown that really are not interesting with this data set. To zoom in on the portion of the map around San Francisco, we find the range of positions of the data points using the minimum and maximum of the latitude and longitude values. We use coord_quickmap() to set the x and y limits on the map as follows:

min_long <- min(test_data$long)
max_long <- max(test_data$long)
min_lat <- min(test_data$lat)
max_lat <- max(test_data$lat)
map_with_data <- map_with_data +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat))
map_with_data

Notice that when we execute this code the system produces the warning message: Coordinate system already present. Adding new coordinate system, which will replace the existing one. This message occurs because we called the function coord_quickmap() a second time. The warning can be safely ignored.

3.5 Using Color and Size to Highlight Another Data Dimension

The maps we have generated so far simply plot a small black dot at the coordinates associated with each data point. Sometimes that is sufficient. If we have additional data associated with the data points, though, we can enhance the map by coloring the dots to correspond to these extra data values. In the test_data we generated, for instance, the value temp is associated with each data point. By assigning temp to the color parameter in geom_point(), ggplot automatically varies the color for each dot based on the range of temp values, as shown below:

map_with_data <- base_map +
  geom_point(data = test_data, aes(x = long, y = lat, color=temp, group=year)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat))

map_with_data

This example shows the default colors used by ggplot. Later, we will show how to change the color palette to obtain a different range of colors.

Color is useful to show differences in the data, but we also can change the size of the plotted points based on the additional data. The following example uses temp to determine the size of the points in addition to their color by assigning color=temp and size=temp:

map_with_data <- base_map +
  geom_point(data = test_data, aes(x = long, y = lat, color=temp, size=temp, group=year)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat))

map_with_data

Sometimes color is appropriate to reveal more about the data, while sometimes size, or a combination of size and color, is more useful. You must use your own judgment to decide what looks best and most readily conveys the information you want to highlight.

3.6 Animating the Map

Animation is a powerful visualization technique to show how the data changes over time. Adding animation to the map we have produced so far is relatively straight-forward using ggplot and gganimate.

In the following code, we show how to add more layers to the map_with_data object we created above to make the dots appear sequentially as determined by the year column in the test_data data frame. The function transition_time(year) specifies that we want to use year to move from one frame of the animation to the next. To add a title to the map, we use ggtitle where the frame_time parameter is the time index associated with the frame currently being displayed, which in this case is year; frame is the sequence number of the current frame; and nframes is the total number of frames in the animation. We compute the total number of years to display, and thus the total number of frames to generate, by finding the range of years in the data set. Finally, the function animate() renders the frames to display one-at-a-time.

map_with_animation <- map_with_data +
  transition_time(year) +
  ggtitle('Year: {frame_time}',
          subtitle = 'Frame {frame} of {nframes}')
num_years <- max(test_data$year) - min(test_data$year) + 1
animate(map_with_animation, nframes = num_years)

This sequence of steps produces an animation of the data we plotted on the base map of the San Francisco area, just as we expected. However, the dots that are plotted look like a bunch of insects buzzing around. It is hard to see any particular patterns in the animation or how things progress through time.

The problem is that we are seeing only a single year’s data displayed in each frame before immediately jumping to the next frame. This presentation may be precisely what you want for some types of data. Sometimes, however, you may want to see how the changes in the data accumulate over time. The shadow_mark() function does exactly that. It leaves the data from each previous frame on the current frame to show how the data accumulated to produce the current frame. This form of animation is often more interesting than the standard animation since it can be easier to observe patterns of growth and change. Here is an example:

map_with_shadow <- map_with_animation +
  shadow_mark()
animate(map_with_shadow, nframes = num_years)

A few input parameters can be passed to shadow_mark to adjust how it displays the previous data points, as described in its documentation. There also are additional shadow functions available that provide variations on how the previous data is displayed, including shadow_trail and shadow_wake. You are encouraged to try these variations and see what they do to this data set.

If you do not like the speed of the animation, you can change it using the fps (frames per second) parameter in animate. Here is the same map as above slowed down to two frames per second:

animate(map_with_shadow, nframes = num_years, fps = 2)

3.7 Save the Map to a File

Finally, you may want to save your animated map as a GIF to put on a web site, or to use in a presentation or some other document. The function anim_save saves the last animation that was produced into the named file, as follows:

anim_save("example1.gif")

Alternatively, you can specify the particular data object you want to save into the file, as shown below. Notice thatanim_save generates the animation for the given data object before it stores it into the file, Consequently, you can pass the same parameters to anim_save as you can to animate.

anim_save("example1.gif", map_with_shadow, nframes = num_years, fps = 2)

4 Example: Mapping Housing Growth

We have seen how to plot data values on to a map using the latitude and longitude coordinates for each data point, and to vary the size and color of the plotted points. We also learned how to animate the data based on a time value associated with each data point.

Now let’s try another example that uses some real data to pull together the concepts we learned above, and to introduce a few new ideas.

Before moving on with this example, I suggest restarting the R console you have running now. This will make sure that any values or variables leftover from the previous example will not accidentally interfere with the next example. In fact, it is good practice to restart R to ensure you have a clean environment before starting any new project. If you are running these examples using RStudio, select the Run pulldown menu near the top of the window and select Restart R and Clear Output. If you are running R in a console window, I suggest you quit and restart.

The first step in this example is to load the necessary libraries. The following code loads the same packages we used in the previous example:

library(ggplot2)
library(gganimate)
library(gifski)

4.1 The Example Data

The data set used in this example was downloaded from kaggle.com using this link. It is a public domain data set that contains information about house sales in the Seattle area from May, 2014 - May, 2015. Make sure you download the data file into the same directory you are using to run these examples. I saved the data into a file named house_data.csv, but you can name the file anything you want. If you do use a different file name, though, be sure to change the name between the quotes in the read.csv function in the following. This function reads the data from the file and creates the data frame house_data.

house_data <- read.csv("house_data.csv")

Here are the first few lines of the data set:

head(house_data)

For this example, we are interested primarily in the data that shows the location of each house, which is given by the lat and long columns of the data frame; the year it was built, which is shown in the column labeled yr_built; and the purchase price given in the price column.

4.2 Generating the Animated Map

As in the previous example, the first step in plotting this data on to a map is to generate an appropriate base map. This time we want a base map for Washington state:

which_state <- "washington"
county_info <- map_data("county", region=which_state)
base_map <- ggplot(data = county_info, mapping = aes(x = long, y = lat, group = group)) +
 geom_polygon(color = "black", fill = "white") +
  coord_quickmap() +
  theme_void()

This code produces the following base map:

base_map

We need to find the range of the latitudes and longitudes for all of the data points to allow us to focus on only that part of the map that has useful data. We also compute the number of years of data, which will be useful when we animate the map.

min_long <- min(house_data$long)
max_long <- max(house_data$long)
min_lat <- min(house_data$lat)
max_lat <- max(house_data$lat)
num_years <- max(house_data$yr_built) - min(house_data$yr_built) + 1

Next, we add several layers to the base map.

  1. Using geom_point, a dot is added for each point at the location determined by its latitude and longitude.
  2. The data points are grouped by the year each house was built.
  3. coord_quickmap rescales the map to focus on the area with useful data.
  4. The animation transition time is set to be the year built using transition_time.
  5. A title is added with ggtitle.
  6. The data points from previous frames are included with the current frame using shadow_mark.
  7. The map is is turned into a GIF animation using animate.

The code to add these layers and compute the animation is shown below:

map_with_data <- base_map +
  geom_point(data = house_data, aes(x = long, y = lat, group=yr_built)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat)) +
  transition_time(yr_built) +
  ggtitle('Year: {frame_time}',
          subtitle = 'Frame {frame} of {nframes}') +
  shadow_mark()
animate(map_with_data, nframes = num_years, fps = 2)

The final animated map shows how the number of houses expanded around the Seattle area during the range of years these houses were built. Of course, this is not a complete picture of the houses in Seattle since it includes only those houses that happen to have been sold during the period May, 2014 - May, 2015. The use of animation, though, does provide a nice visualization to show how newly built houses tended to spread out from the central core over time.

4.3 Using Color to Emphasize Data Values

While it is interesting to see the spread in the location of housing in the previous map, we can enhance the information displayed by adding color to each data point based on a characteristic of the data. For example, the color of each data point can be used to show its price. To do this, we add color=price to geom_point. We also add scale_color_gradient to map a color range to the range of prices. In this case, we specify colors on a scale from green to red so that higher prices correspond to “redder” colors.

map_with_data <- base_map +
  geom_point(data = house_data, aes(x = long, y = lat, group=yr_built, color=price)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat)) +
  transition_time(yr_built) +
  ggtitle('Year: {frame_time}',
          subtitle = 'Frame {frame} of {nframes}') +
  shadow_mark() +
  scale_color_gradient(low = "green", high = "red")
animate(map_with_data, nframes = num_years, fps = 2)

From this animation, we can see that the red dots, which correspond to the more expensive houses, tend to cluster closer to the waterfront in the upper left corner of the map. There are so many lower price houses, though, that the green dots tend to overwhelm the red dots. We can address this issue of masking potentially interesting data by computing some useful statistics for groups of houses instead of looking at each house individually. This approach is discussed in the next section.

First, however, it is interesting to point out that there are other methods for adding color to the data points. For instance, you can specify your own color scale using scale_color_gradient, as shown in the following examples.

This variation of the previous map uses scale_color_gradientn to define a rainbow of colors:

map_with_data <- base_map +
  geom_point(data = house_data, aes(x = long, y = lat, group=yr_built, color=price)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat)) +
  transition_time(yr_built) +
  ggtitle('Year: {frame_time}',
          subtitle = 'Frame {frame} of {nframes}') +
  shadow_mark() +
  scale_color_gradientn(colors = rainbow(7))
animate(map_with_data, nframes = num_years, fps = 2)

Here is another color scheme that uses palette:

map_with_data <- base_map +
  geom_point(data = house_data, aes(x = long, y = lat, group=yr_built, color=price)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat)) +
  transition_time(yr_built) +
  ggtitle('Year: {frame_time}',
          subtitle = 'Frame {frame} of {nframes}') +
  shadow_mark() +
  scale_color_gradientn(colors = palette())
animate(map_with_data, nframes = num_years, fps = 2)

The R environment provides a wide range of options for controlling the color, shapes, fill styles, and shading for plots. I encourage you to experiment with various combinations to determine what works best for your data.

5 Plotting Computed Statistics on the Map

Plotting individual data points on a map often can show interesting trends over time. For instance, in the previous example, we saw how housing spread out over the Seattle area each year. However, too many data points can fill up the map with too much detail and obscure larger trends. We saw this in the previous example when the relatively small number of expensive houses, which were shown in red, were overwritten by the larger number of less expensive houses, shown in green.

A potential solution to this problem is to partition the data into appropriate subsets and then compute aggregate statistics for each subset. This approach reduces each subset of data points into a single value that is appropriately representative of the subset. Then these subset values are plotted on the map instead of all of the individual data points. This creates a map on which it may be easier to see larger trends.

We continue with the house price example from the previous section to demonstrate this partitioning idea. We note that the price of a house is often heavily dependent on its location. Fortunately, we see that our data set includes the zip code for each property. Since zip codes identify geographically contiguous areas, we can use the zip code to partition the house data into smaller subsets. We then can compute an appropriate statistic that represents the central tendency of the prices of that subset of houses. Reasonable statistics include the mean or the median, for instance. For this example, we choose the median value of the prices since the median is commonly used to summarize real estate prices.

Before proceeding further, it would again be useful to clear and restart your R environment. Also, execute the following lines of code to load the libraries needed for this example, and to read the data file to create the house_data data frame.

library(ggplot2)
library(gganimate)
library(gifski)
library(dplyr)

house_data <- read.csv("house_data.csv")

5.1 Partitioning the Data

You probably noticed that we loaded a new package into the R environment, dplyr, in addition to those we used previously. This package provides a useful set of function calls and other operations to easily extract only the desired rows and columns from a data frame to thereby partition the data into various subsets. It also provides useful mechanisms for computing new values for the subsets and generating new columns to be inserted into data frames. The following example only scratches the surface of what the dplyr package provides. I encourage you to dig into it further when you need to partition data and compute new values from across the partitions.

We will introduce the basics of the dplyr functionality by working through a specific example using the same house price data set as before. The general idea is to partition the house data into subsets where every data point in a subset has the same zip code. We then compute some useful statistics for each subset. We plot the desired summary statistics on the map instead of plotting each individual data point.

More specifically, we want the code to perform the following steps:

  1. Partition the data into subsets where each subset has the same zip code.
  2. Compute the median price across all of the houses in each zip code.
  3. Compute the latitude and longitude for a point roughly in the center of each zip code. Since we do not know that actual boundaries of the zip code, we compute the mean of all the latitudes and the mean of all the longitudes for all of the houses in the zip code. This should give us a reasonable point around which the houses in the subset cluster.
  4. Compute the median year the houses in the zip code were built.
  5. Compute the median price per square foot of each house. This value is often used in real estate to compare the cost of houses normalized to their size.
  6. Plot on the map the median price at the coordinates for the approximate center of each zip code region. (We will later use the median price per square foot to modify the plotted points.)
  7. Animate the map using the median year built as the transition time.

Before we proceed with these steps, we need to add the price per square foot for each house to the house_data data frame. R provides an easy way to create a new column in the data frame for this newly computed value using the $ notation, as follows:

house_data$price_per_sqft <- house_data$price / house_data$sqft_living15

The above list of steps may appear to be rather complex. However, the dplyr package makes it reasonably straight-forward to extract all of this information, as shown here:

data_by_zipcode <- house_data %>% 
  group_by(zipcode) %>% 
  summarize(
    count = n(),
    med_price = median(price),
    mean_lat = mean(lat),
    mean_long = mean(long),
    med_yr_built = median(yr_built),
    med_price_per_sqft = median(price_per_sqft)
  )

This code needs a bit of explanation. One of the key things to notice is the use of the %>% notation. This sequence of characters sets up a pipeline between separate computational steps in which the output of the left-hand computation becomes the input to the right-hand computation. The data between the steps is stored internally by dplyr so that we do not need to introduce any temporary variables to store the intermediate data between steps. While this code might look a bit complicated at first, the pipeline operation does make it simpler to write.

Using this pipeline operation, the contents of the data frame house_data are passed to the group_by function. This function reorganizes the data so that all of the rows with the same zip code are put into separate subsets. A data frame with these subsets is passed to the summarize function.

The summarize function computes the functions given between the parentheses for each subset of zip codes. It creates a new data frame, which in this example gets assigned to the variable data_by_zipcode. This new data frame contains one row for each zip code, as shown below:

head(data_by_zipcode)

The new variables that we created in the summarize function become the column headings in the data frame data_by_zipcode. The values in each row are the results returned by the corresponding function calls. In this example, we computed the median price for each zip code, the mean latitude and longitude for all of the houses in each zip code, the median year built, and the median price per square foot. The count column shows the total number of houses contained in that zip code.

Any functions you use within summarize must return a single value. You can write your own functions to be called from within summarize as long as they return a single value.

5.2 Plotting the Computed Statistics

Now that we have computed some useful summary statistics for each zip code, we can plot that information on a map using the same technique we used in the previous example.

First, we find the range of latitude and longitude values in the data set to focus only on the area of the map that has data. We also will use the total number of unique zip codes as the number of frames to generate in the animation. This value is simply the number of rows in data_by_zipcode.

min_long <- min(house_data$long)
max_long <- max(house_data$long)
min_lat <- min(house_data$lat)
max_lat <- max(house_data$lat)

num_frames <- nrow(data_by_zipcode)

As we did in the previous example, we can now generate the base map and add layers to it to include the data, the title, and so on. We animate the map using the median year built in each zip code as the transition parameter. Notice that the median price is used to determine the color of each data point and the median price per square foot determines the size.

which_state <- "washington"

county_info <- map_data("county", region=which_state)  # County boundaries
map_with_data <- ggplot(data = county_info, mapping = aes(x = long, y = lat, group = group)) +
  geom_polygon(color = "black", fill = "white") +
  theme_void() +
  geom_point(data = data_by_zipcode, aes(x = mean_long, y = mean_lat, color=med_price,
                                         size=med_price_per_sqft, group=zipcode)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat)) +
  transition_time(med_yr_built) +
  ggtitle('Year: {floor(frame_time)}',
          subtitle = 'Frame {frame} of {nframes}') +
  shadow_mark() +
  scale_color_gradient(low = "green", high = "red")

animate(map_with_data, nframes = num_frames, fps = 3)

On this map, we see that there is one zip code in red that is relatively large. This is the zip code with the highest median price among all of the zip codes, as indicated by the red color. The houses in this zip code also have a high median price per square foot, as indicated by the large size of the dot compared to the other dots.

As a variation of this map, we can change it slightly so that the size of the data points is controlled by the number of houses in the zip code by changing size=count in geom_point().

which_state <- "washington"

county_info <- map_data("county", region=which_state)  # County boundaries
map_with_data <- ggplot(data = county_info, mapping = aes(x = long, y = lat, group = group)) +
  geom_polygon(color = "black", fill = "white") +
  theme_void() +
  geom_point(data = data_by_zipcode, aes(x = mean_long, y = mean_lat, color=med_price, 
                                         size=count, group=zipcode)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat)) +
  transition_time(med_yr_built) +
  ggtitle('Year: {floor(frame_time)}',
          subtitle = 'Frame {frame} of {nframes}') +
  shadow_mark() +
  scale_color_gradient(low = "green", high = "red")

animate(map_with_data, nframes = num_frames, fps = 4)

In this new map, the red dot representing the most expensive zip code is quite small since there are relatively few houses in this zip code.

Here is one more variation of this map that uses the zip code as the transition parameter for the animation.

which_state <- "washington"

county_info <- map_data("county", region=which_state)  # County boundaries
map_with_data <- ggplot(data = county_info, mapping = aes(x = long, y = lat, group = group)) +
  geom_polygon(color = "black", fill = "white") +
  theme_void() +
  geom_point(data = data_by_zipcode, aes(x = mean_long, y = mean_lat, color=med_price,
                                         size=med_price_per_sqft, group=zipcode)) +
  coord_quickmap(xlim = c(min_long, max_long),  ylim = c(min_lat, max_lat)) +
  transition_time(zipcode) +
  ggtitle('Zipcode: {floor(frame_time)}',
          subtitle = 'Frame {frame} of {nframes}') +
  shadow_mark() +
  scale_color_gradient(low = "green", high = "red")

animate(map_with_data, nframes = num_frames, fps = 3)

How you want to display your data depends entirely on what information you are trying to convey about your data set to your audience. I hope these examples trigger some interesting ideas as you decide how to plot your own data.

6 Generating Videos

The examples so far have all generated repeating GIFs to show the animation. It also is possible to use another package called av to create videos of the animated maps. You must load the av library:

library(av)

You then can call the animate function specifying the renderer from the av package to produce the video:

animate(map_with_data, nframes = num_frames, fps = 4, renderer = av_renderer())

You can save the video to a file as follows:

anim_save("example2.mpg")

7 Final Thoughts

The R programming environment includes many packages that simplify the process of plotting geographical data points onto a map and additionally animating the data to appear on the map sequentially. There are many parameters you can change in the functions described above to change the appearance of the maps you generate. This tutorial has just scratched the surface of what is possible. I encourage you dig into the documentation further, search around for more examples of interesting maps, and simply experiment to see what you can do. Maps are a powerful tool – I hope you have some fun with them.

Notes


  1. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You are free to:
    Share – copy and redistribute the material in any medium or format.
    Adapt – remix, transform, and build upon the material.
    Under the following terms:
    Attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
    NonCommercial – You may not use the material for commercial purposes.
    ShareAlike – If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
    No additional restrictions – You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
    The licensor cannot revoke these freedoms as long as you follow the license terms.↩︎