Maps in R: choropleth maps

This is the third article of the Maps in R series. After having shown how to draw a map without placing data on it and how to plot point data on a map, in this installment the creation of a choropleth map will be presented.

A choropleth map is a thematic map featuring regions colored or shaded according to the value assumed by the variable of interest in that particular region.

Packages used

We make use of three packages (plus their dependencies) to produce the maps in this post.

  • maptools: a set of tools for reading and handling spatial objects. In particular, it will be used to read .shp files, a common format used by geographic information systems software.
  • ggplot2: one of the powerful graphics engines available to R users
  • ggmap: a package for spatial visualization using popular on-line mapping systems, such as GoogleMaps or OpenStreetMap.

So, let's load them into R.

Data used

In order to produce the maps displayed later in this post, we used data publicly available from Eurostat.

The polygons for drawing the administrative boundaries were obtained from this link. In particular, the NUTS 2010 shapefile in the 1:60 million scale was downloaded and used. The other available scales would allow the drawing of better defined maps, but at a computational cost. The zipped file has to be extracted in a folder of choice for using it later.

Note: for an introduction on NUTS (Nomenclature of territorial units for statistics) classification, look at this Introduction on Eurostat website.

The statistic that will be drawn in the map is the the public expenditure on education as a percentage of the Gross Domestic Product (year 2009), and is one of the many indicators available for download at Eurostat's statistics portal.

In order to make the following code fully reproducible, I'm uploading below the .csv file I obtained after performing the various selections on the Eurostat portal.

csv file: educ_thexp_1_Data

With the following commands we read both the shapefile and the csv in R.

Data preparation

The eurMap object could potentially be used as is to produce a map. In fact, a command as simple as plot(eurMap) would draw the map of Europe at the NUTS-3 level.

However, since we are going to make use of the great versatility of the ggplot2 graphics facility, we need to transform eurMap to a data frame, as that is the only type of object accepted by the ggplot() function. The ggplot2 package has the fortify() function that takes care of said conversion.

After that, we can add the data on education expenditure to the newly created data frame. Note that, since administrative borders will be plotted as paths, it's important to have them ordered as they have to be drawn.

Finally, to avoid the plotting of European territories far away from mainland Europe, we put some coordinates limits to the data, using the geocode() function we introduced in previous posts.

Mapping with ggplot2

While not specifically designed for the purpose, the ggplot2 package is well suited for the display of maps. Let's first create an empty map (just showing borders,) by adding a geom_path layer to a ggplot() call.

The map in the following image is obtained by typing m1 in the R console.

Then, by adding a geom_polygon layer, we color the country area according to the education expenditure value (stored in the Value column.)

And here is the result of typing m2 in the R console.

The country borders are not visible anymore because the filled polygons have been stacked on top of them. Changing the order of layers will produce a more readable map.

Overlay on GoogleMaps

Polygon and path layers can be overlaid on a map obtained from GoogleMaps (or OpenStreetMaps) using the ggmap package introduced in the previous posts.

It must be noted that the overlaying will work only if the boundary files have the same coordinate system and the same projection of the underlying map. In this case, the combination of Eurostats shapefiles and GoogleMaps works fine.

As a final touch, let's put text on the map. Using the summaryBy() funcion of the doBy package, we calculate average values for multiple variables. The solution of using the average coordinates of all boundary points as the position where to place the text is a lazy one, as appropriate ways to calculate the center of mass of a polygon should be used instead.

Where to find Shapefiles

If you are looking for shapefiles, your best choice is probably googling for them. However, here are a few places where you can find them.

  • The Eurostat page on Administrative and statistical units mentioned at the beginning of this post.
  • ISTAT's cartography page for Italian administrative units. Also contains geocoded data points of ports, police stations and many other things.
  • The Global Administrative Areas website has worldwide boundaries at various levels in several formats.
  • CShapes features historical boundaries going back to 1946, thus useful for displaying older data. They can be downloaded as shapefiles, but chsapes also comes as an R package.

View (and download) the full code:

11
Shares

About Max Marchi

Statistician on infectious disease data for the Regione Emilia-Romagna by day. Sport data analyst for Baseball Prospectus by night.
This entry was posted in R and tagged , , . Bookmark the permalink.

9 Responses to Maps in R: choropleth maps

  1. Stefan says:

    I have a problew viewing the code snippets... Some of the characters are missing, and others are represented with the "html" versions of them...
    Is it just me?

  2. Constantine says:

    I too have problems seeing the code.
    Max, is there any chance we can have the code? Many thanks!

  3. Milano R net says:

    Dear Stefan and Constantine,
    thank you for your comment.

    I am sorry with Max and with you for the trouble.

    As soon as possible, I'll fix code chunks in the text. By the way, I just fixed the full code at the end of the article, you can download and execute it.

  4. sfleite says:

    First place, thanks so much for your post. You helped me. In my case, I have problems about polygons and I tried to improve the view to my data.
    My scrip is:

    library(maptools)
    gpclibPermit()
    library(ggplot2)
    library(rgdal)
    library(rgeos)
    library(ggmap)

    # read administrative boundaries (change folder appropriately)
    brMap <- readShapePoly("BRASIL.shp")
    brMap

    # read downloaded data (change folder appropriately)
    brRen <- read.csv("Renda.csv", sep = ";", quote = "\"", dec=".", stringsAsFactors = FALSE)
    is.numeric(brRen$rendapc)
    brRen$rendapc <- as.numeric(as.character(brRen$rendapc)) # format as numeric

    #convert shp to UTF8
    library(descr)
    brMap$ESTADO<-toUTF8(brMap$ESTADO, "IBM850")

    # convert shp data to data frame for ggplot
    as.data.frame(brMap) # defining region
    brMap = gBuffer(brMap, width=0, byid=TRUE) #correct polygons problem - TopologyException
    brMapDf <- fortify(brMap, region="UF")

    # merge map and data
    brRenMapDf <- merge(brMapDf, brRen, by.x="id", by.y="Iden")
    brRenMapDf <- brRenMapDf[order(brRenMapDf$order),]
    brRenMapDf

    # limit data to main Europe
    brazil.limits <- geocode(c("Monte Caburaí", "Barra do Chuí", "Serra do Divisor", "Ilhas Martin Vaz"))
    brRenMapDf min(brazil.limits$lon) & long min(brazil.limits$lat) & lat < max(brazil.limits$lat))

    # ggplot mapping
    # data layer
    m0 <- ggplot(data=brRenMapDf)
    # empty map (only borders)
    m1 <- m0 + geom_path(aes(x=long, y=lat, group=group), color='gray') + coord_equal()
    m1

    # fill with education expenditure data
    m2 <- m1 + geom_polygon(aes(x=long, y=lat, group=group, fill=rendapc))
    m2

    # inverse order (to have visible borders)
    m0 <- ggplot(data=brRenMapDf)
    m1 <- m0 + geom_polygon(aes(x=long, y=lat, group=group, fill=rendapc)) + coord_equal()
    m2 <- m1 + geom_path(aes(x=long, y=lat, group=group), color='black')
    m2

    # over a GoogleMap (not working if not correctly projected)
    map <- get_map(location = 'Brazil', zoom=4)
    m0 <- ggmap(map)
    m1 <- m0 + geom_polygon(aes(x=long, y=lat, group=group, fill=rendapc), data=brRenMapDf, alpha=.9)
    m2 <- m1 + geom_path(aes(x=long, y=lat, group=group), data=brRenMapDf, color='black')
    m2

    # add text
    library(doBy)
    txtVal <- summaryBy(long + lat + rendapc ~ id, data=brRenMapDf, FUN=mean, keep.names=T)
    m3 <- m2 + geom_text(aes(x=long, y=lat, label=rendapc), data=txtVal, col="yellow", hjust=0.5, vjust=0.5, cex=3) + scale_fill_gradientn(colours=.colors(6)) #change colors break
    m3

  5. sfleite says:

    Sorry: problem in my script:

    m3 <- m2 + geom_text(aes(x=long, y=lat, label=rendapc), data=txtVal, col="yellow", hjust=0.5, vjust=0.5, cex=3) + scale_fill_gradientn(colours=terrain.colors(6))

  6. maj says:

    First of all: Thanks for creating this tutorial.
    Secondly, I'd like to ask under what circumstances whitespaces are used, as in "eurMap eurEdueurEdu$Value" (second code snippet).
    Thirdly and most importantly, I downloaded the r source code, but it only works until the following command:

    > eurEduMapDf <- merge(eurMapDf, eurEdu, by.x="id", by.y="GEO")
    Error in merge(eurMapDf, eurEdu, by.x = "id", by.y = "GEO") :
    object 'eurMapDf' not found

    RStudio doesn't seem to find the eurMapDf object, am I doing something wrong?

    Thanks in advance.
    Markus

  7. ari says:

    I think that you might be interested in the choroplethr package that I created - it simplifies the creation of choropleths for a certain handful of maps (currently US States, US Counties and US ZIP codes). It also provides native support for mapping data from the US Census.

    Here are some links:
    https://github.com/trulia/choroplethr
    https://github.com/trulia/choroplethr/wiki
    http://blog.revolutionanalytics.com/2014/01/easy-data-maps-with-r-the-choroplethr-package-.html

  8. Immo says:

    Hi Max,

    thanks for your post. However, I tried to reproduce your results using your code. Seems, there is a problem with converting the shapefile into a dataframe with the fortify function.
    Here is the output:
    > # convert shp data to data frame for ggplot
    > eurMapDf <- fortify(eurMap, region='NUTS_ID')
    Error: isTRUE(gpclibPermitStatus()) ist nicht TRUE

    I actually ran your code with only the working directories respecified...

    Hope, there is an easy fix to that issue!
    Thank you already for your help
    Best
    Jörg

    • Immo says:

      solved it. I removed the packages maptools and rgeos and then installed them again (with dependencies=TRUE option) - first maptools, then rgeos (not sure if that is important, but just to mention it).
      then I used both by
      library(maptools)
      library(rgeos)

      and it worked out fine. (tried to install gpclib under R 3.0.2 but that did not work). Obviously it suffices to have rgeos loaded.

      Best

Leave a Reply