Visualizing economic impact of global warming – Part I: Programming with R

 

What we will end up building

 

This data visualization tutorial covers data cleaning, data analysis, designing a chart and coding with R and D3.js. It will be divided in three parts. Part one uses the programming language R to clean and analyze the dataset. Part two uses Adobe Illustrator to fix the R chart made at the end of part one. Part three turns it into an interactive and animated chart using the Javascript data visualization library D3.js.

In the end you will get an interactive lollipop chart that allows switching between two datasets: the average change in GDP in countries based on a global temperature increase of 1.5 degrees Celsius and 2 degrees Celsius.

This tutorial assumes you have basic knowledge of programming.

Tools used in this tutorial series:

⦁ R and R Studio,
⦁ Adobe Illustrator (or any other SVG editing software),
⦁ Data-Driven Documents (D3.js).

Part one covers:

⦁ data cleaning in R,
⦁ data analysis in R,
⦁ creating a lollipop chart in R.

Part two covers:

⦁ improving R’s native chart in Adobe Illustrator, this design will be used as a guide for the D3 version in part three.

Part three covers:

⦁ using HTML, CSS3, Javascript and D3.js to design and build an interactive lollipop chart based on the design made in part two.

 

_______________________________________________________________________________________

Part one: programming in R

Cleaning data, analyzing data and creating a lollipop chart in R.

 

Installing R and R Studio

To get started you will need to install the R language on your machine and install the R Studio editor:

We will use R because it allows for quick data manipulation and quick data visualization. It’s great for exploring and manipulating your data. We will also use the ggplot2 library for R. It’s fantastic for data visualization: ggplot2 can create quick graphs with relative low amount of code compared to D3.js.

You can learn more about the R Project and the ggplot2 data visualization library if you’re curious. I recommend it – it’s a short read.

 

Looking for a dataset

There are numerous online sources available for finding datasets. I found mine from Our World in Data. When searching for “Economic Impact” you will find two entries: Economic Impacts of 1.5°C and Economic Impacts of 2°C. Download these two datasets. You can download them by going to to DATA > download CSV. After this you should have two datasets named econimpact15c.csv and econimpact2.csv.

There are multiple columns in these datasets. Countries, country codes, multiple percentile ratings and a median. For the sake of this tutorial we will only visualize the median value. Read up on what the median is if you need a refresher.

 

Getting started in R Studio

Let’s start using R Studio. I recommend creating a folder somewhere on your machine where you place your datasets and your R file. In R Studio, type getwd() and hit CTRL + Enter. R Studio’s console will now tell you where your current directory is. Set it to your folder using setwd(). Here’s an example of setting your folder if it’s on your desktop:

setwd('C:/Users/Kenny/desktop/LollipopChartFolder’)

Next we will install two libraries. Tidyverse and ggthemes. The Tidyverse library includes dplry and ggplot2: these make our lives a lot easier when manipulating and visualizing data. More in this later. The ggthemes library will make our end result look a bit nicer as it allows setting pre-made themes for your graphs. Enter the following text and don’t forget to hit CTRL + Enter after each entry to install the theme:

install.package('tidyverse')
library(tidyverse)

install.package('ggthemes')
library(ggthemes)

Next up we will load our datasets. We will set these two datasets to two variables. In R, a variable is set using <- after naming a variable, followed by what you want to do. The variables are named onepointfivedegrees and twodegrees, then they will read our downloaded datasets. The datasets are both an CSV file – so we will use read_csv to load them.

onepointfivedegrees <- read_csv("econimpact15c.csv")
twodegrees <- read_csv("econimpact2c.csv")

 

Removing unnecessary information

As previously mentioned, for the sake of this tutorial only the median value will be visualized. The other percentile ratings won’t be used. We can remove the values we don’t want using dplyr’s select() function.

But first, create a new variable called df15c. This will be our new variable that stores the values we want. Using native R’s subset will create a, you probably guessed it, a new dataframe of selected values.

To remove the values we don’t want, we use a hyphen sign infront of the c (see the example below). The hyphen or minus sign removes values. By removing these values, we will end up with the country names and their respective median values. The letter c after the select function is another function that combines values into a vector or list.

df15c <- subset(onepointfivedegrees, select=-c(Year,
  Code,
  `83rd percentile (%)`,
  `17th percentile (%)`,
  `2.5th percentile (%)`,
  `97.5th percentile (%)`))

I use the same code for the 2°C data and store it in a variable called df2c.

 

Renaming column names using dplyr

I’m not happy with the current column names. Dplyr has a nice function called rename() that, as you probably guessed, renames columns. Cool! Rename first asks you to enter a new name, followed by the current value. I renamed Entity to country and Median to value. Simple and effective. View the example below.

But wait, what is this %>% thing? This is the infix operator. It’s not a part of base R, but instead used with dplyr. It works like a pipe or chaining method. The infix operator passes the left hand side of the operator to the first argument of the right hand side of the operator. It’s only used for rename now, but you can imagine its usefulness when chaining multiple dplyr functions. We will do this in the next part.

df15c <- df15c %>%
rename(country = Entity, value = `Median (%)`)

I use the same code for the 2°C data.

 

Sorting and merging data

Next I want to grab the top 10 and bottom 10 of the most affected countries. I will set a new variable to the top and another one to the bottom. Then we combine these two dataframes together using the rbind() function.

sortData15cTop10 <- df15c %>%
     top_n(10)

sortData15cBottom10 <- df15c %>%
     top_n(-10)

sortData15cMerged <- rbind(sortData15cTop10, sortData15cBottom10)

I use the same code for the 2°C data.

 

Creating a bar chart

There are tons of resources online that show how certain graphs are created. I will post the code I used below. I recommend going through chapter 3 of the online book R for Data Science, focused on data visualization.

barplot <- ggplot(sortData15cMerged, mapping = aes(x=reorder(country,value), y=value)) +
geom_bar(stat = "identity") +
coord_flip() +
theme_tufte()

barplot (hit CTRL + Enter to draw the chart)

Playing around with the themes might be fun. I’ve set it to theme_tufte() because I appreciate the minimalism. Don’t forget to check out more themes.

 

Creating a lollipop chart

I’m not a big fan of showing 20 bar charts. The amount of bars and the thickness is distracting from its actual goal: showing the value at the end of the bar. An alternative is using a dot plot. This brings viewers’ attention to the actual value rather than distracting the viewer with a high amount of unnecessary junk.

This brings me to the lollipchart chart. A lollipop chart is simply a dot plot with very thin bars leading up to the value. It’s the same concept as a bar chart, but with less clutter. I chose this because by adding a relatively thick vertical center bar in the middle — on the zero value — viewers can easily see diverging changes. In this case, positive and negative changes in GPD per capita based on global temperature increase.

I also played around with the settings, create my own little custom theme.

lollipop <- ggplot(sortData15cMerged, 
     mapping = aes(x=reorder(country,value), 
                y=value,
                col=ifelse(value >= 0, 'Positive', 'Negative'))) +
     geom_segment(aes(x = reorder(country,value), 
                   y = 0, 
                   xend = country, 
                   yend = value), 
                   color = "darkgrey") +
     geom_point(size=2) +
     coord_flip() +
     theme(
          panel.background = element_rect(fill = "white",
          colour = "white",
          size = 0.5, linetype = "solid"),
          panel.grid.major.y = element_line(size = 0.5, linetype = 'solid', colour = "white"), 
          panel.grid.major = element_line(size = 0.25, linetype = 'dashed', colour = "gainsboro"),
          panel.grid.minor = element_line(size = 0.25, linetype = 'dashed',
          colour = "gainsboro")
     )

lollipop (hit CTRL + Enter to draw the chart)

Exporting a PDF

Now that we’ve plotted our graph and the data is nice and tidy, we can export an PDF to the folder. PDF files can be opened in Adobe Illustrator, where we will mess around with the design. To export the chart to PDF, hit ‘export’ located above the plotted chart, and click ‘Save as PDF…’.

If you don’t have Adobe Illustrator, you can use other free SVG editing software. Or you can jump straight to the Javascript in part three.

 

You should have something that looks like this

 

Important last step

Now that we’ve plotted our graphs and the data is nice and tidy, we can export those datasets to the folder with the R file. This will be used in part three of the tutorial. Use write.csv to export the dataset as a CSV file.

write.csv(sortData15cMerged, 'C:/path/to/folder/dataset.csv')
write.csv(sortData2cMerged, 'C:/path/to/folder//dataset2c.csv')

Download the full R file here.