-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy pathyale_markdown.Rmd
97 lines (75 loc) · 3.19 KB
/
yale_markdown.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
title: "Gross Domestic Product and Carbon Dioxide Emissions"
editor_options:
chunk_output_type: console
date: '2018-03-09'
output:
html_document:
toc: yes
toc_depth: 3
toc_float: true
editor_options:
chunk_output_type: console
params:
year: 2000
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(repos="http://cran.rstudio.com/")
pkgs <- c("ggplot2", "googlesheets", "dplyr", "tidyr", "readr", "plotly", "DT")
for(i in pkgs){
if(!i %in% installed.packages()){
install.packages(i)
}
}
x<-lapply(pkgs, library, character.only = TRUE)
#install.packages("lubridate")
#library(lubridate)
```
# Purpose
The purpose of this document is to serve as motivating example (i.e. R, Markdown, and Open data are cool!), but will also serve to structure the rest of this workshop in that we will see how to work with and visualize data in R, combine code and documentation with R Markdown, and introduce Git and GitHub. Additionally, the tools we will be using come from [The Tidyverse](https://tidyverse.org) which is an opinionated (but effective) way to think about organizing and analyzing data in R.
# The Example
I am new to Industrial Ecology so my example may be a bit tortured, but hopefully it is relevant enough. We are going to be using data from the [Gapminder Project](https://www.gapminder.org/data/). In particular we will look at per country Carbon Dioxide emissions and Gross Domestic Product.
## Get Data
The data we need is in two separate Gapminder datasets so let's load them.
First we can get the dataset that we have saved as a `.csv` in this repository.
```{r read_csv, message=FALSE, echo = FALSE}
gap_gdp <- read_csv("gapminder_gdp.csv")
```
Second, let's the get CO~2~ emissions data straight from the Google Sheets in which it is stored.
```{r gs_read, message=FALSE}
gap_co2_url <- "https://docs.google.com/spreadsheets/d/1qJR55SL3lHcx1d3hMHJ1dMYqAog9sXofaPWjPkpO4QQ/pub"
gap_co2 <- gs_read(gs_url(gap_co2_url))
```
## Manipulate Data
Let's tidy up these two datasets and join them together
```{r tidy}
gap_gdp_tidy <- gap_gdp %>%
gather("year","gdp",-1) %>%
select(country = `GDP (constant 2000 US$)`, everything())
gap_co2_tidy <- gap_co2 %>%
gather("year","co2_emiss", -1) %>%
select(country = `CO2 emission total`, everything())
gap_data <- gap_gdp_tidy %>%
left_join(gap_co2_tidy, by = c("country", "year")) %>%
filter(complete.cases(gdp,co2_emiss))
datatable(gap_data)
```
## Visualize Data
Next step is to visualize the data. Let's look at the association between average yearly emissions and average yearly gross domestic product since `r params$year` for the 10 countries with the highest GDP.
```{r plot_it, warning=FALSE, message=FALSE}
emiss_gdp_gg <- gap_data %>%
filter(year >= params$year) %>%
group_by(country) %>%
summarize(mean_co2 = round(mean(co2_emiss),1),
mean_gdp = round(mean(gdp),1)) %>%
ungroup() %>%
ggplot(aes(x=mean_gdp,y=mean_co2)) +
geom_point(aes(group = country),size = 2) +
geom_smooth(method = "lm") +
theme_classic() +
labs(title = paste("GDP and C02 since", params$year),
x = "Mean Gross Domestic Product",
y = "Mean Carbon Dioxide Emissions")
ggplotly(emiss_gdp_gg)
```