-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathch7.Rmd
232 lines (170 loc) · 6.83 KB
/
ch7.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
title: "活用資料框"
author: "郭耀仁"
date: "`r Sys.Date()`"
output: slidy_presentation
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, results = 'hide', warning = FALSE, fig.show = 'hide')
```
## 建立一個 Data Frame 來練習
- 先建立向量,再以 `data.frame()` 函數將這些向量集合成為一個資料框
```{r}
# 角色設定的向量
name <- c("Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook")
gender <- c("Male", "Male", "Female", "Male", "Male", "Male", "Female", "Male", "Male")
occupation <- c("Captain", "Swordsman", "Navigator", "Sniper", "Cook", "Doctor", "Archaeologist", "Shipwright", "Musician")
bounty <- c(500000000, 320000000, 66000000, 200000000, 177000000, 100, 130000000, 94000000, 83000000)
age <- c(19, 21, 20, 19, 21, 17, 30, 36, 90)
birthday <- c("05-05", "11-11", "07-03", "04-01", "03-02", "12-24", "02-06", "03-09", "04-03")
height <- c(174, 181, 170, 176, 180, 90, 188, 240, 277)
# 建立草帽海賊團角色設定的資料框
straw_hat_df <- data.frame(name, gender, occupation, bounty, age, birthday, height)
```
## 建立一個 Data Frame 來練習(2)
- 使用我們提過的函數觀察這個資料框
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
dim(straw_hat_df)
str(straw_hat_df)
summary(straw_hat_df)
View(straw_hat_df)
```
## 建立新的變數
- 用系統日期去推算出生年份
- 新增出生年份(birth_year)這個變數到原本的資料框
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
this_year <- as.numeric(format(Sys.Date(), '%Y'))
birth_year <- this_year - straw_hat_df$age
straw_hat_df$birth_year <- birth_year
View(straw_hat_df)
```
## 建立新的變數(2)
- 使用 `cbind()` 函數新增一個 `favorite_food` 的欄位
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
favorite_food <- c("Meat", "Food matches wine", "Orange", "Fish", "Food matches black tea", "Sweets", "Food matches coffee", "Food matches coke", "Milk")
straw_hat_df <- cbind(straw_hat_df, favorite_food)
View(straw_hat_df)
```
## 建立新的變數(3)
- 善用 `ifelse()` 函數
- 針對 `age` 變數新增一個 `age_category`
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
straw_hat_df$age_category <- ifelse(straw_hat_df$age < 20, "Below 20",
ifelse(straw_hat_df$age > 30, "Above 30", "Between 20 and 30")
)
View(straw_hat_df)
```
## 建立新的變數(4)
- 使用 `cut()` 函數
- 針對轉換連續型為類別型變數
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
straw_hat_df$age_category <- cut(straw_hat_df$age, breaks = c(0, 19, 29, Inf), labels = c("Below 20", "Between 20 and 30", "Above 30"))
View(straw_hat_df)
```
## 刪除變數
- 將變數指定為 `NULL` 來進行刪除
- 刪除 `occupation`
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
straw_hat_df$occupation <- NULL
View(straw_hat_df)
```
## 新增觀測值
- 使用 `rbind()` 函數新增一筆 `jinbe` 的觀測值
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
jinbe <- c("Jinbe", "Male", "Shichibukai", 400000000, 46, "04-02", 301)
straw_hat_df <- rbind(straw_hat_df, jinbe)
```
## 刪除觀測值
- 使用中括號 `[]` 搭配索引值
- 或使用邏輯值
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
straw_hat_df <- straw_hat_df[-9, ]
straw_hat_df <- straw_hat_df[straw_hat_df$name != "Franky", ]
```
## 使用 `subset()` 函數
- `subset()` 函數可以一次對欄位與觀測值進行篩選
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
male_bounty <- subset(straw_hat_df, gender == "Male", select = c(name, gender, bounty))
View(male_bounty)
```
## 轉換格式
- 新增一個變數為生日的年月日,格式為 Date
- 先算生日的年份,跟原本的月日結合,然後轉換格式
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
```
## 資料轉置:寬表格變長表格
- 透過 `tidyr` 套件中的 `gather()` 函數
- 建立一個寬表格 `straw_hat_wide_df` 僅包含姓名、年齡與身高這三個欄位。
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
straw_hat_wide_df <- straw_hat_df[, c("name", "age", "height")]
View(straw_hat_wide_df)
```
- 使用 `tidyr::gather()` 函數把 `straw_hat_wide_df` 轉置為 `straw_hat_long_df`
```{r}
# install.packages("tidyr")
library(tidyr)
straw_hat_long_df <- gather(straw_hat_wide_df, key = category, value = values, age, height)
View(straw_hat_long_df)
```
## 資料轉置:長表格變寬表格
- 使用 `tidyr::spread()` 函數把 `straw_hat_long_df` 轉置為 `straw_hat_wide_df`
```{r}
straw_hat_wide_df <- spread(straw_hat_long_df, key = category, value = values)
```
## 為什麼要學資料轉置
- 先看我們的原始寬表格
- 如果我們想要快速地畫在一個長條圖上畫兩組數據呢?
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_df.RData"))
straw_hat_wide_df <- straw_hat_df[, c("name", "age", "height")]
View(straw_hat_wide_df)
library(ggplot2)
ggplot(straw_hat_wide_df, aes(x = factor(name), y = age)) + geom_bar(stat = "identity")
ggplot(straw_hat_wide_df, aes(x = factor(name), y = height)) + geom_bar(stat = "identity")
```
- 最快的方法就是轉換成長表格
```{r}
straw_hat_long_df <- gather(straw_hat_wide_df, key = category, value = values, age, height)
ggplot(straw_hat_long_df, aes(x = factor(name), y = values, fill = category)) + geom_bar(stat = "identity", position = "dodge")
```
## 聯結資料框
- 先看看我們待會要聯結的資料框
```{r}
load(url("https://storage.googleapis.com/r_rookies/straw_hat_devil_fruit.RData"))
View(straw_hat_devil_fruit)
```
## 聯結資料框(2)
- 內部聯結
```{r}
straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit)
View(straw_hat_df_devil_fruit)
```
- 左外部聯結(all.x = TRUE)
```{r}
straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit, all.x = TRUE)
View(straw_hat_df_devil_fruit)
```
- 右外部聯結(all.y = TRUE)
```{r}
straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit, all.y = TRUE)
View(straw_hat_df_devil_fruit)
```
- 全外部聯結(all.x = TRUE, all.y = TRUE)
```{r}
straw_hat_df_devil_fruit <- merge(straw_hat_df, straw_hat_devil_fruit, all.x = TRUE, all.y = TRUE)
View(straw_hat_df_devil_fruit)
```
## 隨堂練習
- [活用資料框 - 隨堂練習](https://yaojenkuo.github.io/r_programming/ch7_exercise)
- [資料框整理技巧](https://www.datacamp.com/community/open-courses/%E8%B3%87%E6%96%99%E6%A1%86%E6%95%B4%E7%90%86%E6%8A%80%E5%B7%A7#gs.lmgdhpc)