-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy path02-programming_primers.Rmd
263 lines (172 loc) · 14.8 KB
/
02-programming_primers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
# Programming primers {#programmingprimers}
**Chapter lead author: Pepa Aran**
## Learning objectives
After you've gone over the lecture and solved the exercises, you should be able to:
- Use loops, conditional statements and functions in your code
- Write clean, stylized and structured code
- Look for help
## Tutorial
### Programming basics
In this section, we will review the most basic programming elements (conditional statements, loops, functions...) for the R syntax.
#### Conditional statements
In cases where we want certain statements to be executed or not, depending on a criterion, we can use *conditional statements* `if`, `else if`, and `else`. Conditionals are an essential feature of programming and available in all languages. The R syntax for conditional statements looks like this:
```{r eval=FALSE}
if (temp < 0.0){
is_frozen <- TRUE
}
```
The evaluation of the criterion inside the round brackets (here `(temp < 0.0)`) has to return either `TRUE` or `FALSE`. Whenever the statement between brackets is `TRUE`, the chunk of code between the subsequent curly brackets is executed. You can also write a conditional that covers all possibilities, like this:
```{r}
temp <- -0.5
if (temp < 0.0){
is_frozen <- TRUE
} else {
is_frozen <- FALSE
}
```
When the temperature is below 0, the first chunk of code is executed. Whenever it is greater or equal that 0 (i.e. the condition returns `FALSE`) the second chunk of code is evaluated.
You can also write more than two conditions, covering several cases:
```{r}
is_frozen <- FALSE
just_cold <- FALSE
if (temp < 0.0){
is_frozen <- TRUE
} else if (temp < 10){
just_cold <- TRUE
}
```
> Note: In the code chunks above, an *indentation* was used to highlight which parts go together, which makes the code easy to understand. Indentations are not evaluated by R per se (unlike in other programming languages, e.g., Matlab, Python), but help to make the code easier to read.
#### Loops
*Loops* are essential for solving many common tasks. `for` and `while` loops let us repeatedly execute the same set of commands, while changing an *index* or *counter variable* to take a sequence of different values. The following example calculates the sum of elements in the vector `vec_temp` by iteratively adding them together.
```{r}
vec_temp <- seq(10) # equivalent to 1:10
temp_sum <- 0 # initialize sum
for (idx in seq(length(vec_temp))){
temp_sum <- temp_sum + vec_temp[idx]
}
temp_sum
```
Of course, this is equivalent to just using the `sum()` function.
```{r, eval=FALSE}
sum(vec_temp)
```
Instead of directly telling R how many iterations it should do we can also define a condition and use a `while`-loop. As long as the condition is `TRUE`, R will continue iterating. As soon as it is `FALSE`, R stops the loop. The following lines of code do the same operation as the `for` loop above. What is different? What is the same?
```{r, eval=FALSE}
idx = 1 # initialize counter
temp_sum <- 0 # initialize sum
while (idx <= 10){
temp_sum <- temp_sum + vec_temp[idx]
idx = idx + 1
}
temp_sum
```
#### Functions
Often, analyses require many steps and your scripts may get excessively long. An important aspect of good programming is to avoid duplicating code. If the same sequence of multiple statements or functions are to be applied repeatedly, then it is usually advisable to bundle them into a new *function* and apply this single function to each object. This also has the advantage that if some requirement or variable name changes, it has to be edited only in one place. A further advantage of writing functions is that you can give the function an intuitively understandable name, so that your code reads like a sequence of orders given to a human.
For example, the following code, converting temperature values provided in Fahrenheit to degrees Celsius, could be turned into a function.
```{r eval=FALSE}
# not advisable
temp_soil <- (temp_soil - 32) * 5 / 9
temp_air <- (temp_air - 32) * 5 / 9
temp_leaf <- (temp_leaf - 32) * 5 / 9
```
Functions are a set of instructions encapsulated within curly brackets (`{}`) that generate a desired outcome. Functions contain four main elements:
- They start with a *name* to describe their purpose,
- then they need *arguments*, which are a list of the objects being input,
- enclosed by curly brackets `function(x){ ... }` for the code making up the *body* of the function,
- and lastly, within the body, a *return* statement indicating the output of the function.
Below, we define our own function `f2c()`:
```{r eval=FALSE}
# advisable
f2c <- function(temp_f){
temp_c <- (temp_f - 32) * 5 / 9
return(temp_c)
}
temp_soil <- f2c(temp_soil)
temp_air <- f2c(temp_air)
temp_leaf <- f2c(temp_leaf)
```
Functions are essential for efficient programming. Functions have their own environment, which means that variables inside functions are only defined and usable within that function and are not saved to the global environment. Functions restrict the scope of the domain in which variables are defined. Information flows inside the function only through its arguments, and flows out of the function only through its returned variable.
Functions (particularly long ones) can be written to separate source files with a suffix `.R` and saved in your `./R` directory - "written" as in copy and paste the function text as in the code chunk above into a text file with `.R` suffix. Preferably, the file has the same name as the function. We can save the previous function in a script `./R/f2c.R` and load it later by running `source("./R/f2c")`. It's good practice to keep one file per function, unless a function calls another function that is called nowhere else. In that case, the "sub-ordinate" function can be placed inside the same `.R` file.
### Style your code
Nice code is clean, readable, consistent, and extensible (easily modified or adapted). Ugly code works, but is hard to work with. There is no right or wrong about coding style, but certain aspects make it easier to read and use code. Here are a few points to consider.
#### Spaces and breaks
Adding enough white spaces and line breaks in the right locations greatly helps the legibility of code. Cramming variables, operators, and brackets without spaces leaves an unintelligible sequence of characters and it will not be clear what parts go together. Therefore, consider the following points:
- Use spaces around operators (`=`, `+`, `-`, `<-`, `>`, etc.).
- Use `<-`, not `=`, for allocating a value to a variable.
- Code inside curly brackets should be *indented* (recommended: two white spaces at the beginning of each line for each indentation level - don't use tabs).
For example:
```{r eval=F}
if (temp > 5.0){
growth_temp <- growth_temp + temp
}
```
#### Variable naming
It is recommended to use concise and descriptive variable names. Different variable naming styles are being used. In this course, we use lowercase letters, and underscores (`_`) to separate words within a variable name (`_`). Avoid (`.`) as they are reserved for certain types of objects in R. Also, avoid naming your objects with names of common functions and variables since your re-definition will mask already defined object names.
For example, `df_daily` is a data frame with data at a daily resolution. Or `clean_daily` is a function that cleans daily data. Note that a verb is used as a name for a function and an underscore (`_`) is used to separate words.
It is also recommended to avoid variable names consisting of only one character. Single-letter names make it practically impossible to search for that variable.
```{r eval=F}
# Good
day_01
# Bad
DayOne
day.one
first_day_of_the_month
djm1
# Very bad
mean <- function(x) sum(x)/length(x) # mean() itself is already a function
T <- FALSE # T is an abbreviation of TRUE
c <- 10 # c() is used to create a vector (example <- c(1, 2, 3))
```
#### Script style
Load libraries at the very beginning of a script, followed, by reading data or functions from files. Functions should be defined in separate `.R` files, unless they are only a few lines long. Then, place the sequence of statements. The name of the script should be short and concise and indicate what the script does.
Use comments to describe in human-readable text what the code does. Comments are all that appears to the right of a `#` and are code parts that are not interpreted by R and not executed. Adding comments in the code greatly helps you and others to read your code, understand what it does, modify it, and resolve errors (bugs).
To visually separate parts of a script, use commented lines (e.g., `#----`). The RStudio text editor automatically recognizes `----` added to the right end of a commented line and interprets it as a block of content which can be navigated by using the document (**Outline** button in the top right corner of the editor panel).
Avoid reading entire workspace environments (e.g., `load(old_environment.RData)`), deleting environments `rm(list=ls())`, loading hidden dependencies (e.g., `.Rprofile`), or changing the working directory (`setwd("~/other_dir/"`) as part of a script.
Note that information about the author of the script, its creation date, and modifications, etc. should not be added as commented text in the file. In Chapter \@ref(codemgmt), we will learn about the code versioning control system *git*, which keeps track of all this as meta information associated with files.
A good and comprehensive best practices guide is given by the [tidyverse style guide](https://style.tidyverse.org/).
### Where to find help {#findinghelp}
The material covered in this course will give you a solid basis for your future projects. Even more so, it provides you with code examples that you can adapt to your own purposes. Naturally, you will face problems we did not cover in the course and you will learn more as you go. Different approaches to getting help can be taken for different types of problems and questions.
#### Within R
"*I know the name of a function that might help solve the problem but I do not know how to use it.*" Typing a `?` in front of the function will open the documentation of the function, giving information about a function's purpose and method, arguments, the returned object, and examples. You have learned a few things about plots but you may not know how to make a boxplot:
```{r eval=FALSE}
?graphics::boxplot
```
Running the above code will open the information on making boxplots in R.
"*There must be a function that does task X but I do not know which one.*" Typing `??` will call the function `help.search()`. Maybe you want to save a plot as a JPEG but you do not know how:
```{r eval=FALSE}
??jpeg
```
Note that it only looks through your *installed* packages.
#### Online
To search in the entire library of R go to the website [rdocumentation.org](https://www.rdocumentation.org/) or turn to a search engine of your choice. It will send you to the appropriate function documentation or a helpful forum where someone has already asked a similar question. Most of the time you will end up on [stackoverflow.com](https://stackoverflow.com/), a forum where most questions have already been answered.
#### Error messages
If you do not understand the error message, start by searching for it on the web. Be aware that this is not always useful as developers rely on the error catching provided by R. To be more specific, add the name of the function and package you are using, to get a more detailed answer.
#### Worked examples
Worked examples are implementations of certain workflows that may serve as a template for your own purpose. It is often simpler to adjust existing code to fulfill your purpose than to write it from scratch. *Vignettes* are provided for many packages and serve as example workflows that demonstrate the utility of package functions. You can type ...
```{r eval=FALSE}
vignette("caret", package = "caret")
```
... to get information about how to use the {caret} package in an easily digestible format. (You will learn more about caret in Chapters \@ref(supervisedmli) and \@ref(supervisedmlii)). Several blogs serve similar purposes and are a great entry point to learn about new topics. Examples are the [Posit Blog](https://posit.co/blog/) (Posit is the company developing and maintaining RStudio and several R packages), [R-bloggers](https://www.r-bloggers.com/), [R-Ladies](https://www.rladies.org/blog/), etc.
#### Asking for help
If you cannot find a solution online, start by asking your friends and colleagues. Someone with more experience than you might be able and willing to help you. When asking for help it is important to think about how you state the problem. The key to receiving help is to make it as easy as possible to understand the issue you are facing. Try to reduce what does not work to a simple example. Reproduce a problem with a simple data frame instead of one with thousands of rows. Generalize it in a way that people who do not do research in your field can understand the problem. If you are asking a question online in a forum include the output of `sessionInfo()` (it provides information about the R version, packages your using,...) and other information that can be helpful to understand the problem. [Stackoverflow](https://www.stackoverflow.com) has its own [guidelines](https://stackoverflow.com/help/how-to-ask) on how to ask a good question, which you should follow. Here's a [great template](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) you should use for R-specific question. If your question is well crafted and has not been answered before you can sometimes get an answer within 5 minutes.
## Exercises
### Gauss variations {-}
Use a `for` loop to compute the sum of all natural numbers from 1 to 100. Print the result to the screen. Repeat this exercise but use a `while` loop.
Add up all numbers between 1 and 100 that are at the same time a multiple of 3 and a multiple of 7. Print the result to the screen in the form of: `The sum of multiples of 3 and 7 within 1-100 is: {your result}`.
### Nested loops {-}
Given a matrix `mymat` and a vector `myvec` (see below), implement the following algorithm:
1. Start with the first row in `mymat`.
2. Fill all missing values in the current row of `mymat` with the maximum value in `myvec`.
3. Drop the maximum value from `myvec`.
4. Proceed to the next row of `mymat` and repeat steps 2-4.
`mymat` and `myvec` are defined as:
```{r}
mymat <- matrix(c(6, 7, 3, NA, 15, 6, 7,
NA, 9, 12, 6, 11, NA, 3,
9, 4, 7, 3, 21, NA, 6,
rep(NA, 7)),
nrow = 4, byrow = TRUE)
myvec <- c(8, 4, 12, 9, 15, 6)
```
### Interpolation {-}
Define a vector $\vec{v}$ of length 100. Define the vector so that $v_i = 6$, for $i = 1 : 25$ and $v_i = -20$, for $i = 66 : 100$. Remaining elements are to be defined as 'missing'. Linearly interpolate missing values that are not defined. Plot the values of $\vec{v}$ using `plot(vec)`.