-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path06-functions.rmd
224 lines (150 loc) · 9 KB
/
06-functions.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
# Functions
```{block2, type = 'rmdtip'}
In previous lessons, we've used various functions (like `cat()`, `print()`) without delving deeply into their mechanics. This lesson aims to demystify functions by guiding you through the creation and understanding of custom functions. We'll also explore the concepts of global and local variable scopes.
```
## Defining Functions
Defining a function in R follows a syntax similar to variable assignments or conditional statements. We start with an identifier (e.g., `add_one`) on the left of an assignment operator (`<-`).
The body of the function follows, beginning with the keyword function, the **parameter(s)** enclosed in parentheses (e.g., `input_value`), and the function's body within curly braces. The value of the last executed statement is automatically returned:
```{r, echo=TRUE}
add_one <- function (input_value) {
output_value <- input_value + 1
return(output_value) # Explicit return
}
```
```{r, echo=TRUE}
add_one(3)
```
Call this function by specifying its **identifier** and the necessary parameter(s). For instance, in our example `add_one(3)` returns `4`.
### Understanding the `return()` Function
In R, functions automatically return the value of the last expression evaluated. The decision to use the `return()` function explicitly or rely on implicit return is often guided by the specific context of your work, especially in fields like GIS.
**Implicit Return:**
In R, if `return()` is not explicitly used, the function returns the result of the last line of code executed. This is often sufficient, especially in simpler functions. For example:
```{r, echo=TRUE}
add_one <- function (input_value) {
input_value + 1
}
add_one(3) # Returns 4
```
**Explicit Return:**
Using `return()`, you can explicitly specify what the function should return. This approach is particularly useful when the function has complex logic, or when you need to return a value before reaching the end of the function. For example:
```{r, echo=TRUE}
add_one <- function (input_value) {
result <- input_value + 1
return(result)
}
add_one(3) # Also returns 4
```
In both examples, calling `add_one(3)` returns 4. The explicit use of `return()` in the second example provides clarity about the intended output of the function.
```{block2, type = 'rmdtip'}
The choice between implicit and explicit return in R functions can depend on various factors, including the complexity of the function and the coding standards of your team or organization. In GIS and similar fields, it's important to understand both approaches and adapt to the coding practices of your workplace. Always consider the context and ask, "What is the standard here?"
```
<!-- Note from 01.03 for next Review: It might be good, to explain it more througly, see Discussion Forum - Jens about explicit vs. implicit and my answer: https://elearn.unigis.at/mod/forum/discuss.php?d=9284 -->
**Early Return Example:**
Sometimes, it's necessary to exit a function early, for instance in error handling:
```{r, echo=TRUE}
safe_divide <- function (numerator, denominator) {
if (denominator == 0) {
return("Error: Division by zero")
}
numerator / denominator
}
safe_divide(10, 0) # Returns "Error: Division by zero"
```
So, in GIS and data analysis, adapting to the specific requirements of your project and following established coding standards are key to effective and collaborative work.
## More Parameters
Functions can have multiple parameters, separated by commas.
> A function expects as many input values as there are parameters specified in its definition; otherwise, an error is triggered.
The `area_rectangle` function includes two parameters (`height` and `width`), calculates the area by multiplying the inputs, and returns the area as a numeric value:
```{r, echo=TRUE}
area_rectangle <- function (height, width) {
return(height * width)
}
area_rectangle(3, 2)
```
```{block2, type = 'rmdtip'}
Default parameters can enhance a function's flexibility. For instance, you can modify the `area_rectangle` function parameters as `(height, width = 3)`. Now, with only one input, the function should return `YOUR INPUT * 3`. Specifying both parameters overrides the default `width`. Here's how it looks:
```
```{r, echo=TRUE}
area_rectangle <- function (height, width = 3) {
return(height * width)
}
# Calling with one parameter uses the default width of 3
area_rectangle(3) # Returns 9
# Calling with two parameters uses the specified width
area_rectangle(3, 2) # Returns 6
```
This approach allows the function to be more flexible and adaptable to different use cases.
## Returning Multiple Values
To return multiple values from a function, encapsulate them in a list. For example, `rectangle_metrics` calculates and returns both the `area` and `perimeter` of a rectangle:
```{r, echo=TRUE}
rectangle_metrics <- function (height, width) {
area <- height * width
perimeter <- 2 * (height + width)
return(list(area = area, perimeter = perimeter))
}
```
Retrieve these values using list indices `[[1]]` and `[[2]]`, or by simply using their names:
```{r, echo=TRUE}
metrics <- rectangle_metrics(3, 2)
# Using list indices
cat("Area (list index): ", metrics[[1]], "\n")
cat("Perimeter (list index): ", metrics[[2]], "\n")
# Using names
cat("Area (name): ", metrics$area, "\n")
cat("Perimeter (name): ", metrics$perimeter, "\n")
```
```{block2, type = 'rmdtip'}
Upon defining a function in R, it appears in the Environment Window of RStudio, similar to a variable. When invoked, the R interpreter executes the function stored in memory.
```
## Functions and Control Structures
Functions can contain loops and conditional statements. Here's a function that uses a loop to calculate the factorial of a number (e.g., factorial of 3 = 1 * 2 * 3 = 6):
```{r, echo=TRUE}
factorial <- function (input_value) {
result <- 1
for (i in 1:input_value) {
cat("Current:", result, " | i:", i, "\n")
result <- result * i
}
return(result)
}
factorial(3)
```
The function takes a single numeric value as input, defines a variable named `result` that is equal to '1' and then creates a loop over all the numbers from 1 (variable `result`) to the `input_value`. In the loop, the current value of `result` is multiplied by the value of the iterator `i`.
```{block2, type = 'rmdtip'}
Though technically possible, defining a function within loops or conditional statements is generally avoided.
```
## Scope
Function parameters are internal variables acting as a bridge between the function and its environment. They are 'visible' only within the function's scope.
> The scope of a variable refers to the code region where the variable is accessible.
Understanding scope is crucial for several reasons:
- **Avoiding Variable Conflicts:** Variables with the same name can exist in different scopes without conflicting with each other. This prevents unexpected behavior caused by variable name overlaps.
- **Predictable Behavior:** Knowing whether a variable is global or local helps predict its behavior and influence within your script. It clarifies where and how variables can be accessed or modified.
- **Resource Management:** Local variables are often cleared from memory once the function execution is complete, aiding in efficient resource management in larger scripts or applications.
In R, the scope of variables is defined as follows:
- **Global Variables:** Defined outside of any function and accessible anywhere in the script, including within functions.
- **Local Variables:** Defined within a function and accessible only in that function's scope. This includes function parameters, which act as internal variables bridging the function and its environment.
In the following example, `x_value` is a global variable, while `new_value` and `input_value` are local to the `times_x` function. Accessing `new_value` or `input_value` outside `times_x` would result in an error, but `x_value` can be referred to within the function:
```{r, echo=TRUE}
x_value <- 10
times_x <- function (input_value) {
new_value <- input_value * x_value
return(new_value)
}
times_x(2)
```
Understanding these distinctions is essential for writing robust and error-free R code, especially in more complex data analysis or GIS projects.
```{block2, type = 'rmdexercise'}
Referring to external global variables in a function is possible but can be dangerous. At the time of execution, one cannot be sure what the value of the global variable is. For instance, other processes might have changed its value, which affects the behavior of the function. In order to fix this problem, define the variable `x_value` as a default attribute of function `times_x`.
<details closed>
<summary><ins>**See solution!**</ins></summary>
<p><i><font color="grey">
times_x <- function (input_value, x_value = 10) {
return(input_value * x_value)
}
</font></i>
</p>
</details>
```
```{block2, type = 'rmdtip'}
These lessons have introduced fundamental R programming concepts. The [Base R Cheatsheet](https://github.com/rstudio/cheatsheets/blob/main/base-r.pdf){target="_blank"} offers a concise summary of key operations.
```