forked from r4bds/r4bds.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlab08.qmd
444 lines (336 loc) · 21.6 KB
/
lab08.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
# Lab 8: Creating a Simple R-package {.unnumbered}
## Package(s)
- [devtools](https://github.com/r-lib/devtools)
- [usethis](https://devtools.r-lib.org)
- [roxygen2](https://roxygen2.r-lib.org)
- [testthat](https://testthat.r-lib.org)
- [gert](https://rdrr.io/cran/gert/)
## Schedule
- 08.00 - 08.15: [Recap of Lab 7](https://raw.githack.com/r4bds/r4bds.github.io/main/recap_lab08.html)
- 08.15 - 08.45: [Lecture](https://raw.githack.com/r4bds/r4bds.github.io/main/lecture_lab08.html)
- 08.45 - 09.00: Break
- 09.00 - 12.00: [Exercises](#sec-exercises)
## Learning Materials
_Please prepare the following materials:_
- [Primer on R Packages](primer_on_r_packages.qmd)
- Cheatsheet: [Package development with devtools](https://raw.githubusercontent.com/rstudio/cheatsheets/main/package-development.pdf) -- When in doubt, check here first
- Web: [R Packages: A Beginner's Tutorial](https://www.datacamp.com/community/tutorials/r-packages-guide) -- Read this before class
- Web: [Developing Packages with RStudio](https://support.rstudio.com/hc/en-us/articles/200486488-Developing-Packages-with-RStudio) -- Creating an R Package with RStudio
- Web: [R package primer a minimal tutorial](https://kbroman.org/pkg_primer/) -- Brief but comprehensive R Package tutorial
- Book: [R Packages - 1 Introduction](https://r-pkgs.org/introduction.html) -- Everything you need to know about R Packages
_Unless explicitly stated, do not do the per-chapter exercises in the R4DS2e book_
## Learning Objectives
_A student who has met the objectives of the session will be able to:_
- Prepare a simple R package for distributing documented functions
- Explain the terms Repository, Dependencies, and Namespace
- Implement testing in an R package
- Collaboratively work on an R package on GitHub
## Exercises {#sec-exercises}
_Read the steps of this exercises carefully while completing them_
### Introduction
The aim of these exercises is to set up a collaborative coding project using GitHub and collaborate on creating a simple R package that replicates the central dogma of molecular biology.
The exercises will build upon what you learned in [Lab 7](lab07.qmd) and its [exercises](lab07.qmd#sec-exercises).
But don't worry too much, the setup steps will be given here as well.
----
### TASKS
If you haven't read the [Primer on R packages](primer_on_r_packages.qmd) yet, consider reading it now.
### Task 1 - Setting up the R package
In the first task, **one team member** (you decide who) will initiate a package project and push it to Github. Then, when instructed, the **remaining group members** will connect to that repository, and the real package building will begin.
#### Task 1.1 - Create R Package
1. Go to [R for Bio Data Science RStudio Server](https://teaching.healthtech.dtu.dk/22100/rstudio.php) and login.
2. Click `Create a project`![](images/L08_create.png){width="5%"} in the top left and choose `New Directory`.
3. Select `R Package` and pick a fitting Package name.
- The package you will create will replicate the central dogma of molecular biology. Let that inspire you
- Look up naming rules [here](https://r-pkgs.org/workflow101.html#name-your-package)
- The most important rule: `_`, `-`, and `'` are not allowed.
4. Tick the "Create git repository" box.
5. Click "Create project".
RStudio will now create an R project for you that contains all necessary files and folders.
6. Open the `DESCRIPTION` file and write a title for your package, a small description, and add authors.
- When adding authors, the format is:
```{r}
#| eval: false
Authors@R:
c(person(given = "firstname",
family = "lastname",
role = c("aut", "cre"), # There must be a "cre", but there can only be one
email = "[email protected]"),
person(given = "firstname",
family = "lastname",
role = "aut",
email = "[email protected]"))
```
7. Create an MIT license (from the console) `usethis::use_mit_license("Group name")`
8. _If you didn't tick the "Create git repository"_: Run `usethis::use_git()`. _If the "Git" tab is not in the lower left panel_: reopen the R project.
9. <details>
<summary>**Optional** setup steps (generally good, but not essential for this course)</summary>
<ul>
<li>Add a Readme file `usethis::use_readme_rmd( open = FALSE )`. You will do that in the Group Assignment later </li>
<li>Add a lifecycle badge: `usethis::use_lifecycle_badge( "Experimental" )`</li>
<li>Write vignettes: `usethis::use_vignette("Vignette name")`</li>
</ul>
</details>
#### Task 1.2 - Setup GitHub Repository for your group's R package
During the previous class, you have been using a combination of console and terminal to set up GitHub. This week, you will solely be using the console.
**Still only the first team member**
1. Go to [https://github.com/rforbiodatascience24](https://github.com/rforbiodatascience24)
2. Click the green `New` button
3. Create a new repository called `group_X_package`. Remember to replace the **X**
4. Select `rforbiodatascience24` as owner
5. Make the repository `Public`
6. Click the green `Create repository` button
7. In this new repository, click the `settings` tab
8. Click `Collaborators and Team`
9. Click the green `Add people` button
10. Invite the remaining group members
11. All other team members should now have access to the repository, **but do not create your own project yet.**
#### Task 1.3 - Connect your RStudio Server Project to the GitHub Repository
**Still only the first team member**
1. Find your `PAT` key or create a new
<details> <summary>How to create a new one</summary>
<ul>
<li>Type in the console `usethis::create_github_token()`. You’re going to be redirected to GitHub website.
<li>In case you did that step manually, remember to give permission to access your repositories.
<li>You need to type your password. Don’t change the default settings of creating the token except for the description – set ‘RStudio Server’ or something similar. Then hit Generate token.
<li>Copy the generated token (should start with ghp_) and store it securely (e.g., in a password manager). Do not keep it in a plain file.
</ul></details>
2. Go back to the RStudio Server project
3. Type in the console `gitcreds::gitcreds_set()` and paste the `PAT` key
4. Stage all files (in the `git` window in the top right) by ticking all boxes under `Staged`
5. Still in the console, run the following commands. Replace your group number and use your GitHub username and email. You run these to link your project to the GitHub repository you created. Remember to replace the **X**.
```{r}
#| eval: false
usethis::use_git_remote(name = "origin", "https://github.com/rforbiodatascience24/group_X_package.git", overwrite = TRUE) # Remember to replace the X
usethis::use_git_config(user.name = "USERNAME", user.email = "[email protected]")
gert::git_commit_all("Package setup")
gert::git_push("origin")
```
**All other team members**
_After your teammate has pushed to GitHub._
1. In the RStudio Server, create a new project based on the GitHub Repository just created by your teammate.
- Click `Create a project`![](images/L08_create.png){width="5%"} in the top left.
- Choose `Version Control` and then `Git`.
- Paste in the repository URL: `https://github.com/rforbiodatascience24/group_X_package.git`. Remember to replace the **X**.
- Give the directory a fitting name and click `Create Project`.
2. Find your `PAT` key or create a new
<details> <summary>How to create a new one</summary>
<ul>
<li>Type in the console `usethis::create_github_token()`. You’re going to be redirected to GitHub website.
<li> In case you did it manually, remember to give permission to access your repositories.
<li>You need to type your password. Don’t change the default settings of creating the token except for the description – set ‘RStudio Server’ or something similar. Then hit Generate token.
<li>Copy the generated token (should start with ghp_) and store it securely (e.g. in password manager). Do not keep it in a plain file.
</ul></details>
3. Go back to the RStudio Server project
4. Type in the console `gitcreds::gitcreds_set()` and paste the `PAT` key
5. Still in the console, run the following commands. Replace your GitHub username and email.
```{r}
#| eval: false
usethis::use_git_config(user.name = "USERNAME", user.email = "[email protected]")
```
**If RStudio at any point asks you to log in to GitHub, redo step 4 and 5.**
Now you are ready to work on the R package.
### Task 2 - Build the package
----
Each team member should build and implement their own function. The code will be given, but you will be asked to come up with a name for your function and many of the variables, so discuss in the team what naming convention you want to use. Do you want to use `snake_case` (common in tidyverse) or `camelCase` (common in Bioconductor)? It doesn't really matter, but be consistent.
</br>
#### Task 2.1 - Incorporate data
In this task you will include the following codon table in your package.
**This task should be done by one team member**.
The rest should follow along.
1. Below is a standard codon table. Store it in an object with a name of your own choosing.
```{r}
#| eval: false
c("UUU" = "F", "UCU" = "S", "UAU" = "Y", "UGU" = "C",
"UUC" = "F", "UCC" = "S", "UAC" = "Y", "UGC" = "C",
"UUA" = "L", "UCA" = "S", "UAA" = "_", "UGA" = "_",
"UUG" = "L", "UCG" = "S", "UAG" = "_", "UGG" = "W",
"CUU" = "L", "CCU" = "P", "CAU" = "H", "CGU" = "R",
"CUC" = "L", "CCC" = "P", "CAC" = "H", "CGC" = "R",
"CUA" = "L", "CCA" = "P", "CAA" = "Q", "CGA" = "R",
"CUG" = "L", "CCG" = "P", "CAG" = "Q", "CGG" = "R",
"AUU" = "I", "ACU" = "T", "AAU" = "N", "AGU" = "S",
"AUC" = "I", "ACC" = "T", "AAC" = "N", "AGC" = "S",
"AUA" = "I", "ACA" = "T", "AAA" = "K", "AGA" = "R",
"AUG" = "M", "ACG" = "T", "AAG" = "K", "AGG" = "R",
"GUU" = "V", "GCU" = "A", "GAU" = "D", "GGU" = "G",
"GUC" = "V", "GCC" = "A", "GAC" = "D", "GGC" = "G",
"GUA" = "V", "GCA" = "A", "GAA" = "E", "GGA" = "G",
"GUG" = "V", "GCG" = "A", "GAG" = "E", "GGG" = "G")
```
2. Run `usethis::use_data(NAME_OF_YOUR_OBJECT, overwrite = TRUE, internal = TRUE)`
3. You have now made the data available to our functions, but we also want to make it visible for our users.
- Run `usethis::use_data(NAME_OF_YOUR_OBJECT, overwrite = TRUE)`.
4. Write a data manual (document the data).
- All non-internal data should be documented in a `data.R` file in the `R` folder. Create it with `usethis::use_r("data")`
- Add the following scaffold to `R/data.R` and write a *very* brief description of the data (see an example [here](https://r-pkgs.org/data.html#documenting-data)). Don't spend a lot of time here.
```{r}
#| eval: false
#' Title
#'
#' Description
#'
#'
#' @source \url{https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1}
"NAME_OF_YOUR_OBJECT"
```
Normally, you should also describe how the raw data was cleaned. You would do that in the file that opens after running `usethis::use_data_raw( name = "NAME_OF_YOUR_OBJECT", open = TRUE )`, but that is less relevant here, so we will skip that part.
Your package now includes some data.
Restart R, clean your Environment (small broom in the Environment tab), run the three lines of code from [The Package Workflow](primer_on_r_packages.qmd#pkgwf) section:
```{r}
#| eval: false
rstudioapi::documentSaveAll() # Saves all you files
devtools::document() # Writes all your manuals for you
devtools::load_all() # Simulates library("your package"), allowing you to use your functions
```
And run `?NAME_OF_YOUR_OBJECT`.
**OBS** For some unknown reason, this does not work on the current R version (4.3.1). Instead, you can click the "Build" tab next to the "Git" tab and then "install". After installing and attaching,
your manual should pop up. Try printing the object as well to see what it looks like `print(NAME_OF_YOUR_OBJECT)`.
Push your changes to GitHub.
**The other team members** pull the changes to have the data available to you as well.
</br>
#### Task 2.2 - Implement functions
In this task you will be working individually to implement a function each.
If you are fewer in the team than the number of functions. The quickest to finish can implement the remaining, or separate them out as you see fit.
If you want an additional challenge, each team member can create their own Git branch `gert::git_branch_checkout("NAME_OF_BRANCH")` and work there. When done, merge your branch with the main branch.
That is a more clean and 'proper' workflow, but completely optional.
If you are in doubt what the underlying functions do, run `?function_name` to get a hint about their purpose. **OBS** For some unknown reason, this does not work on the current R version (4.3.1).
As mentioned with the data, you can install your package, and then load the manual.
Also, remember the functions are replicating the central dogma. Let that inspire you, when naming the functions and variables. If you get stuck, ask your teammates about their functions.
Function five is a bit more involved. Do it together or help your teammate out if it causes problems.
For each function (found below), complete the following steps:
1. Look carefully at your function. Choose a fitting name for it
2. Run `usethis::use_r("function_name")` to create an `.R` file for the function
3. Paste in the function and rename all places it says `name_me` with fitting names
4. Click somewhere in the function. In the toolbar at the very top of the page, choose `Code` and click `Insert Roxygen Skeleton`
5. Fill out the function documentation
- Give the function a title
- Do not write anything after `@export`
- The parameters should have the format: `@param param_name description of parameter`
- The return has the format: `@return description of function output`
- Examples can span multiple lines and what you write will be run, so make it runnable.
- **Important!** Either fill out everything or delete what you don't. Otherwise, the package check will fail (Do **not** delete `@export` for these functions).
6. Run the three lines of codes from [The Package Workflow](primer_on_r_packages.qmd#pkgwf) section
- If at this point, you get a warning that `NAMESPACE` already exists, delete it and try again.
```{r}
#| eval: false
rstudioapi::documentSaveAll() # Saves all you files
devtools::document() # Writes all your manuals for you
devtools::load_all() # Simulates library("your package"), allowing you to use your functions
```
7. View your function documentation with `?function_name`
8. Defining a test or a series of tests around your newly created function ensures future corrections will not yield undesired results. Create such a test for your function. Run `usethis::use_test("function_name")` and write a test. Draw some inspiration from [here](https://r-pkgs.org/tests.html#test-structure) or from running `?testthat::expect_equal`.
- **Skip this step for function five**. Instead, write inline code comments for each chunk of code. Press `Cmd+Shift+c` / `Ctrl+Shift+C` to comment your currently selected line.
9. Rerun the three lines from The Package Workflow section and check that the package works with `devtools::check()`
- <details> <summary> Briefly about check </summary>
`devtools::check()` will check every aspect of your package and run the tests you created.
You will likely get some warnings or notes like 'No visible binding for global variables'.
They are often harmless, but, if you like, you can get rid of them as described [here](https://www.r-bloggers.com/2019/08/no-visible-binding-for-global-variable/).
The check will tell you what might cause problems with your package and often also how to fix it.
If there are any errors, fix those. Warnings and notes are also good to address. Feel free to do that if any pops up.
</details>
10. If it succeeds without errors, push your changes to GitHub
- If you decided to create branches review [last week's exercises](lab07.qm#your-first-branch-merging) for guidance
- Always pull before pushing
- Use the RStudio GUI if you prefer
- Remember to pull first
- If you chose to create your own branch, [merge it with master/main](https://www.w3schools.com/git/git_branch_merge.asp).
<details> <summary>Function one</summary>
```{r}
#| eval: false
name_me1 <- function(name_me2){
name_me3 <- sample(c("A", "T", "G", "C"), size = name_me2, replace = TRUE)
name_me4 <- paste0(name_me3, collapse = "")
return(name_me4)
}
```
</details>
<details> <summary>Function two</summary>
```{r}
#| eval: false
name_me1 <- function(name_me2){
name_me3 <- gsub("T", "U", name_me2)
return(name_me3)
}
```
</details>
<details> <summary>Function three</summary>
```{r}
#| eval: false
name_me1 <- function(name_me2, start = 1){
name_me3 <- nchar(name_me2)
codons <- substring(name_me2,
first = seq(from = start, to = name_me3-3+1, by = 3),
last = seq(from = 3+start-1, to = name_me3, by = 3))
return(codons)
}
```
</details>
<details> <summary>Function four</summary>
`NAME_OF_YOUR_OBJECT` refers to the codon table you stored in Task 2.
```{r}
#| eval: false
name_me <- function(codons){
name_me2 <- paste0(NAME_OF_YOUR_OBJECT[codons], collapse = "")
return(name_me2)
}
```
</details>
<details> <summary>Function five</summary>
This function will be the first to use dependencies.
As a reminder, a dependency is a package that your package depends on.
In this case, it will be `stringr` and `ggplot2`. They are already installed, so you don't need to do that.
The best way to add these packages to your own is to first add them to your package dependencies with `usethis::use_package("package_name")`.
If you care about reproducibility, you can add `min_version = TRUE` to the function call to specify a required minimum package version of the dependency.
Run `usethis::use_package` for both dependencies.
For the `ggplot2` functions, we will use `ggplot2::function_name` everywhere a `ggplot2` function is used. Also add `@import ggplot2` to the function description (this is often done with `ggplot2` because it has a lot of useful plotting functions with only rare name overlaps).
If you want to be more precise, add `@importFrom ggplot2 ggplot aes geom_col theme_bw theme` instead.
The same applies for `stringr`, but since we only use a few functions, add `@importFrom stringr str_split boundary str_count` to the function description.
```{r}
#| eval: false
name_me1 <- function(name_me2){
name_me3 <- name_me2 |>
stringr::str_split(pattern = stringr::boundary("character"), simplify = TRUE) |>
as.character() |>
unique()
counts <- sapply(name_me3, function(amino_acid) stringr::str_count(string = name_me2, pattern = amino_acid)) |>
as.data.frame()
colnames(counts) <- c("Counts")
counts[["Name_me2"]] <- rownames(counts)
name_me4 <- counts |>
ggplot2::ggplot(ggplot2::aes(x = Name_me2, y = Counts, fill = Name_me2)) +
ggplot2::geom_col() +
ggplot2::theme_bw() +
ggplot2::theme(legend.position = "none")
return(name_me4)
}
```
</details>
</br>
### Task 3 - Group discussion
1. Describe each function to each other in order - both what it does and which names you gave them and their variables.
2. The person(s) responsible for function five, describe how you added the two packages as dependencies.
3. Discuss why it is a good idea to limit the number of dependencies your package has. When can't it always be avoided?
4. Discuss the difference between adding an `@importFrom package function` tag to a function description compared to using `package::function()`. Read [this section](https://r-pkgs.org/namespace.html#imports) if you are not sure or just want to learn more.
## <span style="color: red;">GROUP ASSIGNMENT</span>
----
For this week's group assignment, write a vignette (user guide) for your package (**max** 2 pages). The vignette should include a brief description of what the package is about and demonstrate how each function in the package is used (individually and in conjunction with each other).
As a final section, discuss use cases for the package and what other functions could be included.
Also include the main points from your discussion in Task 3.
Include a link to your group's GitHub repository at the top of the vignette. Hand it in as a pdf in DTU Learn.
1. Create a vignette with `usethis::use_vignette("your_package_name")`.
2. When you are done writing it, run `devtools::build_vignettes()`.
3. At the top line of your vignette, change `rmarkdown::html_vignette` to `rmarkdown::pdf_document`
4. Rerun `devtools::build_vignettes()` - the created pdf-file in the `doc` folder is the document to hand it.
5. Lastly, duplicate the vignette as the GitHub README
- Create a README with `usethis::use_readme_rmd( open = TRUE )`.
- Copy the content of the vignette Rmarkdown into the readme **Keep the top part of the README as is**. If you overwrote it anyway, change `rmarkdown::pdf_document` to `rmarkdown::github_document`.
- Run `devtools::build_readme()`
- Push the changes to GitHub.
### Last tip on packages
----
Next week, I will introduce the `golem` package for building [production-grade Shiny apps](https://engineering-shiny.org/index.html).
However, I personally also use it to quickly get going with packages.
When starting out with an R package, it may seem complicated with a lot of things to remember.
`golem` remembers these things for you.
When setting up your package / Shiny app with `golem::create_golem("Name of your awesome package/app")`, it creates a `dev` folder with a few files listing all you need to get started with an R package.
It does also give you a lot of other things that you will rarely use, and it also sets up some basic structures for Shiny apps. These can simply be deleted if you are not also building a Shiny app (which you will next week).