-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcode_reviews.Rmd
101 lines (51 loc) · 7.21 KB
/
code_reviews.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
title: "Code reviews"
---
# Reproducible analytical pipelines (RAP)
As part of RAP development all code should be peer reviewed by another member of the Official Statistics team. There are a wide variety of skills and strengths within the team which may mean that a RAP project may not be able to be fully reviewed by a single team member, in these instances a review should take place under the guidance of a senior person and the opportunity be used to share knowledge.
This guidance is not intended to provide an in-depth overview of Git, GitLab, or version control software, but give a list of steps that should be followed during RAP development and improvement to quality assure our code. By following these practices we ensure that we are building in quality from the very initial stages of our statistics production, while also helping to build skills and share knowledge within the team.
For useful Git resources please see:
* [Happy Git for the useR](https://happygitwithr.com/)
* [intro to git with RStudio](https://r-bio.github.io/intro-git-rstudio/)
* [GitLab documentation](https://docs.gitlab.com/)
# Git and GitLab
All Official Statistics code projects should be stored and managed using version control software such as Git. The current NHSBSA platform of choice is GitLab and can be accessed on the AON VPN at https://gitlab.cloud.nhsbsa.nhs.uk/.
## Accounts
A GitLab account should automatically be created for you on joining the team, however if this has not been actioned a service request must be raised FAO of Platform Services. In your request ask to be made a `maintainer` of the `Insight > Statistics` group.
On first accessing the [NHSBSA private GitLab instance](https://gitlab.cloud.nhsbsa.nhs.uk/) you might be greeted with a warning from your browser that the connection is not secure. Ignore this warning by navigating to the advanced settings link and proceeding to website.
## Cloning a repository
Most Official Statistics code repositories will be mainly `R` based with `SQL` used to extract data from the Data Warehouse, therefore we can use the built in `Git` tools in R Studio to manage our projects. Currently, SSH keys cannot be used to clone from the NHSBSA GitLab, therefore we have to use HTTPS. Before we do anything in R Studio we have to change a config setting for `Git`. Open a command prompt (you can do this quickly by bringing up the windows menu by pressing the windows key, and then searching for 'cmd'), and enter the following line of code:
`git config --global http.sslVerify false`
This bypasses the SSL certificate warning that you may have seen by accessing the GitLab instance in your browser. You can check if this has been successful by entering `git config --list`.
You can now clone a repository by creating a 'new project' from the project selection drop down in the top right hand corner of R Studio. This will then open a dialogue box that will allow you to create a project from version control > Git and will ask you for the repository URL. This can be copied from the GitLab website. However it will be provided in the format:
https://dps-gitlab.service.nhsbsa/insight/statistics/prescription-cost-analysis-rap.git
You will notice that the domain name is **dps-gitlab.service.nhsbsa**. This portion of the URL will need changing to **gitlab.cloud.nhsbsa.nhs.uk** in order to be able to clone.
You can still also clone repositories using the Git command line or other software such as GitHub Desktop.
## Branches and merge requests
When working collaboratively on a project all code additions, changes, and improvements should be handled using feature branches and merge requests. For example, if a new function is being added to an existing RAP a new branch should be created where code is created and tested before a merge request is submitted to trigger a code review by another member of the team.
At the moment there are no tools to handle merge requests within R Studio, so these are all managed on the GitLab website. However, any code can still be viewed in the project by switching between the appropriate branches.
Merge requests should be used trigger code reviews by other members of the team, and no branch should be merged without approval from at least one other person. Any member of the team can perform a code review so long as they feel comfortable doing so. Code reviews should be performed collaboratively if there are some advanced techniques used in the code and approval sought from all involved.
# Code reviews
Code reviews are intended to improve code quality, catch bugs, make sure code is well documented, and provides opportunities to share skills and knowledge in the team.
There are lots of opinions in the wild about code reviews, there are no hard and fast rules for what a review should look like or a checklist to go through to make a good one. Therefore it is essential that they are used to be constructive and supportive, reviewers are there to critique the code, not the coder (**GOOD:** "good work", "perhaps you could try x", "what is the reasoning behind this choice?" **BAD:** "this doesn't work", "you should have done y", "poor choice, do what I would do instead").
Some resources that first time reviewers might find handy (although they're all geared towards software engineering rather than analytical code).
* [Google's guide to code review](https://google.github.io/eng-practices/review/reviewer/looking-for.html)
* [In-depth checklist and tips](https://www.codeproject.com/Articles/524235/Codeplusreviewplusguidelines)
* [Quick tips](https://github.com/thoughtbot/guides/tree/main/code-review)
* [Why review code?](https://smartbear.com/learn/code-review/why-review-code/)
## What to review
When developing a RAP, all code -- SQL and R -- should be subject to a review. If the RAP is too large to be handled in one review then it should be broken down into related chunks and assigned to multiple reviewers.
If making improvements to an existing RAP then only code that changes the behaviour of the RAP, or changes it's outputs should be reviewed. For example, making amendments to improve performance of code doesn't need to be reviewed. However, if changing a chart library or making changes to the data extracted from the data warehouse then a full review should be triggered.
## When to review
Reviews should only be triggered after all work is complete on a project or feature of the RAP that is being improved, and only after all automated checks (unit tests, style checks, etc.) have been completed and passed. The code should have been formally and informally tested by the developing statistician before it comes to a reviewer. That is, it will run without issue in the manner as intended. Don't submit for review if you don't know if the join you're using is faulty or not.
## Commits
As a developer, you should get into the habit of using commits little and often. Your commit messages can help a reviewer follow your logic for doing something or even how something works. They should always be written in the imperative as well -- fix bug, not fixed bug. e.g.
```
create visualisation total items
use NHSBSA highcharts functions to create line chart for total items
```
```
fix data manipulation bug
use group_map2() within mutate() in order to calculate rates
within groups correctly
```