-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy path07-demonstrably_correct.Rmd
50 lines (35 loc) · 2.62 KB
/
07-demonstrably_correct.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Be Demonstrably Correct {#demonstrably_correct}
(ref:demonstrablycorrect-intro)
**You Must** - (ref:demonstrablycorrect-must)
**You Should** - (ref:demonstrablycorrect-should)
**You Could** - (ref:demonstrablycorrect-could)
|Related Areas: | [Version Control](#version_control) <br> [Be Reproducible](#reproducible) |
|--------------- |------------------------------------------------------------|
## Quality Assurance Applies to Code {#qa_code_too}
Just because you have written code rather then making a spreadsheet doesnt mean your analysis is correct.
Code is not exempt from Quality Assurance processes.
As with any other analysis you need to record evidence that your code is:
* doing the right thing
* using valid inputs
* producing a sensible answer.
You should refer to the Analytical Modelling Oversight Committee (AMOC) [Quality Assurance (QA) materials](https://dhexchange.kahootz.com/connect.ti/Testanalysts/view?objectId=13590448) which contain useful prompts and frameworks for quality assuring analysis of different size, importance and complexity.
## Testing Frameworks {#unit_tests}
Your code and analysis will grow and evolve. You wont have time to QA every version, and it can be tricky to keep track of which bits of QA have been made obsolete due to new or changed code.
There are frameworks which help you construct and run tests on units of your code. These can be a good way to demonstrate that code is working correctly as you update it.
See [R at DHSC](#r_tests) and [Python at DHSC](#py_tests) for more details.
## Version Control Integration
Having unit tests, and QA is good. Ideally however you can tie a particular result to a particular version of the QA'd and tested code. You _could_ do this manually, by keeping the code for each set of outputs.
Using git for [version control](#version_control) makes this process easy. You can:
* Make a commit to the repository with a note like: `output for XYZ on dd/mm/yy` so you can identify the version used to produce outputs in future.
* Use tools such as
[gitpython](https://gitpython.readthedocs.io/en/stable/)
or
[git2r](https://docs.ropensci.org/git2r/)
to include the
[git commit hash](https://git-scm.com/book/en/v2/Git-Basics-Viewing-the-Commit-History)
which identifies the current version of the code in the output.
This can then later be retrieved from your git repository.
## Reproducible Analytical Pipelines {#rap_demonstrable}
Once you have some QA'd, version controlled and test covered code, the biggest source of error will be the manual steps performed by the analyst running it.
You can eliminate a lot of this, see
[Reproducible Analytical Pipelines](#rap).