Computers are powerful tools for researchers in the life sciences however, scientists are not usually trained in software development. To create reproducible, sound and clear computational research, scientists should follow already established best practices. These practices are oriented towards making research easier to reproduce and scientifically sound.
The following standards are strongly based on the below three articles:
-
Best Practices for Scientific Computing [DOI: 10.1371/journal.pbio.1001745]
-
Ten Simple Rules for Reproducible Computational Research [DOI: 10.1371/journal.pcbi.1003285]
-
Ten Simple Rules for the Open Development of Scientific Software [DOI: 10.1371/journal.pcbi.1002802]
- Write programs for people, not computers. Clarity and readability should be allways emphasized
- Use informative variable names, avoid using names of existing functions (e.g. print) and key words (e.g. import)
- Variables should be nouns where as functions should be verbs
- Be consistent with your indentation. Use spaces or tabs but not both
- Use informative comments that explain what you are doing but do not "state the obvious"
- Separate names with underscores instead of dots (
variable_1
notvariable.1
). One can also useCamelCase
. - Avoid writing long sections of code, separate the code into mutiple blocks (each doing a specific task)
- Use meaningful file names and the correct extensions (
.R
not.r
) - Surround binary operators (=, ==, >, <, etc) with whitespaces (
x = y
notx=y
) - Use spaces always after a comma, but never before a comma (
colour = (green, red, blue)
notcolour = (green,red,blue)
orcolour = (green , red , blue)
)
- Do not "reinvent the wheel"
- Check if others have come up with a verified solution to your problem before writing your own scripts
- Use standard, well known extensions/packages instead of obscures ones, standard packages are more likely to be tested and maintained
- Make incremental changes
- When writing long code, divide your task in smaller parts that can be performed by custom functions. This is known as refactoring your code. Create these functions and test them individually. It will make the whole program/script easier to debug and read
- Plan for mistakes
- Use assertion functions, and internal tests. Do not expect the user to give you the right input
- Test your scripts with example data, and try to include positive and negative standards. Save the test files
- Optimize software only after it works correctly. Work on code readability first, then your algorithm, then optimize.
- Document your code's design and purpose, not its mechanics
- Backup your data! Store at minimum your raw data and the scripts used to generate the analysis results
For Python, follow PEP8 standard. For R use this standard.
Save scripts and sessions. If revising old results, open the saved session instead of running the script again, this will save time and avoid changing the results if a random step is involved in your script.
At the end of your scripts add the following lines to obtain the version of R you are using and the versions of the libraries used:
R.Version()
sessionInfo()
You can save that information at the end of your final script, or send it directly to a file using:
writeLines(capture.output(R.Version()), "R_version_info.txt")
writeLines(capture.output(sessionInfo()), "R_session_info.txt")
Create and use a Github repository to store your working code and keep track its changes. This is helpful for reproducing past results, restoring lost function after drastic code changes, and is essential if you are developing software as a team.
Note: Never push your raw data into a repository, only push test data and your scripts!
Software carpentry has very clear and easy video tutorials on Unix, Python, and version control
Notepad++
This Windows program is very useful to edit codes because it will tell you visually using colors if you have errors in indentation and closing brackets/parenthesis/brace. Download it here
TextWrangler
Rough equivelent of Notepad++ on MAC OSX X. Download it here
Rstudio
An integrated development environment for developing R scripts and interacting with the R statistical environment. Download it here