What are the advantages RDM brings to data analyis #26
Labels
help wanted
Extra attention is needed
question
Further information is requested
rdm for data analysis
Status: published
Date: 13.09.2018
Your name: julien colomb
Your orcid: NA
license to apply to your comment: CC0
your position at the time your life story happened: researcher
Input = RDM for data analysis
Four RDM actions to ease data analysis
1: Make your data computer readable
Digital data can be quite easily transformed and analysed using programming language like R and python. While you do not have to learn these languages (yet), knowing what they require in terms or readability might save you time and efforts.
2: Fit your data format to its analyse (during data collection)
The analysis you will do (the statistics you wanna use as well as the software you will use) might require your data to be in a certain format, it will probably affect how much data you need to come to a robust conclusion and may even affect the number of variables you indeed need to record.
This is especially true for metadata and using an existing standard is easier than transforming what you collecting into that standard afterwards.
3: Plan for the unexpected
The data you collect today may be analysed in 2 years and published in 5. During that time, a lot can happen. People may prove that the analysis you planned is not fitted to your problem, or you may realise that a variable you did not plan to collect is crucial. Maybe a new dataset will appear that you will need to compare your own data to, or new people will help you with your project and need access to your data,...
Plan for your data to be re-usable. At best, get some colleague to watch your data and see if they can understand it.
The unexpected may also be good, maybe halfway in your tedious manual analysis, you will discover a way to automatize it. So keep track of links between raw and derived data.
4: Be specific: merging is easier than splitting
When recording variables, be as specific as you can. It is very easy to pool two categories into one but very difficult (and sometimes impossible) to separate a group during the analysis.
Similarly quantitative variables are easier to analyse than qualitative ones. You can always create categories from quantitative indications, not the way around.
As an example, if your question is "does obese mice make longer naps", record the mice weight not its category. Analysing a correlation between weight and length of naps is more powerful than having the two categories.
The text was updated successfully, but these errors were encountered: