Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the advantages RDM brings to data analyis #26

Open
jcolomb opened this issue Sep 13, 2018 · 1 comment
Open

What are the advantages RDM brings to data analyis #26

jcolomb opened this issue Sep 13, 2018 · 1 comment
Labels
help wanted Extra attention is needed question Further information is requested rdm for data analysis

Comments

@jcolomb
Copy link
Member

jcolomb commented Sep 13, 2018

Status: published
Date: 13.09.2018
Your name: julien colomb
Your orcid: NA
license to apply to your comment: CC0
your position at the time your life story happened: researcher
Input = RDM for data analysis

Four RDM actions to ease data analysis

1: Make your data computer readable

Digital data can be quite easily transformed and analysed using programming language like R and python. While you do not have to learn these languages (yet), knowing what they require in terms or readability might save you time and efforts.

  • Tabular data/metadata shall be tidy
  • Keep your primary data (raw data) untouched (i.e. no copy/paste in raw data, NEVER)
  • If you have many datasets, make sure you are able to automate the file imports. An index of datasets may be a good practical solution.
  • Separate raw data, derived data, analysis and analysis results in different folders
  • Make sure to document each step of your analysis.

2: Fit your data format to its analyse (during data collection)

The analysis you will do (the statistics you wanna use as well as the software you will use) might require your data to be in a certain format, it will probably affect how much data you need to come to a robust conclusion and may even affect the number of variables you indeed need to record.

This is especially true for metadata and using an existing standard is easier than transforming what you collecting into that standard afterwards.

3: Plan for the unexpected

The data you collect today may be analysed in 2 years and published in 5. During that time, a lot can happen. People may prove that the analysis you planned is not fitted to your problem, or you may realise that a variable you did not plan to collect is crucial. Maybe a new dataset will appear that you will need to compare your own data to, or new people will help you with your project and need access to your data,...

Plan for your data to be re-usable. At best, get some colleague to watch your data and see if they can understand it.
The unexpected may also be good, maybe halfway in your tedious manual analysis, you will discover a way to automatize it. So keep track of links between raw and derived data.

4: Be specific: merging is easier than splitting

When recording variables, be as specific as you can. It is very easy to pool two categories into one but very difficult (and sometimes impossible) to separate a group during the analysis.

Similarly quantitative variables are easier to analyse than qualitative ones. You can always create categories from quantitative indications, not the way around.

As an example, if your question is "does obese mice make longer naps", record the mice weight not its category. Analysing a correlation between weight and length of naps is more powerful than having the two categories.

@jcolomb jcolomb added good first issue Good for newcomers question Further information is requested rdm for data analysis labels Sep 13, 2018
@jcolomb
Copy link
Member Author

jcolomb commented Sep 13, 2018

please comment, review and expand !

@jcolomb jcolomb added help wanted Extra attention is needed and removed good first issue Good for newcomers labels Sep 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested rdm for data analysis
Projects
None yet
Development

No branches or pull requests

1 participant