Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature POC: VALIDATE_MODEL #404

Merged
merged 14 commits into from
Dec 21, 2024

Conversation

alanmmobbs93
Copy link

@alanmmobbs93 alanmmobbs93 commented Dec 18, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Related to: #371

NEW FEATURE: VALIDATE_MODEL

GOAL
Create a validation step to cross information between the YML (models and contrast information) and the sample sheet.

Features:

Inputs:

  • YML
  • Samplesheet
  • sample_id_col

Outputs:

  • Validated phenotypic table that contains only the columns that were required from the YML file
  • models.json that contains info about full rank models
  • Warnings JSON file, in case we'd like to report it in the workflow.

Functionalities:

  • All variables included in the YML file must be present in the sample sheet as column (first component of the contrast definitions, a better solution would be to detect it from the formula if we decide to keep it in the yml).
  • Blocking factors are also checked for existence.
  • All levels declared for the variables (extracted from contrast field) must be present in the column. If there are more levels in the samplesheet, they are reported with a warning.
  • Control special characters.
  • Control missing values.
  • Models are constructed with base R functions, and contrast are generated in order to check if the models will be full ranked or not. In case they are not, warnings are generated. We can decide whether to report this at nextflow level or not.

Perspectives:

  • Easy to include more fields if they are added to the yml.

Testing

  • I'd like to, and invite everyone, to test it with real cases to check the flexibility in reading variables from the YML file and finding real errors.
  • Some errors should be found in the YML validation previous to this one.
  • The local module and basic nf-test was also added for future changes and easy comparison during development.

Test

The following example files are part of the nf-test that can be executed as declared below. They were obtained from the pipeline's test profile.

nf-test test modules/local/validatemodel/tests/main.nf.test --debug --profile docker

Example YML
This fake yml file was generated after the reference contrast file. It's (temporary) located within the tests/ folder of the module.

models:
  - formula: "~ treatment"
    contrasts:
      - id: "treatment_mCherry_hND6"
        comparison: ["treatment", "mCherry", "hND6"]

      - id: "treatment_mCherry_hND6_sample_number"
        comparison: ["treatment", "mCherry", "hND6"]
        blocking_factors: ["sample_number"]

      - id: "treatment234"
        comparison: ["treatment", "mCherry", "hND6"]

Note: Check that I added the "formula" field, compared to @nschcolnicov POC for the YML validation. The script uses it to iterate over, and adds the blocking factors when required. But it can be adjusted if we want to remove it. If we decide to keep the formula, it will simplify the comparison field by removing the first part.

Example sample sheet
Matching sample sheet can be found in:

https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.samplesheet.csv

Run script
Make the script executable and run it:

validate_model.R \
        --yml path/to/yml \
        --samplesheet path/to/samplesheet \
        --sample_id_col 'sample'

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Copy link

github-actions bot commented Dec 18, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 6ed0793

+| ✅ 301 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-12-21 16:08:10

Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it a lot! Need to check the R code in more detail though.

@alanmmobbs93
Copy link
Author

Update:

Now the code was updated to read a simpler yml file

contrasts:
  - id: "treatment_mCherry_hND6"
    comparison: ["treatment", "mCherry", "hND6"]

  - id: "treatment_mCherry_hND6_sample_number"
    comparison: ["treatment", "mCherry", "hND6"]
    blocking_factors: ["sample_number"]

  - id: "treatment234"
    comparison: ["treatment", "mCherry", "hND6"]

@alanmmobbs93 alanmmobbs93 changed the base branch from dev to dev_tmp December 20, 2024 13:15
@alanmmobbs93 alanmmobbs93 marked this pull request as ready for review December 20, 2024 13:25
Copy link

@nschcolnicov nschcolnicov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, have some comments for you that can be addressed or not, minimal things that don't affect functionality

modules/local/validatemodel/tests/contrasts.yml Outdated Show resolved Hide resolved
workflows/differentialabundance.nf Show resolved Hide resolved
@alanmmobbs93 alanmmobbs93 merged commit b20337c into nf-core:dev_tmp Dec 21, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants