Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PRE REVIEW]: target-methylseq-qc: a lightweight pipeline for collecting metrics from targeted sequence mapping files. #7238

Closed
editorialbot opened this issue Sep 16, 2024 · 67 comments
Assignees
Labels
Groovy Nextflow pre-review TeX Track: 2 (BCM) Biomedical Engineering, Biosciences, Chemistry, and Materials

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Sep 16, 2024

Submitting author: @abhi18av (Abhinav Sharma)
Repository: https://github.com/wal-yan/target-methylseq-qc
Branch with paper.md (empty if default branch): master
Version: v2.0.0
Editor: @csoneson
Reviewers: @telatin, @sridhar0605
Managing EiC: Kevin M. Moerman

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/220574eb92e0a5c64f58eb092dfd399a"><img src="https://joss.theoj.org/papers/220574eb92e0a5c64f58eb092dfd399a/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/220574eb92e0a5c64f58eb092dfd399a/status.svg)](https://joss.theoj.org/papers/220574eb92e0a5c64f58eb092dfd399a)

Author instructions

Thanks for submitting your paper to JOSS @abhi18av. Currently, there isn't a JOSS editor assigned to your paper.

@abhi18av if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). You can search the list of people that have already agreed to review and may be suitable for this submission.

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands
@editorialbot editorialbot added pre-review Track: 2 (BCM) Biomedical Engineering, Biosciences, Chemistry, and Materials labels Sep 16, 2024
@editorialbot
Copy link
Collaborator Author

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1101/2023.04.29.23289314 is OK
- 10.1038/nbt.3820 is OK
- 10.1038/s41592-018-0046-7 is OK
- 10.1093/bioinformatics/btx192 is OK
- 10.5281/zenodo.10463781 is OK
- 10.1093/gigascience/giab008 is OK
- 10.1101/gr.107524.110 is OK
- 10.1038/nbt.3820 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1038/s41587-020-0439-x is OK
- 10.1093/bioinformatics/btq033 is OK
- 10.5281/zenodo.13147688 is OK
- 10.5281/zenodo.13601364 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: The nf-core framework for community-curated bioinf...
- No DOI given, and none found for title: CreateSequenceDictionary (Picard)
- No DOI given, and none found for title: Picard toolkit
- No DOI given, and none found for title: CollectHsMetrics (Picard)
- No DOI given, and none found for title: CollectMultipleMetrics (Picard)
- No DOI given, and none found for title: HTS format specifications
- No DOI given, and none found for title: Babraham Bioinformatics - FastQC A Quality Control...
- No DOI given, and none found for title: Twist Methylome
- No DOI given, and none found for title: Twist Methylome
- No DOI given, and none found for title: target-methylseq-qc website

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.90  T=0.05 s (1532.4 files/s, 244651.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CSS                              5             39             20           2238
JavaScript                      11            235            226           2112
SVG                              3              3              3           2081
HTML                             4             53             10           1537
YAML                            27             74             30            905
JSON                             7              2              0            635
XML                              2              0              0            518
Markdown                         9            295              0            494
Groovy                           4             76            103            354
TeX                              1             31              0            339
Python                           2             61             90            183
CSV                              3              0              0             10
TOML                             1              1              2              7
Bourne Shell                     1              0              0              5
-------------------------------------------------------------------------------
SUM:                            80            870            484          11418
-------------------------------------------------------------------------------

Commit count by author:

   111	Abhinav Sharma
     1	Patricia
     1	t4ly4

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 1753

✅ The paper includes a Statement of need section

@editorialbot
Copy link
Collaborator Author

License info:

✅ License found: MIT License (Valid open source OSI approved license)

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@editorialbot
Copy link
Collaborator Author

Five most similar historical JOSS papers:

Acanthophis: a comprehensive plant hologenomics pipeline
Submitting author: @kdm9
Handling editor: @marcosvital (Active)
Reviewers: @bricoletc, @gbouras13, @abhishektiwari
Similarity score: 0.7228

MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis
Submitting author: @ParkvilleData
Handling editor: @jmschrei (Active)
Reviewers: @Ebedthan, @rjorton
Similarity score: 0.7062

nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline
Submitting author: @ZeyuanSong
Handling editor: @lpantano (Active)
Reviewers: @preetida, @rspirgel
Similarity score: 0.6955

CheckQC: Quick quality control of Illumina sequencing runs
Submitting author: @johandahlberg
Handling editor: @pjotrp (Retired)
Reviewers: @brainstorm
Similarity score: 0.6866

Koverage: Read-coverage analysis for massive (meta)genomics datasets
Submitting author: @beardymcjohnface
Handling editor: @csoneson (Active)
Reviewers: @lparsons, @telatin
Similarity score: 0.6764

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

@abhi18av
Copy link

CC @agudeloromero @t4ly4

@Kevin-Mattheus-Moerman
Copy link
Member

Kevin-Mattheus-Moerman commented Sep 20, 2024

@abhi18av Dear author, thanks for this submission. I am the AEiC on this track and here to help process the initial steps. Before we proceed, please can you have a look at the following points:

  • Please study the above reference check ☝️ and see if you can address any of the reported potential DOI issues. You can add/amend DOI entries in your .bib file, and call @editorialbot check references here to check them again.
  • I may have missed it, but can you confirm this project features (automated) testing? If so it may be good to link to this in the README.
  • Could you help me understand the above code report, and potentially add to it in terms of nextflow contributions? What aspects of the report would you say is your core achievement/new contribution? In addition, can you help estimate the lines of code number for the nextflow work? We ask this as some tools exist e.g. to automatically generate JavaScript GUI related code for instance. So any help to judge the "weight/size" of this submission would be appreciated.

@Kevin-Mattheus-Moerman
Copy link
Member

@editorialbot invite @csoneson as editor

@editorialbot
Copy link
Collaborator Author

Invitation to edit this submission sent!

@csoneson
Copy link
Member

In principle I'm happy to edit this, but would like to first wait for the author's responses to @Kevin-Mattheus-Moerman's queries above.

@abhi18av
Copy link

abhi18av commented Sep 23, 2024

@abhi18av Dear author, thanks for this submission. I am the AEiC on this track and here to help process the initial steps. Before we proceed, please can you have a look at the following points:

Dear @Kevin-Mattheus-Moerman and @csoneson , thank you for your time to evaluate this manuscript!

I have addressed the comments inline.

  • ✅ Please study the above reference check ☝️ and see if you can address any of the reported potential DOI issues. You can add/amend DOI entries in your .bib file, and call @editorialbot check references here to check them again.

Sure, I have updated the DOI for a few more citations, however as some entries don't have an associated publication (such as picard) or there's no consensus on how to cite (such as HTS specification samtools/hts-specs#179 ), I have simply used the @online bib resource annotation for those.

If there's a better way or a JOSS convention to address this, please let us know and we will be happy to accommodate.

  • ✅ I may have missed it, but can you confirm this project features (automated) testing? If so it may be good to link to this in the README.

Ah yes, there are a bunch of Github actions in the repo which are triggered upon relevant events.

In addition, I have added an explanation in the REAMDE for the bundled test dataset which we provide to users for quick testing https://github.com/wal-yan/target-methylseq-qc?tab=readme-ov-file#testing .

  • ✅ Could you help me understand the above code report, and potentially add to it in terms of nextflow contributions? What aspects of the report would you say is your core achievement/new contribution? In addition, can you help estimate the lines of code number for the nextflow work? We ask this as some tools exist e.g. to automatically generate JavaScript GUI related code for instance. So any help to judge the "weight/size" of this submission would be appreciated.

In terms of the cloc report from #7238 (comment), I must say that numbers hide the overall big picture, but thank you for raising this.

The principle changes regarding the implementation logic are of course in the Nextflow/Groovy layer, however as Nextflow is just the DSL for the orchestration of tasks, we have worked on other layers/languages as well.

The samplesheet check (written in Python) is specific to this pipeline and checks for the overall validity of the samplesheet as a pre-flight check, in addition to the test samplesheet files in CSV format (assets/test_samplesheet_bed_filter.csv and assets/test_samplesheet_picard_profiler.csv.

Furthermore, once the analysis is done, the generated results are merged and pushed to MultiQC which relies on a customized YAML file (assets/multiqc_config.yml) in order to present the principal summary report.

Finally in terms of the UI for Nextflow Schema renderers, the JSON format has been customized to reflect the principal parameters of the pipeline corresponding to different modes.

I must also highlight that the creation of test_* profiles is done in conf/*config scripts which are also Groovy/Nextflow scripts but do not get picked up by cloc as any language in the overall counts.

Therefore, kindly take this into consideration 🙏

image

EDIT (24-09-2024)

Actually, I have realized that programs like cloc, tokei or scc do NOT take Nextflow code into account. Here's a test

image

Therefore none of the Nextflow (or config) files from the core implementation shows up

image

@abhi18av
Copy link

@editorialbot check references

@abhi18av
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1101/2023.04.29.23289314 is OK
- 10.1038/nbt.3820 is OK
- 10.1038/s41587-020-0439-x is OK
- 10.1038/s41592-018-0046-7 is OK
- 10.1093/bioinformatics/btx192 is OK
- 10.5281/zenodo.10463781 is OK
- 10.1093/gigascience/giab008 is OK
- 10.1101/gr.107524.110 is OK
- 10.1038/nbt.3820 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1038/s41587-020-0439-x is OK
- 10.1093/bioinformatics/btq033 is OK
- 10.5281/zenodo.8251379 is OK
- 10.5281/zenodo.13597863 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: CreateSequenceDictionary (Picard)
- No DOI given, and none found for title: Picard toolkit
- No DOI given, and none found for title: CollectHsMetrics (Picard)
- No DOI given, and none found for title: CollectMultipleMetrics (Picard)
- No DOI given, and none found for title: HTS format specifications
- No DOI given, and none found for title: Babraham Bioinformatics - FastQC A Quality Control...
- No DOI given, and none found for title: Twist Methylome
- No DOI given, and none found for title: Twist Methylome
- No DOI given, and none found for title: target-methylseq-qc website

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@editorialbot
Copy link
Collaborator Author

Five most similar historical JOSS papers:

Acanthophis: a comprehensive plant hologenomics pipeline
Submitting author: @kdm9
Handling editor: @marcosvital (Active)
Reviewers: @bricoletc, @gbouras13, @abhishektiwari
Similarity score: 0.7247

MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis
Submitting author: @ParkvilleData
Handling editor: @jmschrei (Active)
Reviewers: @Ebedthan, @rjorton
Similarity score: 0.7067

nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline
Submitting author: @ZeyuanSong
Handling editor: @lpantano (Active)
Reviewers: @preetida, @rspirgel
Similarity score: 0.6974

CheckQC: Quick quality control of Illumina sequencing runs
Submitting author: @johandahlberg
Handling editor: @pjotrp (Retired)
Reviewers: @brainstorm
Similarity score: 0.6879

RNAsik: A Pipeline for complete and reproducible RNA-seq analysis that runs anywhere with speed and ease
Submitting author: @serine
Handling editor: @pjotrp (Retired)
Reviewers: @andrewyatz
Similarity score: 0.6763

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

@Kevin-Mattheus-Moerman
Copy link
Member

@editorialbot query scope

@editorialbot
Copy link
Collaborator Author

Submission flagged for editorial review.

@editorialbot editorialbot added the query-scope Submissions of uncertain scope for JOSS label Sep 23, 2024
@Kevin-Mattheus-Moerman
Copy link
Member

@abhi18av thanks for providing those additional details. I have just flagged this submission for a scope review by our editorial board. This is because I need some help to determine if this work is in scope, and if the pipeline/workflow you present meets our substantial scholarly effort criterion.

The scope review should take about 2 weeks to complete.

@abhi18av
Copy link

Hi @Kevin-Mattheus-Moerman ,

I was trying to understand cloc better and I've edited my response here #7238 (comment) to note that cloc ( or tokei etc) do not take Nextflow files into account therefore the principal implementation logic (in .nf and .config files) is not really included in the line count reports unfortunately 🤷

@Kevin-Mattheus-Moerman
Copy link
Member

@abhi18av yes I'm aware cloc doesn't count nextflow code lines. Hence I asked you to elaborate. If you can help count/estimate these lines yourself that would be helpful.

@abhi18av
Copy link

abhi18av commented Oct 1, 2024

Hi @Kevin-Mattheus-Moerman , apologies for the late response.

Sure, I am happy to provide the line counts for the Nextflow nf and config files.

The total number of lines for Nextflow specific code is 1381.

Tools used

  1. Powershell for scripting
  2. fd command
  3. wc -l command

Here's my method to compute nextflow code lines

  1. Find all nf | config files from project root.
wal-yan-target-methylseq-qc  🍣 master 🅒 base

+  p$ fd "nf$|config$"  -t f
conf/base.config
conf/modules.config
conf/test_bed_filter.config
conf/test_picard_profiler.config
main.nf
modules/local/samplesheet_check.nf
modules/nf-core/bedtools/intersect/main.nf
modules/nf-core/custom/dumpsoftwareversions/main.nf
modules/nf-core/fastqc/main.nf
modules/nf-core/multiqc/main.nf
modules/nf-core/picard/collecthsmetrics/main.nf
modules/nf-core/picard/collectmultiplemetrics/main.nf
modules/nf-core/picard/createsequencedictionary/main.nf
modules/nf-core/samtools/faidx/main.nf
modules/nf-core/samtools/index/main.nf
nextflow.config
subworkflows/local/input_check.nf
workflows/bed_filter.nf
workflows/picard_profiler.nf
  1. Save these files in a variable $nextflowSourceFiles
$nextflowSourceFiles = fd "nf$|config$"  -t f
  1. Iterate upon this file list and execute wc -l
wal-yan-target-methylseq-qc  🍣 master 🅒 base
+  p$ foreach ($f in $nextflowSourceFiles ) { wc -l $f }

65 conf/base.config
50 conf/modules.config
30 conf/test_bed_filter.config
32 conf/test_picard_profiler.config
81 main.nf
31 modules/local/samplesheet_check.nf
39 modules/nf-core/bedtools/intersect/main.nf
24 modules/nf-core/custom/dumpsoftwareversions/main.nf
51 modules/nf-core/fastqc/main.nf
53 modules/nf-core/multiqc/main.nf
83 modules/nf-core/picard/collecthsmetrics/main.nf
67 modules/nf-core/picard/collectmultiplemetrics/main.nf
44 modules/nf-core/picard/createsequencedictionary/main.nf
50 modules/nf-core/samtools/faidx/main.nf
48 modules/nf-core/samtools/index/main.nf
264 nextflow.config
42 subworkflows/local/input_check.nf
140 workflows/bed_filter.nf
187 workflows/picard_profiler.nf

  1. Update the loop for adding the line-counts in a $sum variable
wal-yan-target-methylseq-qc  🍣 master 🅒 base
+  p$ $sum=0; foreach ($f in $nextflowSourceFiles ) { $sum += $(wc -l $f).split(" ")[0] }; $sum

1381

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 1662

✅ The paper includes a Statement of need section

@editorialbot
Copy link
Collaborator Author

License info:

✅ License found: MIT License (Valid open source OSI approved license)

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@editorialbot
Copy link
Collaborator Author

Five most similar historical JOSS papers:

Acanthophis: a comprehensive plant hologenomics pipeline
Submitting author: @kdm9
Handling editor: @marcosvital (Active)
Reviewers: @bricoletc, @gbouras13, @abhishektiwari
Similarity score: 0.7200

MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis
Submitting author: @ParkvilleData
Handling editor: @jmschrei (Active)
Reviewers: @Ebedthan, @rjorton
Similarity score: 0.7057

nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline
Submitting author: @ZeyuanSong
Handling editor: @lpantano (Active)
Reviewers: @preetida, @rspirgel
Similarity score: 0.6922

CheckQC: Quick quality control of Illumina sequencing runs
Submitting author: @johandahlberg
Handling editor: @pjotrp (Retired)
Reviewers: @brainstorm
Similarity score: 0.6854

Koverage: Read-coverage analysis for massive (meta)genomics datasets
Submitting author: @beardymcjohnface
Handling editor: @csoneson (Active)
Reviewers: @lparsons, @telatin
Similarity score: 0.6730

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

@abhi18av
Copy link

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS. (From #7238 (comment))

Hi @csoneson, just to confirm something regarding the above message, do we need to propose some reviewers ourselves or this is already in progress?

@csoneson
Copy link
Member

@abhi18av - apologies, it's been a busy week and I didn't get to this yet. You can suggest reviewers if you'd like, but it's not a requirement. I hope to start reaching out to potential reviewers very soon.

@abhi18av
Copy link

No worries @csoneson , just wanted to confirm if that we are not blocking the process as per the requirement.

We are happy to follow your lead, thank you!

@csoneson
Copy link
Member

👋🏻 @Juke34, @rcannood, @ZeyuanSong - would you be interested in reviewing this submission for the Journal of Open Source Software (JOSS)?

target-methylseq-qc: a lightweight pipeline for collecting metrics from targeted sequence mapping files
#7238

More information about the review process can be found here. Thanks in advance!

@rcannood
Copy link

rcannood commented Dec 1, 2024

Would love to but I'm swamped at the moment.

@csoneson
Copy link
Member

csoneson commented Dec 1, 2024

@rcannood No worries, thanks for your quick response!

@Juke34
Copy link

Juke34 commented Dec 2, 2024

Swamped too, and already involved in a review process for another journal. I doubt to find any time before next year.

@csoneson
Copy link
Member

csoneson commented Dec 2, 2024

@Juke34 No worries, thanks for letting me know!

@csoneson
Copy link
Member

csoneson commented Dec 5, 2024

👋🏻 @sridhar0605, @beardymcjohnface, @lparsons, @telatin, @camillescott - would two of you be interested in reviewing this submission for the Journal of Open Source Software (JOSS)?

target-methylseq-qc: a lightweight pipeline for collecting metrics from targeted sequence mapping files
#7238

More information about the review process can be found here. Thanks in advance!

@telatin
Copy link

telatin commented Dec 5, 2024

Hello @csoneson, I'm interested, but considering the holiday break, this might take 3 to 4 weeks to be completed. Let me know if this can work for you :)

@csoneson
Copy link
Member

csoneson commented Dec 6, 2024

Thanks @telatin - that sounds great. I will assign you now, and start the review issue once we have secured one more reviewer.

@csoneson
Copy link
Member

csoneson commented Dec 6, 2024

@editorialbot add @telatin as reviewer

@editorialbot
Copy link
Collaborator Author

@telatin added to the reviewers list!

@csoneson
Copy link
Member

👋🏻 @sridhar0605, @beardymcjohnface, @lparsons, @camillescott - would you be interested in reviewing this submission for the Journal of Open Source Software (JOSS)?

target-methylseq-qc: a lightweight pipeline for collecting metrics from targeted sequence mapping files
#7238

More information about the review process can be found here. Thanks in advance!

@csoneson
Copy link
Member

👋🏻 @daissi, @kdm9, @johandahlberg - would one of you be interested in reviewing this submission for the Journal of Open Source Software (JOSS)?

target-methylseq-qc: a lightweight pipeline for collecting metrics from targeted sequence mapping files
#7238

More information about the review process can be found here. Thanks in advance!

@johandahlberg
Copy link

@csoneson thank you for asking. I am unable to take this on at this time.

@sridhar0605
Copy link

@csoneson happy to review this. Please expect some delays in responses.
Thanks!

@kdm9
Copy link

kdm9 commented Dec 19, 2024

I would be happy to review, however will be away from the keyboard until mid February 2025, which I assume is too late.

@csoneson
Copy link
Member

@sridhar0605, @kdm9 - thank you for your responses!

@sridhar0605 - I will assign you and open the review issue.
@kdm9 - thanks a lot for your willingness to review! Since we now have two reviewers I think I will move ahead, but I hope we can come back to you for another review in the future 🙂

@csoneson
Copy link
Member

@editorialbot add @sridhar0605 as reviewer

@editorialbot
Copy link
Collaborator Author

@sridhar0605 added to the reviewers list!

@csoneson
Copy link
Member

@editorialbot start review

@editorialbot
Copy link
Collaborator Author

OK, I've started the review over in #7608.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groovy Nextflow pre-review TeX Track: 2 (BCM) Biomedical Engineering, Biosciences, Chemistry, and Materials
Projects
None yet
Development

No branches or pull requests

10 participants