Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SPONGE #182

Open
wants to merge 53 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
96863f2
removing creation of bindingsite matrix in majority vote
mweyrich28 Aug 28, 2024
a82d1f9
adding sponge for testing
mweyrich28 Aug 28, 2024
0548d99
changing process label for circ annotate to medium
mweyrich28 Aug 28, 2024
6fb5a6e
passing transcript counts through pipeline to sponge
mweyrich28 Aug 28, 2024
24f139b
adding sponge params
mweyrich28 Aug 28, 2024
e987ac1
adjusting output to match spongeEffects input
mweyrich28 Sep 5, 2024
77b52dc
adding sponge effects for testing
mweyrich28 Sep 5, 2024
e2aa28e
solving version conflict in container
mweyrich28 Sep 6, 2024
50628d5
removing deseq2 norm from sponge template
mweyrich28 Sep 11, 2024
d9144a3
adding additional deseq2 normalization process for tx counts
mweyrich28 Sep 11, 2024
ae37fb8
removing pacman call in sponge_effects
mweyrich28 Sep 11, 2024
20f8609
updating sponge_effects container to include gsva
mweyrich28 Sep 11, 2024
656c76e
adding and renaming deseq2_normalisation processes
mweyrich28 Sep 11, 2024
79bd298
removing redundant param
mweyrich28 Sep 12, 2024
d44046f
fixing input for tx_normalization
mweyrich28 Sep 12, 2024
8d685db
adding versions file to R script of tx_normalization
mweyrich28 Sep 14, 2024
aa2db97
formatting
mweyrich28 Sep 14, 2024
b04afc6
adding versions to output of sponge effects
mweyrich28 Sep 14, 2024
cb7f82f
adding tarpmir module
mweyrich28 Sep 14, 2024
9e9d380
including tarpmir unification in binding site prediction
mweyrich28 Sep 14, 2024
c2c64cf
adding tarpmir as tool in nf config
mweyrich28 Sep 14, 2024
f3aeba7
grouping mirna options together
mweyrich28 Sep 14, 2024
494dca3
fixing python version in majority_vote
mweyrich28 Sep 25, 2024
a7e0655
adding main pita process
mweyrich28 Sep 26, 2024
41d4413
adding adjusted pita template
mweyrich28 Sep 26, 2024
b7c9ea2
typo
mweyrich28 Sep 26, 2024
e321dc8
adding config for pita and adjusting config for tarpmir
mweyrich28 Sep 26, 2024
d57f234
adding pita libs and Bin
mweyrich28 Sep 26, 2024
41f29de
removing artifact in sponge.R
mweyrich28 Sep 26, 2024
1731391
adding tarpmir process
mweyrich28 Sep 26, 2024
b0fb25b
ordering imports
mweyrich28 Sep 26, 2024
25e71a8
adding pita and tarpmir to tool selection
mweyrich28 Sep 26, 2024
1373e39
reformatting
mweyrich28 Sep 26, 2024
e484058
adding SPONGE_EFFECTS versions
mweyrich28 Sep 26, 2024
7646471
adding pita in mirna_prediction for testing
mweyrich28 Sep 26, 2024
c07f510
adding pita params
mweyrich28 Sep 26, 2024
ad001a4
adding tarpmir params
mweyrich28 Sep 26, 2024
3c4acc9
adding missing ;
mweyrich28 Sep 26, 2024
533af26
fixing output of pita
mweyrich28 Sep 30, 2024
f79c6f4
adding versios.yml for pita_predict
mweyrich28 Sep 30, 2024
f7141df
adding unify pita in conf
mweyrich28 Sep 30, 2024
8412b1c
adding pita to mirna_bindingsites subworkflow
mweyrich28 Sep 30, 2024
8d0b8f2
formatting
mweyrich28 Oct 6, 2024
09b4d7e
adjusting mirna tool filter param
mweyrich28 Oct 6, 2024
cb2afba
removing outdated params
mweyrich28 Oct 6, 2024
2e66b06
Latest changes from Malte
nictru Jan 16, 2025
d1525e7
Re-enable everything in mirna_bindingsites subworkflow
nictru Jan 16, 2025
2075095
Improve mirna majority vote script
nictru Jan 16, 2025
bc7b7de
Improve SPONGE script
nictru Jan 16, 2025
705f2f4
Re-enable commented-out sections from mirna prediction
nictru Jan 16, 2025
7b93466
Improve mirna module structure
nictru Jan 19, 2025
c4b52a2
Add note on SPONGE execution
nictru Jan 19, 2025
5a35f99
Make ciriquant_de output optional
nictru Jan 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
44 changes: 42 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -819,6 +819,26 @@ process {
]
}

withName: UNIFY_TARPMIR {
ext.args = "-v FS='\\t' -v OFS='\\t' 'NR>1 { split(\$3, arr, \",\"); print \$1, \$2, arr[1], arr[2], \"tarpmir\" }'"
ext.suffix = "tarpmir.tsv"
publishDir = [
path: { "${params.outdir}/mirna_prediction/binding_sites/tools/tarpmir/unified" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: UNIFY_PITA {
ext.args = "-v FS='\\t' -v OFS='\\t' 'NR>1 { print \$2, \$1, \$3, \$4, \"pita\" }'"
ext.suffix = "pita.tsv"
publishDir = [
path: { "${params.outdir}/mirna_prediction/binding_sites/tools/pita/unified" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: COMBINE_BINDINGSITES {
ext.prefix = "bindingsites.tsv"
}
Expand Down Expand Up @@ -1042,6 +1062,26 @@ process {
]
}

withName: TARPMIR {
ext.prefix = { "${meta.id}.tarpmir" }
publishDir = [
path: { "${params.outdir}/mirna_prediction/binding_sites/tools/tarpmir/output" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
pattern: "*.bp"
]
}

withName: PITA {
ext.prefix = { "${meta.id}.pita" }
publishDir = [
path: { "${params.outdir}/mirna_prediction/binding_sites/tools/pita/output" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
pattern: "*.tab"
]
}

withName: MIRNA_TARGETS {
publishDir = [
path: { "${params.outdir}/mirna_prediction/binding_sites/targets" },
Expand All @@ -1059,15 +1099,15 @@ process {
]
}

withName: MAJORITY_VOTE {
withName: MIRNA_MAJORITYVOTE {
publishDir = [
path: { "${params.outdir}/mirna_prediction/binding_sites/majority_vote" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: '.*:MIRNA_PREDICTION:COMPUTE_CORRELATIONS' {
withName: '.*:MIRNA_PREDICTION:MIRNA_COMPUTECORRELATIONS' {
publishDir = [
path: { "${params.outdir}/mirna_prediction/correlation" },
mode: params.publish_dir_mode,
Expand Down
4 changes: 2 additions & 2 deletions modules/local/ciriquant/de/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ process CIRIQUANT_DE {
tuple val(meta), path(library), path(expression), path(gene)

output:
tuple val(meta), path("${circ_path}"), emit: circ
tuple val(meta), path("${gene_path}"), emit: gene
tuple val(meta), path("${circ_path}"), emit: circ, optional: true
tuple val(meta), path("${gene_path}"), emit: gene, optional: true
path "versions.yml", emit: versions

when:
Expand Down
1 change: 1 addition & 0 deletions modules/local/combinebeds/filter/templates/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ def format_yaml_like(data: dict, indent: int = 0) -> str:
continue
memberships = series.to_list()
dataset = upsetplot.from_memberships(memberships)
# TODO: Make this more robust for large datasets
upsetplot.plot(dataset,
orientation='horizontal',
show_counts=True,
Expand Down
32 changes: 32 additions & 0 deletions modules/local/deseq2/gene_normalization/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
process GENE_NORMALIZATION {
tag "$meta.id"
label 'process_single'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bioconductor-deseq2:1.34.0--r41hc247a5b_3' :
'biocontainers/bioconductor-deseq2:1.34.0--r41hc247a5b_3' }"

input:
tuple val(meta), path(counts)

output:
tuple val(meta), path("${meta.id}.normalized_counts.tsv"), emit: normalized
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
template 'gene_deseq_normalization.R'

stub:
"""
touch ${meta.id}.normalized_counts.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
bioconductor-deseq2: \$(Rscript -e "library(DESeq2); cat(as.character(packageVersion('DESeq2')))")
END_VERSIONS
"""
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/usr/bin/env Rscript

library(DESeq2)

raw_counts <- read.table("$counts", sep = "\\t", header = TRUE, stringsAsFactors = FALSE, check.names = FALSE)
raw_counts <- raw_counts[ , -2] # drop gene ids
rownames(raw_counts) <- raw_counts\$tx
data <- round(raw_counts[, -1])

samples <- colnames(raw_counts)[-c(1)]


transcript_names <- data.frame(tx = raw_counts\$tx, order = seq_len(nrow(raw_counts)))

# normalize using DeSeq2, Library Size Estimation
meta_data <- data.frame(samples)
row.names(meta_data) <- meta_data\$samples
all(colnames(data) %in% rownames(meta_data))
all(colnames(data) == rownames(meta_data))

dds <- DESeqDataSetFromMatrix(countData = data, colData = meta_data, design = ~ 1)
dds <- estimateSizeFactors(dds)
sizeFactors(dds)
normalized_counts <- DESeq2::counts(dds, normalized = TRUE)

# add tx IDs back to counts table
merged_data <- merge(transcript_names, normalized_counts,
by.x = "tx", by.y = "row.names")

merged_data <- merged_data[order(merged_data\$order), ]

norm_data <- subset(merged_data, select = -c(order))

write.table(norm_data, paste0("${meta.id}.normalized_counts.tsv"), quote = FALSE, sep = "\\t", row.names = FALSE)

# TODO: (Can be done later) Add support for Samplesheet so that we can eliminate batch effects


################################################
################################################
## VERSIONS FILE ##
################################################
################################################

r.version <- strsplit(version[['version.string']], ' ')[[1]][3]
deseq2.version <- as.character(packageVersion('DESeq2'))

writeLines(
c(
'"${task.process}":',
paste(' r-base:', r.version),
paste(' bioconductor-deseq2:', deseq2.version)
),
'versions.yml')
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
process DESEQ2_NORMALIZATION {
process MIRNA_NORMALIZATION {
tag "$meta.id"
label 'process_single'

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
process COMPUTE_CORRELATIONS {
process MIRNA_COMPUTECORRELATIONS {
tag "$meta.id"
label 'process_single'

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process MAJORITY_VOTE {
process MIRNA_MAJORITYVOTE {
tag "$meta.id"
label 'process_medium'
label 'process_high'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
Expand All @@ -11,9 +11,9 @@ process MAJORITY_VOTE {
tuple val(meta), path(bindingsites)

output:
tuple val(meta), path("${meta.id}.majority.tsv"), emit: tsv
tuple val(meta), path("${meta.id}.targets.tsv") , emit: targets
path "versions.yml" , emit: versions
tuple val(meta), path("${meta.id}.majority.tsv") , emit: tsv
tuple val(meta), path("${meta.id}.targets.tsv") , emit: targets
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when
Expand All @@ -25,6 +25,7 @@ process MAJORITY_VOTE {
stub:
"""
touch ${meta.id}.majority.tsv
touch ${meta.id}.targets.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
Original file line number Diff line number Diff line change
@@ -1,32 +1,25 @@
#!/usr/bin/env python3

import platform

import polars as pl
import yaml

paths = "${bindingsites}".split(" ")

df = pl.scan_csv(paths,
separator="\\t",
has_header=False,
new_columns=['mirna', 'target', 'start', 'end', 'tool'])

df = df.select(["mirna", "target", "tool"])
df = pl.scan_csv("*.tsv",
separator="\\t",
has_header=False,
new_columns=['mirna', 'target', 'start', 'end', 'tool'])

df = df.group_by(['mirna', 'target']).agg(pl.col("tool").n_unique())
df = df.select("mirna", "target", "tool")
df = df.group_by('mirna', 'target').agg(pl.col("tool").n_unique())

df = df.filter(pl.col("tool") > int("${min_tools}")) \
.select(["mirna", "target"])
df = df.filter(pl.col("tool") >= int("${min_tools}"))
df = df.select("mirna", "target")

df = df.collect()

df.write_csv('${meta.id}.majority.tsv', separator='\\t', include_header=False)

# Create targets file

df = df.group_by('mirna').agg(pl.col("target").str.concat(","))

df.write_csv('${meta.id}.targets.tsv', separator='\\t', include_header=False)

# Create version file
Expand All @@ -39,3 +32,4 @@

with open("versions.yml", "w") as f:
f.write(yaml.dump(versions))

Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ process MIRNA_TARGETS {
bedtools intersect -a targetscan.bed -b miranda.bed | awk '{print \$6}' > mirna_type

## remove duplicate miRNA entries at MRE sites.
## strategy: sory by circs, sort by start position, sort by site type - the goal is to take the best site type (i.e rank site type found at MRE site).
## strategy: sort by circs, sort by start position, sort by site type - the goal is to take the best site type (i.e rank site type found at MRE site).
paste ${prefix}.mirnas.tmp mirna_type | sort -k3n -k2n -k7r | awk -v OFS="\\t" '{print \$4,\$1,\$2,\$3,\$5,\$6,\$7}' | awk -F "\\t" '{if (!seen[\$1,\$2,\$3,\$4,\$5,\$6]++)print}' | sort -k1,1 -k3n > ${prefix}.mirna_targets.tmp
echo -e "circRNA\\tmiRNA\\tStart\\tEnd\\tScore\\tEnergy_KcalMol\\tSite_type" | cat - ${prefix}.mirna_targets.tmp > ${prefix}.mirna_targets.txt

Expand Down
34 changes: 34 additions & 0 deletions modules/local/pita/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
process PITA {
tag "$meta.id"
label 'process_high'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/ubuntu:20.04' :
'nf-core/ubuntu:20.04' }"

input:
tuple val(meta), path(fasta)
tuple val(meta2), path(mature)

output:
tuple val(meta), path("*.tab"), emit: tsv
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script:
template "pita_prediction.pl"

stub:
def prefix = task.ext.prefix ?: "${meta.id}"

"""
touch ${prefix}.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
END_VERSIONS
"""
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Ivo Hofacker (all parts)
Peter Stadler (design, documentation, cluster algorithms)
Walter Fontana (suboptimal folding, pre-historic implementation of mfe folding)
Stefan Wuchty (suboptimal folding)
19 changes: 19 additions & 0 deletions modules/local/pita/templates/Bin/ViennaRNA/ViennaRNA-1.6/COPYING
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Disclaimer and Copyright

The programs, library and source code of the Vienna RNA Package are free
software. They are distributed in the hope that they will be useful
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Permission is granted for research, educational, and commercial use
and modification so long as 1) the package and any derived works are not
redistributed for any fee, other than media costs, 2) proper credit is
given to the authors and the Institute for Theoretical Chemistry of the
University of Vienna.

If you want to include this software in a commercial product, please contact
the authors.

Note that the file ./lib/naview.c has its own copyright attached.
The ./Readseq/ directory contains a modified version of Don Gilbert's
public domain readseq program.
Loading
Loading