Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create nf-core module for immunedeconv and add to pipeline #368

Open
Tracked by #361
grst opened this issue Dec 3, 2024 · 5 comments
Open
Tracked by #361

Create nf-core module for immunedeconv and add to pipeline #368

grst opened this issue Dec 3, 2024 · 5 comments
Assignees

Comments

@grst
Copy link
Member

grst commented Dec 3, 2024

immunedeconv supports cell-type deconvolution of bulk RNA-seq data given predefined sets of immune cell signatures.

@grst grst changed the title immunedeconv Create nf-core module for immunedeconv Dec 3, 2024
@grst grst changed the title Create nf-core module for immunedeconv Create nf-core module for immunedeconv and add to pipeline Dec 3, 2024
@nschcolnicov nschcolnicov self-assigned this Dec 10, 2024
@nschcolnicov
Copy link

HI @grst I created a POC for this module: #390
Let me know what you think.
If we provide a matrix file with the exact correct format, the module works. But it has some specific requirements which are not met by the test matrix datasets from differential abundance, these are the ones I spotted while building the POC:

  • Having only the first column be a string column, while differentialabundance matrix files contain two.
  • Values must be TPM transformed, not log-transformed.

I wanted to get your input on whether if you think this is something that should be solved by the module, by adding a section that validate the columns; and adding a warning telling the user to make sure that the data is TPM transformed.

@grst
Copy link
Member Author

grst commented Dec 11, 2024

Thanks!

I wouldn't do any transformations of the data inside the module, but having checks can't harm.

while differentialabundance matrix files contain two.

IIRC it could have any number of string columns, they can be set by a pipeline parameter.

A completely different approach would be to not make a separate module but go with a custom quarto report instead? Not sure if that would be acceptable for this pipeline...

@nschcolnicov
Copy link

@grst I added a check for log and TPM transformation, it won't crash the execution, but it will warn the user if the data seems to be log transformed, or if it doesn't look TPM transformed.
I also added a section in the script so that the user can filter the matrix based on the features_name_col parameter which is already used for similar tasks.

PR for this: #390

The one thing that I'm seeing is that, even though the module works for the immunedeconv test data, it doesn't work for none of the matrices in our test data, they all raise this error:

Error in check_signature(sig.mat, mix.mat) : 
      No match found between signature genes and identifiers provided in the expression data!
    Calls: <Anonymous> ... deconvolute_quantiseq -> <Anonymous> -> check_signature
    Execution halted

This doesn't seem to be an issue with the script, but it seems to be an issue with the actual data. Do you know what this may be?

@grst
Copy link
Member Author

grst commented Dec 20, 2024

I suspect that the data uses ENSG identifiers, while immunedeconv requires gene symbols.

In the differentialabundance pipeline, it's possible to specify a column name that contains gene symbols when ENGS are the primary identifiers. I see several options

(1) Document that the module requires gene symbols and leave it to the pipeline to provide them
(2) Allow an additional column in the gene expression matrix that contains gene symbols (and provide an option to choose that column)
(3) Perform gene identifier remapping within the module.

I'd lean to (1) or maybe (2) but I'll leave it for you to decide what makes most sense also from the perspective of integrating it into the pipeline.

@nschcolnicov
Copy link

Pipeline PR: #390
Depends on modules PR: nf-core/modules#7262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants