-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of the SCPCP000006 Wilms tumor dataset #857
Comments
Dear DataLab team, I am re-opening this issue as I am working on the integration fo the Wilms tumor dataset now 😄 Regarding Ideally, I would like to minimally modify the I saw that you used a Thank you in advance! |
Hi @maud-p! I have a few thoughts/questions for you in response -
|
@maud-p One other comment here in relation to setting up |
Dear @sjspielman , dear @jashapiro , Thank you so much for your replies and explanations! I don't want to add much more work and challenges on your plate, so I think I'll try to make it without The reason why I wanted to try it, is that I was very surprised with the integrated data using either I will clean my code and open a PR related to this issue, so maybe we can improve things together. I also came out with another idea to test (hopefully validate) the annotation workflow, running it for the Wilms tumor 14 dataset, which contains paired tumor samples and O-PDX. Making the hypothesis that O-PDX shouldn't contain any normal cells from the patient (human), we could check like this the annotated normal/cancer cells. I'll open a new issue to discuss more about it! Thank you! |
Hi @maud-p, I'm Ally, one of the other data scientists at ALSF. I just wanted to chime in a little since I did some work a while ago testing different integration methods.
What do you mean they overlap? I'm assuming this means on the UMAP you see the cell types close to each other, but if you re-cluster the integrated results do you see that cell types of different types are in different clusters or do they belong to the same cluster? One good metric that we've used previously is the cLISI, which measures how close each cell is to other cells that belong to the same cell type in the PCA. We've used this before in our work, and here's a function we wrote a while ago that might help guide you if you did want to use this metric. Also happy to talk more about using integration metrics in general that could help in identifying if you have "good" integration. That being said, we've definitely seen that the Seurat methods can lead to over integration! So this doesn't surprise me at all. I totally agree that |
To jump off of what @allyhawkins wrote, I'll also point out (though this may not make a huge difference) that the |
Hi @allyhawkins , @sjspielman , Thank you so much, it all makes lot of sense! I'll try the To answer your question @allyhawkins , I have clusters mixed with different cell types. Maybe before the PR, just to illustrate how it looks like with Globally I'd it is not too bad, but the annotated normal cells (kidney in green and normal stroma in turkis) are spread all over the umap without specific clusters. |
If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.
This issue is related to the steps 6 and 7 I described in my proposed analysis #635 (comment)
Describe the goals of the changes to the analysis module.
Step 6 – validation by integration of the 40 samples
I would like to integrate the 40 snRNA-Seq using scVI or harmony, perform dimensional reduction and clustering. This will allow to validate our annotations, as cells from the same cell type should cluster together. At the sample level, normal and cancer cells fro the same histology cluster together, this might not be the case in the integrated dataset (hopefully 🤞 ).
Step 7 – identification of marker genes for each cell subtype using differential expression analysis
Finally, we would like to provide the WT community with specific and universal marker genes for a rapid identification of the different cell types found within the tumors. To do so, we will use pseudobulk differential expression analyses (DElegate package) to find markers of the different cell types using the function FindAllMarkers2 (default parameters, patient as replicate). We would like to even further validate candidate Wilms tumor marker genes in the VISIUM data and/or in FFPE sample (IHC) and in vitro models (IF).
Additionally, we could compare relapse and non-relapse samples per cell type using the function findDE (replicate_column = "patient", method = “edger”) to evaluate if a specific phenotype within the cancer cells or the microenvironment could indicate relapse in WT.
What will your pull request contain?
Will you require additional software beyond what is already in the analysis module?
scvi
integration requires conda environment.Will you require different computational resources beyond what the analysis module already uses?
The integration of the 40 samples will require quite some ressource and might not run in cli.
If known, when do you expect to file the pull request?
I have quite some wet lab work pending and I am not sure when I'll be able to focus on the described follow-up analysis, maybe somewhere in December.
But I like to do these analyses, that will hopefully allow improved marker identification of cancer versus normal and for specific histological subtypes (epithelial, blastemal, stromal), which is crucial for our future research and would be valuable for the Wilms tumor community.
Part of the work might be out of the scope of the Open-ScPCA project, happy to discuss with you if/how you like to continue the analysis!
The text was updated successfully, but these errors were encountered: