-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request - Rhapsody whole genome sequences #106
Comments
Awesome! Regarding the actual feature request, could I ask for a bit more clarification? I'm not sure what the requested feature is exactly, or what the requirement would be for --Rob |
Hi Rob, what I would dream up would be a way to use likely salmon here https://github.com/stela2502/Rustody/blob/8befa2e774caba0f5037b57ceb23bac7d18bac8d/src/bin/quantify_rhapsody.rs#L348. |
So to sum the long one up once more: I need to somehow map the R2 read to a genome wide index and do not have the time to implement that as I fail at the most basic stuff. I am no trained informatics guy after al :-( |
I just found this issue when investigating whether we would easily be able to adapt our pipeline to accommodate BD Rhapsody data. It looks like the R1 sequence structure (as described in their doc (pdf) could be accommodated by salmon |
Hi @jashapiro, Is your use-case for single-cell transcriptomics? We are currently working on a "general" solution to such problems — with increasingly complicated barcoding mechanisms. Currently the |
Hi, at the end I just (re-)implemented a whole genome enabled mapper. That thing now uses a u16 representation of a 8bp fragment of the read to identify a most likely region in a u16::MAX long vector of 8pb-32bp downstream mappers. This does work on the targeted approach and should be able to scale it up to whole genome. You can look at it here: https://github.com/stela2502/Rustody/blob/new_mapper/this/src/fast_mapper.rs. |
I normally see up to 80% PCR duplicates in the data. So I am not sure if thinking about catching each and every read is even worth it. I would not assume that the final counts change in a meaningful way. |
@jashapiro I would be interested in any approach for dealing with variable bases or "diversity inserts" in modern (2023) BD Rhapsody data. The library structure is detailed here as well: https://teichlab.github.io/scg_lib_structs/methods_html/BD_Rhapsody.html The official CWL pipeline has not been too satisfactory for us. |
cc @noahcape & @Daniel-Liu-c0deb0t: Could this modern BD Rhapsody data be a usecase for |
HI all. After some time developing a BAM output and therfore also implementing a CIRAG class my mapping speed dropped significantly - to a value I dropped the project. |
Hi, with the help of Rob I managed to get a Rust BD Rhapsody analysis tool to work (https://github.com/stela2502/Rustody).
It supports all versions of BD's beads at the moment, but not the (new) whole genome part.
Rhapsody data consists of a variety of data: Antibody tags that are coded in R2 as well as Sample Tags coded there, too. In a new version they also have a TCR and BCR sequencing options. I have coded the targeted sequence matches using 32bp kmers from the provided fasta files ( only ~ 500 fasta sequences). But I am not very interested to blow this up to whole genome analysis.
Hence my question: Is there any way you could implement the BD cell ids in alevin fry? Could be as simple as to use my cellids class. Or could I use your library to map all sequences my tool can not handle?
The second option would of cause be my favorite. Like only use your genome representation and the mapper. It would be absolutely epic if you could help me here!
Thank you!
The text was updated successfully, but these errors were encountered: