Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on count distribution from *_dis files #41

Open
AlkaidWang opened this issue Aug 18, 2023 · 3 comments
Open

Questions on count distribution from *_dis files #41

AlkaidWang opened this issue Aug 18, 2023 · 3 comments

Comments

@AlkaidWang
Copy link

Mr./Ms. ,

I checked the count distributions from *_dis files, and compared the numbers with the read counts from BAM files. For a specific MSI locus, I thought that the sum of the number of count distribution equal to the total reads from the BAM files. I checked the read count through IGV, but found that the sum of the number from *_dis files always less than the total read count from BAM files through IGV. So I'm wondering is there any filtration process when counting the read count?

For example:

the MONO27 MSI site
chr2 39573062 GTCTC 27[A] GAGTG
T: 0 0 0 0 0 0 0 0 0 0 0 16 34 104 263 434 639 674 507 293 104 18 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The sum of the count is 3091, however, the total reads is 3896 at the start of the MONO27 site and 4341 at the end of the site, using IGV.

Look forward to your reply, thanks.

@ZhaoDanOnGitHub
Copy link

ZhaoDanOnGitHub commented Aug 18, 2023 via email

@observer2735
Copy link

Dear @AlkaidWang,

Thank you for your support and trust in our work, what you mentioned is a very noteworthy issue. During the development process of MSIsensor, TCGA groups can only get reads for the length of 100 bps. So in the code, the longest span is 110 bp.For example, in the loc you mentioned named as MONO27, it looks like that :

-110bp GTCTC 27[A] GAGTG +73bp

The first 110 bp is because of the setting of MSIsensor is 110 bp (you can search it at Ding lab for search key word "MAX READ LENGTH "), and the second 73 bp is because 110bp-5bp-5bp-27bp (from the start of this loc). The complexity of the code setup is due to the inability to easily obtain reads from the target area at the beginning of MSIsensor development, and only can use "bam_fetch" function in samtools lib.

Maybe your max length of read is longer to 110 bp, and the output is so confused to you. At the publishment msisensor, there was no problem doing so, because every user wants to use MSIsensor must compile it from source code, and the parameter can easily changed by the user. But now you need to use MSIsensor 2, which didn't publish source code, so it may can't cover all reads that you checked in IGV.

I'm very apologize for the inconvenience caused to you, as far as I know, the inconsistency in the number of reads displayed between MSIsensor2 and IGV does not significantly affect the detection results of MSIsensor2. Because after you input the data into MSIsensor 2, it will execute a process named "Normalization", so your input reads will changed into reads frequency.

At the next publishment version we will fix this bug, and I hope it will not affect your user experience so far.

Thank you for your understanding. Wishing you a pleasant life and smooth work.

yours,
Ji

@AlkaidWang
Copy link
Author

Got you. Thank you so much! What a good team and a useful tool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants