-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Number of sequences in BAM header mismatches reference #126
Comments
It seems that perhaps the BamHeader parser is treating the extra sequences
differently than the reference parser. What are the tags in the BamHeader
or are you able to post an example?
The other issue is that the reference is sorted with chr10 first, and the
header has chr2 first. Are you sure the order in the reference is the same
as in the BAM header? You’re right though, this warning is meant to get
users to align to the same reference, but you already did this.
…On Tue, Jul 25, 2023 at 7:11 PM tnnandi ***@***.***> wrote:
Hi,
When I am using SvABA with the hg38 (no alt) reference file and a BAM file
obtained by aligning a whole human genome to the same reference, I get the
following error (only the first few lines are shown):
!!!!!!!!!!! WARNING !!!!!!!!!!!
!!!!!! Number of sequences in BAM header mismatches reference
!!!!!! BAM: 3366 -- Ref: 194
!!!!!! BAM sequence id 1: "chr2" -- Ref sequence id 1: "chr10"
!!!!!! BAM sequence id 2: "chr3" -- Ref sequence id 2: "chr11"
!!!!!! BAM sequence id 3: "chr4" -- Ref sequence id 3: "chr11_KI270721v1_random"
!!!!!! BAM sequence id 4: "chr5" -- Ref sequence id 4: "chr12"
!!!!!! BAM sequence id 5: "chr6" -- Ref sequence id 5: "chr13"
!!!!!! BAM sequence id 6: "chr7" -- Ref sequence id 6: "chr14"
!!!!!! BAM sequence id 7: "chr8" -- Ref sequence id 7: "chr14_GL000009v2_random"
!!!!!! BAM sequence id 8: "chr9" -- Ref sequence id 8: "chr14_GL000225v1_random"
!!!!!! BAM sequence id 9: "chr10" -- Ref sequence id 9: "chr14_KI270722v1_random"
!!!!!! BAM sequence id 10: "chr11" -- Ref sequence id 10: "chr14_GL000194v1_random"
!!!!!! BAM sequence id 11: "chr12" -- Ref sequence id 11: "chr14_KI270723v1_random"
.....
This error arises from the following condition in
src/savaba/run_savaba.cpp:
if (b_header.NumSequences() != bwa_header.NumSequences()),
and when I do samtool view -H <bamfile> | wc -l I do indeed get a number
close to 3366 but that include lot of information in addition to that of
just the sequences (e.g., steps for alignment).
I have ensured that the BAM file has indeed been obtained by mapping to
the same reference genome (I re-did it myself), so I'm not sure how to get
around this issue.
Thanks,
Tarak
—
Reply to this email directly, view it on GitHub
<#126>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUZ7CETQJEOUTEGC3IHYMDXSBHAZANCNFSM6AAAAAA2XYA5J4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi @walaj , thank you very much for the prompt response! Here's the link to a file that contains the header of the bam file + first few entries: https://drive.google.com/file/d/1i66328Q4bmjMAGhD_KF6Z4mQcWBVpjua/view?usp=drive_link The first line of the reference file looks like this (it starts with chr1):
Also, this seems to be more than just a warning as the code fails if Thanks, |
This BAM header has 3366 sequence tags as svaba reports, so I don't see an
issue here. You'll have to confirm how many sequences the reference fasta
has.
…On Tue, Jul 25, 2023 at 9:00 PM tnnandi ***@***.***> wrote:
Hi @walaj <https://github.com/walaj> , thank you very much for the prompt
response!
Here's the link to a file that contains the header of the bam file + first
few entries:
https://drive.google.com/file/d/1i66328Q4bmjMAGhD_KF6Z4mQcWBVpjua/view?usp=drive_link
The first line of the reference file looks like this (it starts with chr1):
chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Also, this seems to be more than just a warning (it is in fact an error
notification) as the code fails if trigger_explain is True that is caused
by
for (int i = 0; i < std::min(b_header.NumSequences(), bwa_header.NumSequences()); ++i) {
if (b_header.IDtoName(i) != bwa_header.IDtoName(i)) {
trigger_explain = true;
Thanks,
Tarak
—
Reply to this email directly, view it on GitHub
<#126 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUZ7CEZ4ZPMI53GXBZ4TYLXSBT33ANCNFSM6AAAAAA2XYA5J4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi tnnandi, have you solved your issue? I'm also facing the same problem |
I still think the reference and the header are different. @zhaowsong can you confirm with |
Hi,
When I am using SvABA with the hg38 (no alt) reference file and a BAM file obtained by aligning a whole human genome to the same reference, I get the following error (only the first few lines are shown):
This error arises from the following condition in src/savaba/run_savaba.cpp:
if (b_header.NumSequences() != bwa_header.NumSequences())
,and when I do
samtools view -H <bamfile> | wc -l
I do indeed get a number close to 3366 but that includes lot of information in addition to that of just the sequences (e.g., steps for alignment).So, I don't see any reason why both headers (for the bam file and the reference file) should be of the same length if we are not limiting our comparison to only those lines that contain the sequence information.
I have ensured that the BAM file has indeed been obtained by mapping to the same reference genome (I re-did it myself), so I'm not sure how to get around this issue.
Thanks,
Tarak
The text was updated successfully, but these errors were encountered: