Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools annotate can segfault when --remove and --annotations used at once #1414

Closed
freeseek opened this issue Feb 15, 2021 · 1 comment · Fixed by samtools/htslib#1234
Closed

Comments

@freeseek
Copy link
Contributor

The following example reproduces the problem. First create a sample VCF:

(echo "##fileformat=VCFv4.2"
echo "##contig=<ID=chr1,length=248956422>"
echo "##FORMAT=<ID=X,Number=1,Type=Integer,Description=\"X\">"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSM"
echo -e "chr1\t833602\t.\tG\tA\t.\t.\t.\tX\t1000000") | \
  bcftools view --no-version -Oz -o in.vcf.gz && \
  bcftools index -f in.vcf.gz

Then run:

$ valgrind bcftools annotate -c "FMT/X" -x FMT/X -a in.vcf.gz in.vcf.gz
==608290== Memcheck, a memory error detector
==608290== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==608290== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==608290== Command: /tmp/bcftools/bcftools annotate -c FMT/X -x FMT/X -a in.vcf.gz in.vcf.gz
==608290== 
==608290== Invalid write of size 1
==608290==    at 0x4843C33: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==608290==    by 0x215054: memcpy (string_fortified.h:34)
==608290==    by 0x215054: bcf_update_format (vcf.c:4330)
==608290==    by 0x16B861: annotate (vcfannotate.c:2955)
==608290==    by 0x16B861: main_vcfannotate (vcfannotate.c:3183)
==608290==    by 0x4AF8CB1: (below main) (libc-start.c:314)
==608290==  Address 0xfffffffffffffffd is not stack'd, malloc'd or (recently) free'd
==608290== 
==608290== 
==608290== Process terminating with default action of signal 11 (SIGSEGV)
==608290==  Access not within mapped region at address 0xFFFFFFFFFFFFFFFD
==608290==    at 0x4843C33: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==608290==    by 0x215054: memcpy (string_fortified.h:34)
==608290==    by 0x215054: bcf_update_format (vcf.c:4330)
==608290==    by 0x16B861: annotate (vcfannotate.c:2955)
==608290==    by 0x16B861: main_vcfannotate (vcfannotate.c:3183)
==608290==    by 0x4AF8CB1: (below main) (libc-start.c:314)
==608290==  If you believe this happened as a result of a stack
==608290==  overflow in your program's main thread (unlikely but
==608290==  possible), you can try to increase the size of the
==608290==  main thread stack using the --main-stacksize= flag.
==608290==  The main thread stack size used in this run was 8388608.
==608290== 
==608290== HEAP SUMMARY:
==608290==     in use at exit: 380,644 bytes in 1,032 blocks
==608290==   total heap usage: 1,919 allocs, 887 frees, 5,954,465 bytes allocated
==608290== 
==608290== LEAK SUMMARY:
==608290==    definitely lost: 0 bytes in 0 blocks
==608290==    indirectly lost: 0 bytes in 0 blocks
==608290==      possibly lost: 0 bytes in 0 blocks
==608290==    still reachable: 380,644 bytes in 1,032 blocks
==608290==         suppressed: 0 bytes in 0 blocks
==608290== Rerun with --leak-check=full to see details of leaked memory
==608290== 
==608290== For lists of detected and suppressed errors, rerun with: -s
==608290== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

What is going on is the bcftools annotate first calls vcf.c function bcf_update_format() to remove the X format with code:

    if ( !n )
    {
        if ( fmt )
        {
            // Mark the tag for removal, free existing memory if necessary
            if ( fmt->p_free )
            {
                free(fmt->p - fmt->p_off);
                fmt->p_free = 0;
            }
            line->d.indiv_dirty = 1;
            fmt->p = NULL;
        }
        return 0;
    }

And then it calls bcf_update_format() again when annotating and in doing so it tries to write in the same region of memory previously cleared:

        // The tag is already present, check if it is big enough to accommodate the new block
        if ( str.l <= fmt->p_len + fmt->p_off )
        {
            // good, the block is big enough
            if ( str.l != fmt->p_len + fmt->p_off ) line->d.indiv_dirty = 1;
            uint8_t *ptr = fmt->p - fmt->p_off;
            memcpy(ptr, str.s, str.l);
            free(str.s);
            int p_free = fmt->p_free;
            bcf_unpack_fmt_core1(ptr, line->n_sample, fmt);
            fmt->p_free = p_free;
        }

As fmt->p - fmt->p_off was previously freed, this causes the application to segfault. One possible solution would be to clear the VCF line structure before starting the annotation process in vcfannotate.c by adding:

        bcf_clear(line);

before

        annotate(args, line);

I don't understand well enough the way bcftools annotate works to know whether a more efficient solution is possible or whether HTSlib should change to avoid code that could end up in this scenario where a format field is first removed but, to maximize efficiency, it is still seen as available until the entire VCF line structure is cleared.

@pd3
Copy link
Member

pd3 commented Feb 16, 2021

This should be fixed by samtools/htslib#1234, once accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants