Skip to content

Version 0.7.11

Compare
Choose a tag to compare
@etal etal released this 20 Apr 23:22
· 1031 commits to master since this release

New dependency on pyfaidx, a Python library for handling samtools-style FASTA indexes (.fai).

export vcf:

  • Add CNVkit version and current date (i.e. local calendar date that the
    "cnvkit.py export vcf" command was run) to the VCF header.

export theta:

  • Given a VCF of SNVs called jointly in paired tumor and normal samples,
    extract SNP allele counts to THetA2's custom input format
    ("snp_formatted.txt"). The two additional files CNVkit generates this way can
    be used with THetA2's "--TUMOR_SNP" and "--NORMAL_SNP" options to improve
    estimates of tumor purity and clonality.
  • Use CNVkit's segment weights and probe counts to estimate normal-sample read
    counts for each segment if no copy number reference profile (.cnn) or paired
    normal sample (.cnr) is given.
    The command's second argument is now optional and deprecated in favor of the
    -r/--reference option, which does the same thing.

import-theta:

  • Save integer copy number in the "cn" column of the output file(s) (CNVkit's
    .cns format).

call, export nexus-ogt:

  • When reading structural variants from a VCF file, interpret the END tag as the
    variant end position, not the length, per the VCF 4.2 specification.
    This bug could cause the b-allele frequencies calculated in call and export nexus-ogt to be erroneously repeated across many consecutive bins.

scatter:

  • When loading CNVkit files (in any command), identify and drop rows with "NaN"
    log2 values. (CNVkit never emits these, but they could happen if a user
    generates .cnr files from Illumina CGH array data files using a custom
    script.) The other rows (spread, gc, rmask) can be NaN without a problem, but
    plotting with scatter would crash when adjusting the y-axis based on NaN
    log2 values. (#95)
  • Detect & warn if input .cnr/.cns/.vcf is not sorted by genomic coordinates.
    This could happen if the input VCF or manually constructed .cnr/.cns file (not
    generated by CNVkit) was not sorted by genomic coordinates. Then the error
    message was cryptic, because some bins/segments/SNVs were selected successfully
    but plotting crashed when laying out the x-axis coordinates.

Internals & packaging:

  • Use the pyfaidx library to extract sequences from a genome FASTA file (used in
    the reference command), replacing some custom code in cnvlib. (#73; thanks
    @mdshw5)
  • Documentation updates.