NAnopore DAta Variant CAller
Run:
git clone [email protected]:fmfi-compbio/nadavca.git
cd nadavca
pip install .
Depending on your configuration, the last step might require root privileges.
After installing, you should be able to run
nadavca --help
and also import nadavca
package in python3:
python3
...
>>> import nadavca
If Nadavca runs, but crashes on unsuccessfully trying to import some package, please report this as a bug and we will try to fix it. In the meantime, just manually install that package and try again.
If Nadavca crashes for any other reason, please report that, too.
Your reads should be basecalled (and results save to fast5) using Guppy 6.0 or earlier.
Warning: does not support multiple contigs
nadavca.estimate_snps(reference_filename,
reads,
reference=None,
config='default/config.yaml',
kmer_model='default/kmer_model.hdf5',
bwa_executable='bwa',
independent=False,
group_name='Analyses/Basecall_1D_000')
where:
reference_filename
is name of a.fasta
file with the reference sequencereads
is either of:- a list of filenames of individual reads
- a name of a directory. In that case, all
.fast5
files in the directory will be processed - a list of
nadavca.Read
instances
config
is either of:- a name of a YAML file containing parameters for the SNP-calling algorithm (see default values)
- a
dict
containing parameters for the SNP-calling algorithm
kmer_model
is either of:- a name of a HDF5 file containing expected values of signal for individual k-mers
- a
nadavca.KmerModel
instance
bwa_executable
is command that runs BWA on your system- If
independent
is set toTrue
, each read will be treated separately (i.e. Nadavca assumes each read was from a different modification of the reference sequence). Otherwise, information from all reads is combined into single consensus score for each position in the sequence. group_name
is path to the group containing results of basecalling inside the.fast5
files of reads
The return value of nadavca.estimate_snps()
is a list of
nadavca.estimator.Chunk
s. Each Chunk
contains estimated SNP probabilities
for a contiguous segment of the reference sequence. Bounds of this segment are
indicated by Chunk
's .start
and .end
attributes.
Estimated probabilities themselves are in Chunk
's .values
attribute,
which is a 2D numpy
array of dimension (end - start, 4)
. For each position
in reference sequence between start
(inclusive) and end
(exclusive),
it contains 4 numbers: the estimated probability of A, C, G and T, respectively,
on this position.
If independent
was set to False
, some positions in the reference sequence may
be covered by multiple reads. This information is stored in .cover
attribute
of nadavca.estimator.Chunk
.
If independent
was set to True
, Nadavca produces a Chunk
for each read.
If you pass reads to nadavca.estimate_snps()
as a list, while using independent=True
,
the order of Chunk
s corresponds to the order of reads.
There is also a command-line interface for nadavca.estimate_snps()
:
nadavca snp reference reads
Run
nadavca --help
nadavca snp --help
for more details.
nadavca.align_signal(reference_filename,
reads,
config='default/config.yaml',
kmer_model='default/kmer_model.hdf5',
bwa_executable='bwa',
group_name='Analyses/Basecall_1D_000'):
where parameters have the same meaning as in nadavca.estimate_snps()
.
The return value of nadavca.align_signal()
is a list of results for each read,
in the same order as they appeared on the input. Thus, it is advisable to pass reads to nadavca.align_signal()
as a list.
If read was not aligned the result is None
.
Otherwise each results is a tuple of ApproximateAlignment and DTW alignment.
DTW alignment is an array with three columns:
the first column contains positions in the reference sequence,
the second and third column contain start and end of correspoinding event in the signal, respectively.
Positions in signal are in ascending order.
Positions in reference are in ascending or descending
order, for forward strands and reverse strands, respectively.
nadavca.align_signal()
also has a command-line interface:
nadavca align reference reads
Run
nadavca --help
nadavca align --help
for more details.