Releases: imallona/yamet
Releases · imallona/yamet
library compatibility with cmake - add export configuration - move boost linkage from library to executable - separate header files to lib and cli
yamet (Yet Another Methylation Entropy Tool) is a powerful and efficient tool written in C++ for computing methylation entropy from genomic data. It aims to provide researchers with fast, accurate, and scalable entropy calculations while being easy to integrate into bioinformatics workflows.
Core Features
- sample entropy within cells - per search interval and also aggregated at a cell level
- average methylation within and across cells - per search interval across all cells and also aggregated at a cell level
- k-mer shannon entropy - per search interval across all cells
- cells files and reference files can be in gzipped format
- multithreaded
Primary Inputs
All files must be tab separated. The cell files are the cytosine reports for all covered positions. The reference file is a list of all positions in the genome. The intervals file is list of regions of interest within each chromosome. We do not enforce particular file formats but we do require the data to be presented in the following format.
- cell files with 5 columns in the following order:
- chromosome
- position
- number of methylated reads
- total number of reads
- rate (beta value)
- reference file with 2 columns in the following order:
- chromosome
- position
- target intervals with 3 columns in the following order:
- chromosome
- start position
- end position
This is the first release candidate of yamet
v0.1.0 after refactoring and rewriting in C++.
- Offers a sample entropy within cells, Shannon's entropy across cells, average methylation by/across cells in C++
- Includes simplistic tests on simulated data and valgrind and cppcheck profiling
Known issues
Requires GCC > 13.
cd method
- CLI takes a reference file listing cytosine coordinates, as many (covered) cytosine reports as cells, and a bedfile to filter in regions to calculate the metrics from. Metrics are calculated per bedfile interval.
- CLI help:
-t [ --tsv ] arg tab separated files for different cells
in the following format
1 5 0 2 0
1 9 1 1 1
2 2 3 4 1
where the columns are the chromosome,
position, number of methylated reads,
total number of reads and the rate
-r [ --ref ] arg tab separated file for reference sites
in the following format
1 5 7
1 7 9
1 9 11
1 11 13
2 2 4
2 4 6
where the columns are the chromosome,
start position and the end position
-b [ --bed ] arg path to bed file for regions of
interest in the following format
1 5 7
1 10 30
2 1 6
where the columns are the chromosome,
start position and the end position
-d [ --det-out ] arg (optional) path to detailed output file
-o [ --out ] arg (optional) path to simple output file
resource utilisation:
--n-cores arg (=0) number of cores used for simultaneously
parsing methylation files
--n-threads-per-core arg (=1) number of threads per core used for
simultaneously parsing methylation
--print-bed print parsed regions file
--print-ref print parsed reference file
--print-sampens [=arg(=true)] (=true) print computed sample entropies
-h [ --help ] produce help message
--version current version information