Skip to content

Releases: imallona/yamet

v1.0.0-rc.1

02 Feb 07:33
8fccdf6
Compare
Choose a tag to compare
library compatibility with cmake

- add export configuration
- move boost linkage from library to executable
- separate header files to lib and cli

v1.1.0-rc.1

21 Dec 19:27
fe6aa37
Compare
Choose a tag to compare

v1.1.0-rc.1

Overview

yamet (Yet Another Methylation Entropy Tool) is a powerful and efficient tool written in C++ for computing methylation entropy from genomic data. It aims to provide researchers with fast, accurate, and scalable entropy calculations while being easy to integrate into bioinformatics workflows.

Added

Core Features

  • sample entropy within cells - per search interval and also aggregated at a cell level
  • average methylation within and across cells - per search interval across all cells and also aggregated at a cell level
  • k-mer shannon entropy - per search interval across all cells
  • cells files and reference files can be in gzipped format
  • multithreaded

Primary Inputs

All files must be tab separated. The cell files are the cytosine reports for all covered positions. The reference file is a list of all positions in the genome. The intervals file is list of regions of interest within each chromosome. We do not enforce particular file formats but we do require the data to be presented in the following format.

  • cell files with 5 columns in the following order:
    1. chromosome
    2. position
    3. number of methylated reads
    4. total number of reads
    5. rate (beta value)
  • reference file with 2 columns in the following order:
    1. chromosome
    2. position
  • target intervals with 3 columns in the following order:
    1. chromosome
    2. start position
    3. end position

v0.1.0-rc.1

17 Jan 14:09
c19cf11
Compare
Choose a tag to compare
v0.1.0-rc.1 Pre-release
Pre-release

Capabilities

This is the first release candidate of yamet v0.1.0 after refactoring and rewriting in C++.

  • Offers a sample entropy within cells, Shannon's entropy across cells, average methylation by/across cells in C++
  • Includes simplistic tests on simulated data and valgrind and cppcheck profiling

Known issues

Requires GCC > 13.

Installation

cd method
bash build.sh

Usage

  • CLI takes a reference file listing cytosine coordinates, as many (covered) cytosine reports as cells, and a bedfile to filter in regions to calculate the metrics from. Metrics are calculated per bedfile interval.
  • CLI help:
input:
  -t [ --tsv ] arg                      tab separated files for different cells
                                        in the following format
                                        
                                         1    5    0    2    0
                                         1    9    1    1    1
                                         2    2    3    4    1
                                        
                                        where the columns are the chromosome, 
                                        position, number of methylated reads, 
                                        total number of reads and the rate 
                                        respectively
  -r [ --ref ] arg                      tab separated file for reference sites 
                                        in the following format
                                        
                                         1    5     7
                                         1    7     9
                                         1    9     11
                                         1    11    13
                                         2    2     4
                                         2    4     6
                                        
                                        where the columns are the chromosome, 
                                        start position and the end position 
                                        respectively
  -b [ --bed ] arg                      path to bed file for regions of 
                                        interest in the following format
                                        
                                         1    5     7
                                         1    10    30
                                         2    1     6
                                        
                                        where the columns are the chromosome, 
                                        start position and the end position 
                                        respectively

output:
  -d [ --det-out ] arg                  (optional) path to detailed output file
  -o [ --out ] arg                      (optional) path to simple output file

resource utilisation:
  --n-cores arg (=0)                    number of cores used for simultaneously
                                        parsing methylation files
  --n-threads-per-core arg (=1)         number of threads per core used for 
                                        simultaneously parsing methylation 
                                        files

verbose:
  --print-bed                           print parsed regions file
  --print-ref                           print parsed reference file
  --print-sampens [=arg(=true)] (=true) print computed sample entropies

misc:
  -h [ --help ]                         produce help message
  --version                             current version information