Skip to content
This repository has been archived by the owner on Apr 21, 2022. It is now read-only.

Commit

Permalink
Merge pull request #7 from a-slide/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
a-slide authored Jan 13, 2020
2 parents 336819c + f4bbc3f commit 9cd41f3
Show file tree
Hide file tree
Showing 9 changed files with 34 additions and 36 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ First of all, thanks for considering contributing to `pycoMeth`! 👍 It's peopl

## Code of conduct

Please note that this project is released with a [Contributor Code of Conduct][code_of_conduct.md]. By participating in this project you agree to abide by its terms.
Please note that this project is released with a [Contributor Code of Conduct][code_of_conduct]. By participating in this project you agree to abide by its terms.

## How you can contribute

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@

### pycoMeth workflow

![Workflow](pictures/pycoMeth_package.png)
![Workflow](docs/pictures/pycoMeth_package.png)


### pycoMeth example output IGV rendering

![](pictures/pycoMeth_all.png)
![](docs/pictures/pycoMeth_all.png)

### Authors

Expand Down
14 changes: 9 additions & 5 deletions docs/CGI_Finder/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,13 @@
* [Python API usage](https://a-slide.github.io/pycoMeth/CGI_Finder/API_usage/)
* [Shell CLI usage](https://a-slide.github.io/pycoMeth/CGI_Finder/CLI_usage/)

## Output format
## Input file

CGI_Finder can generates 2 files, a standard BED file and a tabulated file containing extra information
### Reference FASTA file

FASTA reference file containing sequences in which CpG islands needs to be found.

## Output files

### Tabulated TSV file

Expand All @@ -18,13 +22,13 @@ This tabulated file contains the following fields for each CpG island found:
* chromosome / start / end : Genomic coordinates
* length: Length of the interval
* num_CpG: Number of CpGs found
* CG_freq: G+C nucleotide frequency
* CG_freq: G+C nucleotide frequency
* obs_exp_freq: Observed versus expected CpG frequency

### BED file

Standard genomic BED3 (https://genome.ucsc.edu/FAQ/FAQformat.html#format1) format indicating the coordinates of putative CpG islands.
Minimal standard genomic [BED3](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) format listing the coordinates of putative CpG islands.

The picture below shows the putative CpG islands found (grey boxes) in an example sequence overlayed with C+G frequency and observed versus expected CpG frequency
The picture below shows the putative CpG islands found (grey boxes) in an example sequence, overlaid with C+G frequency and observed/expected CpG frequency

![Example](../pictures/CGI_Finder.png)
14 changes: 6 additions & 8 deletions docs/CpG_Aggregate/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,9 @@

### Reference FASTA file

FASTA reference file used for read alignment and Nanopolish. This file is required and used to sort the CpG sites by coordinates
FASTA reference file used for read alignment and Nanopolish. This file is required and used to sort the CpG sites by coordinates

## Output format

CpG_Aggregate can generates 2 files, a standard BED file and a tabulated file containing extra information
## Output files

### Tabulated TSV file

Expand All @@ -33,15 +31,15 @@ This tabulated file contains the following fields:

### BED file

Standard genomic [BED6](https://genome.ucsc.edu/FAQ/FAQformat.html#format1). The score correspond to the median log likelyhood ratio.
Standard genomic [BED9 format](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) including an RGB color field. The score correspond to the median log likelihood ratio.
The file is already sorted by coordinates and can be rendered with a genome browser such as IGV

The sites are color-coded as follow:

- Median log likelihood ratio higher than 2 (Methylated): Colorscale from orange (llr = 2) to deep red (llr >=6)
- Median log likelihood ratio lower than 2 (Unmethylated): Colorscale from green (llr = -2) to deep blue (llr <= -6)
- Median log likelihood ratio higher than 2 (Methylated): Colorscale from orange (llr = 2) to deep red (llr >=6)
- Median log likelihood ratio lower than 2 (Unmethylated): Colorscale from green (llr = -2) to deep blue (llr <= -6)
- Grey: Median log likelihood ration between -2 and 2 (ambiguous methylation status)

Here is an example of multiple methylation bed files rendered with IGV

![Example Bed Files](../pictures/CpG_Aggregate_2.png)
![Example Bed Files](../pictures/CpG_Aggregate_2.png)
14 changes: 6 additions & 8 deletions docs/Interval_Aggregate/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,13 @@

### Reference FASTA file

FASTA reference file used for read alignment and Nanopolish. This file is required and used to sort the CpG sites by coordinates
FASTA reference file used for read alignment and Nanopolish. This file is required and used to sort the CpG sites by coordinates

### BED file containing intervals

Optional **sorted** and BED file containing **non-overlapping** intervals to bin CpG data into. If this file is not provided, then the program use a sliding customizable window to bim data along the entire genome.
Optional **sorted** and BED file containing **non-overlapping** intervals to bin CpG data into. If this file is not provided, then the program use a sliding customizable window to bin data along the entire genome.

## Output format

CpG_Aggregate can generates 2 files, a standard BED file and a tabulated file containing extra information
## Output files

### Tabulated TSV file

Expand All @@ -36,13 +34,13 @@ This tabulated file contains the following fields:

### BED file

Standard genomic [BED6](https://genome.ucsc.edu/FAQ/FAQformat.html#format1). The score correspond to the median log likelyhood ratio.
Standard genomic [BED9 format](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) including an RGB color field. The score correspond to the median log likelihood ratio.
The file is already sorted by coordinates and can be rendered with a genome browser such as IGV

The sites are color-coded as follow:

* Median log likelihood ratio higher than 2 (Methylated): Colorscale from orange (llr = 2) to deep red (llr >=6)
* Median log likelihood ratio lower than 2 (Unmethylated): Colorscale from green (llr = -2) to deep blue (llr <= -6)
* Median log likelihood ratio higher than 2 (Methylated): Colorscale from orange (llr = 2) to deep red (llr >=6)
* Median log likelihood ratio lower than 2 (Unmethylated): Colorscale from green (llr = -2) to deep blue (llr <= -6)
* Grey: Median log likelihood ration between -2 and 2 (ambiguous methylation status)

Here is an example of multiple methylation bed files rendered with IGV
Expand Down
10 changes: 4 additions & 6 deletions docs/Meth_Comp/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,7 @@ A list of `pycoMeth CpG_Aggregate` or `pycoMeth Interval_Aggregate` **tsv** outp

FASTA reference file used for read alignment and Nanopolish. This file is required and used to sort the CpG sites by coordinates.

## Output format

Meth_Comp can generates 2 files, a standard BED file and a tabulated file containing extra information.
## Output files

### Tabulated TSV file

Expand All @@ -29,18 +27,18 @@ This tabulated file contains the following fields:
* n_samples: Number of valid samples compared for position
* pvalue / statistic: pvalue /statistic for positions obtained by Kruskal Wallis or Mann_Withney test
* adj_pvalue: FDR adjusted pValue using the Benjamini & Hochberg procedure
* neg_med / pos_med / ambiguous_med: Number of samples with a median below the negative llr threshold / above the positive llr threshold or with and ambiguous median between the 2 thresholds
* neg_med / pos_med / ambiguous_med: Number of samples with a median below the negative llr threshold / above the positive llr threshold or with and ambiguous median between the 2 thresholds
* labels: labels of the samples tested, matching the order of values in med_llr_list and raw_llr_list
* med_llr_list: List of median llr values for each samples compared.
* raw_llr_list: List of the list of raw llr values for each samples compared

### BED file

Standard genomic BED6 (https://genome.ucsc.edu/FAQ/FAQformat.html#format1). The score correspond to the -log10(Adjusted Pvalue) capped to 1000. The file is sorted by coordinates and can be rendered with a genome browser such as IGV
Standard genomic [BED9 format](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) including an RGB color field. The score correspond to the -log10(Adjusted Pvalue) capped to 1000. The file is sorted by coordinates and can be rendered with a genome browser such as IGV

The sites are color-coded as follow:

* Significant differential methylation Adjusted pValue: Colorscale from orange (pValue=0.01) to deep purple (pValue<=0.000001)
* Significant differential methylation Adjusted pValue: Colorscale from orange (pValue=0.01) to deep purple (pValue<=0.000001)
* Non-significant: Grey

Here is an example of multiple methylation bed files with rendered with IGV
Expand Down
6 changes: 3 additions & 3 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ With [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.
conda create -n pycoMeth python=3.6
```

You might also want to install [Nanopolish](https://github.com/jts/nanopolish) in the same virtual environment so you can pipe nanopolish output directly into pycoMeth
You might also want to install [Nanopolish](https://github.com/jts/nanopolish) in the same virtual environment so you can pipe nanopolish output directly into `pycoMeth`

## Dependencies

[Nanopolish 0.10+](https://github.com/jts/nanopolish) is not a direct dependency but is required to generate the files used by several commands from this package

Nanocompore relies on a the following robustly maintained third party python libraries:
`pycoMeth` relies on a the following robustly maintained third party python libraries:

* numpy>=1.14.0
* tqdm>=4.23.4
Expand All @@ -43,7 +43,7 @@ pip install pycoMeth
pip install pycoMeth --upgrade
```

If you feel adventurous you can install the development version from test.pypi
If you feel more adventurous you can install the development version from test.pypi

```bash
pip install --index-url https://test.pypi.org/simple/ pycoMeth
Expand Down
2 changes: 1 addition & 1 deletion pycoMeth/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-

# Define self package variable
__version__ = "0.2.6"
__version__ = "0.2.7"
__description__ = 'Python package for nanopore DNA methylation analysis downstream to Nanopolish'
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

# Define package info
name = "pycoMeth"
version = "0.2.6"
version = "0.2.7"
description = 'Python package for nanopore DNA methylation analysis downstream to Nanopolish'
with open("README.md", "r") as fh:
long_description = fh.read()
Expand All @@ -26,7 +26,7 @@
'Development Status :: 3 - Alpha',
'Intended Audience :: Science/Research',
'Topic :: Scientific/Engineering :: Bio-Informatics',
'License :: OSI Approved :: MIT License',
'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
'Programming Language :: Python :: 3'],
install_requires = [
'numpy>=1.14.0',
Expand Down

0 comments on commit 9cd41f3

Please sign in to comment.