Skip to content
Wang Yunfei edited this page Feb 8, 2017 · 4 revisions

NGSLib

  • ngslib is a Python based package aims in Next-Generation Sequencing Analysis.
  • ngslib is used for manipulating genome annotation and sequence files, such as Fasta, Bed, GenePred, BAM, Wiggle and BigWig formats.
  • ngslib uses the "lib" and "inc" directories from Jim Kent's toolkit, and users should read the README file inside and behave accordingly.
  • All files are copyrighted, but license is hereby granted for personal, academic and non-profit use. Commercial users should contact Yunfei Wang in details.

#Prerequisites:

Python packages: (will be installed automatically while installing ngslib)

  1. numpypackage for scientific computing with Python
  2. pysam for SAM/BAM file manipulation
  3. argparseParser for command-line options. (Python 2.6 only)

Other requirements in some rare cases: (need to be installed separately)

  1. python-dev for the Python.h include file. (Ubuntu only, sudo apt-get install python-dev)
  2. libpng for compile Kentlib. (Ubuntu only, sudo apt-get install libpng-dev)

Installation from PYPI

> easy_install --prefix=install_path ngslib

Installation from source code

This package is based on Python 2.6 or 2.7. This package has been tested on CentOS 6.4, Fedora 17, RedHat 5.5 and Ubuntu 12.04. Other platforms might not work well.

Download source file:

> easy_install --editable  --build-directory download_path ngslib

General installation instructions:

  1. Set PYTHONPATH environment variable.

  2. Specify the install path by "--prefix=install_path". In general, set "--prefix=$HOME/local"

> cd ngslib
> python setup.py build
> python setup.py install --prefix=install_path

Contents

Data Structure

  • Seq(Seq): sequence format.
  • Fasta(Fasta): Fasta format.
  • Bed: Genome interval format.
  • GeneBed: Gene annotation format.
  • BedList: List of Bed and its derived formats.
  • BedMap: Arrange Bed or GeneBed using Bin index technology for fast overlapping search.

Modules

  • BioReader: A general parser for Bed, Wiggle, Peak, GeneBed and other formats.
  • StringFile: read string as a file.
  • DB: Build index for fast accessing for biological files.
  • BigWigFile: Fast access of BigWig file.
  • FastaFile: Fast retrieve sequence from huge genome in Fasta format.
  • TwoBitFile: Fast retrieve sequence from huge genome in TwoBit format.
  • wRNA: RNA structure prediction and visualization.
  • Pipeline: Build pipeline using python wrapped shell commands and tools.
  • Utils: Utilities

Scripts

  • wBedToFasta.py
  • wBedExtend.py
  • wBamToWig.py

Main features

  • Fast.
  • Uniform coding style and Universal interface.
  • Simplified.
  • Clarified.

Citation

Clone this wiki locally