Skip to content

PIrANHA version 0.4-alpha-4

Pre-release
Pre-release
Compare
Choose a tag to compare
@justincbagley justincbagley released this 18 Dec 07:04
· 123 commits to master since this release

Codacy Badge License Tweet Twitter

Scripts for file processing and analysis in phylogenomics & phylogeography

This is PIrANHA v0.4a4, a software package that provides a number of utility functions and pipelines for file processing and analysis steps in the (phylo*=) fields of phylogenomics and phylogeography (including population genomics). PIrANHA is fully command line-based and contains a series of functions for automating tasks during evolutionary analyses of genetic data.

PIrANHA v0.4a4 (=v0.4-alpha-4) is a new development pre-release, or 'minor version' that is greatly improved and ready for alpha testing! Development is ongoing and we need alpha testers, so feel free to download this release and try it out. Email all suggestions, feature requests, and bug fix requests to Justin directly at jbagley (at) jsu (dot) edu (also see Contact page of the wiki) using forms here.

See full description in the PIrANHA README, Quick Guide, and wiki pages.

What's new?

v0.4a4 (v0.4-alpha-4)

This update builds on the previous development release, v0.4a3, by adding minor bug fixes, major bug fixes, new features and improvements, and new functions.

Phylogenomics

In this release, I have worked to further flesh out contributions of PIrANHA to phylogenomics workflows for analyzing targeted sequence capture data (e.g. from Hyb-Seq) by adding the new function assembleReads, a script that automates de novo assembly of cleaned sequence reads (short reads in FASTQ format) from targeted capture HTS experiments using the ABySS assembler. This is a companion script designed to be run before phaseAlleles and alignAlleles. The overall workflow now assembles HTS read data, and phases and aligns consensus sequences based on reads (re)mapped to a reference assembly FASTA file (i.e. following reference-based assembly). This combination of programs was designed to be run 1) in a custom target capture workflow (“Workflow 1” below) or 2) after first conducting cleaning, assembly, locus selection, and reference-based assembly in the SECAPR sequence capture pipeline (Andermann et al. 2018; “Workflow 2” below, tested using output from SECAPR as input for PIrANHA).

There are two recommended workflows:

Workflow 1 (Recommended, most stable):

  1. Cleaning reads using fastp (see here; or similar software).
  2. Read assembly using assembleReads, followed by sequence phasing (phaseAlleles) and alignment of allelic sequences (alignAlleles) in PIrANHA.
  3. Post-processing and phylogenetic inference.

Workflow 2:

  1. Read cleaning, assembly, locus selection, and reference-based assembly (specifically created with SECAPR (Andermann et al. 2018).
  2. Sequence phasing (phaseAlleles) and alignment of allelic sequences (alignAlleles) in PIrANHA.
  3. Post-processing and phylogenetic inference.

New features

  • Tab completion. The most important new feature added in this release is dynamic tab completion of function names after piranha -f (e.g. piranha -f <TAB>). See the GitHub repository README for a cool demonstration of this feature!!
  • Simplified Homebrew install (updated formula)
  • New single install_piranha installer script replaces previous system using two separate installer scripts.
  • New handling of large alignment files keeps dropRandomHap function from dying, while still reducing alignments to one phased allele per sample.
  • Added -t option for specifying number of threads when running batchRunFolders.

Bug fixes

  • Fixed version printing for piranha main script and functions (piranha -V, piranha --version, piranha -f <function> -V, and piranha -f <function> --version each now yield expected behavior (terse output).
  • Bug fixes for bad piping or other minor errors in batchRunFolders, FASTAsummary, and splitFile functions.
  • Bug fixes and updates for assembleReads and phaseAlleles functions of piranha, fixing errors that caused the program to stop due to issues with among other things ls.
  • Bug fix for PHYLIP2NEXUS because failing regex test for hexadecimal characters, if produced, in the resulting (output) NEXUS files. Problem solved by POSIX solution.
  • Bug fixes for FASTA2PHYLIP function, which in aggregate now completely fix previous issues with the single-FASTA, -f 1 option.
  • Updated trimSeqs function to improve performance after issue discussion with Juan Moreira. This updated fixed POSIX space bug, because [:space:] should be [[:space:]].