PIrANHA version 0.4-alpha-4
Pre-releaseScripts for file processing and analysis in phylogenomics & phylogeography
This is PIrANHA v0.4a4, a software package that provides a number of utility functions and pipelines for file processing and analysis steps in the (phylo*=) fields of phylogenomics and phylogeography (including population genomics). PIrANHA is fully command line-based and contains a series of functions for automating tasks during evolutionary analyses of genetic data.
PIrANHA v0.4a4 (=v0.4-alpha-4) is a new development pre-release, or 'minor version' that is greatly improved and ready for alpha testing! Development is ongoing and we need alpha testers, so feel free to download this release and try it out. Email all suggestions, feature requests, and bug fix requests to Justin directly at jbagley (at) jsu (dot) edu (also see Contact page of the wiki) using forms here.
See full description in the PIrANHA README, Quick Guide, and wiki pages.
What's new?
v0.4a4 (v0.4-alpha-4)
This update builds on the previous development release, v0.4a3, by adding minor bug fixes, major bug fixes, new features and improvements, and new functions.
Phylogenomics
In this release, I have worked to further flesh out contributions of PIrANHA to phylogenomics workflows for analyzing targeted sequence capture data (e.g. from Hyb-Seq) by adding the new function assembleReads
, a script that automates de novo assembly of cleaned sequence reads (short reads in FASTQ format) from targeted capture HTS experiments using the ABySS assembler. This is a companion script designed to be run before phaseAlleles
and alignAlleles
. The overall workflow now assembles HTS read data, and phases and aligns consensus sequences based on reads (re)mapped to a reference assembly FASTA file (i.e. following reference-based assembly). This combination of programs was designed to be run 1) in a custom target capture workflow (“Workflow 1” below) or 2) after first conducting cleaning, assembly, locus selection, and reference-based assembly in the SECAPR sequence capture pipeline (Andermann et al. 2018; “Workflow 2” below, tested using output from SECAPR as input for PIrANHA).
There are two recommended workflows:
Workflow 1 (Recommended, most stable):
- Cleaning reads using
fastp
(see here; or similar software). - Read assembly using
assembleReads
, followed by sequence phasing (phaseAlleles
) and alignment of allelic sequences (alignAlleles
) in PIrANHA. - Post-processing and phylogenetic inference.
Workflow 2:
- Read cleaning, assembly, locus selection, and reference-based assembly (specifically created with SECAPR (Andermann et al. 2018).
- Sequence phasing (
phaseAlleles
) and alignment of allelic sequences (alignAlleles
) in PIrANHA. - Post-processing and phylogenetic inference.
New features
- Tab completion. The most important new feature added in this release is dynamic tab completion of function names after
piranha -f
(e.g.piranha -f <TAB>
). See the GitHub repository README for a cool demonstration of this feature!! - Simplified Homebrew install (updated formula)
- New single
install_piranha
installer script replaces previous system using two separate installer scripts. - New handling of large alignment files keeps
dropRandomHap
function from dying, while still reducing alignments to one phased allele per sample. - Added
-t
option for specifying number of threads when runningbatchRunFolders
.
Bug fixes
- Fixed version printing for
piranha
main script and functions (piranha -V
,piranha --version
,piranha -f <function> -V
, andpiranha -f <function> --version
each now yield expected behavior (terse output). - Bug fixes for bad piping or other minor errors in
batchRunFolders
,FASTAsummary
, andsplitFile
functions. - Bug fixes and updates for
assembleReads
andphaseAlleles
functions ofpiranha
, fixing errors that caused the program to stop due to issues with among other thingsls
. - Bug fix for
PHYLIP2NEXUS
because failing regex test for hexadecimal characters, if produced, in the resulting (output) NEXUS files. Problem solved by POSIX solution. - Bug fixes for
FASTA2PHYLIP
function, which in aggregate now completely fix previous issues with the single-FASTA,-f 1
option. - Updated
trimSeqs
function to improve performance after issue discussion with Juan Moreira. This updated fixed POSIX space bug, because[:space:]
should be[[:space:]]
.