-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #8 from eslerm/master
PR for 1.4dev1 Bioconda recipe and Galaxy
- Loading branch information
Showing
17 changed files
with
1,306 additions
and
613 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,100 +1,131 @@ | ||
# VKMZ version 1.3.1 | ||
# vkmz v1.4dev1 | ||
|
||
VKMZ is a metabolomics prediction and vizualization tool which creates van Krevelen diagrams from mass spectrometry data. A van Krevelen diagram (VKD) plots a molecule on a 2D scatterplot based on the molecule's oxygen to carbon ratio (O:C) against it's hydrogen to carbon ratio (H:C). Classes of metabolites cluster together on a VKD [0]. Plotting a complex mixture of metabolites on a VKD can be used to briefly convey untargeted metabolomics data. | ||
vkmz predicts molecular formulas by searching a known mass-formula dictionary | ||
for a feature observed by a mass spectrometer. Elemental ratios for predicted-features | ||
are calculated to create the carbon-to-oxygen and carbon-to-hydrogen axis of a | ||
van Krevelen Diagram (VKD). VKD's are a convenient visualization tool for | ||
briefly conveying the constituents of a complex MS mixture (e.g., untargeted | ||
plant metabolomics). As output predicted-feature are saved to a tabular file, | ||
an interactive VKD web page, and other optional formats. | ||
|
||
VKMZ attempts to predict a molecular formula for each feature in LC-MS data. Each feature's mass is compared to a database of known formula masses. A prediction is made when a known mass is within the mass error range of an feature's uncharged (neutral) mass. A binary search algorithm is used to quickly make matches. Heristically generated databases for labeled and unlabeled metabolites are included [1]. VKMZ finds all predictions for an observed mass within the mass error. The prediction with the lowest delta (absolute difference between an feature's neutral mass and the predicted mass) is plotted. Features without predictions are discarded. Outputed is saved as a tabular and html file. | ||
## Installation | ||
|
||
This software works best with, accurate, high resolution LC-MS data. A well calibrated LC-MS is essential for correct predictions. It is best to emperically derive mass error etiher from the data or from data using the same methods and spiked standards. Using low resolution data will result in false positive predictions, especially for large mass metabolites. | ||
vkmz requires Python version 3.6 or greater. | ||
|
||
VKMZ can be used as a command line tool or on the Galaxy web platform [2]. A Galaxy wrapper for VKMZ is maintatined in this repository. VKMZ was developed on the Workflow4Metabolomics version of Galaxy [3]. | ||
### setup.py | ||
|
||
## Using VKMZ command line | ||
|
||
### Input modes | ||
|
||
VKMZ has two input modes: | ||
1. `xcms` mode reads features from XCMS data | ||
2. `tsv` mode reads a specially formatted tabular file | ||
|
||
Select a mode by declaring it as the first argument to `vkmz.py`. | ||
|
||
> **Example:** | ||
> ``` | ||
> python vkmz.py xcms [other parameters] | ||
> ``` | ||
Different modes allow different parameters. | ||
### Required parameters | ||
#### xcms mode | ||
xcms mode requires three tabular files generated by XCMS: | ||
* `--data-matrix [XCMS_DATA_MATRIX_FILE]` | ||
* `--sample-metadata [XCMS_SAMPLE_METADATAFILE]` | ||
* `--variable-metadata [XCMS_VARIABLE_METADATAFILE]` | ||
##### xcms mode example: | ||
Clone or otherwise repo and go to the root directory. Install vkmz as any other | ||
python package: | ||
``` | ||
python vkmz.py xcms --data-matrix test-data/datamatrix.tabular --sample-metadata test-data/sampleMetadata.tabular --variable-metadata test-data/variableMetadata.tabular [other parameters] | ||
python3 setup.py install | ||
``` | ||
|
||
#### tsv mode | ||
### Bioconda | ||
|
||
Version 1.4dev1 has a [conda recipe](https://github.com/bioconda/bioconda-recipes/tree/master/recipes/vkmz). | ||
|
||
tsv mode requires a tabular file of a specific format as input: | ||
* `--input [TSV_FILE]` | ||
### Galaxy | ||
|
||
The first five columns of the input tabular file must be: | ||
>| sample_id | polarity | mz | rt | intensity | | ||
>|-----------|----------|----|----|-----------| | ||
Version 1.4dev1 has a [Galaxy wrapper](https://toolshed.g2.bx.psu.edu/view/eslerm/vkmz/). | ||
|
||
## Input Data | ||
|
||
#### All modes | ||
Can either parse a tabular file or Workflow4Metabolomics' XCMS tabular as input. | ||
|
||
Mass error of LC-MS in parts-per-million: | ||
* `--error [PPM_ERROR_NUMBER]` | ||
* It is critical to set the mass error correctly | ||
Input MS data can be given in two "modes", (1) tabular or (2) Workflow4Metabolomics' | ||
XCMS for Galaxy (W4M-XCMS) files. | ||
|
||
Output name: | ||
* `--output [FILENAME]` | ||
* A `.tsv` and `.html` file will be generated by VKMZ with the given filename | ||
Tabular mode requires a single tabular file as input and must include the columns | ||
"sample_name", "polarity", "mz", "rt", and "intensity". Each row represents a | ||
feature. Optionally a "charge" column can exist. | ||
|
||
### Optional parameters | ||
W4M-XCMS mode requires the sample metadata, variable metadata, and data matrix | ||
files generated with W4M-XCMS. Feature charge information can be read from the | ||
variable metadata file if it has been annotated with CAMERA. | ||
|
||
Database: | ||
* `--database [DATABASE_FILE_PATH]` | ||
* Default is BMRB's monoisotopic heuristically generated database | ||
* Path is relative to `--directory` | ||
Polarity values should be either "positive" or "negative". | ||
|
||
Directory: | ||
* `--directory [TOOL_PATH]` | ||
* Explicitly define tool directory | ||
* Paths are relative if unset | ||
* Affects database and web page template paths | ||
If feature charge information is present, features without charge information | ||
will be removed. If CAMERA annotation is present, only monoisotopic features | ||
will be kept. An argument flag (`--impute-charge`) can be set to disable removing | ||
features without charge annotation. Users should be wary of false results when | ||
using this non-default option. **Currently W4M-XCMS mode imputes a charge of one | ||
for all annotated charges. This will be addressed in v1.4.** | ||
|
||
Forced Polarity: | ||
* `--polarity [positive|negative]` | ||
* Set all features to have either a positive or negative polarity | ||
* Overrides input files polarity information | ||
* Do not use this parameter on data containing both polarities | ||
## Output | ||
|
||
Neutral: | ||
* `--neutral` | ||
* Using this flag disables charged mass adjustment | ||
* Without this flag VKMZ adjusts a feature mass by adding or removing that mass of a proton based on the features charged polarity | ||
vkmz always outputs tabular and html files. Optionally, vkmz can output JSON | ||
and SQL as well. | ||
|
||
Unique: | ||
* `--unique` | ||
* Remove features with multiple predictions from output | ||
## Command Line Interface | ||
|
||
## Special thanks to | ||
### Quick start | ||
|
||
Adrian, Art, Eric, Jerry, Kevin, Renata, Stephen, Tim, and Yuan. | ||
``` | ||
vkmz tabular --input test-data/tabular.tabular --output foo --error 10 | ||
vkmz xcms -xd test-data/datamatrix.tabular -xv test-data/variableMetadata.tabular -xs test-data/sampleMetadata.tabular -o foo -e 10 --impute | ||
``` | ||
|
||
## Citations | ||
### All Arguments | ||
|
||
0. Brockman et al. [doi:10.1007/s11306-018-1343-y](https://doi.org/10.1007/s11306-018-1343-y) | ||
1. Hegeman et al. [doi:10.1021/ac070346t](https://doi.org/10.1021/ac070346t) | ||
2. [Galaxy Project](https://galaxyproject.org/) | ||
3. [Workflow4Metabolomics](http://workflow4metabolomics.org/) | ||
4. Smith et al. [doi:10.1021/ac051437y](https://www.ncbi.nlm.nih.gov/pubmed/16448051) | ||
vkmz runs in two modes: "tabular" and "xcms". xcms refers to Workflow4Metabolomics' | ||
XCMS for Galaxy. | ||
|
||
``` | ||
tabular mode arguments: | ||
usage: vkmz tabular [-h] --input INPUT --output [OUTPUT] --error [ERROR] | ||
[--json] [--sql] [--metadata] [--database [DATABASE]] | ||
[--prefix [PREFIX]] [--polarity {positive,negative}] | ||
[--neutral] [--alternate] [--impute-charge] | ||
required arguments: | ||
--input [INPUT], -i [INPUT] | ||
Path to tabular file | ||
optional arguments: | ||
--help, -h tabular mode help message and exit | ||
xcms mode arguments: | ||
usage: vkmz xcms [-h] --data-matrix [DATA_MATRIX] --sample-metadata | ||
[SAMPLE_METADATA] --variable-metadata [VARIABLE_METADATA] | ||
--output [OUTPUT] --error [ERROR] [--json] [--sql] | ||
[--metadata] [--database [DATABASE]] [--prefix [PREFIX]] | ||
[--polarity {positive,negative}] [--neutral] [--alternate] | ||
[--impute-charge] | ||
required arguments: | ||
--data-matrix [DATA_MATRIX], -xd [DATA_MATRIX] | ||
Path to XCMS data matrix file | ||
--sample-metadata [SAMPLE_METADATA], -xs [SAMPLE_METADATA] | ||
Path to XCMS sample metadata file | ||
--variable-metadata [VARIABLE_METADATA], -xv [VARIABLE_METADATA] | ||
Path to XCMS variable metadata file | ||
optional arguments: | ||
--help, -h xcms mode help message and exit | ||
mode shared arguments: | ||
required arguments: | ||
--output [OUTPUT], -o [OUTPUT] | ||
Specify output file path. | ||
--error [ERROR], -e [ERROR] | ||
Mass error of MS data in parts-per-million. | ||
optional arguments: | ||
--json, -j Set JSON flag to save JSON output | ||
--sql, -s Set SQL flag to save SQL output | ||
--metadata, -m Set metadata flag to save argument metadata | ||
--database [DATABASE], -db [DATABASE] | ||
Define path to custom database of known formula-mass | ||
pairs | ||
--prefix [PREFIX] Define path prefix to support files ("d3.html" and | ||
database directory | ||
--polarity {positive,negative}, -p {positive,negative} | ||
Set flag to force polarity of all features to positive | ||
or negative | ||
--neutral, -n Set flag if input data contains neutral feature mass | ||
instead of mz | ||
--alternate, -a Set flag to keep features with multiple predictions | ||
--impute-charge, --impute | ||
Set flag to impute "1" for missing charge information | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
import setuptools | ||
|
||
setuptools.setup( | ||
name="vkmz", | ||
version="1.4dev0", | ||
python_requires=">=3.6", | ||
description="metabolomics formula prediction and van Krevelen diagram generation", | ||
author="Mark Esler", | ||
author_email="[email protected]", | ||
url="https://github.com/HegemanLab/VKMZ", | ||
packages=setuptools.find_packages(), | ||
entry_points={"console_scripts": ["vkmz = vkmz.__main__:main"]}, | ||
package_data={"vkmz": ["d3.html", "databases/*"]}, | ||
) |
Oops, something went wrong.