-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME
298 lines (207 loc) · 9.42 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
# Bio::ViennaNGS 0.19.2
Bio::ViennaNGS is a distribution of Perl modules and utilities for
building efficient Next-Generation Sequencing (NGS) analysis
pipelines. It covers various aspects of NGS data analysis, including
(but not limited to) conversion of sequence annotation, evaluation of
mapped data, expression quantification and visualization.
Bio::ViennaNGS is shipped with a complementary set of modules, classes and
roles:
- Bio::ViennaNGS::AnnoC
A Moose interface for storage and conversion of sequence annotation
data.
- Bio::ViennaNGS::Bam
Routines for high-level manipulation of BAM files.
- Bio::ViennaNGS::Bed
A Moose interface for manipulation of genomic interval data in BED
format.
- Bio::ViennaNGS::BedGraphEntry
A Moose interface for storing genomic cvoerage and interval data in
bedGraph format.
- Bio::ViennaNGS::Expression
A Moose interface for computation of normalized gene / transcript
expression based on read counts.
- Bio::ViennaNGS::ExtFeature
A Moose wrapper for extended BED6 elements.
- Bio::ViennaNGS::Fasta
Routines for accessing genomic sequences implemented through a Moose
interface to Bio::DB::Fasta.
- Bio::ViennaNGS::Feature
A Moose-based BED6 wrapper.
- Bio::ViennaNGS::FeatureChain
Yet another Moose class for chaining gene annotation features.
- Bio::ViennaNGS::FeatureInterval
A Moose interface for handling elementary genomic intervals,
corresponding to BED3.
- Bio::ViennaNGS::FeatureIO
A Moose interface for easy input/output operations on common feature
annotation file formats.
- Bio::ViennaNGS::FeatureLine
An abstract Moose class for combining several
Bio::ViennaNGS::FeatureChain objects.
- Bio::ViennaNGS::MinimalFeature
A Moose interface for handling elementary gene annotation, corresponding
to BED4.
- Bio::ViennaNGS::Peak
A Moose interface for identification and characterization of (coverage /
expression) peaks in RNA-seq data.
- Bio::ViennaNGS::SpliceJunc
A collection of routines for alternative splicing analysis.
- Bio::ViennaNGS::Subtypes
Moose subtypes for internal usage.
- Bio::ViennaNGS::Tutorial
A comprehensive tutorial of the Bio::ViennaNGS core routines with
real-world NGS data.
- Bio::ViennaNGS::UCSC
Routines for visualization of genomics data with the UCSC genome
browser.
- Bio::ViennaNGS::Util
A collection of wrapper routines for commonly used third-party NGS
utilities as well as a set of utility functions.
## UTILITIES
In addition, Bio::ViennaNGS comes with a collection of utility
programs for accomplishing routine tasks often required in NGS data
processing. These utilities serve as reference implementation of the
routines implemented in the (sub)modules:
assembly_hub_constructor.pl:
The UCSC genome browser offers the possibility to visualize any
organism (including organisms that are not included in the standard
UCSC browser bundle) through hso called 'Assembly Hubs'. This script
constructs Assembly Hubs from genomic sequence and annotation data.
bam_split.pl:
Split (paired-end and single-end) BAM alignment files by strand and compute
statistics. Optionally create BED output, as well as normalized bedGraph
and bigWig files for coverage visualization in genome browsers (see
dependencies on third-patry tools below).
bam_to_bigWig.pl:
Produce bigWig coverage profiles from (aligned) BAM files, explicitly
considering strandedness. The most natural use case of this tool is to
create strand-aware coverage profiles in bigWig format for genome browser
visualization.
bam_uniq.pl:
Extract unique and multi mapping reads from BAM alignment files and create
a separate BAM file for both uniqe (.uniq.) and multi (.mult.) mappers.
bed2bedGraph.pl:
Convert BED files to (strand specific) bedGraph files, allowing
additional annotation and automatic generation of bedGraph files which can
easily be converted to big-type files for easy UCSC visualization.
bed2nt2aa.pl:
Extract sequence intervals from Fasta files, provided as BED6
records. Optionally translate nucleotide into amino acid sequences.
bed62bed12.pl:
A simple BED6 -> BED12 converter
fasta_multigrep.pl:
Extract individual (whole) sequences from a multi Fasta file
fasta_subgrep.pl:
Extract subsequences from (multi) Fasta files.
fasta_regex.pl:
Extract sequence motifs from (multi) Fasta files provided as regular
expression.
gff2bed.pl:
Convert RefSeq GFF3 annotation files to BED12 format. Individual BED12
files are created for each feature type (CDS/tRNA/rRNA/etc.). Tested with
RefSeq bacterial GFF3 annotation.
kmer_analysis.pl:
Count k-mers of predefined length in FastQ and Fasta files
newUCSCdb.pl:
Create a new genome database to a locally installed instance of the UCSC
genome browser in order to add a novel organism for visualization.
normalize_multicov.pl:
Compute normalized expression data in TPM and RPKM from (raw) read counts
in bedtools multicov format. TPM reference: Wagner et al, Theory
Biosci. 131(4), pp 281-85 (2012).
rnaseq_peakfinder.pl:
Find and characterize peaks/enriched regions of certain size and coverage
in RNA-seq data.
sj_visualizer.pl:
Convert splice junctions from mapped RNA-seq data in segemehl BED6 splice
junction format to BED12 for easy visualization in genome browsers.
splice_site_summary.pl:
Identify and characterize splice junctions from RNA-seq data by
intersecting them with annotated splice junctions.
track_hub_constructor.pl:
Analogous to assembly_hub_constructor.pl, construct a Track Hub for an
organism listed in the UCSC Genome Browser.
trim_fastq.pl:
Trim sequence and quality string fields in a Fastq file by user
defined length.
## TUTORIAL
See Bio::ViennaNGS::Tutorial
## INSTALLATION
We have compiled a comprehensive `ViennaNGS Installation HOWTO` which
covers the setup of third party and Perl dependencies in detail:
http://rna-seq.at/blog/2015/03/03/viennangs-installation-howto/
If you have already installed all dependencies, the ViennaNGS distribution
can be installed with the following commands:
perl Makefile.PL
make
make test
make install
However, we highly recommend using *cpanminus* for installing ViennaNGS. If
you have cpanm available on your system, simply type
cpanm Bio::ViennaNGS
This command will grab the lastest ViennaNGS release from CPAN together
with all Perl dependencies and install the suite automatically (provided
you have samtools <= 0.1.19 installed, see the HOWTO link above).
## SUPPORT AND DOCUMENTATION
After installing, you can find documentation for this module with the
perldoc command.
perldoc Bio::ViennaNGS
You can also look for information at:
RT, CPAN's request tracker (report bugs here)
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Bio-ViennaNGS
AnnoCPAN, Annotated CPAN documentation
http://annocpan.org/dist/Bio-ViennaNGS
CPAN Ratings
http://cpanratings.perl.org/d/Bio-ViennaNGS
Search CPAN
http://search.cpan.org/dist/Bio-ViennaNGS
## DEPENDENCIES
Bio::ViennaNGS depends on a set of third-party tools and libraries which
are required for specific filtering and file format conversion tasks as
well as for building internally used Perl modules:
* bedtools2 >= 2.17 (https://github.com/arq5x/bedtools2)
* bedGraphToBigWig, fetchChromSizes, faToTwoBit from the UCSC Genome
Browser applications (http://hgdownload.cse.ucsc.edu/admin/exe/)
* the R Statistics software (http://www.r-project.org/)
* samtools <= v0.1.19 for building Bio::DB::Sam. Please note that more
recent HTSlib-based versions of samtools will not work with Bio::DB::Sam
Please ensure that all third-party utilities are available on your system
and accessible to the Perl interpreter.
Moreover, Bio::ViennaNGS depends on a ste of Perl packages that are not
shipped with the Perl core distribution, most notably Bio::Perl >=
1.00690001 and Bio::DB::Sam >= 1.37.
## SOURCE AVAILABILITY
This source is available on Github:
https://github.com/mtw/Bio-ViennaNGS
## VIENNANGS PAPER
If the Bio::ViennaNGS suite is useful for your work or if you use any
component of Bio::ViennaNGS in a custom pipeline, please cite the following
publication:
"ViennaNGS - A toolbox for building efficient next-generation sequencing
analysis pipelines"
Michael T. Wolfinger, Joerg Fallmann, Florian Eggenhofer and Fabian Amman
F1000Research 4:50 (2015)
DOI: http://dx.doi.org/10.12688/f1000research.6157.2
## NOTES
The Bio::ViennaNGS suite is actively developed and tested on different
flavours of Linux and Mac OS X. We have taken care of writing
platform-independent code that should run out of the box on most UNIX-based
systems, however we do not have access to machines running Microsoft
Windows. As such Microsoft Windows compatibility is not tested.
## BUGS
Please report bugs through the Github issue tracker at
https://github.com/mtw/Bio-ViennaNGS/issues
## AUTHORS
Michael T. Wolfinger <[email protected]>
Joerg Fallmann <[email protected]>
Florian Eggenhofer <[email protected]>
Fabian Amman <[email protected]>
## COPYRIGHT AND LICENCE
Copyright (C) 2014-2017 Michael T. Wolfinger <[email protected]>
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself, either Perl version 5.10.0 or, at your
option, any later version of Perl 5 you may have available.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.