Skip to content

Commit

Permalink
1. Update training case study to use v0.10.0 and simplify the doc.
Browse files Browse the repository at this point in the history
2. Update all scripts to v0.10.0.
3. Add Singularity instructions in Quick Start.
4. Updated deepvariant-vcf-stats-report.md and images to v0.10.0.
5. Update trio merging. (tedyun@: can you review trio-merge-case-study.md?)

PiperOrigin-RevId: 302530370
  • Loading branch information
pichuan committed Mar 26, 2020
1 parent f7955ff commit e271f74
Show file tree
Hide file tree
Showing 24 changed files with 262 additions and 306 deletions.
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DeepVariant

[![release](https://img.shields.io/badge/release-v0.9.0-green?logo=github)](https://github.com/google/deepvariant/releases)
[![release](https://img.shields.io/badge/release-v0.10.0-green?logo=github)](https://github.com/google/deepvariant/releases)
[![announcements](https://img.shields.io/badge/announcements-blue)](https://groups.google.com/d/forum/deepvariant-announcements)
[![blog](https://img.shields.io/badge/blog-orange)](https://goo.gl/deepvariant)

Expand All @@ -16,7 +16,7 @@ designed for painless integration with the
We recommend using our Docker solution. The command will look like this:

```
BIN_VERSION="0.9.0"
BIN_VERSION="0.10.0"
sudo docker run \
-v "YOUR_INPUT_DIR":"/input" \
-v "YOUR_OUTPUT_DIR:/output" \
Expand All @@ -30,11 +30,15 @@ sudo docker run \
--num_shards=$(nproc) **This will use all your cores to run make_examples. Feel free to change.**
```

For more information, see:
If you're using GPUs, or want to use Singularity instead, see
[Quick Start](docs/deepvariant-quick-start.md) for more details.

For more information, also see:

* [Quick Start](docs/deepvariant-quick-start.md)
* [Full documentation list](docs/README.md)
* [Best practices for multi-sample variant calling with DeepVariant](docs/trio-merge-case-study.md)
* [(Advanced) Training tutorial](docs/deepvariant-training-case-study.md)


## How to cite

Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
## (Advanced) Training

* [Advanced Case Study: Train a customized SNP and small indel variant caller
for BGISEQ-500 data](deepvariant-tpu-training-case-study.md)
for BGISEQ-500 data](deepvariant-training-case-study.md)
* [DeepVariant training data](deepvariant-details-training-data.md)

## More details
Expand Down
4 changes: 2 additions & 2 deletions docs/deepvariant-details.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ Cloud Platform. Specifying the CPU platform also allows us to report the runtime
more consistently.

```shell
gcloud beta compute instances create "${USER}-cpu" \
gcloud compute instances create "${USER}-cpu" \
--scopes "compute-rw,storage-full,cloud-platform" \
--image-family "ubuntu-1604-lts" \
--image-project "ubuntu-os-cloud" \
Expand All @@ -254,7 +254,7 @@ gcloud beta compute instances create "${USER}-cpu" \
### Command for a GPU machine on Google Cloud Platform

```shell
gcloud beta compute instances create "${USER}-gpu" \
gcloud compute instances create "${USER}-gpu" \
--scopes "compute-rw,storage-full,cloud-platform" \
--maintenance-policy "TERMINATE" \
--accelerator=type=nvidia-tesla-p100,count=1 \
Expand Down
13 changes: 7 additions & 6 deletions docs/deepvariant-pacbio-model-case-study.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,21 @@

In this case study we describe applying DeepVariant to PacBio CCS reads to call
variants. We will call small variants from a publicly available whole genome
from PacBio.
CCS dataset from PacBio.

Starting from v0.10.0, our PacBio model is trained with additional amplified
library data, which provides a significant accuracy boost on amplified data.
Starting from v0.10.0, sequence from amplified libraries is included in our
PacBio CCS training set, providing a significant accuracy boost to variant
detection from amplified CCS data.

Case study is run on a standard Google Cloud instance. There are no special
hardware or software requirements for running this case study. For consistency
we use Google Cloud instance with 64 cores and 128 GB of memory. This is NOT the
fastest or cheapest configuration. For more scalable execution of DeepVariant
see the [External Solutions] section.

In v0.8 DeepVariant released a special model that works with PacBio data. In
this case study we will apply PacBio model by specifying `PACBIO` in
`model_type` parameter in the `run_pacbio_case_study_docker.sh` script.
In v0.8 DeepVariant released a model for PacBio CCS data. In this case study we
will apply PacBio model by specifying `PACBIO` in `model_type` parameter in the
`run_pacbio_case_study_docker.sh` script.

## Case study overview

Expand Down
51 changes: 50 additions & 1 deletion docs/deepvariant-quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,55 @@ output.visual_report.html
For more information about `output.visual_report.html`, see the
[VCF stats report documentation](deepvariant-vcf-stats-report.md).

## Notes on GPU image

If you are using GPUs, you can pull the GPU version, and make sure to run with
`nvidia-docker`:

```
sudo nvidia-docker run \
-v "${INPUT_DIR}":"/input" \
-v "${OUTPUT_DIR}:/output" \
google/deepvariant:"${BIN_VERSION}-gpu" \
/opt/deepvariant/bin/run_deepvariant \
...
```

## Notes on Singularity

### CPU version

```
# Pull the image.
singularity pull docker://google/deepvariant:"${BIN_VERSION}"
# Run DeepVariant.
singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
docker://google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \ **Replace this string with exactly one of the following [WGS,WES,PACBIO]**
--ref="${INPUT_DIR}"/ucsc.hg19.chr20.unittest.fasta \
--reads="${INPUT_DIR}"/NA12878_S1.chr20.10_10p1mb.bam \
--regions "chr20:10,000,000-10,010,000" \
--output_vcf="${OUTPUT_DIR}"/output.vcf.gz \
--output_gvcf="${OUTPUT_DIR}"/output.g.vcf.gz \
--num_shards=1 \ **How many cores the `make_examples` step uses. Change it to the number of CPU cores you have.**
```

### GPU version

```
# Pull the image.
singularity pull docker://google/deepvariant:"${BIN_VERSION}-gpu"
# Run DeepVariant.
# Using "--nv" and "${BIN_VERSION}-gpu" is important.
singularity run --nv -B /usr/lib/locale/:/usr/lib/locale/ \
docker://google/deepvariant:"${BIN_VERSION}-gpu" \
/opt/deepvariant/bin/run_deepvariant \
...
```

## Evaluating the results

Here we use the `hap.py`
Expand Down Expand Up @@ -189,7 +238,7 @@ Benchmarking Summary:
[BAM]: http://genome.sph.umich.edu/wiki/BAM
[BWA]: https://academic.oup.com/bioinformatics/article/25/14/1754/225615/Fast-and-accurate-short-read-alignment-with
[docker build]: https://docs.docker.com/engine/reference/commandline/build/
[Dockerfile]: https://github.com/google/deepvariant/blob/r0.9/Dockerfile
[Dockerfile]: https://github.com/google/deepvariant/blob/r0.10/Dockerfile
[External Solutions]: https://github.com/google/deepvariant#external-solutions
[FASTA]: https://en.wikipedia.org/wiki/FASTA_format
[Quick Start in r0.7]: https://github.com/google/deepvariant/blob/r0.7/docs/deepvariant-quick-start.md
Expand Down
Loading

0 comments on commit e271f74

Please sign in to comment.