The table below shows the CPU system configuration and runtime (in seconds) for each kernel.
CPU | SIMD Flag | Operating System | Threads | BSW | Chain | PairHMM | POA |
---|---|---|---|---|---|---|---|
Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz | AVX512 | CentOS Linux 7 (CORE) | 80 | 0.0504 | 0.306 | 0.587 | 16.6 |
Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz | AVX512 | Ubuntu 20.04.5 LTS | 32 | 0.0984 | 0.473 | 0.792 | 34.3 |
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz | AVX2 | CentOS Linux 7 (CORE) | 28 | 0.196 | 2.35 | 2.13 | 41.7 |
12th Gen Intel(R) Core(TM) i5-12600 | AVX2 | Ubuntu 22.04.2 LTS | 12 | 0.140 | 2.21 | 1.71 | 36.6 |
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | AVX2 | Ubuntu 20.04.5 LTS | 8 | 0.29 | 4.79 | 4.51 | 98.5 |
The table below shows the GPU system configuration and runtime (in seconds) for each kernel.
GPU | Arch Code | CUDA Version | BSW | Chain | PairHMM | POA |
---|---|---|---|---|---|---|
NVIDIA A100 | sm_80 | 11.2 | 0.012 | 0.155 | 0.597 | 2.53 |
NVIDIA RTX A6000 | sm_86 | 12.0 | 0.012 | 0.339 | 0.572 | 3.70 |
NVIDIA TITAN Xp | sm_61 | 10.2 | 0.020 | 0.747 | 0.915 | 11.2 |
The CPU baselines are obtained from the Intel Xeon Platinum 8380 CPU @ 2.30GHz with 80 threads in 1 socket and AVX512. The CPU die area is 600mm2. The GPU baselines are obtained from the NVIDIA RTX A100 and its die area is 826mm2. In the Chain
benchmark, GPU and GenDP throughputs are penalized by 3.72x because they use a re-ordered chaining algorithm and compute 3.72x more cells than the CPU implementation. The CPU baselines and GenDP throughputs are normalized to 7nm technology for a fair comparison with GPU baselines. GenDP achieves an average 157.8x throughput/mm2 speedup over GPU. The Metric used in the table is Giga Cell Updates per Second (GCUPS) and Mega Cell Updates per Second/mm2 (MCUPS/mm2).
BSW | Chain | PairHMM | POA | |
---|---|---|---|---|
Total Cell Updates | 2431855834 | 20736142007 | 258363282803 | 6448581509 |
CPU Runtime (seconds) | 0.0504 | 0.306 | 0.587 | 16.6 |
CPU GCUPS | 44.91 | 19.61 | 32.88 | 14.51 |
CPU Normalized MCUPS/mm2 | 130.29 | 56.89 | 95.41 | 42.11 |
GPU Runtime (seconds) | 0.012 | 0.155 | 0.597 | 2.53 |
GPU GCUPS | 192.92 | 10.40 | 32.35 | 95.13 |
GPU MCUPS/mm2 | 239.16 | 12.89 | 40.11 | 117.94 |
ASIC Normalized MCUPS/mm2 | 118,950 | - | 51,867 | - |
GenDP Normalized MCUPS/mm2 | 47,574 | 3,626 | 17,681 | 2,965 |
GenDP Speedup over CPU | 365.1x | 63.7x | 185.3x | 70.4x |
GenDP Speedup over GPU | 198.9x | 281.4x | 440.8x | 25.1x |
- Intel CPU with 16G memory and 40G storage
- Linux OS
- NVIDIA GPU and CUDA >= 10.0
- gcc >= 8.3.1
- cmake >= 3.16.0
- OpenMP >= 201511
- Intel DPC++/C++ Compiler >= 2021.8.0
- ZLIB >= 1.2.8
- Python >= 3.7.9
- numactl >= 2.0.0
# Install Intel(R) oneAPI DPC++/C++ Compiler (ICX)
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/19123/l_dpcpp-cpp-compiler_p_2023.0.0.25393_offline.sh
sudo sh ./l_dpcpp-cpp-compiler_p_2023.0.0.25393_offline.sh
# Activate OneAPI Toolkit
source /opt/intel/oneapi/setvars.sh
# Clone GenDP
git clone --recursive https://github.com/Yufeng98/GenDP.git
cd GenDP
# Download Datasets
wget https://genomicsbench.eecs.umich.edu/gendp-datasets.tar.gz
tar -zxvf gendp-datasets.tar.gz
If you encounter errors while running, please see the scripts in cpu-baselines/README.md
for debugging.
# Specify workspace directory
export GenDP_WORK_DIR=`pwd`
# Specify the SIMD flag and number of threads to use.
# Check SIMD compatibility with `lscpu | grep Flags`, e.g., sse, avx2, avx512
# Use sse4.1 as the default SIMD flag, could also choose avx2 or avx512
bash run-cpu-baselines.sh <SIMD_FLAG> <NUM_THREADS> 2>&1 | tee cpu-baselines-log.txt
python3 $GenDP_WORK_DIR/profile-cpu-baselines-log.py cpu-baselines-log.txt
If you encounter errors while running, please see the scripts in gpu-baselines/README.md
for debugging.
export GenDP_WORK_DIR=`pwd`
# The path of CUDA library <CUDA_PATH> is usually /usr/local/cuda-xx
# The path of CUDA binary library <CUDA_BINARY_PATH> is usually /usr/local/cuda-xx/bin
# The <ARCH_CODE> could be found by checking the compute capability of the GPU from https://developer.nvidia.com/cuda-gpus
# E.g. if the Compute Capability of NVIDIA A100 is 8.0, its ARCH_CODE is sm_80
bash run-gpu-baselines.sh <CUDA_PATH> <CUDA_BINARY_PATH> <ARCH_CODE> 2>&1 | tee gpu-baselines-log.txt
python3 $GenDP_WORK_DIR/profile-gpu-baselines-log.py gpu-baselines-log.txt
If you encounter errors while running, please see the scripts in gendp/README.md
for debugging. The simulation results could be different but comparable to the reported table. because the script does not run the entire datasets. The script could also be configured to run the entire datasets by changing the input size to -1 and will generate the same throughputs as above, but it may take ~250 hours for simulation and ~2T storage.
export GenDP_WORK_DIR=`pwd`
# bash run-gendp-simulation.sh <Chain input size> <PairHMM input size> <POA input size>
# See approximate runtime on different input sizes for each kernel in script run-gendp-simulation.sh
# BSW simulation is fast and entire dataset is default.
bash run-gendp-simulation.sh 500 100000 100 2>&1 | tee gendp-simulation-log.txt # ~ 6 hours
bash run-gendp-simulation.sh 2000 500000 200 2>&1 | tee gendp-simulation-log.txt # ~ 24 hours
bash run-gendp-simulation.sh -1 -1 -1 2>&1 | tee gendp-simulation-log.txt # ~ 250 hours for entire dataset
python3 $GenDP_WORK_DIR/profile-gendp-simulation-log.py gendp-simulation-log.txt
We appreciate any feedback and suggestions from the community. Feel free to raise an issue or submit a pull request on Github. For assistance in using GenDP, please contact: Yufeng Gu (yufenggu AT umich DOT edu).
If you decide to use GenDP in your research, please cite the following references:
Gu, Y., Subramaniyan, A., Dunn, T., Khadem, A., Chen, K.Y., Paul, S., Vasimuddin, M., Misra, S., Blaauw, D., Narayanasamy, S. and Das, R., 2023, June. GenDP: A Framework of Dynamic Programming Acceleration for Genome Sequencing Analysis. In Proceedings of the 50th Annual International Symposium on Computer Architecture (pp. 1-15).
@inproceedings{gu2023gendp,
title={GenDP: A Framework of Dynamic Programming Acceleration for Genome Sequencing Analysis},
author={Gu, Yufeng and Subramaniyan, Arun and Dunn, Tim and Khadem, Alireza and Chen, Kuan-Yu and Paul, Somnath and Vasimuddin, Md and Misra, Sanchit and Blaauw, David and Narayanasamy, Satish and others},
booktitle={Proceedings of the 50th Annual International Symposium on Computer Architecture},
pages={1--15},
year={2023}
}