Before running KGDiff, please follow the below instruction to build the virtualenv.
conda create -n kgdiff python=3.9
conda activate kgdiff
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 -c pytorch
conda install pytorch-scatter pytorch-cluster pytorch-sparse==0.6.13 pyg==2.0.4 -c pyg
pip install pyyaml easydict lmdb
pip install numpy==1.21.6 pandas==1.4.1 tensorboard==2.9.0 seaborn==0.11.2
pip install Pillow==9.0.1
conda install -c conda-forge openbabel
pip install meeko==0.1.dev3 vina==1.2.2 pdb2pqr rdkit
# =======================
# install autodocktools
# for linux
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
# for windows
python.exe -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
# =======================
pip install scipy==1.7.3
# be cautious with package version, feel free to open an issue if you meet package conflits.
We have uploaded the datasets (CrossDocked2020 and PDBBind2020) to . Prior to training and generation, kindly download and extract the data.zip(datasets) to the project's root directory.
We have uploaded the pretrained models to logs_diffusion.zip(model ckpts) on . Feel free to download and place them in the root directory.
We recommend training the model on a single GPU with 40GB CUDA memory. The training parameters can be found in the configs/training.yml file.
git clone https://github.com/CMACH508/KGDiff.git
cd KGDiff
python scripts/train_diffusion.py
Here, we provide two examples for molecule generation.
-
To sample molecules based on the first protein from the test set in CrossDocked2020, run the following scripts. If you want to sample from PDBBind2020, replace the arguments "--guide_mode joint" with "--guide_mode pdbbind_random.
python scripts/sample_diffusion.py --config ./configs/sampling.yml -i 0 --guide_mode joint --type_grad_weight 100 --pos_grad_weight 25 --result_path ./cd2020_pro_0_res
-
To sample molecules based on the proteins listed in Table S4 of our paper, run the following scripts.
python scripts/sample_for_pocket.py --pdb_idx 0 --protein_root ./data/extended_poc_proteins/ --guide_mode joint --type_grad_weight 100 --pos_grad_weight 25 --result_path ./extended_pro_0_res
Here, we offer an example for evaluating generated molecules.
python scripts/evaluate_diffusion.py
note: Before you install KGDiff, please create an virtual env in Installation part.
command | excuting files |
---|---|
kg_gen | scripts.sample_diffusion.py |
kg_gen4poc | scripts.sample_for_pocket.py |
kg_train | scripts.train_diffusion.py |
kg_eval | scripts.evaluate_diffusion.py |
Here is an example for training KGDiff.
conda activate kgdiff
pip install KGDiff==0.1.2
kg_train --config your_config_path --ckpt your_ckpt_path --logdir your_ckpt_dirname
# Detailed arguments are given in scripts/train_diffusion.py
We provide the reproduction.ipynb notebook file for reproducing figures and benchmarks in our paper. Before proceeding, please ensure you have downloaded and extracted the misc_results.zip and the benchmark.zip from into the root directory.
If you use the code or data in this package, please cite:
@article{10.1093/bib/bbad435,
author = {Qian, Hao and Huang, Wenjing and Tu, Shikui and Xu, Lei},
title = "{KGDiff: towards explainable target-aware molecule generation with knowledge guidance}",
journal = {Briefings in Bioinformatics},
volume = {25},
number = {1},
pages = {bbad435},
year = {2023},
month = {12},
issn = {1477-4054},
doi = {10.1093/bib/bbad435},
url = {https://doi.org/10.1093/bib/bbad435},
eprint = {https://academic.oup.com/bib/article-pdf/25/1/bbad435/55464687/bbad435.pdf},
}
The code in this package is licensed under the MIT License. We thanks TargetDiff for the open source codes.
If you have any question, please contact us: [email protected].