Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unambig-ti subcommand #32

Merged
merged 8 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 92 additions & 121 deletions Cargo.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "haddock-restraints"
version = "0.6.2"
version = "0.7.0"
edition = "2021"
description = "Generate restraints to be used in HADDOCK"
license = "MIT"
Expand Down
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ A standalone command-line application to generate restraints to be used in HADDO
- [`restraint`: Generate Unambiguous restraints to keep molecules together during docking](https://www.bonvinlab.org/haddock-restraints/restraint.html)
- [`interface`: List residues in the interface](https://www.bonvinlab.org/haddock-restraints/interface.html)
- [`z`: Generate Z-restraints for a protein](https://www.bonvinlab.org/haddock-restraints/z.html)
- [`unambig-ti`: Generate unambiguous true-interface restraints from a PDB file](https://www.bonvinlab.org/haddock-restraints/unambig-ti.html)

## Usage

Expand All @@ -31,9 +32,9 @@ OR

- Install it with [`cargo`](https://www.rust-lang.org/tools/install)

```bash
cargo install haddock-restraints
```
```bash
cargo install haddock-restraints
```

## Execute

Expand All @@ -44,12 +45,13 @@ Generate restraints to be used in HADDOCK
Usage: haddock-restraints <COMMAND>

Commands:
tbl Generate TBL file from input file
ti Generate true-interface restraints from a PDB file
restraint Generate Unambiguous restraints to keep molecules together during docking
interface List residues in the interface
z Generate Z-restraints for a protein
help Print this message or the help of the given subcommand(s)
tbl Generate TBL file from input file
ti Generate true-interface restraints from a PDB file
unambig-ti Generate unambiguous true-interface restraints from a PDB file
restraint Generate unambiguous restraints to keep molecules together during docking
interface List residues in the interface
z Generate Z-restraints for a protein
help Print this message or the help of the given subcommand(s)

Options:
-h, --help Print help
Expand Down
19 changes: 19 additions & 0 deletions docs/src/unambig-ti.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Generate _unambiguous true-interface_ restraints from a PDB file

This is a more strict version of the true-interface restraints that assigns
specific pairs to each residue in the interface.

For each atom in each of the chains, the algorithm defines the pairs based on
atomic interactions between any atom of a given residue in the interface and
any other atom in an interacting chain. The restraints are later defined
considering the CA-CA distances of such pairs; thus allowing for
side-chain flexibility.

## Usage

To run the `unambig-ti` subcommand, you just need to provide the path to the
PDB file and the cutoff distance. For example:

```bash
haddock-restraints unambig-ti path/to/complex.pdb 5.0 > unambig.tbl
```
5 changes: 4 additions & 1 deletion docs/src/usage.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Usage

`haddock-restraints` is a command-line tool that is organized with subcommands. Each subcommand has its own set of options and arguments. The main subcommands are:
`haddock-restraints` is a command-line tool that is organized with subcommands.
Each subcommand has its own set of options and arguments. The main subcommands are:

- [`tbl`: Generate .tbl file from a configuration file](./tbl.md)

Expand All @@ -11,3 +12,5 @@
- [`interface`: List residues in the interface](./interface.md)

- [`z`: Generate restraints to keep the molecule aligned to the Z-axis](./z.md)

- [`unambig-ti`: Generate unambiguous true-interface restraints](./unambig-ti.md)
90 changes: 88 additions & 2 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,14 @@ enum Commands {
#[arg(help = "Cutoff distance for interface residues")]
cutoff: f64,
},
#[command(about = "Generate Unambiguous restraints to keep molecules together during docking")]
#[command(about = "Generate unambiguous true-interface restraints from a PDB file")]
UnambigTi {
#[arg(help = "PDB file")]
input: String,
#[arg(help = "Cutoff distance for interface residues")]
cutoff: f64,
},
#[command(about = "Generate unambiguous restraints to keep molecules together during docking")]
Restraint {
#[arg(help = "PDB file")]
input: String,
Expand All @@ -45,7 +52,6 @@ enum Commands {
#[arg(help = "Cutoff distance for interface residues")]
cutoff: f64,
},

#[command(about = "Generate Z-restraints for a protein")]
Z {
#[arg(required = true, help = "Input file")]
Expand Down Expand Up @@ -89,6 +95,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
Commands::Ti { input, cutoff } => {
let _ = true_interface(input, cutoff);
}
Commands::UnambigTi { input, cutoff } => {
let _ = unambig_ti(input, cutoff);
}
Commands::Restraint { input } => {
let _ = restraint_bodies(input);
}
Expand Down Expand Up @@ -172,6 +181,74 @@ fn gen_tbl(input_file: &str) {
println!("{}", tbl);
}

/// Generates Unambiguous Topological Interactions (TIs) from a protein structure.
///
/// This function reads a PDB file, identifies the closest residue pairs based on a specified distance cutoff,
/// and creates unambiguous interactors for each residue pair.
///
/// # Arguments
///
/// * input_file - A string slice that holds the path to the input PDB file.
/// * cutoff - A reference to a f64 value specifying the distance cutoff (in Angstroms) for determining interactions.
///
/// # Returns
///
/// A Result<String, Box<dyn Error>> containing the generated TBL (Topological Restraints List) if successful.
///
/// # Functionality
///
/// 1. Loads the PDB file using the structure::load_pdb function.
/// 2. Finds the closest residue pairs within the specified distance cutoff.
/// 3. Creates Interactor instances for each residue pair.
/// 4. Assigns chains, active/passive residues, and atoms to the interactors.
/// 5. Generates the Topological Restraints List (TBL) using the created interactors.
/// 6. Prints the generated TBL to stdout.
///
/// # Panics
///
/// This function will panic if:
/// - The PDB file cannot be opened or parsed.
fn unambig_ti(input_file: &str, cutoff: &f64) -> Result<String, Box<dyn Error>> {
let pdb = match structure::load_pdb(input_file) {
Ok(pdb) => pdb,
Err(e) => {
panic!("Error opening PDB file: {:?}", e);
}
};
let pairs = structure::get_closest_residue_pairs(&pdb, *cutoff);

let mut interactors: Vec<Interactor> = Vec::new();
let mut counter = 0;
pairs.iter().for_each(|g| {
let mut interactor_i = Interactor::new(counter);
counter += 1;
let mut interactor_j = Interactor::new(counter);
interactor_j.add_target(counter - 1);
interactor_i.add_target(counter);
counter += 1;

interactor_i.set_chain(g.chain_i.as_str());
interactor_i.set_active(vec![g.res_i as i16]);
interactor_i.set_active_atoms(vec![g.atom_i.clone()]);
interactor_i.set_target_distance(g.distance);

interactor_j.set_chain(g.chain_j.as_str());
interactor_j.set_passive(vec![g.res_j as i16]);
interactor_j.set_passive_atoms(vec![g.atom_j.clone()]);

interactors.push(interactor_i);
interactors.push(interactor_j);
});

// Make the restraints
let air = Air::new(interactors);
let tbl = air.gen_tbl().unwrap();

println!("{}", tbl);

Ok(tbl)
}

/// Analyzes the true interface of a protein structure and generates Ambiguous Interaction Restraints (AIRs).
///
/// This function reads a PDB file, identifies the true interface between chains based on a distance cutoff,
Expand Down Expand Up @@ -645,4 +722,13 @@ assign ( resid 2 and segid A and name CA ) ( resid 8 and segid A and name CA ) 1
Err(_e) => (),
}
}
#[test]
fn test_unambigti() {
let expected_tbl = "assign ( resid 2 and segid A and name CA ) ( resid 10 and segid B and name CA ) 9.1 2.0 0.0\n\n";

match unambig_ti("tests/data/two_res.pdb", &5.0) {
Ok(tbl) => assert_eq!(tbl, expected_tbl),
Err(_e) => (),
}
}
}
116 changes: 116 additions & 0 deletions src/structure.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,17 @@ pub struct Bead {
pub position: Vector3<f64>,
}

#[derive(Debug, Clone)]
pub struct Pair {
pub chain_i: String,
pub res_i: isize,
pub atom_i: String,
pub chain_j: String,
pub res_j: isize,
pub atom_j: String,
pub distance: f64,
}

/// Performs a neighbor search for residues within a specified radius of target residues.
///
/// This function uses a K-d tree to efficiently find residues that are within a given radius
Expand Down Expand Up @@ -623,6 +634,88 @@ pub fn process_pdb(input_pdb: &str) -> BufReader<Cursor<Vec<u8>>> {
pdb_handler::pad_lines(temp_path)
}

/// Finds the closest residue pairs between different protein chains within a specified distance cutoff.
///
/// This function analyzes inter-chain interactions by identifying the closest atom pairs between residues
/// from different chains, using the alpha carbon (CA) atoms for distance calculation.
///
/// # Arguments
///
/// * pdb - A reference to a parsed PDB structure.
/// * cutoff - A floating-point value specifying the maximum distance (in Angstroms) for considering residue pairs.
///
/// # Returns
///
/// A Vec<Pair> containing the closest residue pairs within the specified cutoff, sorted by distance.
///
/// # Functionality
///
/// 1. Iterates through all unique chain pairs in the PDB structure.
/// 2. For each chain pair, finds the closest atoms between residues.
/// 3. Selects pairs where the inter-residue distance is less than the specified cutoff.
/// 4. Uses alpha carbon (CA) atoms for distance calculation and pair representation.
/// 5. Sorts the resulting pairs by their inter-alpha carbon distance.
///
/// # Notes
///
/// - Only inter-chain interactions are considered.
/// - The distance is calculated based on the closest atom pairs between residues.
/// - The resulting pairs are sorted from shortest to longest distance.
pub fn get_closest_residue_pairs(pdb: &pdbtbx::PDB, cutoff: f64) -> Vec<Pair> {
let mut closest_pairs = Vec::new();

let chains: Vec<_> = pdb.chains().collect();

for i in 0..chains.len() {
for j in (i + 1)..chains.len() {
let chain_i = &chains[i];
let chain_j = &chains[j];

for res_i in chain_i.residues() {
let mut min_distance = f64::MAX;
let mut closest_pair = None;

for res_j in chain_j.residues() {
let mut atom_dist = f64::MAX;
for atom_i in res_i.atoms() {
for atom_j in res_j.atoms() {
let distance = atom_i.distance(atom_j);
if distance < atom_dist {
atom_dist = distance;
}
}
}
if atom_dist < min_distance {
min_distance = atom_dist;
closest_pair = Some((res_j, res_i));
}
}

if min_distance < cutoff {
if let Some((res_j, res_i)) = closest_pair {
let ca_i = res_i.atoms().find(|atom| atom.name() == "CA").unwrap();
let ca_j = res_j.atoms().find(|atom| atom.name() == "CA").unwrap();
let ca_ca_dist = ca_i.distance(ca_j);
closest_pairs.push(Pair {
chain_i: chain_i.id().to_string(),
res_i: res_i.serial_number(),
atom_i: ca_i.name().to_string(),
chain_j: chain_j.id().to_string(),
res_j: res_j.serial_number(),
atom_j: ca_j.name().to_string(),
distance: ca_ca_dist,
});
}
}
}
}
}

closest_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());

closest_pairs
}

#[cfg(test)]
mod tests {

Expand Down Expand Up @@ -686,4 +779,27 @@ mod tests {
.filter(|line| line.starts_with("ATOM"))
.all(|line| line.len() == 80));
}
#[test]
fn test_get_closest_residue_pairs() {
let pdb = load_pdb("tests/data/two_res.pdb").unwrap();
let observed_pairs = get_closest_residue_pairs(&pdb, 5.0);
let expected_pairs = vec![Pair {
chain_i: "A".to_string(),
chain_j: "B".to_string(),
atom_i: "CA".to_string(),
atom_j: "CA".to_string(),
res_i: 2,
res_j: 10,
distance: 9.1,
}];
assert_eq!(observed_pairs.len(), expected_pairs.len());
let pair = &observed_pairs[0];
assert_eq!(pair.chain_i, expected_pairs[0].chain_i);
assert_eq!(pair.chain_j, expected_pairs[0].chain_j);
assert_eq!(pair.atom_i, expected_pairs[0].atom_i);
assert_eq!(pair.atom_j, expected_pairs[0].atom_j);
assert_eq!(pair.res_i, expected_pairs[0].res_i);
assert_eq!(pair.res_j, expected_pairs[0].res_j);
assert!((pair.distance - 9.1).abs() < 0.1);
}
}
18 changes: 18 additions & 0 deletions tests/data/two_res.pdb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
ATOM 8 N THR A 2 15.115 11.555 5.265 1.00 7.81 N
ATOM 9 CA THR A 2 13.856 11.469 6.066 1.00 8.31 C
ATOM 10 C THR A 2 14.164 10.785 7.379 1.00 5.80 C
ATOM 11 O THR A 2 14.993 9.862 7.443 1.00 6.94 O
ATOM 12 CB THR A 2 12.732 10.711 5.261 1.00 10.32 C
ATOM 13 CG2 THR A 2 12.484 11.442 3.895 1.00 11.90 C
ATOM 14 OG1 THR A 2 13.308 9.439 4.926 1.00 12.81 O
ATOM 47 N ARG B 10 7.647 4.909 10.005 1.00 3.73 N
ATOM 48 CA ARG B 10 8.496 4.609 8.837 1.00 3.38 C
ATOM 49 C ARG B 10 7.798 3.609 7.876 1.00 3.47 C
ATOM 50 O ARG B 10 7.878 3.778 6.651 1.00 4.67 O
ATOM 51 CB ARG B 10 9.847 4.020 9.305 1.00 3.95 C
ATOM 52 CG ARG B 10 10.752 3.607 8.149 1.00 4.55 C
ATOM 53 CD ARG B 10 11.226 4.699 7.244 1.00 5.89 C
ATOM 54 NE ARG B 10 12.143 5.571 8.035 1.00 6.20 N
ATOM 55 CZ ARG B 10 12.758 6.609 7.443 1.00 7.52 C
ATOM 56 NH1 ARG B 10 12.539 6.932 6.158 1.00 10.68 N1+
ATOM 57 NH2 ARG B 10 13.601 7.322 8.202 1.00 9.48 N