-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Enhanced sampling simulation cookbooks #21
base: main
Are you sure you want to change the base?
Conversation
…-cookbook into add-enhanced-sampling
These aren't really cookbook entries as such. Take a look at the existing entries to get a sense for how they're written. A cookbook entry should be short, to the point, and completely self contained. It should start by clearly stating the problem, then present the minimal amount of code needed to solve it, while clearly explaining the code so the user can adapt it to their own problem. These notebooks are more like tutorials based on their size and complexity. And they don't have much explanation, just large blocks of code. |
These notebooks are self-contained and run using the I was basing it off a discussion in another thread where you mentioned:
So I wrote it assuming someone would know what the theory behind any of these enhanced sampling methods are and just need the code to run them on any system of their choice. I figure that the "tutorials" would have more in depth discussion, but happy to add more documentation/problem statements if that would be preferred. Admittedly these aren't super small notebooks, but I think they are still descriptively minimal examples? Partially because setting up and running Either way I definitely want to document these a lot better and flesh out the details more but I just wanted to get something out there that "runs" in case y'all had any thoughts. Appreciate this discussion!
I'm also happy to migrate these to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked one notebook and added detailed comments on it. I didn't go through the others, because very similar comments apply to them too. Does this make it clearer what we're looking for in a cookbook entry?
"from openmm.app import *\n", | ||
"from openmm import * \n", | ||
"from openmm.unit import *\n", | ||
"from openmmtools import forces\n", | ||
"import mdtraj as md\n", | ||
"import numpy as np\n", | ||
"import parmed as pmd\n", | ||
"import bz2\n", | ||
"import os\n", | ||
"from openmm import CustomIntegrator\n", | ||
"from openmm.unit import kilojoules_per_mole, is_quantity\n", | ||
"from openmm.unit import *\n", | ||
"import numpy as np\n", | ||
"import pandas as pd \n", | ||
"from matplotlib import pyplot as plt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few problems with these imports.
- Many of them are never used! So why import them?
- Cookbook entries need to run in CI. That means if an external package doesn't get automatically installed with OpenMM, you need to explicitly install it as part of the notebook.
- For that reason, we should be very sparing about using external packages in the cookbook. They add complexity and should only be used when there's a really good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - for clarity, I used the external packages pandas
and matplotlib
in other notebooks to demonstrate how one would choose hyperparameters for things like AMD or Metadynamics.
From a teaching standpoint do you think that's strictly necessary or would it be fine to just describe the process and skip ahead? If you think it's strictly necessary I can modify the notebook accordingly to install needed packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It comes down to what you're trying to teach. If pandas is central to what you're teaching, then it's necessary. But if it's just incidental, a way of doing some setup, it's better to omit it.
In the Gaussian AMD tutorial, you use pandas to load a log file from an earlier simulation, then compute a few statistics of the energy from it. For a cookbook entry, I think it's fine to skip that and just hardcode them. Something like this:
Before running a Gaussian AMD simulation, you first need to select a strength for the boost potential. This is usually done by running a short simulation, recording the potential energy, and computing some simple statistics of it. The following are the minimum, maximum, mean, and standard deviation of the energies observed during an earlier simulation.
And then just provide hardcode values:
Vmin = -801890.7367725638,
Vmax = -794737.8288112064,
Vavg = -798419.040535957,
Vstd = 904.4602237453533
"# define a function for creating RMSD restraints \n", | ||
"def create_rmsd_restraint(positions, atom_indicies):\n", | ||
" rmsd_cv = RMSDForce(positions, atom_indicies)\n", | ||
" energy_expression = 'step(dRMSD) * (K_RMSD/2) * dRMSD^2; dRMSD = (RMSD-RMSD0);' \n", | ||
" energy_expression += 'K_RMSD = %f;' % spring_constant.value_in_unit_system(md_unit_system) \n", | ||
" energy_expression += 'RMSD0 = %f;' % restraint_distance.value_in_unit_system(md_unit_system) \n", | ||
" restraint_force = CustomCVForce(energy_expression) \n", | ||
" restraint_force.addCollectiveVariable('RMSD', rmsd_cv) \n", | ||
" return restraint_force \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is never used.
"# define a function to list force groups and modify system context \n", | ||
"def forcegroupify(system): \n", | ||
" forcegroups = {} \n", | ||
" for i in range(system.getNumForces()): \n", | ||
" force = system.getForce(i) \n", | ||
" force.setForceGroup(i) \n", | ||
" forcegroups[force] = i \n", | ||
" return forcegroups" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is never used.
"sim_temp = 300.0 * kelvin\n", | ||
"H_mass = 4.0 * amu #Might need to be tuned to 3.5 amu \n", | ||
"time_step = 0.002 * picosecond \n", | ||
"nb_cutoff = 10.0 * angstrom \n", | ||
"box_padding = 12.0 * angstrom\n", | ||
"salt_conc = 0.15 * molar\n", | ||
"receptor_path=\"../villin.pdb\"\n", | ||
"current_file=\"villin-solvated\"\n", | ||
"# Misc parameters \n", | ||
"restraint_distance = 0.0 * angstroms \n", | ||
"restart_freq = 10\n", | ||
"log_freq = 1 \n", | ||
"prd_steps = 100 " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these parameters are never used. Most of the others are only used once, so you can just put the value at the appropriate place in the code without defining a variable for it. Remember, the goal of a cookbook entry is to show how to do one specific thing with the smallest amount of code possible. It doesn't need to be a complete, robust simulation code.
" constraints=HBonds, \n", | ||
" rigidWater=True,\n", | ||
" hydrogenMass=H_mass)\n", | ||
"system.addForce(MonteCarloBarostat(1*bar, sim_temp))\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a barostat is unnecessary. It's a fine thing to do in a real simulation, but it's unrelated to umbrella sampling. It doesn't contribute toward showing the user what we're trying to show them.
"simulation.reporters.append(DCDReporter('./'+current_file+''+ '.dcd', restart_freq))\n", | ||
"simulation.reporters.append(CheckpointReporter('./'+current_file+''+ '.chk', min(prd_steps, 10*restart_freq)))\n", | ||
"simulation.reporters.append(StateDataReporter(open('./log.' + current_file+'', 'w'), log_freq, step=True, potentialEnergy=True, kineticEnergy=True, totalEnergy=True, temperature=True, volume=True, density=True, speed=True))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary. Unless you specifically refer to the output of a reporter later on, it's relevant to what you're teaching.
"## set force constant K for the biasing potential.\n", | ||
"## the unit here is kJ*mol^{-1}*nm^{-2}, which is the default unit used in OpenMM\n", | ||
"K = 100\n", | ||
"simulation.context.setParameter(\"k\", K)\n", | ||
"\n", | ||
"## M centers of harmonic biasing potentials\n", | ||
"M = 20\n", | ||
"r0_range = np.linspace(0.3, 2.0, M, endpoint = False)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters will already have been set in the calls to addBond()
above.
"simulation.context.setParameter('r0_d1', r0_range[1])\n", | ||
"simulation.context.setParameter('r0_d2', r0_range[2])" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
"simulation.context.setPositions(pdb.positions)\n", | ||
"print(' initial : %s' % (simulation.context.getState(getEnergy=True).getPotentialEnergy()))\n", | ||
"simulation.minimizeEnergy()\n", | ||
"print(' final : %s' % (simulation.context.getState(getEnergy=True).getPotentialEnergy()))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has nothing to do with umbrella sampling.
"for i in range(5):\n", | ||
" simulation.step(10)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what's going on. We've set up a simulation with restraints. Now we run some dynamics and collect statistics. What statistics do we need to collect? How do we collect them? What do we do with them once we've collected them?
These comments are super helpful - thanks so much! In looking through the examples, I didn't appreciate that it didn't have to be a proper "robust" simulation code so I included everything (and it's from an older "chicken scratch" notebook).
Indeed - initially this was actually one bigger notebook with all four separate approaches applying to the same Thanks again! I'll work on resolving the comments you made, and apply them to the other notebooks as well. I'll try to ensure that these are much more lightweight examples. |
@peastman Would you be willing to take a look at these cookbooks now and let me know if you think they've been sufficiently cleaned up? I've cleaned up all four notebooks to ensure it's just as minimal a simulation as possible. For each notebook, I also tried to document how one could save outputs (without using reporters) in a way that I like to do it, but I wasn't sure it added too much complexity or not. It doesn't involve too much overhead and it's a nice way to only save what you need to track the sampling of your simulation or generate the free energy projections. Any and all comments are welcome! |
@sef43 also wanted to tag you in case you may have had any thoughts/edits here! |
Sorry this PR got forgotten! I think it just got lost in everyone's inboxes. Let's get back to it and see if we can finish cleaning it up. What you have now is closer to what we're looking for, but I think we can improve it. If you read through the existing cookbook entries, you'll see they tend to follow a formula.
For comparison, look at your umbrella sampling entry again. Instead of explaining what problem we're solving, it goes straight into code. The only explanation is vague titles like, "Create system + CV Forces with harmonic restraints." When the reader finally gets to some text, they don't have the context needed to understand it. "Ultimately we want to track the value of these distances..." But you haven't told them what these distances are or why we would want to track them. My best suggestion is really just to read through the existing cookbook entries. I think that will make it clear. They tend to have at least as much text as code, and they never present code without first explaining what the code is for. |
No worries @peastman - totally understand that things come up and/or get buried! Thanks for pointing out the additional notebooks and the guidance of the Four Parts. I've tried out an "example" of the documenting on the Umbrella sampling notebook, since you brought it up - Could you take a look at it and let me know if this is enough/needs editing/additional fleshing out still? I tried adding more context without adding more math or anything and kept the code to the minimum. Hopefully this provides sufficient context? If not I can definitely keep expanding! If it looks good here then I have a starting point to edit the other notebooks and can push changes to them accordingly - just want to make sure we converge on "one good notebook" first! |
That's a lot better. Actually, what you have now is more of a tutorial than a cookbook recipe. But that's ok, let's just call it a tutorial instead. Umbrella sampling is a big enough topic to justify one. I'm not sure the One solution is to put the CustomCVForce into its own force group, then call But all of this may be unnecessary complexity. What about adding |
I think there's already an Umbrella sampling tutorial that goes into a lot more of the theoretical details of both the running and the analysis. I can reference it more directly if that's preferred but figured that this would just act as a "minimal code" example (which is what I thought the cookbook was). What would be the best way to resolve this then? Should I remove the umbrella sampling as a cookbook or make any other changes to make it more cookbook-like?
Yea I was trying to figure out a way to do it while importing as few libraries as possible (maintaining the
I could try adding this into the notebook to help reduce computational cost! Would it just be as simple as
I think this is a great approach and was what I first thought of doing but I think evaluating distances after the simulation as run (which would require a restart from the user) is just a different, but equally valid, approach to evaluating sampling quality. I'm happy to include a separate code block to show how one would evaluate the distances after the simulation if you think that would be helpful! |
Good point. Do you think this one adds anything beyond what's in the existing tutorial? I generally think of the cookbook as reference material for answering very specific questions that wouldn't justify a whole tutorial. Things like, "How do I change the temperature during a simulation?" or, "How do I decompose the total energy into contributions from different interactions?" The user already knows exactly what they want to do. They just want to find out what code to write to do it.
I didn't mean to move the calculation to after the simulation is run. It would still be computed live, just by a different method. You would replace d1,d2 = dist_measurer.getCollectiveVariableValues(simulation.context) with positions = state.getPositions(asNumpy=True)
d1 = norm(positions[d1_atom1_ind]-positions[d1_atom2_ind])
d2 = norm(positions[d2_atom1_ind]-positions[d2_atom2_ind]) |
So I think mine is a lot more "straight to the point" as far as providing code to setup and run a single window for a user who knows Umbrella sampling but just wants the OpenMM syntax, if that makes sense? The other notebook has a lot more theoretical details and of running a full Umbrella sampling simulation and provides a lot of details on system setup and analysis. This notebook would be much more "Assuming you know what you want, here is the code to apply and run a single window." I could link to specific codeblocks and/or include more details/example code on how to run using multiple windows? Another option is I could remove this notebook but keep the additional other three notebooks (on accelerated MD, Gaussian accelerated MD, and Metadynamics)?
Oh I see! Sorry I totally misinterpreted what you were meaning, but yes the way you described doing it is equally good! I hadn't done it this way because I avoided importing numpy in the notebook (and by |
Actually @sef43 do you have any thoughts on this, since you wrote the existing tutorial? |
Oh! Phenomenal! I've gone ahead and updated the notebook to print it using
In a real simulation it would be much less frequent than recording it 10 steps, but the sample cookbook loop I have is only running a 100 step simulation anyways. I've made a point of noting that that the |
Here's an initial draft of 4 cookbooks for running enhanced sampling methods (AMD, GAMD, MetaD, and Umbrella Sampling) using OpenMM. Each of them are located in a subdirectory inside
cookbook/Enhanced sampling methods
. A couple of samplelog.
files are included (which are used to set up hyper parameters for these simulations).Any and all feedback is welcome! The notebook code itself runs but it may need cleanup/additional documentation.
(This PR is a re-opening after PR #16 ) to deal with merge conflicts.
This submission is response to a larger effort to add more tutorials to the OpenMM-cookbook (#12)