-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature normalize and standardize objects #287
Feature normalize and standardize objects #287
Conversation
e6e821d
to
71cb142
Compare
…ndardized input and result interface.
71cb142
to
199b324
Compare
A few wins for |
This pull request introduces 13 alerts when merging 819bb97 into d04a380 - view on LGTM.com new alerts:
|
Added Changelog description above to summarize PR. |
On the whole, looks good, thanks!
|
Thanks for the questions! A few thoughts.
You'd want to be as specific as possible in this case. The class Person(BaseModel):
string: str # probably not what you want
name: str # probably what you want; "name" is not intended to be generic, the type is generic and you ascribe the generic type (interface) to a specific attribute of the object I agree re: field names in general though. I stuck with some old names so it looks more familiar. I'd consider
100%. This is the intended design and the main benefit I'm trying to highlight. It follows from the logic of the class Person(BaseModel):
name: str # analogous to .molecule
best_friend: Person # analogous to .input_spec. This annotation means Person or any of its subclasses (YoungPerson, OldPerson) is valid
class OldPerson(Person):
favorite_newspaper: str
class YoungPerson(Person):
favorite_cartoon: str
# Examples omit everyone having a `.best_friend` for simplicity
colton = OldPerson(name="colton", best_friend=YoungPerson(name="greg", favorite_cartoon="Rick and Morty"), ...)
lori = YoungPerson(name="lori", best_friend=OldPerson(name="suzy", favorite_newspaper="Something old!"), ...)
This pattern makes programming much easier because like objects are alike ("I know how to make an input, I need a for result in results: # contains any result--Atomic, FailedOp, Opt, Torsion, ...,
if result.input_data.molecule.name == "water":
input_data = result.input_data
if input_data.specification.extras.get("specialkeyword"):
if input_data.specification.keywords.get("badkeyword"):
print("Found my mistake! This extra + keyword ruins everything. All these computations are junk!") With the current for result in results: # contains any kind of result
if isinstance(result, FailedOperation):
input_data = result.input_data # NOTE: input_data not typed on FailedOperation, so will be `dict`
if input_data.get("initial_molecule"): # Is OptimizationInput or TorsionDriveInput
molecule_dict = input_data.get("initial_molecule")
else: # is AtomicInput
molecule_dict = input_data.get("molecule")
molecule = Molecule(**molecule_dict) # Recast from dict -> Molecule
elif isinstance(result, TorsionDriveResult) or isinstance(result, OptimizationResult):
molecule = result.initial_molecule
else: # is AtomicResult
molecule = result.molecule
# TODO: Don't forget to add more cases in the future for all new Result object types!
if molecule.name == 'water':
if isinstance(result, FailedOperation):
my_special_keyword = result.input_data["extras"].get("specialkeyword") # keyword may have been added as part of output! Not strictly input data
else:
my_special_keyword = result.extras.get("specialkeyword")
if my_special_keyword:
if isinstance(result, FailedOperation):
keywords = result.input_data['keywords']
else:
keywords = result.keywords
if keywords.get("badkeyword"):
print("Found my mistake! This extra + keyword ruins everything. All these computations are junk!")
# TODO: Don't forget to add more cases for future Input types! The above code is the beauty of unified interfaces. Everything you need to do is generic, follows the same pattern, and you can quickly lock on to the patterns of a library (like
Correct. Since they seem to evolve together (everything is currently on v1) I've placed this attribute as a common value here. It can be changed on any child object if they don't evolve in lock step.
Agree! I like this piece a lot. Our "immutabiliy" is only as good as our perfect coding in
Would love to chat about these! I'd benefit from hearing more use cases to inform my perspectives :)
I'd be happy to help out on migrating
Let me know when you'd like this posted over there and I'll do it! Anything else let me know. Hope the explanations above aren't too pedantic (pydantic?!). I'm trying to be clear and detailed about the design pattern but hope I'm not communicating (or over-communicating) that poorly. Always happy to do a call for clarification and discussion :) |
One additional note on this:
This concept is really the big idea behind object oriented programming. It's often called "polymorphism." Not building interfaces this way misses out on the truly powerful syntax that OOP offers. Polymorphism is the reason to use OOP. You may have heard about "duck typing" in python. It's an even more flexible version of the same thing. So just throwing out a few key terms here so the interested can dig deeper on google :) |
My docs pass #289 messed this up -- sorry about that. Please rebase at your convenience. |
What do you mean by this? Not clear what got messed up. Thanks :) |
Sorry I was unclear. There's conflicts in five files that need to be resolved after I renovated docs. Any |
…m:coltonbh/QCElemental into feature-normalize-and-standardize-objects
This pull request introduces 3 alerts when merging e923f22 into 2956421 - view on LGTM.com new alerts:
|
…at exported models conform to pydantic's autogenerated schema, which is not necessary to tests. Also, issues arrise with jsonschema which are external to our concerns.
This pull request introduces 3 alerts when merging c4442de into 2956421 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Let's initialize next
with this and see if downstream will need any tweaks in practice. Thank you!
Hell yeah! I can start to work on cleaning up the |
* Refactored models. Moved common fields to parent classes. Created standardized input and result interface. * Basic models implemented; tests not passing yet * test_model_results passing * test_molutil passing * test_molparse_from_string_passing * test_molecule passing * Set test_molutil back to original state * blacken qcel * Skip --validate tests since they are circular in nature. They test that exported models conform to pydantic's autogenerated schema, which is not necessary to tests. Also, issues arrise with jsonschema which are external to our concerns.
* Refactored models. Moved common fields to parent classes. Created standardized input and result interface. * Basic models implemented; tests not passing yet * test_model_results passing * test_molutil passing * test_molparse_from_string_passing * test_molecule passing * Set test_molutil back to original state * blacken qcel * Skip --validate tests since they are circular in nature. They test that exported models conform to pydantic's autogenerated schema, which is not necessary to tests. Also, issues arrise with jsonschema which are external to our concerns.
Description
The purpose of this PR is to standardize the interface for input and result objects. It also removes duplicate field specifications by moving common attributes to parent classes. The general interface for most end user objects remains almost the same, with a few tweaks to create the commonalities.
Closes #264
Changelog description
Input
andResult
objects to conform to a standard interface for each object type. Inspiration was drawn from existing implementations inqcelemental
and extended to create consistency.AtomicInput
,OptimizationInput
,TorsionDriveInput
) conform to the basic interface of.molecule
, containing the initial molecule on which to perform/start the computation, and.specification
which defines thekeywords
,program
,extras
,protocols
, and other parameters to fully specify the computation.molecule
+specification
= complete computation. More sophisticated specifications, likeOptimizationSpecification
andTorsionDriveSpecification
can be composed by addingspecifications
objects that define the collection of computations needed to execute the procedure. SeeOptimizationSpecification
for an example that composesAtomicSpecification
into it.AtomicResult
,OptimizationResult
,TorsionDriveResult
,FailedOperation
) conform to the basic interface of.input_data
which is a perfect copy of the input data used to generate the computation,.success
boolean to define whether the operation succeeded or failed, and thestdout
andstderr
for the computation. Additionally, computation-specific output fields an be added as needed.id
,schema_name
etc) were unmodified in form but refactored to parent classes to reduce duplicate code.Status
Interface description
This PR makes all interfaces conform to the following basic specification:
Input Objects
All input objects (
AtomicInput
,OptimizationInput
,TorsionDriveInput
, future procedures...) follow the same pattern:.specification
specifies how that computation is to be performed. It contains all thekeywords
,extras
,protocols
, possiblemodel
/driver
, and/or other fields required to be sent directly to the computational chemistry program being called..molecule
attribute specifies on what molecule to perform (or begin) this operation.The
Specification
objects take their inspiration from theQCInputSpecification
object. Combining a specification with a molecule creates a computation as per Ben's thoughts.Increasingly sophisticated computations (like procedures) can now be generated systematically by composition of any number of required
Specification
objects. A simplifiedAtomicInput
,OptimizationInput
, andTorsionDriveInput
can be seen below that show how this pattern plays nicely at each level of abstraction.Result Objects
All result objects (
AtomicResult
,OptimizationResult
,TorsionDriveResult
,FailedOperation
) conform to the same interface. The previous split between inheritance (on successful results) and composition (on failed results) is resolved in favor of composition. The inheritance from input objects created difficulty withschema_name
that required special validation and the mixing of the.extras
dictionary that contained both input and output values. Additionally, creating new inputs from previous results was difficult because the input and output fields were mixed on the same object so things likeAtomicInput(**my_old_atomic_result.dict())
raise exceptions.AtomicInput(**my_old_atomic_result.input_data.dict()
would be nice. I preferred the ease of accessing the complete input data at.input_data
on aFailedOperation
object :) Grouping result objects via inheritance also eliminated field duplication. All result objects follow this interface:.success
: bool (True
on all successful results,False
onFailedOperation
).input_data
: The full input data object that created the computation.stdout
: Optional[str] of stdout data.stderr
: Optional[str] of stderrHere is a simplified hierarchy:
This plays nicely with your brain because you can interact with all inputs and outputs in the same way. Inputs always contain a molecule and a specifications. Specifications always contains
program
,keywords
for that program, and any additionalspecifications
required for sub-computations (like a gradient specification for an optimization routine). All result objects have a.success
value,input_data
value, and then any additional fields unique to their computation type. I think these changes can makeqcelemental
even easier to use and more consistent to program with.I consider this PR a starting point for community conversation! So I look forward to hearing feedback, thoughts, opinions, and concerns. Hope this can move
qcel
forward in a positive way. Thanks to all the maintainers for your work on the package.Simple Script
If you want to play with these objects and see how they feel just install
qcelemental
from this branch and then use this script (assuming you named ithacking.py
).Addendum
Because I was using
pyreverse
to model the various data relationships in the library I also took the liberty to delete the old models that were to be removed inv0.13.0
since this made it easier to visualize the code.