All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
mypy
for search space and objectives- Class hierarchy for objectives
- Deserialization is now also possible from optional class name abbreviations
Kernel
,MaternKernel
,AdditiveKernel
,ProductKernel
andScaleKernel
classes for specifying kernelsKernelFactory
protocol enabling context-dependent construction of kernels- Preset mechanism for
GaussianProcessSurrogate
hypothesis
strategies and roundtrip test for kernels, constraints, objectives, priors and acquisition functions- New acquisition functions:
qSR
,qNEI
,LogEI
,qLogEI
,qLogNEI
- Serialization user guide
- Basic deserialization tests using different class type specifiers
GammaPrior
,HalfCauchyPrior
,NormalPrior
,HalfNormalPrior
,LogNormalPrior
andSmoothedBoxPrior
can now be chosen as lengthscale prior
- Reorganized acquisition.py into
acquisition
subpackage - Reorganized simulation.py into
simulation
subpackage - Reorganized gaussian_process.py into
gaussian_process
subpackage - Acquisition functions are now their own objects
acquisition_function_cls
constructor parameter renamed toacquisition_function
- User guide now explains the new objective classes
- Telemetry deactivation warning is only shown to developers
torch
,gpytorch
andbotorch
are lazy-loaded for improved startup time- If an exception is encountered during simulation, incomplete results are returned with a warning instead of passing through the uncaught exception
- Environment variables
BAYBE_NUMPY_USE_SINGLE_PRECISION
andBAYBE_TORCH_USE_SINGLE_PRECISION
to enforce single point precision usage
model_params
attribute fromSurrogate
base class,GaussianProcessSurrogate
andCustomONNXSurrogate
n_task_params
now evaluates to 1 iftask_idx == 0
- Simulation no longer fails in
ignore
mode when lookup dataframe contains duplicate parameter configurations - Simulation no longer fails for targets in
MATCH
mode closest_element
now works for array-like input of all kinds- Structuring concrete subclasses no longer requires providing an explicit
type
field _target(s)
attributes ofObjectives
are now de-/serialized without leading underscore to support user-friendly serialization strings- Telemetry does not execute any code if it was disabled
- The former
baybe.objective.Objective
class has been replaced withSingleTargetObjective
andDesirabilityObjective
acquisition_function_cls
constructor parameter forBayesianRecommender
VarUCB
andqVarUCB
acquisition functions
- Simulation user guide
- Example for transfer learning backtesting utility
pyupgrade
pre-commit hook- Better human readable
__str__
representation of objective and targets - Alternative dataframe deserialization from
pd.DataFrame
constructors
- More detailed and sophisticated search space user guide
- Support for Python 3.12
- Upgraded syntax to Python 3.9
- Bumped
onnx
version to fix vulnerability - Increased threshold for low-dimensional GP priors
- Replaced
fit_gpytorch_mll_torch
withfit_gpytorch_mll
- Use
tox-uv
in pipelines
telemetry
dependency is no longer a group (enables Poetry installation)
- Better human readable
__str__
representation of campaign - README now contains an example on substance encoding results
- Transfer learning user guide
from_simplex
constructor now also takes and applies optional constraints
- Full lookup backtesting example now tests different substance encodings
- Replaced unmaintained
mordred
dependency bymordredcommunity
SearchSpace
s now usendarray
instead ofTensor
from_simplex
now efficiently validated inCampaign.validate_config
- BoTorch dependency bumped to
>=0.9.3
- Workaround for BoTorch hybrid recommender data type
- Support for Python 3.8
- Subpackages for the available recommender types
- Multi-style plotting capabilities for generated example plots
- JSON file for plotting themes
- Smoke testing in relevant tox environments
ContinuousParameter
base class- New environment variable
BAYBE_CACHE_DIR
that can customize the disk cache directory or turn off disk caching entirely - Options to control the number of nonzero parameters in
SubspaceDiscrete.from_simplex
- Temporarily ignore ONNX vulnerabilities
- Better human readable
__str__
representation of search spaces pretty_print_df
function for printing shortened versions of dataframes- Basic Transfer Learning example
- Repo now has reminders (https://github.com/marketplace/actions/issue-reminder) enabled
mypy
for recommenders
Recommender
s now share their core logic via their base class- Remove progress bars in examples
- Strategies are now called
MetaRecommender
's and part of therecommenders.meta
module Recommender
's are now calledPureRecommender
's and part of therecommenders.pure
modulestrategy
keyword ofCampaign
renamed torecommender
NaiveHybridRecommender
renamed toNaiveHybridSpaceRecommender
- Unhandled exception in telemetry when username could not be inferred on Windows
- Metadata is now correctly updated for hybrid spaces
- Unintended deactivation of telemetry due to import problem
- Line wrapping in examples
TwoPhaseStrategy
,SequentialStrategy
andStreamingSequentialStrategy
have been replaced with their newMetaRecommender
versions
- Copy button for code blocks in documentation
mypy
for campaign, constraints and telemetry- Top-level example summaries
RecommenderProtocol
as common interface forStrategy
andRecommender
SubspaceDiscrete.from_simplex
convenience constructor
- Order of README sections
- Imports from top level
baybe.utils
no longer possible - Renamed
utils.numeric
toutils.numerical
- Optional
chem
dependencies are lazily imported, improving startup time
- Several minor issues in documentation
- Visibility and constructor exposure of
Campaign
attributes that should be private TaskParameter
s no longer disappear from computational representation when the search space contains only one task parameter value- Failing
baybe
import from environments containing only core dependencies caused by eagerly loadingchem
dependencies tox
coretest
now uses correct environment and skips unavailable tests- Basic serialization example no longer requires optional
chem
dependencies
- Detailed headings in table of contents of examples
- Passing
numerical_measurements_must_be_within_tolerance
to theCampaign
constructor is no longer supported. Instead,Campaign.add_measurements
now takes an additional parameter to control the behavior. batch_quantity
replaced withbatch_size
allow_repeated_recommendations
andallow_recommending_already_measured
are now attributes ofRecommender
and no longer attributes ofStrategy
- Target enums
mypy
for targets and intervals- Tests for code blocks in README and user guides
hypothesis
strategies and roundtrip tests for targets, intervals, and dataframes- De-/serialization of target subclasses via base class
- Docs building check now part of CI
- Automatic formatting checks for code examples in documentation
- Deserialization of classes with classmethod constructors can now be customized
by providing an optional
constructor
field SearchSpace.from_dataframe
convenience constructor
- Renamed
bounds_transform_func
target attribute totransformation
Interval.is_bounded
now implements the mathematical definition of boundedness- Moved and renamed target transform utility functions
- Examples have two levels of headings in the table of content
- Fix orders of examples in table of content
DiscreteCustomConstraint
validator now expects dataframe instead of seriesignore_example
flag builds but does not execute examples when building documentation- New user guide versions for campaigns, targets and objectives
- Binarization of dataframes now happens via pickling
- Wrong use of
tolerance
argument in constraints user guide - Errors with generics and type aliases in documentation
- Deduplication bug in substance_data
hypothesis
strategy - Use pydoclint as flake8 plugin and not as a stand-alone linter
- Margins in documentation for desktop and mobile version
Interval
s can now also be deserialized from a bounds iterableSubspaceDiscrete
andSubspaceContinuous
now have de-/serialization methods
- Conda install instructions and version badge
- Early fail for different Python versions in regular pipeline
Interval.is_finite
replaced withInterval.is_bounded
- Specifying target configs without explicit type information is deprecated
- Specifying parameters/constraints at the top level of a campaign configuration JSON is
deprecated. Instead, an explicit
searchspace
field must be provided with an optionalconstructor
entry
- Release pipeline now also publishes source distributions
hypothesis
strategies and tests for parameters package
- Reworked validation tests for parameters package
SubstanceParameter
now collects inconsistent user input in anExceptionGroup
- Link handling in documentation
- GitHub CI pipelines
- GitHub documentation pipeline
- Optional
--force
option for building the documentation despite errors - Enabled passing optional arguments to
tox -e docs
calls - Logo and banner images
- Project metadata for pyproject.toml
- PyPI release pipeline
- Favicon for homepage
- More literature references
- First drafts of first user guides
- Reworked README for GitHub landing page
- Now has concise contribution guidelines
- Use Furo theme for documentation
--debug
flag for documentation building
- Script for building HTML documentation and corresponding
tox
environment - Linter
typos
for spellchecking - Parameter encoding enums
mypy
for parameters packagetox
environments formypy
- Replacing
pylint
,flake8
,µfmt
andusort
withruff
- Markdown based documentation replaced with HTML based documentation
encoding
is no longer a class variable- Now installed with correct
pandas
dependency flag comp_df
column names forCustomDiscreteParameter
are now safe
Raises
section for validators and corresponding contributing guideline- Bring your own model: surrogate classes for custom model architectures and pre-trained ONNX models
- Test module for deprecation warnings
- Option to control the switching point of
TwoPhaseStrategy
(formerStrategy
) SequentialStrategy
andStreamingSequentialStrategy
classes- Telemetry env variable
BAYBE_TELEMETRY_VPN_CHECK
turning the initial connectivity check on/off - Telemetry env variable
BAYBE_TELEMETRY_VPN_CHECK_TIMEOUT
for setting the connectivity check timeout
- Reorganized modules into subpackages
- Serialization no longer relies on cattrs' global converter
- Refined (un-)structuring logic
- Telemetry env variable
BAYBE_TELEMETRY_HOST
renamed toBAYBE_TELEMETRY_ENDPOINT
- Telemetry env variable
BAYBE_DEBUG_FAKE_USERHASH
renamed toBAYBE_TELEMETRY_USERNAME
- Telemetry env variable
BAYBE_DEBUG_FAKE_HOSTHASH
renamed toBAYBE_TELEMETRY_HOSTNAME
- Bumped cattrs version
- Now supports Python 3.11
- Removed
pyarrow
version pin TaskParameter
added to serialization test- Deserialization (e.g. from config) no longer silently drops unknown arguments
BayBE
class replaced withCampaign
baybe.surrogate
replaced withbaybe.surrogates
baybe.targets.Objective
replaced withbaybe.objective.Objective
baybe.strategies.Strategy
replaced withbaybe.strategies.TwoPhaseStrategy
- Linear in-/equality constraints over continuous parameters
- Constrained optimization for
SequentialGreedyRecommender
RandomRecommender
now supports linear in-/equality constraints via polytope sampling
- Include linting for all functions
- Rewrite functions to distinguish between private and public ones
- Unreachable telemetry endpoints now automatically disables telemetry and no longer cause any data submission loops
add_fake_results
utility now considers potential target bounds- Constraint names have been refactored to indicate whether they operate on discrete or continuous parameters
- Random recommendation failing for small discrete (sub-)spaces
- Deserialization issue with
TaskParameter
TaskParameter
for multitask modelling- Basic transfer learning capability using multitask kernels
- Advanced simulation mechanisms for transfer learning and search space partitioning
- Extensive docstring documentation in all files
- Autodoc using sphinx
- Script for automatic code documentation
- New
tox
environments for a full and a core-only pytest run
- Discrete subspaces require unique indices
- Simulation function signatures are redesigned (but largely backwards compatible)
- Docstring contents and style (numpy -> google)
- Regrouped additional dependencies
- Test environments for multiple Python versions via
tox
- Removed
environment.yml
- Telemetry host endpoint is now flexible via the environment variable
BAYBE_TELEMETRY_HOST
- Inference for
__version__
- Vulnerability check via
pip-audit
tests
dependency group
- Removed no longer required
fsspec
dependency
- Scipy vulnerability by bumping version to 1.10.1
- Missing
pyarrow
dependency
from_dataframe
convenience constructors for discrete and continuous subspacesfrom_bounds
convenience constructor for continuous subspacesempty
convenience constructors discrete and continuous subspacesbaybe
,strategies
andutils
namespace for convenient imports- Simple test for config validation
VarUCB
andqVarUCB
acquisition functions emulating maximum variance for active learning- Surrogate model serialization
- Surrogate model parameter passing
- Renamed
create
constructors tofrom_product
- Renamed
empty
checks for subspaces tois_empty
- Fixed inconsistent class names in surrogate.py
- Fixed inconsistent class names in parameters.py
- Cached recommendations are now private
- Parameters, targets and objectives are now immutable
- Adjusted comments in example files
- Accelerated the slowest tests
- Removed try blocks from config examples
- Upgraded numpy requirement to >= 1.24.1
- Requires
protobuf<=3.20.3
SearchSpace
parameters in surrogate models are now handled infit
- Dataframes are encoded in binary for serialization
comp_rep
is loaded directly from the serialization string
- Include scaling in FPS recommender
- Support for pandas>=2.0.0
- Constraints serialization
- A maximum of one
DependenciesConstraint
is allowed - Bumped numpy and matplotlib versions
- Code coverage check with pytest-cov
- Hybrid mode for
SequentialGreedyRecommender
- Removed support for infinite parameter bounds
- Removed not yet implemented MULTI objective mode
- Changelog assert in Azure pipeline
- Bug: telemetry could not be fully deactivated
Interval
class for representing parameter/target bounds- Activated mypy for the first few modules and fixed their type issues
- Automatic (de-)serialization and
SerialMixin
class - Basic serialization example, demo and tests
- Mechanisms for loading and validating config files
- Telemetry via OpenTelemetry
- More detailed package installation info
- Fallback mechanism for
NonPredictiveRecommender
- Introduce naive hybrid recommender
- Switched from pydantic to attrs in all modules except constraints.py
- Removed subclass initialization hooks and
type
attribute - Refactored class attributes and their conversion/validation/initialization
- Removed no longer needed
HashableDict
class - Refactored strategy and recommendation module structures
- Replaced dict-based configuration logic with object-based logic
- Overall versioning scheme and version inference for telemetry
- No longer using private telemetry imports
- Fixed package versions for dev tools
- Revised "Getting Started" section in README.md
- Revised examples
- Telemetry no longer crashing when package was not installed
- Tests for different search space types and their compatible recommenders
- Initial strategies converted to recommenders
- Config keyword
initial_strategy
replaced byinitial_recommender_cls
- Config keywords for the clustering recommenders changed from
x
toCLUSTERING_x
- skicit-learn-extra is now optional dependency in the [extra] group
- Type identifiers of greedy recommenders changed to 'SEQUENTIAL_GREEDY_x'
- Parameter bounds now only contain dimensions that actually appear in the search space
- Parsing for continuous parameters
- Caching of recommendations to avoid unnecessary computations
- Strategy support for hybrid spaces
- Custom discrete constraint with user-provided validator
- Parameter class hierarchy
SearchSpace
has now a discrete and continuous subspace- Model fit now done upon requesting recommendations
- Updated BoTorch and GPyTorch versions are also used in pyproject.toml
SearchSpace
class- Code testing with pytest
- Option to specify initial data for backtesting simulations
- SequentialGreedyRecommender class
- Switched from miniconda to micromamba in Azure pipeline
- BoTorch version upgrade to fix critical bug (pytorch/botorch#1454)
- Parameters cannot be initialized with duplicate values
- Initial strategy: Farthest Point Sampling
- Initial strategy: Partitioning Around Medoids
- Initial strategy: K-means
- Initial strategy: Gaussian Mixture Model
- Constraints and conditions for discrete parameters
- Data scaling functionality
- Decorator for automatic model scaling
- Decorator for handling constant targets
- Decorator for handling batched model input
- Surrogate model: Mean prediction
- Surrogate model: Random forrest
- Surrogate model: NGBoost
- Surrogate model: Bayesian linear
- Save/load functionality for BayBE objects
- UCB now usable as acquisition function, hard-set beta parameter to 1.0
- Temporary GP priors now exactly reproduce EDBO setting
- Code skeleton with a central object to access functionality
- Basic parser for categorical parameters with one-hot encoding
- Basic parser for discrete numerical parameters
- Azure pipeline for code formatting and linting
- Single-task Gaussian process strategy
- Streamlit dashboard for comparing single-task strategies
- Input functionality to read measurements including automatic matching to search space
- Integer encoding for categorical parameters
- Parser for numerical discrete parameters
- Single numerical target with Min and Max mode
- Recommendation functionality
- Parameter scaling depending on parameter types and user-chosen scalers
- Noise and fake-measurement utilities
- Internal metadata storing various info about datapoints in the search space
- BayBE options controlling recommendation and data addition behavior
- Config parsing and validation using pydantic
- Global random seed control
- Strategy connection with BayBE object
- Custom parameters as labels with user-provided encodings
- Substance parameters which are encoded via cheminformatics descriptors
- Data cleaning utilities useful for descriptors
- Simulation capabilities for testing the package on existing data
- Parsing and preprocessing for multiple targets / desirability ansatz
- Basic README file
- Automatic publishing of tagged versions
- Caching of experimental parameters and chemical descriptors
- Choices for acquisition functions and their usage with arbitrary surrogate models
- Temporary logic for selecting GP priors