“To understand a system, you must perturb it.”
– George Box
BoxingGym is a benchmarking framework designed to evaluate the capabilities of language-based agents in experimental design and model discovery. The framework consists of several simulated environments where agents can perform experiments, propose models, and refine them based on collected data.
To install BoxingGym, clone the repository and install the dependencies:
git clone https://github.com/kanishkg/boxing-gym.git
cd boxing_gym
pip install -e .
You should now be able to import the BoxingGym package in your Python environment.
import boxing_gym
Environments in BoxingGym simulate models of the world across different domains. You can interact with an environment using predefined methods to conduct experiments, collect data, and test hypotheses.
Example code to interact with an environment(see run_experiment.py
for a complete example):
from boxing_gym.envs import SomeEnvironment
env = SomeEnvironment()
env.reset()
action = env.sample_random_input()
observation = env.step(action)
Agents in BoxingGym can perform experimental design, run simulations, and propose models. The framework includes pre-built agents like Box's Apprentice and the LLM Agent.
Example pseudo-code to interact with an agent (see run_experiment.py
for a complete example):
from boxing_gym.agents import LLMAgent
from boxing_gym.envs.some_env import SomeEnv, SomeGoal
env = SomeEnv()
goal = SomeGoal(env, include_prior=True)
agent = LLMAgent()
agent.set_goal(goal_description)
observation = env.reset()
action = agent.act(observation)
next_observation = env.step(action)
Environment in BoxingGym define the simulated world model and the interactions an agent can have with it. To create a new environment, subclass the Environment class and implement the necessary methods:
class CustomEnvironment:
def __init__(self, param1, param2, param3):
super().__init__()
self.param1 = param1
self.param2 = param2
self.param3 = param3
self.reset()
self.env_name = "custom_environment"
def reset(self):
# Initialize or reset the environment to a starting state
# sample params for the world model
self.model_params = sample...
self.data = []
def get_system_message(self, include_prior=True, goal=None):
# Add a system message to the environment
def step(self, action):
# Process the input_value to produce an output
result = # pass input through model to get the next observation
self.data.append(result) # Store the result if necessary
return result
def validate_input(self, action):
# validate the input action
# return error message if invalid so that the agent can correct it
return action
def sample_random_input(self):
# Sample a random valid action for the environment
def run_experiment(self, action):
validated_input = self.validate_input(action)
if isinstance(validated_input, str):
return validated_input, False
result = self.step(validated_input)
return result, True
Goals in BoxingGym define the objectives for an agent within an environment. To create a new goal, subclass the Goal class and implement the necessary methods:
from boxing_gym.goals import Goal
class MyCustomGoal(Goal):
def __init__(self, env):
super().__init__(env) # Initialize with environment
self.eval_points = [] # Store evaluation points
self.eval_pointer = 0 # Pointer for evaluation
# other initialization code
def get_system_message(self, include_prior):
# Generate goal description based on prior knowledge
goal_description = "Your goal is to "# goal description
return self.env.get_system_message(include_prior, goal_description)
def get_goal_eval_question(self, include_prior):
# Generate or retrieve a question for evaluation
if self.eval_pointer >= len(self.eval_points):
x = ... # Generate new input
y = self.env.step(x) # Get output
self.eval_points.append((time, infected_num)) # Store evaluation point
else:
x, y = self.eval_points[self.eval_pointer] # Retrieve evaluation point
self.eval_pointer += 1
question = f"What y for {x}"
return question, y
def evaluate_predictions(self, predictions, measurements):
# Evaluate the predictions made by the agent
return mse, std
def expected_information_gain(self, query_point, num_outer_samples=1000, num_inner_samples=10):
# Calculate EIG for a new query point using a Bayesian approach
# See existing environments for more details on how to implement this method
An agent in BoxingGym is an entity that interacts with environments to perform experiments, propose models, and refine them based on collected data. To create a new agent, subclass the Agent class and implement the necessary methods:
class MyCustomAgent:
def __init__(self, param1, param2, param3):
super().__init__()
self.param1 = param1
self.param2 = param2
self.param3 = param3
self.reset()
self.agent_name = "custom_agent"
def generate_actions(self, observation):
# Generate an action based on the past result
return action
def generate_predictions(self, query):
# Generate a prediction based on the current query
return prediction
To run an experiment in BoxingGym, you can use the run_experiment.py
script. The script allows you to specify the environment, agent, and goal for the experiment. You can also configure the experiment using Hydra configuration files.
Example configuration file for running an experiment:
example.yaml
- _self_
- llms: openai # LLM config to use
- exp: oed # Experiment type discovery or oed (Optimal Experiment Design)
- envs: custom_env # Environments to use
include_prior: true # Include prior knowledge in the goal description
openai.yaml
- model_name: "gpt-4o"
- temperature: 0.0
- max_tokens: 512
oed.yaml
- num_experiments: [0, 1, 3, 5, 7, 10]
- experiment_type: "oed"
custom_env.yaml
- env_name: "custom_environment"
- goal_name: "custom_goal"
- num_evals: 10
- env_params:
- param1: 1
- param2: 2
- param3: 3
- Environments:
src/boxing_gym/envs/
- Agents:
src/boxing_gym/agents/
- LLM Agent:
src/boxing_gym/agents/agent.py
- Box's Apprentice: Model Criticism LLM in
src/boxing_gym/agents/llm.py
- LLM Agent:
- Configurations for experiments: We use hydra for configs,
conf/
- Running experiments:
run_experiment.py
- Scripts for running experiments:
scripts/
- Analysis of experiments:
analysis/
Box's Apprentice is an agent that combines language model capabilities with statistical modeling to perform experimental design and model discovery. It can build explicit generative models to improve predictions.
Box's Apprentice: Model Criticism LLM in src/boxing_gym/agents/llm.py
The LLM Agent is a language-based agent that interacts with environments purely through natural language. It is capable of proposing and testing scientific theories but relies on its language processing abilities.
LLM Agent: src/boxing_gym/agents/agent.py
We welcome contributions to BoxingGym, especially for new environments and agents. If you want to contribute, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -am 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Create a new Pull Request.
- For major changes, please open an issue first to discuss what you would like to change.