This is the official repository for the 2019 CAADRIA workshop that took place at the Victoria University of Wellington, New Zealand, on the 13th and 14th of April: "WS.4 Deep Reinforcement Learning in Grasshopper - Using Deep Q-Networks to Train an Intelligent Agent to Act in a Grasshopper Environment".
- Nariddh Khean, Computational Design, University of New South Wales
- Alessandra Fabbri, Computational Design, University of New South Wales
- M. Hank Haeusler, Computational Design, University of New South Wales
Deep reinforcement learning (DRL), a subset of machine learning (ML), has seen incredible successes in game playing problems – most notable was the seminal triumph in 2015, where it exhibited beyond human-level performance playing Atari games, and two years later when defeating the human world-champion in the board game, Go. Since then, DRL has become increasingly applied to more impactful applications, such as news recommendations, real-time advertisement, and drug design. As more DRL applications are being discovered, DRL has been identified as one of the most prominent and potentially disruptive ML trends for 2019.
A look at the built environment research landscape reveals a growing, yet small body of ML-related publications. Interrogating the CumInCAD database, a repository of over 9600 conference papers within the computer-aided design (CAD) field, a search for "reinforcement learning" returned only 5 papers: one of which was published in 1995, two were from the same authors, and only one that could technically be considered as deep RL. The lack of DRL in CAD research is not due to a lack of research interest, as Google's Brain Robotics engineer Alex Irpan remarks that "[DRL has] attracted some of the strongest research interest I've seen”; rather, it is the combination of two factors specific to the field of the built environment:
- a lack of conceptual understanding, not just surrounding the mathematics and computation of DRL, but also how to assess the parameters of a given problem and identify if DRL is a suitable method, and
- a lack of readily available avenues to implement DRL in CAD software native to the built environment.
The goals of the workshop can be dichotomously categorised as theoretical and practical.
To gain a theoretical understanding of:
- how to identify when DRL is an appropriate method to solve a problem,
- how to frame a problem so that DRL can be suitably applied, and
- the nuances of adjusting inputs, reward schemas, and hyperparameters.
To gain practical, hands-on experience with:
- framing a problem as a DRL scenario in the parametric modelling environment Grasshopper,
- writing a novel python framework that uses a DRL algorithm known as 'deep Q-learning', and
- training the Grasshopper agent and evaluating its performance.
The problem at hand is to create an artificially intelligent agent to steer a car along a road network. The agent is not given any information about the road itself. Instead, all it sees are the parameters at which its "sightlines" intersect with the edges of the road. As such, a robustly trained agent should be able to navigate any reasonable road network. At the end of the workshop, participants were given an hour to train their own agent. After, collating everyone's best models, we had a race on a road I called "the Gauntlet"!
Grasshopper is a visual scripting language for the 3D modelling software, Rhino, which comes standard in Rhino 6. There are two Grasshopper plugins that are needed:
- Hoopsnake which is used to get around Grasshopper's recursive loop avoidance check, and
- as the in-built
GH_Python
component uses Iron Python (yuck!), we used GH_CPython, which allows us to implement CPython in Grasshopper.
The training of the DQN will be run on a local server built in python 3.5/3.6, which will communicate with Grasshopper through sockets. Two Python libraries are used:
- the scientific computing library, NumPy, and
- the machine learning library, TensorFlow.
Before every training run starts, there are some pre-flight checks that need to be ticked off.
- First, have the
training.gh
file in Grasshopper. Within that file, find the Hoopsnake component, right-click the component, and click 'reset'. This needs to be done every time the training process is restarted. - Open
training.py
in a text editor and make sure the two variablesINPUT_DIM
andOUTPUT_DIM
are equivalent to the sliderSight Line Count
and the number of items in theAction
list in Grasshopper respectively. - Find the two GH_CPython components in the
training.gh
file, double click them, and make sure thePORT
variable is one that is not being used already. Further, make sure the two port numbers are the same as those in thetraining.py
file.
Once all those are square, we can start training! To start training the DQN, have Rhino and Grasshopper open with the training.gh
file, and run training.py
in a terminal. If everything is set up correctly, the terminal should output the following...
> python training.py
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 16) 0
_________________________________________________________________
dense (Dense) (None, 64) 1088
_________________________________________________________________
dense_1 (Dense) (None, 32) 2080
_________________________________________________________________
dense_2 (Dense) (None, 16) 528
_________________________________________________________________
dense_3 (Dense) (None, 3) 51
=================================================================
Total params: 3,747
Trainable params: 3,747
Non-trainable params: 0
_________________________________________________________________
None
Model Initialised.
Start Loop in GH Client...
At this stage, find the HoopSnake component in the Grasshopper script, right-click it, and select 'Loop'. This should then start the training loop!
The terminal should output something along the lines of...
... connected.
ITERATION: 1
state = [0.643712854056232, 0.696352339453148, 0.784533288702983, ...
q-estimates = [0.043717995285987854, -0.17392410337924957, 0.1835426241159439]
action = 2 (epsilon)
reward = 0.2
epsilon = 0.796
ITERATION: 2
state = [0.484029628340757, 0.524381259674292, 0.580696676398673, ...
q-estimates = [0.13394702970981598, -0.1825016885995865, 0.27398931980133057]
... and so on. By default, every 50 iterations, a .h5
file will be saved to a directory assigned in the parameters, which will be indicated in the terminal...
-- MODEL SAVED (50.h5) --
These .h5
files are what we care about. Within are the weights and biases of the neural network at that iteration of training. These .h5
files will be used later during the deployment of the model, so don't worry if something goes wrong with the training process.
Note: Resetting the training process and running it again will save
.h5
files with the same name, in the same directory, effectively overwriting previous models. If you want to preserve models, move the.h5
files to a different directory.
By default, I've added a 10 second timeout for when the socket is waiting for data from Grasshopper, so that it doesn't hang indefinitely. If you receive the socket.timeout: timed out
error, simply rerun the training.py
file and start the loop in Grasshopper a tad quicker (or change the TIMEOUT
parameter in the training.py
file to something more manageable).
An interesting thing to note can be seen when observing the performance of the DQN during training. Since I've made training steps occur after every iteration, as opposed to after each episode, the ability for long-term planning becomes unstable, as its "brain" is changing at every step. The reason I've made it this way is due to the comparatively slow iteration time that we have to deal with in Grasshopper. The point is, once you save a model, and let that model alone traverse a road, it should exhibit more stable behaviour.
Note: After every training run, don't forget the reset the hoopsnake component in Grasshopper!
Once you've trained a DQN, and you have some models saved in your designated model directory, you can load one of those models and see how it performs on a new road.
Similar set-up steps as before:
- Have the
deploy.gh
file in Grasshopper. Again, Hoopsnake should be reset before each run. - Open
deploy.py
in a text editor and make sure theINPUT_DIM
variable is the same as theSight Line Count
slider in Grasshopper. - Find the GH_CPython component in the
deploy.gh
file, and make sure the port number is the same as in thedeploy.py
file.
After you see Start Loop in GH Client...
in the terminal, start the Hoopsnake component and the terminal should output something like:
... connected.
ITERATION: 1
state = [0.643712854056232, 0.696352339453148, 0.784533288702983, ...
q-estimates = [0.043717995285987854, -0.17392410337924957, 0.1835426241159439]
action = 2
Since we are deploying a trained model, not training it, we won't allow it to make any random actions based on the e-greedy policy, nor are we providing the DQL algorithm with a reward. The former is a method to address the exploration vs exploitation problem, and the latter is used as a reward/punishment metric, both of which are only needed when training.
Note: If the car collides with the edge of the road, the visualisation will stop the car, but the loops will continue. Be sure to kill the python loop, or let it timeout after you stop the Hoopsnake loop.
Parameter Name | Data Type | Default | Range | Description |
---|---|---|---|---|
INPUT_DIM |
integer | 16 | > 0 | The number of input neurons in the neural network. This value should be the same as the Sight Line Count slider in Grasshopper. |
OUTPUT_DIM |
integer | 3 | > 0 | The number of output neurons in the neural network. This value should be the same as the number of actions in the Actions panel in Grasshopper. |
Parameter Name | Data Type | Default | Range | Description |
---|---|---|---|---|
ALPHA |
float | 1 | > 0 | Effectively, learning rate. |
GAMMA |
float | 0.5 | > 0, < 1 |
Discount factor. The factor, which is consecutively multiplied by the highest predicted q-value of the next state (q_s_a_d ), that determines the worth the policy places on future rewards. The larger the GAMMA , the more the agent will favour long-term rewards. |
LAMBDA |
float | 0.005 | > 0, < 1 |
The rate at which epsilon decays. epsilon , used in the ε-greedy policy, allows the agent to conduct random actions, to balance exploitation with exploration. The larger the number, the faster epsilon decays. |
INITIAL_EPSILON |
float | 0.8 | > 0, < 1 |
The value of epsilon at iteration 0. |
FINAL_EPSILON |
float | 0.05 | > 0, < 1 |
The minimum value of epsilon that it decays to. |
MAX_MEMORY |
integer | 10000 | > 0 | The maximum number of (prev_state, action, reward, state_in) tuples that are stored in memory before the oldest ones are removed. |
BATCH_SIZE |
integer | 64 | > 0 | The number of (prev_state, action, reward, state_in) tuples that are sampled from memory to undergo experience-replay. |
Parameter Name | Data Type | Default | Range | Description |
---|---|---|---|---|
ITERATIONS |
integer | 2000 | > 0 | The number of iterations you would like your model to train for. |
TIMEOUT |
integer | 10 | > 0 | The time in seconds after the terminal outputs Start Loop in GH Client... to receive data from Grasshopper, before it times out. |
MODEL_SAVE_FREQ |
integer | 50 | > 0 | The number of iterations between each time a model is saved to the model directory. |
MODEL_SAVE_PATH |
string | 'D:\DRL\models' | - | The path to the model directory, where all .h5 files are saved. |
MODEL_NAME |
string | '2500.h5' | - | The name of the model in the model directory to deploy. |
As always, thank you to my supervisors, Alessandra Fabbri and M. Hank Haeusler, who's constant guidance and encouragement made this workshop happen. Thank you to the CAADRIA team and the Victoria University of Wellington for holding a wonderful conference. And, a big thank you to the sixteen participants who came, got involved, and helped me find my love of teaching. Hope to see you in the next one!