Skip to content

Commit

Permalink
Simplify docs and add prelimenary equation
Browse files Browse the repository at this point in the history
  • Loading branch information
StephanDeHoop committed Jan 21, 2025
1 parent 5cfe15b commit 0682e1b
Show file tree
Hide file tree
Showing 4 changed files with 59 additions and 54 deletions.
113 changes: 59 additions & 54 deletions docs/everest/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,66 +56,71 @@ long as the optimization process is running.
- Signal everest optimization run termination. It will be called by the client when the optimization needs to be terminated in the middle of the run


Everest vs. Ert data models
EVEREST vs. ERT data models
===========================
Everest uses Ert for running an experiment, but instead of submitting an `ensemble` (i.e., as in Ert) to the queue we submit
a `batch` in Everest. `Batches` are in principle very similar to `ensembles`, but they have some key differences.
The purpose of this section is to explain these key differences from a developer point-of-view.

In Ert, an `ensemble` contains a number of `realizations` which are a set of `model parameters` which Ert attempts to history match to some data.
These `model parameters` are sampled from a certain distribution after creating the `ensemble`. In Everest we are not history matching our `model parameters`,
but instead try to find an optimal strategy on how to operate the particular model(s) in order to maximize some objective (i.e., we optimize for a set of `controls`).
Simply put, the optimization algorithm is iteratively updating our controls until we reach some convergence criteria (i.e., have obtained the optimal strategy).

In order to perform the optimization we need to get the sensitivity of the `objective function` to these `controls` or `optimization variables`.
It is important to understand how are current controls are performing. Did we improve our strategy and therefore our objective function value?
Note: `updated controls` after one optimization iteration become `current controls` for the next iteration. This means in Everest there will be a
distinction between running a forward model for `current controls` or `perturbed controls`. The forward model doesn’t care for which type of controls
it is currently running, but the optimizer will handle the results slightly different.

If we perform robust optimization (i.e., don’t have a single deterministic underlying model) a `batch` contains a certain number of `realizations`
(similar to `realizations` in Ert except static and denoted with `<GEO_ID>`) and each `realization` contains a certain number of `simulations`
(i.e., forward model runs). The `simulations` are either evaluating the objective function value for `current controls` and/or for
each `perturbed controls`. This means that the hierarchy of the output in Everest and Ert are different (Fig 2).
EVEREST uses ERT for running an experiment, but instead of submitting an `ensemble` (ERT) to the queue we submit
a `batch` in EVEREST. `Batches` are in principle very similar to `ensembles`, ERT queue system doesn't treat them differently,
but they have some hierarchical differences in terms of the meaning behind the data.
ERT history matches `realizations` (i.e., `model parameters`) to data, hence an `ensemble` contains a number of `realizations`.
EVEREST optimizes a set of `controls` and assumes static (i.e., unchanging) `realizations`.
In terms of collecting the results of forward model runs, there is a distinction between `unperturbed controls`
(i.e., current `objective function` value) and `perturbed controls` (i.e., required to calculate the `gradient`).
Furthermore, when performing robust optimization (i.e., multiple static `realizations`) a `batch` contains a
certain number of `realizations` (denoted by `<GEO_ID>`) and each `realization` contains a number of `simulations`
(i.e., forward model runs). These `simulations` are forward model runs for either `unperturbed controls` and/or
`perturbed controls`. This is the key differences between the hierarchical data model of EVEREST and ERT (Fig 3).

.. figure:: images/Everest_vs_Ert_01.png
:align: center
:width: 700px
:alt: Everest vs. Ert data models

Difference between `ensemble` in Ert and `batch` in Everest.
A `realization` in Everest refers to a static model configuration which doesn’t change during the optimization,
but is different from model to model. While `realization` in Ert means set of model parameters which are going to be history
matched to certain data. Particularly, in Ert the `model parameters` or `realizations` are actually the objective of the optimization,
while in Everest the optimization objective is finding the correct controls such that a certain objective is maximized or minimized.
Since Everest still uses Ert to submit the `batch` (i.e., `ensemble`) to the queue, the Everest runs (i.e., for a <GEO_ID> we run a set of controls)
are mapped to Ert `realizations` accordingly. After collecting the results for each Ert `realization` they are mapped back to Everest structure.

The mapping from data models in Everest to Ert is the same as flatten a 2D array (i.e., from index based on `<GEO_ID>` and `simulation` in Everest to
index based on `realization` Ert). Ultimately, Ert is submitting the forward model runs to the queue and is agnostic about the meaning of each run.
Only when the data is collected back in Everest is the meaning of each run attributed.

In Ert each `ensemble` is exactly one step in the history matching algorithm and `realizations` have continuity from one iteration to the next.
For example, `model parameter set 0` are smoothly(?) changing from prior to posterior over the course of the history matching.
This is not the case for `simluations` in Everest and highlights another key difference. A `batch` can contain several different configurations (Fig 3)
and `simulation 0` for `<GEO_ID> = 0` can be either `current` or a `perturbed control` hence there is no continuity from one `batch` to the next. `<GEO_ID>`
is continuous from one `batch` to the next since they are not changing at all over the course of the optimization.
:alt: EVEREST vs. ERT data models

Difference between `ensemble` in ERT and `batch` in EVEREST.

.. figure:: images/Everest_vs_Ert_02.png
:align: center
:width: 700px
:alt: Other `batch` configurations Everest

Two other possible configurations of Everest `batches` in the context of gradient-based optimization algorithms (i.e., `optpp_q_newton`).
Option (A) can occur when the option `speculative` is set to False, this means that before proceeding with the optimization the update set of controls is
first evaluated if it actually improves the objective function value (hence the `batch` contains only `current controls` for each forward model run for each
`<GEO_ID>`). Option (B) can occur when the `current controls` are already evaluated in a previous `batch` and no update to current controls has occurred yet.
In the context of gradient-free optimization methods a `batch` can also contain multiple current control forward model runs (ask Pieter how batches change
depending on the optimization algorithm, I wrote this now based on his old figure, but pretty sure the terminology is wrong here).

Perhaps an attempt could be made to improve the terminology on the Everest side, at least from a developer point of view. `Simulation`
is meaningful to the user and perhaps should be kept as is, but `simulation` to a Everest developer is not helpful since in the code `realization`
is mapped to `simulation` and vice versa. Also, `<GEO_ID>` is misleading in several ways, it’s not the same as a `realization` in Ert since it’s static
and furthermore we could be optimizing anything which has nothing to do with “GEO”. Perhaps we could change `<GEO_ID>` to `<STATIC_MODEL_ID>`
to emphasize the fact these model parameters are not changing over the course of the optimization. And for developers it would be much clearer to
change `simulation` to `ert_realization` or `forward_model_evaluation`.
:alt: Additional explanation of Fig 3

Different meaning of `realization` and `simulation`.

As is evident from the image above, in terms of execution in the queue `realization` (ERT) and `simulation` (EVEREST) are synonymous.
This means that ERT queue system is agnostic about the meaning of each run only when the data is collected back in EVEREST (`GEN_DATA`) is meaning
of each run attributed.
The mapping from data models in EVEREST to ERT is the same as flattening a 2D array (i.e., from a `<GEO_ID>` and `pertubation` based index in EVEREST to
`realization` in ERT).

Explicitly this means:

.. math::
r(g, p) = g,
if `batch` only has `unperturbed controls`,

.. math::
r(g, p) = p + g * P,
if `batch` only has `perturbed controls`,

.. math::
r(g, p) = g * (p<0) + (p + g * P + G) (p>=0),
if `batch` has `unperturbed` and `perturbed controls`, where `r` is the ERT `realization_id` (0, ..., `R` - 1), `g` is the `<GEO_ID>` (0, ..., `G` - 1), `p` is `pertubation_id` (-1, 0, ..., `P` - 1), `R`
is the total number of ERT `realizations`, `G` is the total number of static `model_realizations`, `P` is the total number of pertubations.
NOTE: `p = -1` for `unperturbed controls`, and `p = 0, ..., P - 1` for `perturbed controls`.
**THIS IS MY SUGGESTION AND CURRENTLY NOT HOW IT WORKS AND ONLY VALID FOR GRADIENT BASED OPTIMIZATION ALGORITHMS I GUESS?
If we don't want `p` to be negative we need to use a flag (e.g., `is_pertubation`)**

Another thing to note is that continuity for `realizations` between `ensemble` exists; however, this is not the case for `simulations` in `batches`.
A `batch` can contain several different configurations (Fig 5) and `simulation 0` for `<GEO_ID> = 0` can be either `unperturbed`
or `perturbed controls`. `<GEO_ID>` is continuous from one `batch` to the next since they are not changing at all over the course of the optimization.

.. figure:: images/Everest_vs_Ert_03.png
:align: center
:width: 700px
:alt: Other `batch` configurations EVEREST

Two other possible configurations of EVEREST `batches` in the context of gradient-based optimization algorithms (i.e., `optpp_q_newton`).
Binary file modified docs/everest/images/Everest_vs_Ert_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/everest/images/Everest_vs_Ert_02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/everest/images/Everest_vs_Ert_03.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0682e1b

Please sign in to comment.