-
Notifications
You must be signed in to change notification settings - Fork 11
modestpy API (outdated)
Users are supposed to use only modestpy.Estimation
class and its two methods estimate()
and validate()
. The class defines a single interface for different optimization algorithms. Currently, the available algorithms are: genetic algorithm (GA), pattern search (PS) and sequential quadratic programming (SQP). The methods can be used in a sequence, e.g. GA+PS (default) or GA+SQP, using the argument methods
. All estimation settings are set during instantiation. Results of estimation and validation are saved in the working directory workdir
(it must exist).
First define the following variables:
-
workdir
(str) - path to the working directory (it must exist) -
fmu_path
(str) - path to the FMU compiled for your platform -
inp
(pandas.DataFrame) - inputs, index given in seconds and namedtime
-
est
(dict(str : tuple(float, float, float))) - dictionary mapping parameter names to tuples (initial guess, lower bound, upper bound) -
known
(dict(str : float)) - dictionary mapping parameter names to known values -
ideal
(pandas.DataFrame) - ideal solution (usually measurements), index given in seconds and namedtime
Indexes of inp
and ideal
must be equal, i.e. inp.index == ideal.index
must be True
.
Columns in inp
and ideal
must have the same names as model inputs and outputs, respectively. All model inputs must be present in inp
, but only chosen outputs may be included in ideal
. Data for each variable present in ideal
are used to calculate the error function that is minimized by modestpy.
Now the parameters can be estimated using default settings:
>>> session = Estimation(workdir, fmu_path, inp, known, est, ideal)
>>> estimates = session.estimate() # Returns dict(str: float)
>>> err, res = session.validate() # Returns tuple(dict(str: float), pandas.DataFrame)
All results are also saved in workdir
.
By default all data from inp
and ideal
are used in both estimation and validation. In other words, the data set taken into account is always from inp.index[0]
to ideal.index[-1]
. To slice the data into separate learning and validation periods, additional arguments need to be defined:
-
lp_n
(int) - number of learning periods, randomly selected withinlp_frame
-
lp_len
(float) - length of single learning period -
lp_frame
(tuple(float, float)) - beginning and end of learning time frame -
vp
(tuple(float, float)) - validation period
Often model parameters are used to define the initial condition in the model, in example initial temperature. The initial value has to be read from the measured data which is stored in ideal
. You can do this with the
optional argument ic_param
:
-
ic_param
(dict(str : str)) - maps model parameters to column names inideal
Estimation algorithms (GA, PS, SQP) can be tuned by overwriting specific keys in ga_opts
, ps_opts
and sqp_opts
. The default options are:
-
ga_opts = {'maxiter': 50, 'pop_size': max(4 * n_parameters, 20), 'tol': 1e-6, 'mut': 0.10, 'mut_inc': 0.33, 'uniformity': 0.5, look_back':50, 'lhs': False}
-
ps_opts = {'maxiter': 500, 'rel_step':0.02, 'tol': 1e-11, 'try_lim': 1000}
-
sqp_opts = {'scipy_opts': {'disp': True, 'iprint': 2, 'maxiter': 150, 'full_output': True}}
Exemplary estimation using customized settings:
>>> session = Estimation(workdir, fmu_path, inp, known, est, ideal,
lp_n=5, lp_len=25000, lp_frame=(0, 150000),
vp=(150000, 215940), ic_param={'Tstart': 'T'},
methods=('GA', 'PS'),
ga_opts={'maxiter': 5, 'tol': 0.001, 'lhs': True},
ps_opts={'maxiter': 500, 'tol': 1e-6})
>>> estimates = session.estimate()
>>> err, res = session.validate()
NOTES
-
inp
andideal
DataFrames must have index namedtime
. This is to avoid a common user mistake of loading DataFrame from a csv and forgetting to set the right index. The index should be in seconds.
def __init__(self, workdir, fmu_path, inp, known, est, ideal,
lp_n=None, lp_len=None, lp_frame=None, vp=None,
ic_param=None, methods=('GA', 'PS'), ga_opts={}, ps_opts={}, sqp_opts={},
fmi_opts={}, ftype='RMSE', seed=None,
default_log=True, logfile='modestpy.log'):
"""
Index in DataFrames ``inp`` and ``ideal`` must be named 'time'
and given in seconds. The index name assertion check is
implemented to avoid situations in which a user reads DataFrame
from a csv and forgets to use ``DataFrame.set_index(column_name)``
(it happens quite often...). TODO: Check index name assertion.
Currently available estimation methods:
- GA - genetic algorithm
- PS - pattern search (Hooke-Jeeves)
- SQP - sequential quadratic programming (SciPy implementation)
Parameters:
-----------
workdir: str
Output directory, must exist
fmu_path: str
Absolute path to the FMU
inp: pandas.DataFrame
Input data, index given in seconds and named ``time``
known: dict(str: float)
Dictionary with known parameters (``parameter_name: value``)
est: dict(str: tuple(float, float, float))
Dictionary defining estimated parameters,
(``par_name: (guess value, lo limit, hi limit)``)
ideal: pandas.DataFrame
Ideal solution (usually measurements),
index in seconds and named ``time``
lp_n: int or None
Number of learning periods, one if ``None``
lp_len: float or None
Length of a single learning period, entire ``lp_frame`` if ``None``
lp_frame: tuple of floats or None
Learning period time frame, entire data set if ``None``
vp: tuple(float, float) or None
Validation period, entire data set if ``None``
ic_param: dict(str, str) or None
Mapping between model parameters used for IC and variables from ``ideal``
methods: tuple(str, str)
List of methods to be used in the pipeline
ga_opts: dict
Genetic algorithm options
ps_opts: dict
Pattern search options
sqp_opts: dict
SQP solver options
fmi_opts: dict
Additional options to be passed to the FMI model (e.g. solver tolerance)
ftype: string
Cost function type. Currently 'NRMSE' (advised for multi-objective estimation) or 'RMSE'.
seed: None or int
Random number seed. If None, current time or OS specific randomness is used.
default_log: bool
If true, use default logging settings. Use false if you want to use own logging.
logfile: str
If default_log=True, this argument can be used to specify the log file name
"""
def estimate(self, get='best'):
"""
Estimates parameters.
Returns average or best estimates depending on ``get``.
Average parameters are calculated as arithmetic average
from all learning periods. Best parameters are those which
resulted in the lowest error during respective learning period.
It is advised to use 'best' parameters.
The chosen estimates ('avg' or 'best') are saved
in a csv file ``final.csv`` in the working directory.
In addition estimates and errors from all learning periods
are saved in ``best_per_run.csv``.
Parameters
----------
get: str, default 'best'
Type of returned estimates: 'avg' or 'best'
Returns
-------
dict(str: float)
"""
"""
Performs a simulation with estimated parameters
for the previously selected validation period. Other period
can be chosen with the `vp` argument. User chosen `vp` in this method
does not override the validation period chosen during instantiation
of this class.
Parameters
----------
vp: tuple or None
Validation period given as a tuple of start and stop time in seconds.
Returns
-------
dict
Validation error, keys: 'tot', '<var1>', '<var2>', ...
pandas.DataFrame
Simulation result
"""