Skip to content

modestpy API (outdated)

Krzysztof Arendt edited this page Aug 27, 2020 · 1 revision

Introduction

Users are supposed to use only modestpy.Estimation class and its two methods estimate() and validate(). The class defines a single interface for different optimization algorithms. Currently, the available algorithms are: genetic algorithm (GA), pattern search (PS) and sequential quadratic programming (SQP). The methods can be used in a sequence, e.g. GA+PS (default) or GA+SQP, using the argument methods. All estimation settings are set during instantiation. Results of estimation and validation are saved in the working directory workdir (it must exist).

Learn by examples

First define the following variables:

  • workdir (str) - path to the working directory (it must exist)
  • fmu_path (str) - path to the FMU compiled for your platform
  • inp (pandas.DataFrame) - inputs, index given in seconds and named time
  • est (dict(str : tuple(float, float, float))) - dictionary mapping parameter names to tuples (initial guess, lower bound, upper bound)
  • known (dict(str : float)) - dictionary mapping parameter names to known values
  • ideal (pandas.DataFrame) - ideal solution (usually measurements), index given in seconds and named time

Indexes of inp and ideal must be equal, i.e. inp.index == ideal.index must be True. Columns in inp and ideal must have the same names as model inputs and outputs, respectively. All model inputs must be present in inp, but only chosen outputs may be included in ideal. Data for each variable present in ideal are used to calculate the error function that is minimized by modestpy.

Now the parameters can be estimated using default settings:

>>> session = Estimation(workdir, fmu_path, inp, known, est, ideal)
>>> estimates = session.estimate()  # Returns dict(str: float)
>>> err, res = session.validate()   # Returns tuple(dict(str: float), pandas.DataFrame)

All results are also saved in workdir.

By default all data from inp and ideal are used in both estimation and validation. In other words, the data set taken into account is always from inp.index[0] to ideal.index[-1]. To slice the data into separate learning and validation periods, additional arguments need to be defined:

  • lp_n (int) - number of learning periods, randomly selected within lp_frame
  • lp_len (float) - length of single learning period
  • lp_frame (tuple(float, float)) - beginning and end of learning time frame
  • vp (tuple(float, float)) - validation period

Often model parameters are used to define the initial condition in the model, in example initial temperature. The initial value has to be read from the measured data which is stored in ideal. You can do this with the optional argument ic_param:

  • ic_param (dict(str : str)) - maps model parameters to column names in ideal

Estimation algorithms (GA, PS, SQP) can be tuned by overwriting specific keys in ga_opts, ps_opts and sqp_opts. The default options are:

  • ga_opts = {'maxiter': 50, 'pop_size': max(4 * n_parameters, 20), 'tol': 1e-6, 'mut': 0.10, 'mut_inc': 0.33, 'uniformity': 0.5, look_back':50, 'lhs': False}

  • ps_opts = {'maxiter': 500, 'rel_step':0.02, 'tol': 1e-11, 'try_lim': 1000}

  • sqp_opts = {'scipy_opts': {'disp': True, 'iprint': 2, 'maxiter': 150, 'full_output': True}}

Exemplary estimation using customized settings:

>>> session = Estimation(workdir, fmu_path, inp, known, est, ideal,
                         lp_n=5, lp_len=25000, lp_frame=(0, 150000),
                         vp=(150000, 215940), ic_param={'Tstart': 'T'},
                         methods=('GA', 'PS'),
                         ga_opts={'maxiter': 5, 'tol': 0.001, 'lhs': True},
                         ps_opts={'maxiter': 500, 'tol': 1e-6})

>>> estimates = session.estimate()
>>> err, res = session.validate()

NOTES

  • inp and ideal DataFrames must have index named time. This is to avoid a common user mistake of loading DataFrame from a csv and forgetting to set the right index. The index should be in seconds.

Docstrings

Instantiation

    def __init__(self, workdir, fmu_path, inp, known, est, ideal,
                 lp_n=None, lp_len=None, lp_frame=None, vp=None,
                 ic_param=None, methods=('GA', 'PS'), ga_opts={}, ps_opts={}, sqp_opts={},
                 fmi_opts={}, ftype='RMSE', seed=None,
                 default_log=True, logfile='modestpy.log'):
        """
        Index in DataFrames ``inp`` and ``ideal`` must be named 'time'
        and given in seconds. The index name assertion check is
        implemented to avoid situations in which a user reads DataFrame
        from a csv and forgets to use ``DataFrame.set_index(column_name)``
        (it happens quite often...). TODO: Check index name assertion.

        Currently available estimation methods:
            - GA - genetic algorithm
            - PS - pattern search (Hooke-Jeeves)
            - SQP - sequential quadratic programming (SciPy implementation)

        Parameters:
        -----------
        workdir: str
            Output directory, must exist
        fmu_path: str
            Absolute path to the FMU
        inp: pandas.DataFrame
            Input data, index given in seconds and named ``time``
        known: dict(str: float)
            Dictionary with known parameters (``parameter_name: value``)
        est: dict(str: tuple(float, float, float))
            Dictionary defining estimated parameters,
            (``par_name: (guess value, lo limit, hi limit)``)
        ideal: pandas.DataFrame
            Ideal solution (usually measurements), 
            index in seconds and named ``time``
        lp_n: int or None
            Number of learning periods, one if ``None``
        lp_len: float or None
            Length of a single learning period, entire ``lp_frame`` if ``None``
        lp_frame: tuple of floats or None
            Learning period time frame, entire data set if ``None``
        vp: tuple(float, float) or None
            Validation period, entire data set if ``None``
        ic_param: dict(str, str) or None
            Mapping between model parameters used for IC and variables from ``ideal``
        methods: tuple(str, str)
            List of methods to be used in the pipeline
        ga_opts: dict
            Genetic algorithm options
        ps_opts: dict
            Pattern search options
        sqp_opts: dict
            SQP solver options
        fmi_opts: dict
            Additional options to be passed to the FMI model (e.g. solver tolerance)
        ftype: string
            Cost function type. Currently 'NRMSE' (advised for multi-objective estimation) or 'RMSE'.
        seed: None or int
            Random number seed. If None, current time or OS specific randomness is used.
        default_log: bool
            If true, use default logging settings. Use false if you want to use own logging.
        logfile: str
            If default_log=True, this argument can be used to specify the log file name
        """

Estimation

    def estimate(self, get='best'):
        """
        Estimates parameters.

        Returns average or best estimates depending on ``get``.
        Average parameters are calculated as arithmetic average
        from all learning periods. Best parameters are those which
        resulted in the lowest error during respective learning period.
        It is advised to use 'best' parameters.

        The chosen estimates ('avg' or 'best') are saved
        in a csv file ``final.csv`` in the working directory.
        In addition estimates and errors from all learning periods 
        are saved in ``best_per_run.csv``.

        Parameters
        ----------
        get: str, default 'best'
            Type of returned estimates: 'avg' or 'best'

        Returns
        -------
        dict(str: float)
        """

Validation

        """
        Performs a simulation with estimated parameters
        for the previously selected validation period. Other period
        can be chosen with the `vp` argument. User chosen `vp` in this method
        does not override the validation period chosen during instantiation
        of this class.

        Parameters
        ----------
        vp: tuple or None
            Validation period given as a tuple of start and stop time in seconds.

        Returns
        -------
        dict
            Validation error, keys: 'tot', '<var1>', '<var2>', ...
        pandas.DataFrame
            Simulation result
        """
Clone this wiki locally