From 359232a1a4d562dcb3db03601a3807506a726673 Mon Sep 17 00:00:00 2001
From: Jeff Newman <jeff@driftless.xyz>
Date: Fri, 27 Dec 2024 10:56:28 -0600
Subject: [PATCH] Scaling Data Tables

---
 docs/content/running.md | 109 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 107 insertions(+), 2 deletions(-)

diff --git a/docs/content/running.md b/docs/content/running.md
index 3819274..2926418 100644
--- a/docs/content/running.md
+++ b/docs/content/running.md
@@ -49,6 +49,10 @@ class VEModel(FilesCoreModel): # (1)!
 
 You can see some examples of the `FilesCoreModel` interface [here](https://github.com/tmip-emat/tmip-emat-ve/blob/1f62205652d26389d1767a2cdb0ba1fabe057dea/emat_verspm.py#L36)
 and [here](https://github.com/tmip-emat/ve-integration/blob/ff9e83fdc5adf4414dd9454dd980d6f32464bf09/emat_ve_wrapper.py#L40).
+The process for creating a new analysis with EMAT and VisionEval includes creating
+a similar class that is a subclass of `FilesCoreModel`, and then defining the specific
+methods needed to carry out the steps of the integration.  This can be done from
+scratch, or by copying and modifying an existing example.
 
 ## Setting Up an Experiment
 
@@ -101,6 +105,11 @@ based on the scope:
 - [Direct Injection](#direct-injection)
 - [Custom Methods](#custom-methods)
 
+Each of these methods can be implemented in a bespoke manner for each specific
+input parameter (both policy levers and exogenous uncertainties), or you can use
+generic methods that can be applied to a wide range of input parameters.  The generic
+approach is shown in the example repositories.
+
 ### Categorical Drop-In
 
 Many of the input files for VisionEval are in the form of CSV files.  The simplest way
@@ -397,8 +406,104 @@ Scenario-Inputs/
 
 ### Scaling Data Tables
 
-Forthcoming: documentation of the `_manipulate_by_scale` method, which allows for
-scaling of single input files.
+The scaling data tables method is much like the mixture of data tables method, but
+instead of linearly interpolating between two input files, the scaling data tables
+method will scale all the values in selected columns of an input file up or down based on the 
+value of the policy lever or exogenous uncertainty.  This is useful for continuous
+inputs that are best represented as a single table, but where the values in that
+table can be scaled up or down.  For example, you may have a population projection
+that represents a "baseline" scenario, and you want to explore the effects of
+different levels of population growth.
+
+The `_manipulate_by_scale` function shown below can be included in an integration's 
+subclass of `FilesCoreModel`, and used to scale the values in the input files 
+based on the value of the policy lever or exogenous uncertainty.
+
+``` python
+def _manipulate_by_scale(
+    self, 
+    params, # (1)!
+    param_map, # (2)!
+    ve_scenario_dir, # (3)!
+    max_thresh=1E9, # (4)!
+):
+    # Gather list of all files in scenario input directory 
+    filenames = []
+    for i in os.scandir(scenario_input(ve_scenario_dir)):
+        if i.is_file():
+            filenames.append(i.name)
+
+    for filename in filenames:
+        df1 = pd.read_csv(scenario_input(ve_scenario_dir,filename))
+        for param_name, column_names in param_map.items():
+            if isinstance(column_names, str):
+                column_names = [column_names]
+            for column_name in column_names:
+                df1[[column_name]] = (
+                    df1[[column_name]]
+                    * params.get(param_name)
+                ).clip(lower=-max_thresh, upper=max_thresh) # (5)!
+
+        out_filename = join_norm(
+            self.resolved_model_path, 'inputs', filename
+        )
+        df1.to_csv(out_filename, index=False, float_format="%.5f", na_rep='NA')
+```
+
+1.  The `params` dictionary is passed through from the `setup` method to the
+    `_manipulate_by_scale` method.
+2.  The `param_map` argument is a dictionary that maps the parameter names in the
+    `params` dictionary to the column names in the input file that should be scaled.
+3.  The `ve_scenario_dir` argument is the directory where the input files for the 
+    scaling are stored.
+4.  The `max_thresh` argument is the maximum value that any value in the input file
+    can be scaled to.  This is important to ensure that the scaled values are not too
+    large or too small.
+5.  The `clip` method is used to ensure that the scaled values are not too large or too
+    small.  This is important to ensure that the scaled values are within the range of
+    values that VisionEval is expecting.
+
+If you use this approach, you would not set the `min` and `max` values for the 
+relevant parameter in the exploratory scope definition to 0 and 1, as you would 
+for the mixture model.  Instead, set those limits to the minimum and maximum
+values that you want to use for the scaling factor.  The upper and lower limits
+need not be symmetric around 1.0, as the scaling factor can be used to scale 
+values up or down, or both.
+
+``` yaml
+    LUDENSITYMIX:
+        shortname: Urban Mix Prop
+        address: LUDENSITYMIX
+        ptype: exogenous uncertainty
+        dtype: float # (1)!
+        desc: Urban proportion for each marea by year
+        default: 0
+        min: 0.75 # (2)!
+        max: 1.5 # (3)!
+```
+
+1.  The `dtype` is set to `float` to indicate that this is a continuous input, 
+    which can take on a range of values.
+2.  The `min` value for scaling factor can be any value.  Positive values less
+    than or equal to 1.0 are most common, but negative values are also allowable,
+    if the signs on the targeted values might be inverted.
+3.  The `max` value for the scaling factor can be any value.  Positive values
+    greater than or equal to 1.0 are most common.
+
+The function `__manipulate_by_scale` written above also implies that the input 
+files on which the scaling factor are applied are in the scenario directory, not
+a subdirectory of the scenario directory, as was the case in mixture models.  
+This is because the scaling factor method is applied to a single set of inputs,
+so there's no need to have multiple subdirectories for the input files.
+
+``` tree    
+Scenario-Inputs/
+    OTP/
+        ANOTHER_PARAMETER/
+        LUDENSITYMIX/
+            marea_mix_targets.csv
+        OTHER_PARAMETER/
+```
 
 ### Additive Data Tables