Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters with more than one index (e.g. region, region) cause a read error #153

Open
willu47 opened this issue Apr 10, 2023 · 2 comments

Comments

@willu47
Copy link
Member

willu47 commented Apr 10, 2023

If a config contains a parameter which contains a duplicate index, such as

TradeRoute:
    indices: [REGION,REGION,FUEL,YEAR]
    type: param
    dtype: float
    default: 0

then an error is raised when reading in the corresponding csv file

NotImplementedError                       Traceback (most recent call last)
Cell In[2], line 24
     20 validate_config(config)
     22 read_strategy = ReadCsv(user_config=config)
---> 24 model, defaults = read_strategy.read(folder_path)
     25 logging.debug(model.keys())

File ~/miniconda3/envs/linopy/lib/python3.11/site-packages/otoole/read_strategies.py:209, in ReadCsv.read(self, filepath, **kwargs)
    207 if entity_type == "param":
    208     df = self._get_input_data(filepath, parameter, details, converter)
--> 209     narrow = self._check_parameter(df, details["indices"], parameter)
    210     if not narrow.empty:
    211         narrow_checked = check_datatypes(
    212             narrow, self.user_config, parameter
    213         )

File ~/miniconda3/envs/linopy/lib/python3.11/site-packages/otoole/read_strategies.py:91, in _ReadTabular._check_parameter(self, df, expected_headers, name)
     87         logger.warning("%s not in header of %s", column, name)
     89 logger.debug("Final all headers for %s: %s", name, all_headers)
---> 91 return narrow[all_headers].set_index(expected_headers)

File ~/miniconda3/envs/linopy/lib/python3.11/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
...
    420     values = sanitize_array(values, None)
    421 else:
    422     # i.e. must be a list

NotImplementedError: > 1 ndim Categorical are not supported at this time

I'm using pandas v1.5.3 and otoole v1.0

@trevorb1
Copy link
Member

Also relates to issue #130

@willu47
Copy link
Member Author

willu47 commented Apr 14, 2023

To reproduce this:

import pandas as pd

data = [
    ['REGIONA', 'REGIONB', 2010, 1],
    ['REGIONA', 'REGIONB', 2020, 2],
    ['REGIONB', 'REGIONA', 2010, 2],
    ['REGIONB', 'REGIONA', 2020, 2],
]
df = pd.DataFrame(data, columns=['REGION', 'REGION', 'YEAR', 'VALUE'])

df.set_index(['REGION', 'REGION', 'YEAR'])

returning the error:

NotImplementedError                       Traceback (most recent call last)
wusher/repository/otoole/Categorical) Index.ipynb Cell 25 in ()
----> [1](vscode-notebook-cell:/Users/wusher/repository/otoole/Categorical%20Index.ipynb#X35sZmlsZQ%3D%3D?line=0) df.set_index(['REGION', 'REGION', 'YEAR'])

File ~/miniconda3/envs/otoole38/lib/python3.9/site-packages/pandas/util/_decorators.py:311), in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/miniconda3/envs/otoole38/lib/python3.9/site-packages/pandas/core/frame.py:5555), in DataFrame.set_index(self, keys, drop, append, inplace, verify_integrity)
   5547     if len(arrays[-1]) != len(self):
   5548         # check newest element against length of calling frame, since
   5549         # ensure_index_from_sequences would not raise for append=False.
   5550         raise ValueError(
   5551             f"Length mismatch: Expected {len(self)} rows, "
   5552             f"received array of length {len(arrays[-1])}"
   5553         )
-> 5555 index = ensure_index_from_sequences(arrays, names)
   5557 if verify_integrity and not index.is_unique:
   5558     duplicates = index[index.duplicated()].unique()
...
    417     values = sanitize_array(values, None)
    418 else:
    419     # i.e. must be a list

NotImplementedError: > 1 ndim Categorical are not supported at this time

We can check for duplicate columns using df.columns.is_unique.

Some relevant reading material from the pandas docs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants