Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update evaluation logic for dashboard support #62

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

prateekdesai04
Copy link
Collaborator

@prateekdesai04 prateekdesai04 commented Oct 23, 2023

Description of changes:
This PR handles the case where if multiple cleaned CSVs having been run on different folds are being evaluated.
Initially evaluation was only possible if all were using same number of folds.
This sets the folds to the least of all the cleaned CSVs being evaluated.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comment on lines +160 to +168
dataframes = []
for path in paths:
path = path if is_s3_url(path) else os.path.join(self.results_dir_input, path)
dataframe = pd.read_csv(path)
dataframes.append(dataframe)
# Discarding extra folds
min_num_rows = min(len(df) for df in dataframes)
trimmed_dataframes = [df[:min_num_rows] for df in dataframes]
return pd.concat(trimmed_dataframes, ignore_index=True, sort=True)
Copy link
Contributor

@Innixma Innixma Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not discard extra folds properly. Please add a unit test and separate out the filtering logic so it is not hard-coded into the load_results_raw method.

  1. Not all DataFrames loaded will have the same number of methods or datasets, so trimming by length of rows will not work.
  2. We don't want to always filter extra folds. This should be a post-load operation that is optional.
  3. You are assuming the input file is sorted by fold. This is not a valid assumption.

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to above comment

dataframe = pd.read_csv(path)
dataframes.append(dataframe)
# Discarding extra folds
min_num_rows = min(len(df) for df in dataframes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple datasets in results file? min() will not do what it's intended right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants