-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update evaluation logic for dashboard support #62
base: master
Are you sure you want to change the base?
Conversation
dataframes = [] | ||
for path in paths: | ||
path = path if is_s3_url(path) else os.path.join(self.results_dir_input, path) | ||
dataframe = pd.read_csv(path) | ||
dataframes.append(dataframe) | ||
# Discarding extra folds | ||
min_num_rows = min(len(df) for df in dataframes) | ||
trimmed_dataframes = [df[:min_num_rows] for df in dataframes] | ||
return pd.concat(trimmed_dataframes, ignore_index=True, sort=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not discard extra folds properly. Please add a unit test and separate out the filtering logic so it is not hard-coded into the load_results_raw
method.
- Not all DataFrames loaded will have the same number of methods or datasets, so trimming by length of rows will not work.
- We don't want to always filter extra folds. This should be a post-load operation that is optional.
- You are assuming the input file is sorted by fold. This is not a valid assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to above comment
dataframe = pd.read_csv(path) | ||
dataframes.append(dataframe) | ||
# Discarding extra folds | ||
min_num_rows = min(len(df) for df in dataframes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are multiple datasets in results file? min()
will not do what it's intended right?
Description of changes:
This PR handles the case where if multiple cleaned CSVs having been run on different folds are being evaluated.
Initially evaluation was only possible if all were using same number of folds.
This sets the folds to the least of all the cleaned CSVs being evaluated.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.