-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
output folder DNE fix and audit log addition #847
Conversation
…output-folder-dne
@@ -152,8 +152,16 @@ boolean convertToCdf() { | |||
LoadedCvrData parseAndCountCastVoteRecords(BiConsumer<Double, Double> progressUpdate) | |||
throws CastVoteRecordGenericParseException { | |||
ContestConfig config = ContestConfig.loadContestConfig(configPath); | |||
boolean setUpLoggingSuccess = setUpLogging(config.getOutputDirectory(), "check-ballot-counts"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the stuff in these logs going to be duplicated in the "tabulate" log so, I'd argue that we shouldn't log in this case at all (which was an open question in #846). This would also allow us to remove the extra prefix
field, which is a bonus IMO. Reasons:
- No tabulation actually occurs / no files are written
- Adds clutter (HD space and number of files)
- The prefixes make it so log files are no longer at the top of the directory (since "audit" comes before most other words), and that they're potentially split in the file list, so harder to find all relevant logs
- Simpler code
CC: @yezr to comment if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all your points, happy to defer to @yezr.
One benefit is that it may be confusing to have things outputted to the GUI log that are not in the audit log. But it's essentially duplicated / irrelevant info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking with @yezr:
- Don't include Chehck Ballot Counts audit log
- Do check checksums to ensure no change between Check and Tabulate
I'll get to this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into this and it's not as straightforward as I had hoped because some CVRs are directories, not files.
Some considerations:
- I could hash every file in a directory, but that several downsides, including having to recursively search a potentially very big directory with unnecessary files. We don't mandate that the CVR directory is free of clutter today, and I don't know if we should add that restriction. What if the Dominion CVR directory is just C:/ ? It will also fail if files are opened and temp hidden auto-save files are created, or .DS_STORE on mac.
- I could enumerate every file we read in the directory, but this feels error-prone: what if a future release reads a new file from the directory, and we forget to add it to the list of files to hash? That feels like an easy-to-miss bug.
(1) is easier, but (2) seems safer. Still, it makes me want to move away from file hashing, and towards something a little cleaner.
- I could hash each CVR, but I run into the same issue as (2): what if, in the future, a new field is added? This feels like a hard-to-catch bug once again, requiring manually spotting it via a PR.
- I could hash the entire RCTab CVR, since we have plans to test and mandate that this file contains all information needed to exactly reproduce the election results.
So I'm leaning towards the latter: a checksum validation of the RCTab CVR. The downside is: (1) that means the file can never include timestamps, and (2) we can't inform the user which of their CVR sources has changed.
To sum up the four options:
Checksum Of | Downside |
---|---|
Every file in Directory | Clutter in directory |
Only relevant files in directory | Easy to miss a file in the future |
Every field in the CastVoteRecord data structure | Easy to miss a field in the future |
Entire RCTab CVR | Lost mapping from changed field to CVR that changed; requires generating this file twice |
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a can of worms! I think we should kick the can of worms down the road and break this out into a separate issue, just submitting this PR as-is with the counting logging removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created #853 to address stricter CVR comparisons outside of this PR
…output-folder-dne Also remove double tabulation logging
closes #846
Open questions to be discussed in the Issue: should we include the audit log for the Check Ballot Counts step?
This PR does two things: