Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize CVR identification across audit logs and RCTab output #913

Draft
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

nurse-the-code
Copy link
Collaborator

@nurse-the-code nurse-the-code commented Jan 9, 2025

Key Changes:

  • Fixes getId method to use logic that prints "RCTab CVR Id" to audit log.
    • Uses computedId as primary identifier with fallback to suppliedId
  • Added getComputeId method to CVR to print computeId to rctab_cvr.csv as `"Vendor Id"
  • Updates both audit logs and rctab_cvr.csv to use the getId method
  • Changes Dominion computedId delimiter from pipe (|) to dash (-) for improved readability
  • Adds column "RCTab CVR Id" to RCTab CVR output

Benefits:

  • Same ballot maintains the same identifier across multiple tabulations by using centralized getId logic
  • Easier to track specific ballots across audit logs and RCTab CVR output
  • More intuitive ID format using dashes instead of pipes
  • Moving toward consistent identification approach throughout the application

Known Issues:

  • See TODOs in code asking if we can to use the updated getId method, the getComputedId, or something else.

Testing Needed:

  1. Verify ID consistency:
    • ✅ Compare IDs in audit log with rctab_cvr.csv output
    • ✅ Ensure id returned from getId method is consistent across multiple tabulations
    • ✅ Check ID format follows tabulator-batch-record pattern for records from Dominion CVR sources
  2. Test fallback behavior:
    • Cases where computedId is null
    • Cases where computedId is empty or whitespace
    • Cases where only suppliedId is available
  3. Test with each CVR provider
    • ✅ CDF/Unisyn
    • ✅ Clear Ballot
    • ✅ Dominion (tested with 2024 Alaska US House race)
    • ✅ ES&S (in test suite)
    • ✅ Hart
    • ✅ generic CSV
  4. Review and update failing tests:
    • ✅ Done for now

Feature Request: Add Standardized Unique Identifiers for Cast Vote Records
#911
Replace UUID-based identification with a more consistent ID system that
uses the same identifier across audit logs and RCTab CVR output. The ID
is primarily derived from the computedId (tabulator-batch-record) with
fallback to suppliedId when computedId is unavailable.

Unlike UUIDs which change between tabulations, this approach ensures the
same ballot maintains the same identifier across multiple tabulations,
making it easier to track specific ballots across different runs.

Changes:
- Remove UUID generation for CVR identification
- Add getPrimaryId() method to centralize ID logic
- Update audit log and RCTab CVR to use the same ID format
- Change ID delimiter from pipe to dash for better readability
- Add TODO to review legacy getId() usage

Note: This change ensures that CVR IDs are consistent between audit logs,
RCTab CVR output, and repeated tabulations, making it easier to track
individual ballots across different output formats and multiple runs.
Comment on lines 132 to 134
String getPrimaryId() {
return !isNullOrBlank(computedId) ? computedId : suppliedId;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a refactoring of the computedId and suppliedId logic used below in the logStringBuilder (which should print a unique CVR Id to the audit log.

Comment on lines 137 to 139
String getId() {
return suppliedId != null ? suppliedId : computedId;
}
Copy link
Collaborator Author

@nurse-the-code nurse-the-code Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic that is used to generate the "Record Id" that gets printed out in the RCTab CVR. It is not clear to me if this was intended to be a unique CVR identifier, because it uses the same computedId and suppliedId that my getPrimaryId uses.

But this value returned by getId() is not unique to every CVR (tested on 2024 Alaska US House race).

Additionally, we should research if we still need to print the "Record Id" to the RCTab CVR (since it is not unique).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yezr thinks we don't need this any more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either fix this with the new logic or remove this method.

Copy link
Collaborator Author

@nurse-the-code nurse-the-code Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix/change this line:
csvPrinter.print(castVoteRecord.getId());
in OutputWriter.java if we refactor, remove or change this method. Currently csvPrinter.print(castVoteRecord.getId()); prints the recordId (at least in Dominion elections).

Copy link
Collaborator Author

@nurse-the-code nurse-the-code Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the code in here to a getSuppliedId method and rename the "Record ID" header in the RCTab CVR to "Vendor Id"

if (outcomeType == VoteOutcomeType.IGNORED) {
logStringBuilder.append(" [was ignored] ");
logStringBuilder.append(" [ was ignored] ");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this change

// rank 2 selection, ... rank maxRanks selection
// RCTab CVR Id, ContestId, TabulatorId, BatchId, RecordId, Precinct, Precinct Portion,
// rank 1 selection, rank 2 selection, ... rank maxRanks selection
csvPrinter.print("RCTab CVR Id");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this down a couple lines to be after the "Contest Id"

@@ -737,6 +738,7 @@ String writeRcTabCvrCsv(
}

CastVoteRecord castVoteRecord = castVoteRecords.get(i);
csvPrinter.print(castVoteRecord.getPrimaryId());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this down a couple lines to be after the "Contest Id"

@@ -981,6 +984,7 @@ private Map<String, Object> generateCvrSnapshotMap(
entry("@type", "CVR.CVRContest"));

return Map.ofEntries(
// TODO: Do we want this to use the getSuppliedId or the getId method?
entry("@id", generateCvrSnapshotId(sanitizeStringForOutput(cvr.getId()), round)),
Copy link
Collaborator Author

@nurse-the-code nurse-the-code Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our choice here should be the same as in the TODO comment added before the sanitizedId definition, so long as we are offering a unique CVR id to include in @id.

@@ -875,6 +877,7 @@ private List<Map<String, Object>> generateCdfMapForCvrs(List<CastVoteRecord> cas
for (CastVoteRecord cvr : castVoteRecords) {
List<Map<String, Object>> cvrSnapshots = new LinkedList<>();
cvrSnapshots.add(generateCvrSnapshotMap(cvr, null, null));
// TODO: Do we want this to use the getSuppliedId or the getId method?
Copy link
Collaborator Author

@nurse-the-code nurse-the-code Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In researching this question, I looked at the NIST Cast Vote Records Common Data Format Specification v 1.0 document. Here are some relevant parts I found.

  • on page 18:
    Screenshot 2025-01-22 at 05 52 02
  • on page 31:
    Screenshot 2025-01-22 at 05 53 46

This prints out the BallotPrePrintedId from the CDF's CVR class. It would seem to me that this should be a vendor-supplied id (e.g. suppliedId). I am not sure that we were using this correctly prior to this PR, because we would sometimes print the computedId (which was never pre-printed on any ballot). I am also not sure what this should look like if there is no vender-supplied id.

Comment on lines -1 to +11
Source Filepath,CVR Provider,Contest Id,Tabulator Id,Batch Id,Record Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-1,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-2,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-3,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-4,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-5,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-6,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-7,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-8,,,George Gervin,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-9,,,George Gervin,Yinka Dare,undervote
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-10,,,Sedale Threatt,George Gervin,undervote
Source Filepath,CVR Provider,Contest Id,RCTab CVR Id,Tabulator Id,Batch Id,Vendor Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-1,,,,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-2,,,,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-3,,,,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-4,,,,,,Mookie Blaylock,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-5,,,,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-6,,,,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-7,,,,,,Yinka Dare,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-8,,,,,,George Gervin,undervote,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-9,,,,,,George Gervin,Yinka Dare,undervote
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-10,,,,,,Sedale Threatt,George Gervin,undervote
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ES&S CVRs, it looks like our new getSuppliedId method (slightly changed from the original getId method results in no "Vendor Id" (formerly "Record Id") getting printed to the rctab_cvr.csv. Instead, that same value is printed as the "RCTab CVR Id" (using the new getId method).

This is because the old getId method (and the new getId method as well) returns a computedIdfor ES&S CVRs. The newgetSuppliedIdreturns an empty string, because ES&S CVRs don't havesuppliedId` associated with them in RCTab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add common id to both audit log and rctab_cvr.csv
1 participant