-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize CVR identification across audit logs and RCTab output #913
base: develop
Are you sure you want to change the base?
Conversation
Feature Request: Add Standardized Unique Identifiers for Cast Vote Records #911
Replace UUID-based identification with a more consistent ID system that uses the same identifier across audit logs and RCTab CVR output. The ID is primarily derived from the computedId (tabulator-batch-record) with fallback to suppliedId when computedId is unavailable. Unlike UUIDs which change between tabulations, this approach ensures the same ballot maintains the same identifier across multiple tabulations, making it easier to track specific ballots across different runs. Changes: - Remove UUID generation for CVR identification - Add getPrimaryId() method to centralize ID logic - Update audit log and RCTab CVR to use the same ID format - Change ID delimiter from pipe to dash for better readability - Add TODO to review legacy getId() usage Note: This change ensures that CVR IDs are consistent between audit logs, RCTab CVR output, and repeated tabulations, making it easier to track individual ballots across different output formats and multiple runs.
String getPrimaryId() { | ||
return !isNullOrBlank(computedId) ? computedId : suppliedId; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a refactoring of the computedId
and suppliedId
logic used below in the logStringBuilder
(which should print a unique CVR Id to the audit log.
String getId() { | ||
return suppliedId != null ? suppliedId : computedId; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the logic that is used to generate the "Record Id" that gets printed out in the RCTab CVR. It is not clear to me if this was intended to be a unique CVR identifier, because it uses the same computedId
and suppliedId
that my getPrimaryId
uses.
But this value returned by getId()
is not unique to every CVR (tested on 2024 Alaska US House race).
Additionally, we should research if we still need to print the "Record Id" to the RCTab CVR (since it is not unique).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yezr thinks we don't need this any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either fix this with the new logic or remove this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix/change this line:
csvPrinter.print(castVoteRecord.getId());
in OutputWriter.java
if we refactor, remove or change this method. Currently csvPrinter.print(castVoteRecord.getId());
prints the recordId
(at least in Dominion elections).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the code in here to a getSuppliedId
method and rename the "Record ID" header in the RCTab CVR to "Vendor Id"
…ture/issue-911-rctab-cvr-primary-id
if (outcomeType == VoteOutcomeType.IGNORED) { | ||
logStringBuilder.append(" [was ignored] "); | ||
logStringBuilder.append(" [ was ignored] "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this change
// rank 2 selection, ... rank maxRanks selection | ||
// RCTab CVR Id, ContestId, TabulatorId, BatchId, RecordId, Precinct, Precinct Portion, | ||
// rank 1 selection, rank 2 selection, ... rank maxRanks selection | ||
csvPrinter.print("RCTab CVR Id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this down a couple lines to be after the "Contest Id"
@@ -737,6 +738,7 @@ String writeRcTabCvrCsv( | |||
} | |||
|
|||
CastVoteRecord castVoteRecord = castVoteRecords.get(i); | |||
csvPrinter.print(castVoteRecord.getPrimaryId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this down a couple lines to be after the "Contest Id"
@@ -981,6 +984,7 @@ private Map<String, Object> generateCvrSnapshotMap( | |||
entry("@type", "CVR.CVRContest")); | |||
|
|||
return Map.ofEntries( | |||
// TODO: Do we want this to use the getSuppliedId or the getId method? | |||
entry("@id", generateCvrSnapshotId(sanitizeStringForOutput(cvr.getId()), round)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our choice here should be the same as in the TODO
comment added before the sanitizedId
definition, so long as we are offering a unique CVR id to include in @id
.
@@ -875,6 +877,7 @@ private List<Map<String, Object>> generateCdfMapForCvrs(List<CastVoteRecord> cas | |||
for (CastVoteRecord cvr : castVoteRecords) { | |||
List<Map<String, Object>> cvrSnapshots = new LinkedList<>(); | |||
cvrSnapshots.add(generateCvrSnapshotMap(cvr, null, null)); | |||
// TODO: Do we want this to use the getSuppliedId or the getId method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In researching this question, I looked at the NIST Cast Vote Records Common Data Format Specification v 1.0 document. Here are some relevant parts I found.
This prints out the BallotPrePrintedId
from the CDF's CVR class. It would seem to me that this should be a vendor-supplied id (e.g. suppliedId
). I am not sure that we were using this correctly prior to this PR, because we would sometimes print the computedId
(which was never pre-printed on any ballot). I am also not sure what this should look like if there is no vender-supplied id.
Source Filepath,CVR Provider,Contest Id,Tabulator Id,Batch Id,Record Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3 | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-1,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-2,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-3,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-4,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-5,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-6,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-7,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-8,,,George Gervin,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-9,,,George Gervin,Yinka Dare,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,,,simple_ess_cvr.xlsx-10,,,Sedale Threatt,George Gervin,undervote | ||
Source Filepath,CVR Provider,Contest Id,RCTab CVR Id,Tabulator Id,Batch Id,Vendor Id,Precinct,Precinct Portion,Rank 1,Rank 2,Rank 3 | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-1,,,,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-2,,,,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-3,,,,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-4,,,,,,Mookie Blaylock,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-5,,,,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-6,,,,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-7,,,,,,Yinka Dare,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-8,,,,,,George Gervin,undervote,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-9,,,,,,George Gervin,Yinka Dare,undervote | ||
../_shared/simple_ess_cvr.xlsx,ess,,simple_ess_cvr.xlsx-10,,,,,,Sedale Threatt,George Gervin,undervote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ES&S CVRs, it looks like our new getSuppliedId
method (slightly changed from the original getId
method results in no "Vendor Id"
(formerly "Record Id"
) getting printed to the rctab_cvr.csv
. Instead, that same value is printed as the "RCTab CVR Id"
(using the new getId
method).
This is because the old getId
method (and the new getId method as well) returns a
computedIdfor ES&S CVRs. The new
getSuppliedIdreturns an empty string, because ES&S CVRs don't have
suppliedId` associated with them in RCTab.
Key Changes:
getId
method to use logic that prints"RCTab CVR Id"
to audit log.computedId
as primary identifier with fallback to suppliedIdgetComputeId
method to CVR to printcomputeId
torctab_cvr.csv
as `"Vendor Id"rctab_cvr.csv
to use thegetId
methodcomputedId
delimiter from pipe (|
) to dash (-
) for improved readability"RCTab CVR Id"
to RCTab CVR outputBenefits:
getId
logicKnown Issues:
getId
method, thegetComputedId
, or something else.Testing Needed:
rctab_cvr.csv
outputgetId
method is consistent across multiple tabulations