Parsing error for citations with defendant 'Thompson' #174

ERosendo · 2024-03-28T15:18:11Z

In issue #3924, we identified a bug in Eyecite's parsing method when the defendant's last name is 'Thompson'.

For example, for the citation 'Shapiro v. Thompson, 394 U. S. 618':

Expected output: volume: 394, reporter: 'U.S.', page: '618'
Actual output: volume: None, reporter: 'Thompson', page: '394'

Other examples of inputs that are incorrectly parsed are: Adams v. Thompson, 560 F. Supp. 894 and Mozena v. Thompson, 44 A.2d 276.

I've been using the first example to debug this issue, and noticed that Eyecite identifies two tokens within the input string: "Thompson's Unreported Cases (TN)" and "United States Supreme Court Reports.". The problem arises because these tokens overlap (both include "394") and Eyecite's tokenize method prioritizes the rightmost token when encountering overlaps, leading to this results.

The text was updated successfully, but these errors were encountered:

mlissner · 2024-03-28T17:11:40Z

Any idea how easy this is to solve so that it identifies each?

mlissner · 2024-04-01T20:54:38Z

Per discussion today, seems to be happening when citations appear to overlap. The simple solution here is to find both citations that overlap and then filter out the one that's incomplete.

quevon24 · 2025-01-07T16:55:56Z

The problem here is that we have Thompson as a nominative reporter (optional volume), that's why it is detected as a citation.

We could detect the overlap and see which one is incomplete but according to the reporters-db Thompson regex, Thompson , 394 is a valid citation, we get two valid citations: Thompson , 394 and 394 U. S. 618

This can be replicated with other nominative reporters like:

Foo v. Cooke (Tennessee Reports, Cooke)
Foo v. Holmes (Holmes Circuit Court Reports (US))
Foo v. McCahon (Kansas Reports, McCahon)
Foo v. Olcott (Olcott's Admiralty Reports)
Foo v. Taney (Taney's United States (US) Circuit Court Reports)

mlissner · 2025-01-07T18:07:44Z

Seems like some post processing could detect and remedy this. The Thompson one may be technically complete, but it's the same data. So maybe: If there's overlap and the data overlaps too, then....

flooie · 2025-01-07T20:22:10Z

@mlissner I like your solution -

if there is an overlapping span - we can check if one is a complete citation and if one is nominative and choose the complete citation.

ERosendo changed the title ~~parsing error for citations with defendant 'Thompson'~~ Parsing error for citations with defendant 'Thompson' Mar 28, 2024

mlissner added this to @erosendo's backlog Apr 1, 2024

mlissner moved this to Main Backlog in @erosendo's backlog Apr 1, 2024

ERosendo moved this from Main Backlog to Bots Backlog in @erosendo's backlog Apr 15, 2024

github-project-automation bot added this to Citator Oct 12, 2024

github-project-automation bot added this to Case Law Sprint Nov 15, 2024

flooie moved this to General Backlog in Case Law Sprint Nov 19, 2024

flooie moved this from General Backlog to To Do in Case Law Sprint Dec 17, 2024

quevon24 linked a pull request Jan 9, 2025 that will close this issue

Fix bad parsing for citations with defendant similar to nominative reporter #190

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing error for citations with defendant 'Thompson' #174

Parsing error for citations with defendant 'Thompson' #174

ERosendo commented Mar 28, 2024 •

edited

Loading

mlissner commented Mar 28, 2024

mlissner commented Apr 1, 2024

quevon24 commented Jan 7, 2025 •

edited

Loading

mlissner commented Jan 7, 2025

flooie commented Jan 7, 2025

Parsing error for citations with defendant 'Thompson' #174

Parsing error for citations with defendant 'Thompson' #174

Comments

ERosendo commented Mar 28, 2024 • edited Loading

mlissner commented Mar 28, 2024

mlissner commented Apr 1, 2024

quevon24 commented Jan 7, 2025 • edited Loading

mlissner commented Jan 7, 2025

flooie commented Jan 7, 2025

ERosendo commented Mar 28, 2024 •

edited

Loading

quevon24 commented Jan 7, 2025 •

edited

Loading