-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Juriscraper to Support Bundling of Separate Opinions #883
Comments
To be clear here, what you're proposing is upgrading Juriscraper to return multiple opinion objects under one key, like we have with clusters/opinions in CL itself, right? Assuming so, can you provide a link or screenshot or something as an example? |
Yes - I was working this thru in my head - before I laid out my vision. |
I fixed and rewrote part of Connecticut - to take advantage of the We can take these results and either call a method to combine the multiple opinions here into clusters and only slightly modify CL to save each opinion together with the cluster |
I'd expect this to mirror the fields in CL pretty closely. Why not do the joining in JS so that CL has a nice JSON object of clusters with nested opinions? |
Supports new juriscraper scraper class and returned objects, and also keeps legacy interface - Supports: freelawproject/juriscraper#883 - Supports: freelawproject/juriscraper#889
I checked the changes required on Courtlistener to support this new paradigm, while still supporting the legacy scrapers. I found the following:
Even if we return objects of the following shape we would have to return an item for each opinion (because of dup checking), causing a somewhat ugly duplication
Here is a branch where I show the changes needed in CL, which turned out rather small. This is still a concept, would have to be tested and improved |
Gianfranco, it's very OK to change CL as part of this, if it means making the interface better while hitting our design requirements. I'd rather do this now and have something we like instead of being stuck with half measures. Does that change your thinking about approach? |
It took quite some time but I have a draft working on integration with Courtlistener (which will be another parallel PR) ResultsI used How it currently looks on Courtlistener Also, the scraper captures Implementation detailsIt's better to look at the code, even if there is still pending work. I have written comments extensively. On Courtlistener: Besides the "code" code review, I will need some "data" code review, to see if I am using properly the Of note, I found a way to keep tests of secondary/deferred page's examples. For Pending workI still have a bunch of bugs to solve and tests to write for this to be mergeable
Further workThere is a clear opportunity to scrape Some bugs found on the wayBugs on OpinionSite[Linear] integration with CL: Attributes that we can return but are never picked up in CL (defined on OpinionSite class)
These are actually used on some sources, so we are not inserting data we do collect. For example, lower_courts is used in |
Issue Description:
Currently, a handful of courts provide separate opinions in their opinion lists, which are not currently supported by juriscraper and CourtListener (CL). This lack of support for bundling separate opinions can lead to incomplete or segmented case information being scraped and processed.
Suggested Enhancement:
I propose updating juriscraper to allow for the bundling of separate opinions. This enhancement would ensure that all opinions related to a case are collected and processed together, providing a more comprehensive view of the case proceedings and decisions.
Courts: (in progress list)
The text was updated successfully, but these errors were encountered: