Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pa): dynamic backscraper and update to new source #968

Merged
merged 8 commits into from
Aug 6, 2024

Conversation

grossir
Copy link
Contributor

@grossir grossir commented Mar 26, 2024

Implemented a new scraper targeting the API instead of the RSS feed. Since we needed to backscrape, that way we can target custom dates Also, updated the example files

Helps solve #967

Implemented a new scraper targeting the API instead of the RSS feed. Since we needed to backscrape, that way we can target custom dates
Also, updated the example files

Helps solve freelawproject#967
@grossir grossir requested a review from quevon24 May 8, 2024 15:26
Copy link
Member

@quevon24 quevon24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grossir @flooie I think it works great, but i have a question, in many results we have different opinions of the same case and i was wondering if it should be returned as a single item(case with multiple opinions) or separate items as it is now

for example, here: https://www.pacourts.us/api/opinion?startDate=2021-05-27T00%3A00%3A00-05%3A00&endDate=2021-07-15T00%3A00%3A00-05%3A00&courtType=SUPREME&boardDocketNumber=undefined&courtDocketNumber=undefined&keywords=undefined&postTypes=cd%2Cco%2Cdo%2Cmo%2Coaj&publicationType=undefined&sortDirection=-1

We have a majority opinion, concurring and dissent opinion and I think the management command in CL only creates one opinion, so i think those would be created as independent opinions instead of having them all in one cluster. Or how is it handled for these cases?


{
"Author": null,
"BoardDocketNumber": null,
"Caption": "Commonwealth v. Cosby, Jr., W., Aplt. - No. 39 MAP 2020",
"CourtDocketNumber": null,
"CourtType": 3,
"DispositionDate": "2021-06-30T00:00:00",
"Keywords": null,
"UserIdentifier": "M.D. Prothonotary",
"UploadDate": "0001-01-01T00:00:00",
"PostedToday": false,
"Postings": [
{
"Id": 82106,
"AuthorId": "Wecht, David N.",
"OpinionId": 74452,
"FileName": "J-100-2020mo - 104821740139246918.pdf",
"ProcessedDate": "2021-06-30T00:00:00",
"PostingTypeId": "mo",
"PublicationTypeId": null,
"RenderedDate": "2021-06-30T00:00:00",
"SortOrder": 0,
"FileVersion": 3,
"Author": {
"Id": 0,
"AuthorName": "Justice David Wecht",
"AuthorCode": "Wecht, David N.",
"Selectable": true,
"SortOrder": 1450
},
"PostType": {
"Id": 0,
"PostingTypeCode": "mo",
"PostingTypeId": "Majority Opinion",
"SortOrder": null
},
"PublicationType": null
},
{
"Id": 82107,
"AuthorId": "Dougherty, Kevin M.",
"OpinionId": 74452,
"FileName": "J-100-2020cdo - 104821740139246932.pdf",
"ProcessedDate": "2021-06-30T00:00:00",
"PostingTypeId": "cd",
"PublicationTypeId": null,
"RenderedDate": "2021-06-30T00:00:00",
"SortOrder": 0,
"FileVersion": 1,
"Author": {
"Id": 0,
"AuthorName": "Justice Kevin Dougherty",
"AuthorCode": "Dougherty, Kevin M.",
"Selectable": true,
"SortOrder": 1440
},
"PostType": {
"Id": 0,
"PostingTypeCode": "cd",
"PostingTypeId": "Concurring and Dissenting Opinion",
"SortOrder": null
},
"PublicationType": null
},
{
"Id": 82108,
"AuthorId": "Saylor, Thomas G.",
"OpinionId": 74452,
"FileName": "J-100-2020do - 104821740139246963.pdf",
"ProcessedDate": "2021-06-30T00:00:00",
"PostingTypeId": "do",
"PublicationTypeId": null,
"RenderedDate": "2021-06-30T00:00:00",
"SortOrder": 0,
"FileVersion": 1,
"Author": {
"Id": 0,
"AuthorName": "Justice Thomas G. Saylor",
"AuthorCode": "Saylor, Thomas G.",
"Selectable": false,
"SortOrder": 0
},
"PostType": {
"Id": 0,
"PostingTypeCode": "do",
"PostingTypeId": "Dissenting Opinion",
"SortOrder": null
},
"PublicationType": null
}
],
"CreatedById": null,
"DeletedById": null,
"UpdatedById": null,
"CreatedOn": null,
"DeletedOn": null,
"UpdatedOn": null,
"CreatedBy": null,
"DeletedBy": null,
"UpdatedBy": null,
"Id": 74452
},


@grossir
Copy link
Contributor Author

grossir commented May 14, 2024

This is indeed an enhancement we've been tracking for months. The OpinionCluster model has a one-to-many relation to the Opinion model, so it is what we should be doing. However, we would need to change both Juriscraper's OpinionSite(Linear) and cl_scrape_opinions for it to work. I bundled this and other changes into a proposed new scraper/site class some time ago, but is still pending of review. You can check it some more here #883 (comment)

@flooie
Copy link
Contributor

flooie commented Jul 17, 2024

@grossir can we figure out the conflicts here and does this need the enhanced v3 juriscraper?

@grossir
Copy link
Contributor Author

grossir commented Jul 17, 2024

To actually return and use OpinionClusters, we will indeed need a different approach both in juriscraper and courtlistener, and that's not ready yet

It seems there were no conflicts, I just merged main

Copy link
Contributor

@flooie flooie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but needs a few changes.

@grossir
Copy link
Contributor Author

grossir commented Aug 6, 2024

@flooie I updated the PR with the suggested changes; also added support for getting "per_curiam"

@grossir grossir requested a review from flooie August 6, 2024 00:55
@flooie
Copy link
Contributor

flooie commented Aug 6, 2024

This is great thanks @grossir

@flooie flooie merged commit 404a66b into freelawproject:main Aug 6, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants