-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pa): dynamic backscraper and update to new source #968
Conversation
Implemented a new scraper targeting the API instead of the RSS feed. Since we needed to backscrape, that way we can target custom dates Also, updated the example files Helps solve freelawproject#967
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grossir @flooie I think it works great, but i have a question, in many results we have different opinions of the same case and i was wondering if it should be returned as a single item(case with multiple opinions) or separate items as it is now
We have a majority opinion, concurring and dissent opinion and I think the management command in CL only creates one opinion, so i think those would be created as independent opinions instead of having them all in one cluster. Or how is it handled for these cases?
{
"Author": null,
"BoardDocketNumber": null,
"Caption": "Commonwealth v. Cosby, Jr., W., Aplt. - No. 39 MAP 2020",
"CourtDocketNumber": null,
"CourtType": 3,
"DispositionDate": "2021-06-30T00:00:00",
"Keywords": null,
"UserIdentifier": "M.D. Prothonotary",
"UploadDate": "0001-01-01T00:00:00",
"PostedToday": false,
"Postings": [
{
"Id": 82106,
"AuthorId": "Wecht, David N.",
"OpinionId": 74452,
"FileName": "J-100-2020mo - 104821740139246918.pdf",
"ProcessedDate": "2021-06-30T00:00:00",
"PostingTypeId": "mo",
"PublicationTypeId": null,
"RenderedDate": "2021-06-30T00:00:00",
"SortOrder": 0,
"FileVersion": 3,
"Author": {
"Id": 0,
"AuthorName": "Justice David Wecht",
"AuthorCode": "Wecht, David N.",
"Selectable": true,
"SortOrder": 1450
},
"PostType": {
"Id": 0,
"PostingTypeCode": "mo",
"PostingTypeId": "Majority Opinion",
"SortOrder": null
},
"PublicationType": null
},
{
"Id": 82107,
"AuthorId": "Dougherty, Kevin M.",
"OpinionId": 74452,
"FileName": "J-100-2020cdo - 104821740139246932.pdf",
"ProcessedDate": "2021-06-30T00:00:00",
"PostingTypeId": "cd",
"PublicationTypeId": null,
"RenderedDate": "2021-06-30T00:00:00",
"SortOrder": 0,
"FileVersion": 1,
"Author": {
"Id": 0,
"AuthorName": "Justice Kevin Dougherty",
"AuthorCode": "Dougherty, Kevin M.",
"Selectable": true,
"SortOrder": 1440
},
"PostType": {
"Id": 0,
"PostingTypeCode": "cd",
"PostingTypeId": "Concurring and Dissenting Opinion",
"SortOrder": null
},
"PublicationType": null
},
{
"Id": 82108,
"AuthorId": "Saylor, Thomas G.",
"OpinionId": 74452,
"FileName": "J-100-2020do - 104821740139246963.pdf",
"ProcessedDate": "2021-06-30T00:00:00",
"PostingTypeId": "do",
"PublicationTypeId": null,
"RenderedDate": "2021-06-30T00:00:00",
"SortOrder": 0,
"FileVersion": 1,
"Author": {
"Id": 0,
"AuthorName": "Justice Thomas G. Saylor",
"AuthorCode": "Saylor, Thomas G.",
"Selectable": false,
"SortOrder": 0
},
"PostType": {
"Id": 0,
"PostingTypeCode": "do",
"PostingTypeId": "Dissenting Opinion",
"SortOrder": null
},
"PublicationType": null
}
],
"CreatedById": null,
"DeletedById": null,
"UpdatedById": null,
"CreatedOn": null,
"DeletedOn": null,
"UpdatedOn": null,
"CreatedBy": null,
"DeletedBy": null,
"UpdatedBy": null,
"Id": 74452
},
This is indeed an enhancement we've been tracking for months. The OpinionCluster model has a one-to-many relation to the Opinion model, so it is what we should be doing. However, we would need to change both Juriscraper's OpinionSite(Linear) and cl_scrape_opinions for it to work. I bundled this and other changes into a proposed new scraper/site class some time ago, but is still pending of review. You can check it some more here #883 (comment) |
@grossir can we figure out the conflicts here and does this need the enhanced v3 juriscraper? |
To actually return and use OpinionClusters, we will indeed need a different approach both in juriscraper and courtlistener, and that's not ready yet It seems there were no conflicts, I just merged main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but needs a few changes.
…on types Also, update example files
@flooie I updated the PR with the suggested changes; also added support for getting "per_curiam" |
This is great thanks @grossir |
Implemented a new scraper targeting the API instead of the RSS feed. Since we needed to backscrape, that way we can target custom dates Also, updated the example files
Helps solve #967