-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the Croissant metadata specification #328
Conversation
Hi @amercader - I'm working with MLCommons on a project being run by @benjelloun at Google, aiming to integrate capability for the Croissant metadata specification as widely as possible. Over the last few weeks we've been exploring CKAN and its extension ecosystem, and found that the existing ckanext-dcat extension already provides the needed base functionality. I've created a new branch with a single commit showing all required adjustments. To existing files there are only changes to the list of namespaces, and to the list and import of profiles. The new schema and profile are in new separate files isolated from the existing logic, and have been made according to the documentation (schemas, profiles) and using the existing schemaorg.py profile as a basis. You can find some further conversation on this topic here. I'd be grateful if you can review the PR and consider it for integration into master. Please let me and @benjelloun know if you have any questions. Many thanks for your help. |
@Reikyo this is great and extremely timely as I had started working myself on a croissant profile these last few days. I think this is an extremely valuable feature for sites that we can get ready relatively soon. I'll review asap and get back to you. Regarding where this functionality should live I think it wouldn't be a big task to separate the schema.org/croissant profiles to their own extension but let's focus on getting this ready for now. Thanks, this is really exciting |
@amercader Thanks for the quick reply, much appreciated. You can find some further info about the work done on this here. In particular, see the spreadsheet for indication of the properties and values being considered, and how they are present in the Croissant RDF graph when no scheming schema is applied to modify the UI (column O) and when the new Croissant scheming schema is applied (column S). In the latter case there are, of course, many more output properties, which go up to (but not including) the RecordSet section of the Croissant specification. You'll find a number of notes throughout the new schema (schemas/croissant.yaml) and profile (profiles/croissant.py) to clarify the details and reasoning. Also note that for the schema, I allowed users to define their own "@id"s where applicable, and you can see these used in the profile where "id_given" is present. In the above folder I've also included images and files that show what I'm thinking of as a three step process. There are files showing all three steps when no scheming schema is applied, and when the Croissant scheming schema is applied.
Hopefully this makes it clear exactly what the changes do. |
@Reikyo I had a first pass it looks like all this is going in the right direction. Some points in no particular order:
Let me know if all this makes sense. I can help with any of the points above, in fact I just pushed a new I'll be mostly off the next couple of weeks though, but can pick it up again in January. |
@amercader Firstly happy new year, I hope you had a good break. Similarly I've been off over the last couple of weeks, but looking forward to progressing this issue now. Many thanks for the quick evaluation of the proposed changes, I appreciate the detailed comments. Addressing the comments in turn:
Finally, I've changed the target branch to |
@Reikyo Happy new year! I messed up and pushed some commits to make things easier for you to the See comments below:
This is a bit of confusing CKAN terminology so bear with me :) Extensions (e.g. ckanext-dcat) are Python packages that contain one or more plugins (e.g. In 4be90d3 I added a new You will need to re-run
Commit 104b1ae adds a new endpoint at
Whenever possible I'd try to follow existing names for internal field names in the DCAT profiles, e.g. I'm not an expert but
I created a stub for where the docs should live here: 675c484, I can do the same for tests, etc Let me know if all this makes sense and sorry again for merging the PR |
Moving the conversation from here to #330. |
Added namespace, schema and profile for the Croissant metadata specification (https://docs.mlcommons.org/croissant/docs/croissant-spec.html)