Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Support for Relation-Based Properties for DCAT profiles #331

Open
jadzlnds opened this issue Jan 24, 2025 · 4 comments
Open

Implement Support for Relation-Based Properties for DCAT profiles #331

jadzlnds opened this issue Jan 24, 2025 · 4 comments

Comments

@jadzlnds
Copy link

In the file dcat_us_full.yaml and dcat_ap_full.yaml there are TODO comments at line 356 and line 299 :

# TODO: relation-based properties are not yet included (e.g. is_version_of, source, sample, etc)

These properties are essential for establishing relationships between datasets, such as:

  • Specifying that a dataset is a version of another dataset (dct:isVersionOf).
  • Indicating that a dataset has a newer version (dct:hasVersion).
  • Linking parts of datasets (dct:hasPart, dct:isPartOf).
  • Referencing source datasets (dct:source).
  • And more (e.g., dct:relation, dct:replaces).

Proposed Solution

Add support for the following relation-based properties to the schema, dataset harvesting, and DCAT export functionality:

Relation-Based Properties to Include

  1. dct:hasVersion
  2. dct:isVersionOf
  3. dct:source
  4. dct:relation
  5. dct:hasPart
  6. dct:isPartOf
  7. dct:conformsTo
  8. dct:hasFormat
  9. dct:isFormatOf
  10. dct:isReferencedBy
  11. dct:references
  12. dct:isReplacedBy
  13. dct:replaces
  14. dct:isRequiredBy
  15. dct:requires

Functionality Overview

  • Schema Update: Extend the dcat_us_full.yaml and dcat_ap_full.yaml schema to support these properties. Each property should allow a list of URIs to accommodate multiple relationships of the same type.
  • RDF Harvesting: Update the RDF parsing logic to extract these properties from incoming RDF metadata.
  • Storage: Save these relationships as CKAN dataset extras(additional attributes) to ensure they are accessible for use within CKAN.
  • DCAT Export: Include these relationships in the DCAT RDF export, allowing users to share relationship metadata via RDF.
@brunopacheco1
Copy link

Hello @amercader, I hope you still remember this topic.

@amercader
Copy link
Member

I do remember it @brunopacheco1 I had many open fronts but next week I'll be able to focus on this and we can work out an initial spec, thanks.

@brunopacheco1
Copy link

Great, @amercader thanks for your prompt reply. Sure, no problem, please advise us the best moment.

@amercader amercader changed the title Implement Support for Relation-Based Properties in dcat_us_full.yaml Implement Support for Relation-Based Properties for DCAT profiles Jan 31, 2025
@amercader
Copy link
Member

Thanks for writing the initial issue details @jadzlnds . I've been thinking about this but sadly been also pulled in different directions so haven't managed to finish a proper spec. I think most of the functionality and changes are clear, certainly once we have relations encoded in some form in the dataset dict the DCAT serializer can expose them as the DCAT spec mandates (and covnersely the parser used in the harvesters can create dataset dicts with the necessary fields.

What I'm still not convinced of is on how to store the relation information internally, namely:

  1. Store it as dataset fields
  2. Store in a dedicated table and include the information somehow in the dataset dict (essentially resurrect the old package relationships feature Fix, document and rewrite tests for dataset relationships (or remove) ckan#4212)

Having a dedicated custom field per relation (e.g. source, has_part, etc) that stores multiple string values (either a dataset id or an external URI) is tempting because of its simplicity but I think it might make difficult to extend the feature in the future. Things like generate the inverse properties, update all linked datasets when a dataset is deleted, etc

On the other hand having a table is obviously more complex as we need to deal with model setup, some CRUD actions for relations, etc but we have tons of examples of similar features so it shouldn't take long to develop. It might be difficult to support the case of linking to an external URI though.

@wardi @smotornyuk do you have any thoughts here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants