The Geological Survey of Queensland (GSQ) publishes vocabularies - a way to describe things and the relationship between things.
A vocabulary is a set of agreed terms:
- In GSQ, a vocabulary defines the terms used to describe and represent things in the domain of science and data management.
- Vocabularies align information within a business area or across systems.
- Vocabularies can be very complex (with thousands of terms) or very simple (describing one or two concepts only).
Read Why Vocabularies? and more subjects in the Vocabularies Wiki.
Fig. 1: Vocabulary context diagram
- We use tools such as Vocbench or Excel to create the vocabulary using SKOS Simple Knowledge Organization System. See also the SKOS Primer for the basics.
- The native format for a vocabulary is a TTL (turtle) file. This file contains RDF triples - subject > predicate > object statements.
- We use Github (where you are now) to store and manage versions of vocabulary TTL files. Github also provides workflow functionality to approve vocabularies. Read the Github getting started guide
- We import the TTL files into GraphDB to create a triple store. GraphDB lets us query the triples.
- VocPrez presents our vocabs on the web for people and computers to read. VocPrez pulls the triples from GraphDB to create a cache of the vocabularies.
- CKAN drop-down form fields pull their values from VocPrez. This ensures that the attributes uses to describe a dataset comes from the controlled vocabulary.
Fig. 2: Vocabulary build and pull process
- Select the vocabulary editor of your choice.
- Create the vocabulary using the SKOS Simple Knowledge Organization System. See also the SKOS Primer for the basics. NOTE: Always first check if there is an international or national vocabulary (see below for links).
a. Use Vocbench to create the vocabulary.
b. Use the Excel template to create the vocabulary - download Excel SKOS Vocabulary Builder.
c. Edit the vocab TTL file in Visual Studio Code. Use the extension Language Support for RDF related language syntax for formatting support. - Export the vocabulary to a TTL file. If using Vocbench, it is easier to export the TTL from the Build repository in GraphDB. Follow the instructions here.
- Validate the TTL file using the online Skosify tool. Tick the checkboxes Keep skos:related relationships within the same hierarchy and Include skos:narrower relations in output
- Import the TTL file into a development branch in Github. Name your branch dev-yourGithubusername. See how-to instructions here.
- When you're ready to publish your vocabulary into Test, submit a Pull Request to the TEST branch. See how-to instructions here.
- A member of the Data Integrity Team will review your vocabulary and either Approve or Request Changes. See how-to instructions here.
- If the pull request is approved, the vocab will now be in the TEST branch.
- Import the vocabulary TTL file into the Core Repository in the Test Graph DB https://test.graphdb.gsq.digital using the instructions here.
- Restart the Test VocPrez to refresh the VocPrez cache (we will automate this step).
- The vocabulary is now published in the Test VocPrez at https://test.vocabs.gsq.digital
See the instructions at Vocabulary Review Process
- Follow the PID URI Allocations process detailed on the Linked Data Working Group webpage.
- Perform a Pull Request from the DEV branch in Github to the MASTER branch.
- A member of the Data Integrity Team will review your vocabulary and either Approve or Request Changes.
- Import the vocabulary TTL file into the Core Repository in the Production Graph DB https://graphdb.gsq.digital using the instructions here.
- Restart the Production VocPrez to refresh the VocPrez cache (we will automate this step).
- The vocabulary is now published in the Production VocPrez at https://vocabs.gsq.digital. Please note that the vocab will not display in VocPrez until the URI registration at http://linked.data.gov.au is approved.
- Research Vocabularies Australia https://vocabs.ands.org.au/
- Basel Register of Thesauri, Ontologies & Classifications https://bartoc.org/
- Best practice in formalizing a SKOS vocabulary https://confluence.csiro.au/public/VOCAB/vocabulary-services/publishing-vocabularies/best-practice-in-formalizing-a-skos-vocabulary
- ontologies/*.ttl - background ontologies needed for vocab inferencing
- gsq-*.ttl - GSQ vocab files
- vocabs_load.py - a Python script to load a GraphDB instance with the background ontologies and GSQ vocab files
- scripts/ - Python scripts to dump/load a GraphDB instance with these vocab files
This code repository's content are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0), the deed of which is stored in this repository here: LICENSE.
Vocabularies owner:
Mark Gordon
Geological Survey of Quensland
Department of Natural Resources, Mines and Energy
Brisbane, QLD, Australia
[email protected]
Technical contact:
Vance Kelly
Geological Survey of Quensland
Department of Natural Resources, Mines and Energy
Brisbane, QLD, Australia
[email protected]
Author:
David Crosswell
Enterprise Architect
Cross-Lateral Enterprises
https://crosslateral.com.au