diff --git a/index.html b/index.html index 66eb23aa..8d54821e 100644 --- a/index.html +++ b/index.html @@ -1104,7 +1104,7 @@
News
deeponto.onto.taxonomy
; add the structural reasoner type. (v0.8.8)A package for ontology engineering with deep learning.
News
deeponto.onto.taxonomy
; add the structural reasoner type. (v0.8.8)deeponto.align.oaei
for scripts at the sub-repository OAEI-Bio-ML as well as bug fixing. (v0.8.4)deeponto.onto.OntologyNormaliser
and deeponto.onto.OntologyProjector
(v0.8.0).deeponto.subs.bertsubs
and deeponto.onto.pruning
modules (v0.7.0).deeponto.probe.ontolama
and deeponto.onto.verbalisation
modules (v0.6.0). Check the complete changelog and FAQs. The FAQs page does not contain much information now but will be updated according to feedback.
"},{"location":"#about","title":"About","text":"\\(\\textsf{DeepOnto}\\) aims to provide building blocks for implementing deep learning models, constructing resources, and conducting evaluation for various ontology engineering purposes.
\\(\\textsf{DeepOnto}\\) relies on OWLAPI version 4 (written in Java) for ontologies.
We follow what has been implemented in mOWL that uses JPype to bridge Python and Java Virtual Machine (JVM). Please check JPype's installation page for successful JVM initialisation.
"},{"location":"#pytorch","title":"Pytorch","text":"\\(\\textsf{DeepOnto}\\) relies on Pytorch for deep learning framework.
We recommend installing Pytorch prior to installing DeepOnto following the commands listed on the Pytorch webpage. Notice that users can choose either GPU (with CUDA) or CPU version of Pytorch.
In case the most recent Pytorch version causes any incompatibility issues, use the following command (with CUDA 11.6
) known to work:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116\n
Basic usage of DeepOnto does not rely on GPUs, but for efficient deep learning model training, please make sure torch.cuda.is_available()
returns True
.
Other dependencies are specified in setup.cfg
and requirements.txt
which are supposed to be installed along with deeponto
.
# requiring Python>=3.8\npip install deeponto\n
"},{"location":"#install-from-git-repository","title":"Install from Git Repository","text":"To install the latest, probably unreleased version of deeponto, you can directly install from the repository.
pip install git+https://github.com/KRR-Oxford/DeepOnto.git\n
"},{"location":"#main-features","title":"Main Features","text":"
Figure: Illustration of DeepOnto's architecture.
"},{"location":"#ontology-processing","title":"Ontology Processing","text":"The base class of \\(\\textsf{DeepOnto}\\) is Ontology
, which serves as the main entry point for introducing the OWLAPI's features, such as accessing ontology entities, querying for ancestor/descendent (and parent/child) concepts, deleting entities, modifying axioms, and retrieving annotations. See quick usage at load an ontology. Along with these basic functionalities, several essential sub-modules are built to enhance the core module, including the following:
Ontology Reasoning (OntologyReasoner
): Each instance of \\(\\textsf{DeepOnto}\\) has a reasoner as its attribute. It is used for conducting reasoning activities, such as obtaining inferred subsumers and subsumees, as well as checking entailment and consistency.
Ontology Pruning (OntologyPruner
): This sub-module aims to incorporate pruning algorithms for extracting a sub-ontology from an input ontology. We currently implement the one proposed in [2], which introduces subsumption axioms between the asserted (atomic or complex) parents and children of the class targeted for removal.
Ontology Verbalisation (OntologyVerbaliser
): The recursive concept verbaliser proposed in [4] is implemented here, which can automatically transform a complex logical expression into a textual sentence based on entity names or labels available in the ontology. See verbalising ontology concepts.
Ontology Projection (OntologyProjector
): The projection algorithm adopted in the OWL2Vec* ontology embeddings is implemented here, which is to transform an ontology's TBox into a set of RDF triples. The relevant code is modified from the mOWL library.
Ontology Normalisation (OntologyNormaliser
): The implemented \\(\\mathcal{EL}\\) normalisation is also modified from the mOWL library, which is used to transform TBox axioms into normalised forms to support, e.g., geometric ontology embeddings.
Ontology Taxonomy (OntologyTaxonomy
): The taxonomy extracted from an ontology is a directed acyclic graph for the subsumption hierarchy, which is often used to support graph-based deep learning applications.
Individual tools and resources are implemented based on the core ontology processing module. Currently, \\(\\textsf{DeepOnto}\\) supports the following:
BERTMap [1] is a BERT-based ontology matching (OM) system originally developed in repo but is now maintained in \\(\\textsf{DeepOnto}\\). See Ontology Matching with BERTMap & BERTMapLt.
Bio-ML [2] is an OM resource that has been used in the Bio-ML track of the OAEI. See Bio-ML: A Comprehensive Documentation.
BERTSubs [3] is a system for ontology subsumption prediction. We have transformed its original experimental code into this project. See Subsumption Inference with BERTSubs.
OntoLAMA [4] is an evaluation of language model probing datasets for ontology subsumption inference. See OntoLAMA: Dataset Overview & Usage Guide for the use of the datasets and the prompt-based probing approach.
License
Copyright 2021-2023 Yuan He. Copyright 2023 Yuan He, Jiaoyan Chen. All rights reserved.
Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
"},{"location":"#citation","title":"Citation","text":"The preprint of our system paper for \\(\\textsf{DeepOnto}\\) is currently available at arxiv.
Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, and Brahmananda Sapkota. DeepOnto: A Python Package for Ontology Engineering with Deep Learning. arXiv preprint arXiv:2307.03067 (2023).
@article{he2023deeponto,\n title={DeepOnto: A Python Package for Ontology Engineering with Deep Learning},\n author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian and Allocca, Carlo and Kim, Taehun and Sapkota, Brahmananda},\n journal={arXiv preprint arXiv:2307.03067},\n year={2023}\n}\n
"},{"location":"#relevant-publications","title":"Relevant Publications","text":"Please report any bugs or queries by raising a GitHub issue or sending emails to the maintainers (Yuan He or Jiaoyan Chen) through:
first_name.last_name@cs.ox.ac.uk
"},{"location":"bertmap/","title":"Ontology Matching with BERTMap and BERTMapLt","text":"Paper
Paper for BERTMap: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).
@inproceedings{he2022bertmap,\n title={BERTMap: a BERT-based ontology alignment system},\n author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},\n booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n volume={36},\n number={5},\n pages={5684--5691},\n year={2022}\n}\n
This page gives the tutorial for \\(\\textsf{BERTMap}\\) family including the summary of the models and how to use them.
Figure 1. Pipeline illustration of BERTMap.
The ontology matching (OM) pipeline of \\(\\textsf{BERTMap}\\) consists of following steps:
(src_annotation, tgt_annotation, synonym_label)
into training and validation sets.Predict mappings for each class \\(c\\) of the source ontology \\(\\mathcal{O}\\) by:
Extend the raw predictions using an iterative algorithm based on the locality principle. To be specific, if \\(c\\) and \\(c'\\) are matched with a relatively high mapping score (\\(\\geq \\kappa\\)), then search for plausible mappings between the parents (resp. children) of \\(c\\) and the parents (resp. children) of \\(c'\\). This process is iterative because there would be new highly scored mappings at each round. Terminate mapping extension when there is no new mapping with score \\(\\geq \\kappa\\) found or it exceeds the maximum number of iterations. Note that \\(\\kappa\\) is set to \\(0.9\\) by default, as in the original paper.
Truncate the extended mappings by preserving only those with scores \\(\\geq \\lambda\\). In the original paper, \\(\\lambda\\) is supposed to be tuned on validation mappings \u2013 which are often not available. Also, \\(\\lambda\\) is not a sensitive hyperparameter in practice. Therefore, we manually set \\(\\lambda\\) to \\(0.9995\\) as a default value which usually yields a higher F1 score. Note that both \\(\\kappa\\) and \\(\\lambda\\) are made available in the configuration file.
Repair the rest of the mappings with the repair module built in LogMap (BERTMap does not focus on mapping repair). In short, a minimum set of inconsistent mappings will be removed (further improve precision).
Steps 5-8 are referred to as the global matching process which computes OM mappings from two input ontologies. \\(\\textsf{BERTMapLt}\\) is the light-weight version without BERT training and mapping refinement. The mapping filtering threshold for \\(\\textsf{BERTMapLt}\\) is \\(1.0\\) (i.e., string-matched).
In addition to the traditional OM procedure, the scoring modules of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) can be used to evaluate any class pair given their selected annotations. This is useful in ranking-based evaluation.
Warning
The \\(\\textsf{BERTMap}\\) family rely on sufficient class annotations for constructing training corpora of the BERT synonym classifier, especially under the unsupervised setting where there are no input mappings and/or external resources. It is very important to specify correct annotation properties in the configuration file.
"},{"location":"bertmap/#usage","title":"Usage","text":"To use \\(\\textsf{BERTMap}\\), a configuration file and two input ontologies to be matched should be imported.
from deeponto.onto import Ontology\nfrom deeponto.align.bertmap import BERTMapPipeline\n\nconfig_file = \"path_to_config.yaml\"\nsrc_onto_file = \"path_to_the_source_ontology.owl\" \ntgt_onto_file = \"path_to_the_target_ontology.owl\" \n\nconfig = BERTMapPipeline.load_bertmap_config(config_file)\nsrc_onto = Ontology(src_onto_file)\ntgt_onto = Ontology(tgt_onto_file)\n\nBERTMapPipeline(src_onto, tgt_onto, config)\n
The default configuration file can be loaded as:
from deeponto.align.bertmap import BERTMapPipeline, DEFAULT_CONFIG_FILE\n\nconfig = BERTMapPipeline.load_bertmap_config(DEFAULT_CONFIG_FILE)\n
The loaded configuration is a CfgNode
object supporting attribute access of dictionary keys. To customise the configuration, users can either copy the DEFAULT_CONFIG_FILE
, save it locally using BERTMapPipeline.save_bertmap_config
method, and modify it accordingly; or change it in the run time.
from deeponto.align.bertmap import BERTMapPipeline, DEFAULT_CONFIG_FILE\n\nconfig = BERTMapPipeline.load_bertmap_config(DEFAULT_CONFIG_FILE)\n\n# save the configuration file\nBERTMapPipeline.save_bertmap_config(config, \"path_to_saved_config.yaml\")\n\n# modify it in the run time\n# for example, add more annotation properties for synonyms\nconfig.annotation_property_iris.append(\"http://...\") \n
If using \\(\\textsf{BERTMap}\\) for scoring class pairs instead of global matching, disable automatic global matching and load class pairs to be scored.
from deeponto.onto import Ontology\nfrom deeponto.align.bertmap import BERTMapPipeline\n\nconfig_file = \"path_to_config.yaml\"\nsrc_onto_file = \"path_to_the_source_ontology.owl\" \ntgt_onto_file = \"path_to_the_target_ontology.owl\" \n\nconfig = BERTMapPipeline.load_bertmap_config(config_file)\nconfig.global_matching.enabled = False\nsrc_onto = Ontology(src_onto_file)\ntgt_onto = Ontology(tgt_onto_file)\n\nbertmap = BERTMapPipeline(src_onto, tgt_onto, config)\n\nclass_pairs_to_be_scored = [...] # (src_class_iri, tgt_class_iri)\nfor src_class_iri, tgt_class_iri in class_pairs_to_be_scored:\n # retrieve class annotations\n src_class_annotations = bertmap.src_annotation_index[src_class_iri]\n tgt_class_annotations = bertmap.tgt_annotation_index[tgt_class_iri]\n # the bertmap score\n bertmap_score = bertmap.mapping_predictor.bert_mapping_score(\n src_class_annotations, tgt_class_annotations\n )\n # the bertmaplt score\n bertmaplt_score = bertmap.mapping_predictor.edit_similarity_mapping_score(\n src_class_annotations, tgt_class_annotations\n )\n ...\n
Tip
The implemented \\(\\textsf{BERTMap}\\) by default searches for each source ontology class a set of possible matched target ontology classes. Because of this, it is recommended to set the source ontology as the one with a smaller number of classes for efficiency.
Note that in the original paper, the model is expected to match for both directions src2tgt
and tgt2src
, and also consider the combination of both results. However, this does not usually bring better performance and consumes significantly more time. Therefore, this feature is discarded and the users can choose which direction to match.
Warning
Occasionally, the fine-tuning loss may not be converging and the validation accuracy is not improving; in that case, set to a different random seed can usually fix the problem.
"},{"location":"bertmap/#configuration","title":"Configuration","text":"The default configuration file looks like:
model: bertmap # bertmap or bertmaplt\n\noutput_path: null # if not provided, the current path \".\" is used\n\nannotation_property_iris:\n- http://www.w3.org/2000/01/rdf-schema#label # rdfs:label\n- http://www.geneontology.org/formats/oboInOwl#hasSynonym\n- http://www.geneontology.org/formats/oboInOwl#hasExactSynonym\n- http://www.w3.org/2004/02/skos/core#exactMatch\n- http://www.ebi.ac.uk/efo/alternative_term\n- http://www.orpha.net/ORDO/Orphanet_#symbol\n- http://purl.org/sig/ont/fma/synonym\n- http://www.w3.org/2004/02/skos/core#prefLabel\n- http://www.w3.org/2004/02/skos/core#altLabel\n- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P108\n- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P90\n\n# additional corpora \nknown_mappings: null # cross-ontology corpus\nauxiliary_ontos: [] # auxiliary corpus\n\n# bert config\nbert: pretrained_path: emilyalsentzer/Bio_ClinicalBERT max_length_for_input: 128 num_epochs_for_training: 3.0\nbatch_size_for_training: 32\nbatch_size_for_prediction: 128\nresume_training: null\n\n# global matching config\nglobal_matching:\nenabled: true\nnum_raw_candidates: 200 num_best_predictions: 10 mapping_extension_threshold: 0.9 mapping_filtered_threshold: 0.9995 for_oaei: false\n
"},{"location":"bertmap/#bertmap-or-bertmaplt","title":"BERTMap or BERTMapLt","text":"config.model
By changing this parameter to bertmap
or bertmaplt
, users can switch between \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\). Note that \\(\\textsf{BERTMapLt}\\) does not use any training and mapping refinement parameters."},{"location":"bertmap/#annotation-properties","title":"Annotation Properties","text":"config.annotation_property_iris
The IRIs stored in this parameter refer to annotation properties with literal values that define the synonyms of an ontology class. Many ontology matching systems rely on synonyms for good performance, including the \\(\\textsf{BERTMap}\\) family. The default config.annotation_property_iris
are in line with the Bio-ML dataset, which will be constantly updated. Users can append or delete IRIs for specific input ontologies. Note that it is safe to specify all possible annotation properties regardless of input ontologies because the ones that are not used will be ignored.
"},{"location":"bertmap/#additional-training-data","title":"Additional Training Data","text":"The text semantics corpora by default (unsupervised setting) will consist of two intra-ontology sub-corpora built from two input ontologies (based on the specified annotation properties). To add more training data, users can opt to feed input mappings (cross-ontology sub-corpus) and/or a list of auxiliary ontologies (auxiliary sub-corpora).
config.known_mappings
Specify the path to input mapping file here; the input mapping file should be a .tsv
or .csv
file with three columns with headings: [\"SrcEntity\", \"TgtEntity\", \"Score\"]
. Each row corresponds to a triple \\((c, c', s(c, c'))\\) where \\(c\\) is a source ontology class, \\(c'\\) is a target ontology class, and \\(s(c, c')\\) is the matching score. Note that in the BERTMap context, input mapppings are assumed to be gold standard (reference) mappings with scores equal to \\(1.0\\). Regardless of scores specified in the mapping file, the scores of the input mapppings will be adjusted to \\(1.0\\) automatically. config.auxiliary_ontos
Specify a list of paths to auxiliary ontology files here. For each auxiliary ontology, a corresponding intra-ontology corpus will be created and thus produce more synonym and non-synonym samples."},{"location":"bertmap/#bert-settings","title":"BERT Settings","text":"config.bert.pretrained_path
\\(\\textsf{BERTMap}\\) uses the pre-trained Bio-Clincal BERT as specified in this parameter because it was originally applied on biomedical ontologies. For general purpose ontology matching, users can use pre-trained variants such as bert-base-uncased
. config.bert.batch_size_for_training
Batch size for BERT fine-tuning. config.bert.batch_size_for_prediction
Batch size for BERT validation and mapping prediction. Adjust these two parameters if users found an inappropriate GPU memory fit.
config.bert.resume_training
Set to true
if the BERT training process is somehow interrupted and users wish to continue training."},{"location":"bertmap/#global-matching-settings","title":"Global Matching Settings","text":"config.global_matching.enabled
As mentioned in usage, users can disable automatic global matching by setting this parameter to false
if they wish to use the mapping scoring module only. config.global_matching.num_raw_candidates
Set the number of raw candidates selected in the mapping prediction phase. config.global_matching.num_best_predictions
Set the number of best scored mappings preserved in the mapping prediction phase. The default value 10
is often more than enough. config.global_matching.mapping_extension_threshold
Set the score threshold of mappings used in the iterative mapping extension process. Higher value shortens the time but reduces the recall. config.global_matching.mapping_filtered_threshold
The score threshold of mappings preserved for final mapping refinement. config.global_matching.for_oaei
Set to false
for normal use and set to true
for the OAEI 2023 Bio-ML Track such that entities that are annotated as not used in alignment will be ignored during global matching."},{"location":"bertmap/#output-format","title":"Output Format","text":"Running \\(\\textsf{BERTMap}\\) will create a directory named bertmap
or bertmaplt
in the specified output path. The file structure of this directory is as follows:
bertmap\n\u251c\u2500\u2500 data\n\u2502 \u251c\u2500\u2500 fine-tune.data.json\n\u2502 \u2514\u2500\u2500 text-semantics.corpora.json\n\u251c\u2500\u2500 bert\n\u2502 \u251c\u2500\u2500 tensorboard\n\u2502 \u251c\u2500\u2500 checkpoint-{some_number}\n\u2502 \u2514\u2500\u2500 checkpoint-{some_number}\n\u251c\u2500\u2500 match\n\u2502 \u251c\u2500\u2500 logmap-repair\n\u2502 \u251c\u2500\u2500 raw_mappings.json\n\u2502 \u251c\u2500\u2500 repaired_mappings.tsv \n\u2502 \u251c\u2500\u2500 raw_mappings.tsv\n\u2502 \u251c\u2500\u2500 extended_mappings.tsv\n\u2502 \u2514\u2500\u2500 filtered_mappings.tsv\n\u251c\u2500\u2500 bertmap.log\n\u2514\u2500\u2500 config.yaml\n
It is worth mentioning that the match
sub-directory contains all the global matching files:
raw_mappings.tsv
The raw mapping predictions before mapping refinement. The .json
one is used internally to prevent accidental interruption. Note that bertmaplt
only produces raw mapping predictions (no mapping refinement). extended_mappings.tsv
The output mappings after applying mapping extension. filtered_mappings.tsv
The output mappings after mapping extension and threshold filtering. logmap-repair
A folder containing intermediate files needed for applying LogMap's debugger. repaired_mappings.tsv
The final output mappings after mapping repair."},{"location":"bertsubs/","title":"Subsumption Prediction with BERTSubs","text":"Paper
Paper for BERTSubs: Contextual Semantic Embeddings for Ontology Subsumption Prediction (accepted by WWW Journal in 2023).
@article{chen2023contextual,\n title={Contextual semantic embeddings for ontology subsumption prediction},\n author={Chen, Jiaoyan and He, Yuan and Geng, Yuxia and Jim{\\'e}nez-Ruiz, Ernesto and Dong, Hang and Horrocks, Ian},\n journal={World Wide Web},\n pages={1--23},\n year={2023},\n publisher={Springer}\n}\n
This page gives the tutorial for \\(\\textsf{BERTSubs}\\) including the functions, the summary of the models and usage instructions.
The current version of \\(\\textsf{BERTSubs}\\) is able to predict:
Figure 1. Pipeline illustration of BERTSubs.
The pipeline of \\(\\textsf{BERTSubs}\\) consists of following steps.
Corpus Construction: extracting a set of sentence pairs from positive and negative subsumptions from the target ontology (or ontologies), with one of the following three templates used for transforming each class into a sentence,
Model Fine-tuning: fine-tuning a language model such as BERT with the above sentence pairs.
Note that the optionally given subsumptions via a train subsumption file can also be used for fine-tuning. Please see more technical details in the paper.
"},{"location":"bertsubs/#evaluation-case-and-dataset-ontology-completion","title":"Evaluation Case and Dataset (Ontology Completion)","text":"The evaluation is implemented scripts/bertsubs_intra_evaluate.py. Download an ontology (e.g., FoodOn) and run:
python bertsubs_intra_evaluate.py --onto_file ./foodon-merged.0.4.8.owl\n
The parameter --subsumption_type can be set to 'restriction' for complex class subsumptions, and 'named_class' for named class subsumptions. Please see the programme for more parameters and their meanings.
It executes the following procedure:
The named class or complex class subsumption axioms of an ontology is partitioned into a train set, a valid set and a test set. They are saved as train, valid and test files, respectively.
The test and the valid subsumption axioms are removed from the original ontology, and a new ontology is saved.
Notice: for a named class test/valid subsumption, a set of negative candidate super classes are extracted from the ground truth super class's neighbourhood. For a complex class test/valid subsumption, a set of negative candidate super classes are randomly extracted from all the complex classes in the ontology.
"},{"location":"bertsubs/#usage","title":"Usage","text":"To run \\(\\textsf{BERTSubs}\\), a configuration file and one input ontology (or two ontologies) are mandatory. If candidate class pairs are given, a fine-tuned language model and a file with predicted scores of the candidate class pairs in the test file are output; otherwise, only the fine-grained language model is output. The test metrics (MRR and Hits@K) can also be output if the ground truth and a set of negative candidate super classes are given for the subclass of each valid/test subsumption.
The following code is for intra-ontology subsumption.
from yacs.config import CfgNode\nfrom deeponto.complete.bertsubs import BERTSubsIntraPipeline, DEFAULT_CONFIG_FILE_INTRA\nfrom deeponto.utils import load_file\nfrom deeponto.onto import Ontology\n\nconfig = CfgNode(load_file(DEFAULT_CONFIG_FILE_INTRA)) # Load default configuration file\nconfig.onto_file = './foodon.owl'\nconfig.train_subsumption_file = './train_subsumptions.csv' # optional\nconfig.valid_subsumption_file = './valid_subsumptions.csv' # optional\nconfig.test_subsumption_file = './test_subsumptions.csv' #optional\nconfig.test_type = 'evaluation' #'evaluation': calculate metrics with ground truths given in the test_subsumption_file; 'prediction': predict scores for candidate subsumptions given in test_submission_file\nconfig.subsumption_type = 'named_class' # 'named_class' or 'restriction' \nconfig.prompt.prompt_type = 'isolated' # 'isolated', 'traversal', 'path' (three templates)\n\nonto = Ontology(owl_path=config.onto_file)\nintra_pipeline = BERTSubsIntraPipeline(onto=onto, config=config)\n
The following code is for inter-ontology subsumption.
from yacs.config import CfgNode\nfrom deeponto.complete.bertsubs import BERTSubsInterPipeline, DEFAULT_CONFIG_FILE_INTER\nfrom deeponto.utils import load_file\nfrom deeponto.onto import Ontology\n\nconfig = CfgNode(load_file(DEFAULT_CONFIG_FILE_INTER)) # Load default configuration file\nconfig.src_onto_file = './helis2foodon/helis_v1.00.owl'\nconfig.tgt_onto_file = './helis2foodon/foodon-merged.0.4.8.subs.owl'\nconfig.train_subsumption_file = './helis2foodon/train_subsumptions.csv' # optional\nconfig.valid_subsumption_file = './helis2foodon/valid_subsumptions.csv' # optional\nconfig.test_subsumption_file = './helis2foodon/test_subsumptions.csv' # optional\nconfig.test_type = 'evaluation' # 'evaluation', or 'prediction'\nconfig.subsumption_type = 'named_class' # 'named_class', or 'restriction'\nconfig.prompt.prompt_type = 'path' # 'isolated', 'traversal', 'path' (three templates)\n\nsrc_onto = Ontology(owl_path=config.src_onto_file)\ntgt_onto = Ontology(owl_path=config.tgt_onto_file)\ninter_pipeline = BERTSubsInterPipeline(src_onto=src_onto, tgt_onto=tgt_onto, config=config)\n
For more details on the configuration, please see the comment in the default configuration files default_config_intra.yaml and default_config_inter.yaml.
"},{"location":"bio-ml/","title":"Bio-ML: A Comprehensive Documentation","text":"paper
Paper for Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022). It was nominated as the best resource paper candidate at ISWC 2022.
@inproceedings{he2022machine,\n title={Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching},\n author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian},\n booktitle={The Semantic Web--ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23--27, 2022, Proceedings},\n pages={575--591},\n year={2022},\n organization={Springer}\n}\n
"},{"location":"bio-ml/#overview","title":"Overview","text":"\\(\\textsf{Bio-ML}\\) is a comprehensive ontology matching (OM) dataset that includes five ontology pairs for both equivalence and subsumption ontology matching. Two of these pairs are based on the Mondo ontology, and the remaining three are based on the UMLS ontology. The construction of these datasets encompasses several steps:
Dataset Download (License: CC BY 4.0 International):
Complete Documentation: https://krr-oxford.github.io/DeepOnto/bio-ml/ (this page).
In order to derive scalable Ontology Matching (OM) pairs, the ontology pruning algorithm propoased in the \\(\\textsf{Bio-ML}\\) paper can be utilised. This algorithm is designed to trim a large-scale ontology based on certain criteria, such as involvement in a reference mapping or association with a particular semantic type (see UMLS data scripts). The primary goal of the pruning function is to discard irrelevant ontology classes whilst preserving the relevant hierarchical structure.
More specifically, for each class, denoted as \\(c\\), that needs to be removed, subsumption axioms are created between the parent and child elements of \\(c\\). This step is followed by the removal of all axioms related to the unwanted classes.
Once a list of class IRIs to be removed has been compiled, the ontology pruning can be executed using the following code:
from deeponto.onto import Ontology, OntologyPruner\n\n# Load the DOID ontology\ndoid = Ontology(\"doid.owl\")\n\n# Initialise the ontology pruner\npruner = OntologyPruner(doid)\n\n# Specify the classes to be removed\nto_be_removed_class_iris = [\n \"http://purl.obolibrary.org/obo/DOID_0060158\",\n \"http://purl.obolibrary.org/obo/DOID_9969\"\n]\n\n# Perform the pruning operation\npruner.prune(to_be_removed_class_iris)\n\n# Save the pruned ontology locally\npruner.save_onto(\"doid.pruned.owl\") \n
"},{"location":"bio-ml/#subsumption-mapping-construction","title":"Subsumption Mapping Construction","text":"Ontology Matching (OM) datasets often include equivalence matching, but not subsumption matching. However, it is feasible to create a subsumption matching task from an equivalence matching task. Given a list of reference equivalence mappings, which take the form of \\({(c, c') | c \\equiv c' }\\), one can construct reference subsumption mappings by identifying the subsumers of \\(c'\\) and producing \\({(c, c'') | c \\equiv c', c' \\sqsubseteq c'' }\\). We have developed a subsumption mapping generator for this purpose.
from deeponto.onto import Ontology\nfrom deeponto.align.mapping import SubsFromEquivMappingGenerator, ReferenceMapping\n\n# Load the NCIT and DOID ontologies\nncit = Ontology(\"ncit.owl\")\ndoid = Ontology(\"doid.owl\")\n\n# Load the equivalence mappings\nncit2doid_equiv_mappings = ReferenceMapping.read_table_mappings(\"ncit2doid_equiv_mappings.tsv\") # The headings are [\"SrcEntity\", \"TgtEntity\", \"Score\"]\n\n# Initialise the subsumption mapping generator \n# and the mapping construction is automatically done\nsubs_generator = SubsFromEquivMappingGenerator(\n ncit, doid, ncit2doid_equiv_mappings, \n subs_generation_ratio=1, delete_used_equiv_tgt_class=True\n)\n
Output:
3299/4686 are used for creating at least one subsumption mapping.\n3305 subsumption mappings are created in the end.\n
Retrieve the generated subsumption mappings with:
subs_generator.subs_from_equivs\n
Output:
[('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C9311',\n 'http://purl.obolibrary.org/obo/DOID_120',\n 1.0),\n ('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C8410',\n 'http://purl.obolibrary.org/obo/DOID_1612',\n 1.0), ...]\n
See a concrete data script for this process at OAEI-Bio-ML/data_scripts/generate_subs_maps.py
.
The subs_generation_ratio
parameter determines at most how many subsumption mappings can be generated from an equivalence mapping. The delete_used_equiv_tgt_class
determines whether or not to sabotage equivalence mappings used for creating at least one subsumption mappings. If it is set to True
, then the target side of an (used) equivalence mapping will be marked as deleted from the target ontology. Then, apply ontology pruning to the list of to-be-deleted target ontology classes:
from deeponto.onto import OntologyPruner\n\npruner = OntologyPruner(doid)\npruner.prune(subs_generator.used_equiv_tgt_class_iris)\npruner.save_onto(\"doid.subs.owl\")\n
See a concrete data script for this process at OAEI-Bio-ML/data_scripts/generate_cand_maps.py
.
Note
In the OAEI 2023 version, the target class deletion is disabled as modularisation counteracts the effects of such deletion. For more details, refer to OAEI Bio-ML 2023.
"},{"location":"bio-ml/#candidate-mapping-generation","title":"Candidate Mapping Generation","text":"To evaluate an Ontology Matching (OM) model's capacity to identify correct mappings amid a pool of challenging negative candidates, we utilise the negative candidate mapping generation algorithm as proposed in the Bio-ML paper. This algorithm uses idf_sample
to generate candidates that are textually ambiguous (i.e., with similar naming), and neighbour_sample
to generate candidates that are structurally ambiguous (e.g., siblings). The algorithm ensures that none of the reference mappings are added as negative candidates. Additionally, for subsumption cases, the algorithm carefully excludes ancestors as they are technically correct subsumptions.
Use the following Python code to perform this operation:
from deeponto.onto import Ontology\nfrom deeponto.align.mapping import NegativeCandidateMappingGenerator, ReferenceMapping\nfrom deeponto.align.bertmap import BERTMapPipeline\n\n# Load the NCIT and DOID ontologies\nncit = Ontology(\"ncit.owl\")\ndoid = Ontology(\"doid.owl\")\n\n# Load the equivalence mappings\nncit2doid_equiv_mappings = ReferenceMapping.read_table_mappings(\"ncit2doid_equiv_mappings.tsv\") # The headings are [\"SrcEntity\", \"TgtEntity\", \"Score\"]\n\n# Load default config in BERTMap\nconfig = BERTMapPipeline.load_bertmap_config()\n\n# Initialise the candidate mapping generator\ncand_generator = NegativeCandidateMappingGenerator(\n ncit, doid, ncit2doid_equiv_mappings, \n annotation_property_iris = config.annotation_property_iris, # Used for idf sample\n tokenizer=Tokenizer.from_pretrained(config.bert.pretrained_path), # Used for idf sample\n max_hops=5, # Used for neighbour sample\n for_subsumptions=False, # Set to False because the input mappings in this example are equivalence mappings\n)\n\n# Sample candidate mappings for each reference equivalence mapping\nresults = []\nfor test_map in ncit2doid_equiv_mappings:\n valid_tgts, stats = neg_gen.mixed_sample(test_map, idf=50, neighbour=50)\n print(f\"STATS for {test_map}:\\n{stats}\")\n results.append((test_map.head, test_map.tail, valid_tgts))\nresults = pd.DataFrame(results, columns=[\"SrcEntity\", \"TgtEntity\", \"TgtCandidates\"])\nresults.to_csv(result_path, sep=\"\\t\", index=False)\n
See a concrete data script for this process at OAEI-Bio-ML/data_scripts/generate_cand_maps.py
.
The process of sampling using idf scores was originally proposed in the BERTMap paper. The annotation_property_iris
parameter specifies the list of annotation properties used to extract the names or aliases of an ontology class. The tokenizer
parameter refers to a pre-trained sub-word level tokenizer used to build the inverted annotation index. These aspects are thoroughly explained in the BERTMap tutorial.
Our evaluation protocol concerns two scenarios for OM: global matching for overall assessment and local ranking for partial assessment.
"},{"location":"bio-ml/#global-matching","title":"Global Matching","text":"As an overall assessment, given a complete set of reference mappings, an OM system is expected to compute a set of true mappings and compare against the reference mappings using Precision, Recall, and F-score metrics. With \\(\\textsf{DeepOnto}\\), the evaluation can be performed using the following code.
Matching Result
Download an example of matching result file. The three columns, \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
refer to the source class IRI, the target class IRI, and the matching score.
from deeponto.align.evaluation import AlignmentEvaluator\nfrom deeponto.align.mapping import ReferenceMapping, EntityMapping\n\n# load prediction mappings and reference mappings\npreds = EntityMapping.read_table_mappings(f\"{experiment_dir}/bertmap/match/repaired_mappings.tsv\")\nrefs = ReferenceMapping.read_table_mappings(f\"{data_dir}/refs_equiv/full.tsv\")\n\n# compute the precision, recall and F-score metrics\nresults = AlignmentEvaluator.f1(preds, refs)\nprint(results)\n
The associated formulas for Precision, Recall and F-score are:
\\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}, \\ \\ R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}, \\ \\ F_1 = \\frac{2 P R}{P + R}\\]where \\(\\mathcal{M}_{pred}\\) and \\(\\mathcal{M}_{ref}\\) denote the prediction mappings and reference mappings, respectively.
Output:
{'P': 0.887, 'R': 0.879, 'F1': 0.883}\n
For the semi-supervised setting where a small set of training mappings is provided, the training set should also be loaded and set as null (neither positive nor negative) with null_reference_mappings
during evaluation:
train_refs = ReferenceMapping.read_table_mappings(f\"{data_dir}/refs_equiv/train.tsv\")\nresults = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=train_refs)\n
When null reference mappings are involved, the formulas of Precision and Recall become:
\\[P = \\frac{|(\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}) - \\mathcal{M}_{null}|}{|\\mathcal{M}_{pred} - \\mathcal{M}_{null} |}, \\ \\ R = \\frac{|(\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}) - \\mathcal{M}_{null}|}{|\\mathcal{M}_{ref} - \\mathcal{M}_{null}|}\\]As for the OAEI 2023 version, some prediction mappings could involve classes that are marked as not used in alignment. Therefore, we need to filter out those mappings before evaluation.
from deeponto.onto import Ontology\nfrom deeponto.align.oaei import *\n\n# load the source and target ontologies and \n# extract classes that are marked as not used in alignment\nsrc_onto = Ontology(\"src_onto_file\")\ntgt_onto = Ontology(\"tgt_onto_file\")\nignored_class_index = get_ignored_class_index(src_onto)\nignored_class_index.update(get_ignored_class_index(tgt_onto))\n\n# filter the prediction mappings\npreds = remove_ignored_mappings(preds, ignored_class_index)\n\n# then compute the results\nresults = AlignmentEvaluator.f1(preds, refs, ...)\n
Tip
We have encapsulated above features in the matching_eval
function in the OAEI utilities.
However,
Therefore, the ranking-based evaluation protocol is presented as follows.
"},{"location":"bio-ml/#local-ranking","title":"Local Ranking","text":"An OM system is also expected to distinguish the reference mapping among a set of candidate mappings and the performance can be reflected in Hits@K and MRR metrics.
Warning
The reference subsumption mappings are inherently incomplete, so only the ranking metrics are adopted in evaluating system performance in subsumption matching.
Ranking Result
Download an example of raw (unscored) candidate mapping file and an example of scored candidate mapping file. The \"SrcEntity\"
and \"TgtEntity\"
columns refer to the source class IRI and the target class IRI involved in a reference mapping. The \"TgtCandidates\"
column stores a sequence of tgt_cand_iri
in the unscored file and a list of tuples (tgt_cand_iri, score)
in the scored file, which can be accessed by the built-in Python function eval
.
With \\(\\textsf{DeepOnto}\\), the evaluation can be performed as follows. First, an OM system needs to assign a score to each target candidate class and save the results as a list of tuples (tgt_cand_class_iri, matching_score)
.
from deeponto.utils import read_table\nimport pandas as pd\n\ntest_candidate_mappings = read_table(\"test.cands.tsv\").values.to_list()\nranking_results = []\nfor src_ref_class, tgt_ref_class, tgt_cands in test_candidate_mappings:\n tgt_cands = eval(tgt_cands) # transform string into list or sequence\n scored_cands = []\n for tgt_cand in tgt_cands:\n # assign a score to each candidate with an OM system\n ...\n scored_cands.append((tgt_cand, matching_score))\n ranking_results.append((src_ref_class, tgt_ref_class, scored_cands))\n# save the scored candidate mappings in the same format as the original `test.cands.tsv`\npd.DataFrame(ranking_results, columns=[\"SrcEntity\", \"TgtEntity\", \"TgtCandidates\"]).to_csv(\"scored.test.cands.tsv\", sep=\"\\t\", index=False)\n
Then, the ranking evaluation results can be obtained by:
from deeponto.align.oaei import *\n\n# If `has_score` is False, assume default ranking (see tips below)\nranking_eval(\"scored.test.cands.tsv\", has_score=True, Ks=[1, 5, 10])\n
Output:
{'MRR': 0.9586373098280843,\n 'Hits@1': 0.9371951219512196,\n 'Hits@5': 0.9820121951219513,\n 'Hits@10': 0.9878048780487805}\n
The associated formulas for MRR and Hits@K are:
\\[MRR = \\sum_i^N rank_i^{-1} / N, \\ \\ Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N\\]where \\(N\\) is the number of reference mappings used for testing, \\(rank_i\\) is the relative rank of the reference mapping among its candidate mappings.
Tip
If matching scores are not available, the target candidate classes should be sorted in descending order and saved in a list, the ranking_eval
function will compute scores according to the sorted list.
Below is a table showing the data statistics for the original Bio-ML used in OAEI 2022. In the Category column, \"Disease\" indicates that the data from Mondo mainly covers disease concepts, while \"Body\", \"Pharm\", and \"Neoplas\" denote semantic types of \"Body Part, Organ, or Organ Components\", \"Pharmacologic Substance\", and \"Neoplastic Process\" in UMLS, respectively.
Note that each subsumption matching task is constructed from an equivalence matching task subject to target ontology class deletion, therefore #TgtCls (subs)
differs from #TgtCls
.
Source Task Category #SrcCls #TgtCls #TgtCls(\\(\\sqsubseteq\\)) #Ref(\\(\\equiv\\)) #Ref(\\(\\sqsubseteq\\)) Mondo OMIM-ORDO Disease 9,642 8,838 8,735 3,721 103 Mondo NCIT-DOID Disease 6,835 8,448 5,113 4,686 3,339 UMLS SNOMED-FMA Body 24,182 64,726 59,567 7,256 5,506 UMLS SNOMED-NCIT Pharm 16,045 15,250 12,462 5,803 4,225 UMLS SNOMED-NCIT Neoplas 11,271 13,956 13,790 3,804 213
The datasets, which can be downloaded from Zenodo, include Mondo.zip
and UMLS.zip
for resources constructed from Mondo and UMLS, respectively. Each .zip
file contains three folders: raw_data
, equiv_match
, and subs_match
, corresponding to the raw source ontologies, data for equivalence matching, and data for subsumption matching, respectively. The detailed file structure is illustrated in the figure below.
"},{"location":"bio-ml/#oaei-bio-ml-2023","title":"OAEI Bio-ML 2023","text":"
For the OAEI 2023 version, we implemented several updates, including:
Locality Module Enrichment: In response to the loss of ontology context due to pruning, we used the locality module technique (access the code) to enrich pruned ontologies with logical modules that provide context for existing classes. To ensure the completeness of reference mappings, the new classes added are annotated as not used in alignment with the annotation property use_in_alignment
set to false
. While these supplemental classes can be used by OM systems as auxiliary information, they can be excluded from the alignment process. Even they are considered in the final output mappings, our evaluation will ensure that they are excluded in the metric computation (see Evaluation Framework).
Simplified Task Settings: For each of the five OM pairs, we simplified the task settings to the following:
{task_name}/refs_equiv/full.tsv
is used for global matching evaluation.{task_name}/refs_equiv/test.tsv
is used for global matching evaluation.{task_name}/refs_equiv/test.cands.tsv
for local ranking evaluation.Subsumption Matching:
{task_name}/refs_subs/test.cands.tsv
. Bio-LLM: A Special Sub-Track for Large Language Models: We introduced a unique sub-track for Large Language Model (LLM)-based OM systems. We extracted small but challenging subsets from the NCIT-DOID and SNOMED-FMA (Body) datasets for this purpose (refer to OAEI Bio-LLM 2023).
Below demonstrates the data statistics for the OAEI 2023 version of Bio-ML, where the input ontologies are enriched with locality modules compared to the pruned versions used in OAEI 2022. The augmented structural and logical contexts make these ontologies more similar to their original versions without any processing (available at raw_data
). The changes compared to the previous version (see Bio-ML OAEI 2022) are reflected in the +
numbers of ontology classes.
In the Category column, \"Disease\" indicates that the Mondo data are mainly about disease concepts, while \"Body\", \"Pharm\", and \"Neoplas\" denote semantic types of \"Body Part, Organ, or Organ Components\", \"Pharmacologic Substance\", and \"Neoplastic Process\" in UMLS, respectively.
Source Task Category #SrcCls #TgtCls #Ref(\\(\\equiv\\)) #Ref(\\(\\sqsubseteq\\)) Mondo OMIM-ORDO Disease 9,648 (+6) 9,275 (+437) 3,721 103 Mondo NCIT-DOID Disease 15,762 (+8,927) 8,465 (+17) 4,686 3,339 UMLS SNOMED-FMA Body 34,418 (+10,236) 88,955 (+24,229) 7,256 5,506 UMLS SNOMED-NCIT Pharm 29,500 (+13,455) 22,136 (+6,886) 5,803 4,225 UMLS SNOMED-NCIT Neoplas 22,971 (+11,700) 20,247 (+6291) 3,804 213
The file structure for the download datasets (from Zenodo) is also simplified this year to accommodate the changes. Detailed structure is presented in the following figure.
Remarks on this figure:
refs_equiv/full.tsv
in the unsupervised setting, and on refs_equiv/test.tsv
(with refs_equiv/train.tsv
set to null reference mappings) in the semi-supervised setting. Testing of the local ranking evaluation should be performed on refs_equiv/test.cands.tsv
for both settings.refs_equiv/test.cands.tsv
and the training mapping set refs_subs/train.tsv
is optional.test.cands.tsv
file in the Bio-LLM sub-track is different from the main Bio-LM track ones. See OAEI Bio-LLM 2023 for more information and how to evaluate on it.As Large Language Models (LLMs) are trending in the AI community, we formulate a special sub-track for evaluating LLM-based OM systems. However, evaluating LLMs with the current OM datasets can be time and resource intensive. To yield insightful results prior to full implementation, we leverage two challenging subsets extracted from the NCIT-DOID and the SNOMED-FMA (Body) equivalence matching datasets.
For each original dataset, we first randomly select 50 matched class pairs from ground truth mappings, but excluding pairs that can be aligned with direct string matching (i.e., having at least one shared label) to restrict the efficacy of conventional lexical matching. Next, with a fixed source ontology class, we further select 99 negative target ontology classes, thus forming a total of 100 candidate mappings (inclusive of the ground truth mapping). This selection is guided by the sub-word inverted index-based idf scores as in the BERTMap paper (see BERTMap tutorial for more details), which are capable of producing target ontology classes lexically akin to the fixed source class. We finally randomly choose 50 source classes that do not have a matched target class according to the ground truth mappings, and create 100 candidate mappings using the inverted index for each. Therefore, each subset comprises 50 source ontology classes with a match and 50 without. Each class is associated with 100 candidate mappings, culminating in a total extraction of 10,000, i.e., (50+50)*100, class pairs.
"},{"location":"bio-ml/#evaluation","title":"Evaluation","text":""},{"location":"bio-ml/#matching","title":"Matching","text":"From all the 10,000 class pairs in a given subset, the OM system is expected to predict the true mappings among them, which can be compared against the 50 available ground truth mappings using Precision, Recall, and F-score.
We use the same formulas in the main track evaluation framework to calculate Precision, Recall, and F-score. The prediction mappings \\(\\mathcal{M}_{pred}\\) are the class pairs an OM system predicts as true mappings, and the reference mappings \\(\\mathcal{M}_{ref}\\) refers to the 50 matched pairs.
\\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}, \\ \\ R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}, \\ \\ F_1 = \\frac{2 P R}{P + R}\\]"},{"location":"bio-ml/#ranking","title":"Ranking","text":"Given that each source class is associated with 100 candidate mappings, we can compute ranking-based metrics based on their scores. Specifically, we calculate:
\\(Hits@1\\) for the 50 matched source classes, counting a hit when the top-ranked candidate mapping is a ground truth mapping. The corresponding formula is:
\\[ Hits@K = \\sum_{(c, c') \\in \\mathcal{M}_{ref}} \\mathbb{I}_{rank_{c'} \\leq K} / |\\mathcal{M}_{ref}| \\]where \\(rank_{c'}\\) is the predicted relative rank of \\(c'\\) among its candidates, \\(\\mathbb{I}_{rank_{c'} \\leq K}\\) is a binary indicator function that outputs 1 if the rank is less than or equal to \\(K\\) and outputs 0 otherwise.
The \\(MRR\\) score is also computed for these matched source classes, summing the inverses of the ground truth mappings' relative ranks among candidate mappings. The corresponding formula is:
\\[ MRR = \\sum_{(c, c') \\in \\mathcal{M}_{ref}} rank_{c'}^{-1} / |\\mathcal{M}_{ref}| \\]For the 50 unmatched source classes, we compute the rejection rate (denoted as \\(RR\\)), counting a successful rejection when all the candidate mappings are predicted as false mappings. We assign each unmatched source class with a null class \\(c_{null}\\), which refers to any target class that does not have a match with the source class, and denote this set of unreferenced mappings as \\(\\mathcal{M}_{unref}\\).
\\[ RR = \\sum_{(c, c_{null}) \\in \\mathcal{M}_{unref}} \\prod_{d \\in \\mathcal{T}_c} (1 - \\mathbb{I}_{c \\equiv d}) / |\\mathcal{M}_{unref}| \\]where \\(\\mathcal{T}_c\\) is the set of target candidate classes for \\(c\\), and \\(\\mathbb{I}_{c \\equiv d}\\) is a binary indicator that outputs 0 if the OM system predicts a false mapping between \\(c\\) and \\(d\\), and outputs 1 otherwise. The product term in this equation returns 1 if all target candidate classes are predicted as unmatched, i.e., \\(\\forall d \\in \\mathcal{T}_c.\\mathbb{I}_{c \\equiv d}=0\\).
To summarise, the Bio-LLM sub-track provides two representative OM subsets and adopts a range of evaluation metrics to gain meaningful insights from this partial assessment, thus promoting robust and efficient development of LLM-based OM systems.
"},{"location":"bio-ml/#oaei-participation","title":"OAEI Participation","text":"To participate in the OAEI track, please visit the OAEI Bio-ML website for more information, especially on the instructions of system submission or direct result submission. In the following, we present the formats of result files we expect participants to submit.
"},{"location":"bio-ml/#result-submission-format","title":"Result Submission Format","text":"For the main Bio-ML track, we expect two result files for each setting:
(1) A prediction mapping file named match.result.tsv
in the same format as the reference mapping file (e.g., task_name/refs_equiv/full.tsv
).
Matching Result
Download an example of mapping file. The three columns, \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
refer to the source class IRI, the target class IRI, and the matching score.
(2) A scored or ranked candidate mapping file named rank.result.tsv
in the same format as the test candidate mapping file (e.g., task_name/refs_equiv/test.cands.tsv
).
Ranking Result
Download an example of raw (unscored) candidate mapping file and an example of scored candidate mapping file. The \"SrcEntity\"
and \"TgtEntity\"
columns refer to the source class IRI and the target class IRI involved in a reference mapping. The \"TgtCandidates\"
column stores a sequence of tgt_cand_iri
in the unscored file and a list of tuples (tgt_cand_iri, score)
in the scored file, which can be accessed by the built-in Python function eval
.
We also accept a result file without scores and in that case we assume the list of tgt_cand_iri
has been sorted in descending order.
Note that each OM pair is accompanied with an unsupervised and a semi-supervised setting and thus separate sets of result files should be submitted. Moreover, for subsumption matching, only the ranking result file in (2) is required.
For the Bio-LLM sub-track, we expect one result file (similar to (2) but requiring a list of triples) for the task:
(3) A scored or ranked (with answers) candidate mapping file named biollm.result.tsv
in the same format as the test candidate mapping file (i.e., task_name/test.cands.tsv
).
Bio-LLM Result
Download an example of bio-llm mapping file. The \"SrcEntity\"
and \"TgtEntity\"
columns refer to the source class IRI and the target class IRI involved in a reference mapping. The \"TgtCandidates\"
column stores a sequence of a list of triples (tgt_cand_iri, score, answer)
in the scored file, which can be accessed by the built-in Python function eval
. The additional answer
values are True
or False
indicating whether the OM system predicts (src_class_iri, tgt_cand_iri)
as a true mapping.
It is important to notice that the answer
values are necessary for the matching evaluation of P, R, F-score, and the computation of rejection rate, the score
values are used for ranking evaluation of MRR and Hits@1.
deeponto.complete
.check_consistency()
at deeponto.onto.Ontology
.deeponto.onto.OntologyVerbaliser
.deeponto.subs
to deeponto.complete
.deeponto.probe.ontolama
into deeponto.complete
....
"},{"location":"changelog/#v088-2023-october","title":"v0.8.8 (2023 October)","text":""},{"location":"changelog/#added_1","title":"Added","text":"deeponto.onto.OntologyVerbaliser
.\"struct\"
(Structural Reasoner) at deeponto.onto.OntologyReasoner
.load_reasoner()
method at deeponto.onto.OntologyReasoner
for convenience of changing the reasoner type and remove reload_reasoner()
method as it is a special case of load_reasoner()
.rdflib
into the dependencies for building graph-related features.deeponto.onto.taxonomy
for building the taxonomy over ontologies and potentially other structured data.read_table_mappings()
method at deeponto.align.mapping
from using dataframe.iterrows()
to dataframe.itertuples()
which is much more efficient.deeponto.utils.process_annotation_literal()
to False
.slf4j
to warn
to prevent tons of printing at ELK (issue (#13)[https://github.com/KRR-Oxford/DeepOnto/issues/13]).deeponto.align.oaei
.reasoner_type
argument at deeponto.onto.OntologyReasoner
, now supporting hermit
(default) and elk
.get_all_axioms()
method at deeponto.onto.Ontology
. Add get_iri()
method at deeponto.onto.Ontology
.
Add new features into deeponto.onto.OntologyVerbaliser
including:
verbalise_object_property_subsumption()
for object property subsumption axioms.
verbalise_class_expression()
.verbalise_class_subsumption()
for class subsumption axioms;verbalise_class_equivalence()
for class equivalence axioms;verbalise_class_assertion()
for class assertion axioms;verbalise_relation_assertion()
for relation assertion axioms;auto-correction
option for fixing entity names.keep_iri
option for keeping entity IRIs.add_quantifier_word
option for adding quantifier words as in the Manchester syntax.
Add get_assertion_axioms()
method at deeponto.onto.Ontology
.
get_axiom_type()
method at deeponto.onto.Ontology
.owl_individuals
attribute at deeponto.onto.Ontology
.get_owl_objects()
method to be anonymous as it is only used for creating pre-processed entity index at deeponto.onto.Ontology
.get_owl_object_from_iri()
method to get_owl_object()
at deeponto.onto.Ontology
.ERROR
.set_seed()
method at deeponto.utils
..verbalise_class_expression()
method by adding an option to keep entity IRIs without verbalising them using .vocabs
at deeponto.onto.OntologyVerbaliser
.apply_lowercasing
value to False
for both .get_annotations()
and .build_annotation_index()
methods at deeponto.onto.Ontology
..get_owl_object_annotations()
to .get_annotations()
at deeponto.onto.Ontology
.use_in_alignment
annotation in BERTMap for the OAEI.deeponto.align.oaei
.read_table_mappings
method to allow None
for threshold.deeponto.onto.OntologyPruner
.f1
and MRR
method in deeponto.align.evaluation.AlignmentEvaluator
.deeponto.onto.OntologyNormaliser
.deeponto.onto.OntologyProjector
.transformers
to transformers[torch]
.lib
from mowl to direct import.get_owl_object_annotations
by adding uniqify
at the end to preserve the order.deeponto.subs.bertsubs
; its inter-ontology setting is also imported at deeponto.align.bertsubs
.deeponto.onto.OntologyPruner
as a separate module.deeponto.onto.Ontology
; if started already, skip this step.get_owl_object_annotations
at deeponto.onto.Ontology
by preserving the relative order of annotation retrieval, i.e., create set
first and use the .add()
function instead of casting the list
into set
in the end.check_deprecated
at deeponto.onto.Ontology
by adding a check for the \\(\\texttt{owl:deprecated}\\) annotation property -- if this property does not exist in the current ontology, return False
(not deprecated).remove_axiom
for removing an axiom from the ontology at deeponto.onto.Ontology
(note that the counterpart add_axiom
has already been available).check_named_entity
for checking if an entity is named at deeponto.onto.Ontology
.get_subsumption_axioms
for getting subsumption axioms subject to different entity types at deeponto.onto.Ontology
.get_asserted_complex_classes
for getting all complex classes that occur in ontology (subsumption and/or equivalence) axioms at deeponto.onto.Ontology
.get_asserted_parents
and get_asserted_children
for getting asserted parent and children for a given entity at deeponto.onto.Ontology
.check_deprecation
for checking an owl object's deprecation (annotated) at deeponto.onto.Ontology
.en_core_web_sm
download into the initialisation of OntologyVerbaliser
.deeponto.onto.Ontology
.deeponto.onto.OntologyReasoner
:super_entities_of
\\(\\rightarrow\\) get_inferred_super_entities
sub_entities_of
\\(\\rightarrow\\) get_inferred_sub_entities
deeponto.onto.Ontology
.deeponto.lama
.deeponto.onto.verbalisation
.deeponto.onto.verbalisation
.src/
layout.The code before v0.5.0 is no longer available.
"},{"location":"faqs/","title":"FAQs","text":"Q1: System compatibility?
Q2: Encountering issues with the JPype installation?
Q3: Missing system-level dependencies on Linux?
g++
and python-dev
need to be installed.paper
Paper for OntoLAMA: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).
@inproceedings{he-etal-2023-language,\n title = \"Language Model Analysis for Ontology Subsumption Inference\",\n author = \"He, Yuan and\n Chen, Jiaoyan and\n Jimenez-Ruiz, Ernesto and\n Dong, Hang and\n Horrocks, Ian\",\n booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n month = jul,\n year = \"2023\",\n address = \"Toronto, Canada\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2023.findings-acl.213\",\n doi = \"10.18653/v1/2023.findings-acl.213\",\n pages = \"3439--3453\"\n}\n
This page provides an overview of the \\(\\textsf{OntoLAMA}\\) datasets, how to use them, and the related probing approach introduced in the research paper.
"},{"location":"ontolama/#overview","title":"Overview","text":"\\(\\textsf{OntoLAMA}\\) is a set of language model (LM) probing datasets and a prompt-based probing method for ontology subsumption inference or ontology completion. The work follows the \"LMs-as-KBs\" literature but focuses on conceptualised knowledge extracted from formalised KBs such as the OWL ontologies. Specifically, the subsumption inference (SI) task is introduced and formulated in the Natural Language Inference (NLI) style, where the sub-concept and the super-concept involved in a subsumption axiom are verbalised and fitted into a template to form the premise and hypothesis, respectively. The sampled axioms are verified through ontology reasoning. The SI task is further divided into Atomic SI and Complex SI where the former involves only atomic named concepts and the latter involves both atomic and complex concepts. Real-world ontologies of different scales and domains are used for constructing OntoLAMA and in total there are four Atomic SI datasets and two Complex SI datasets.
"},{"location":"ontolama/#useful-links","title":"Useful Links","text":"Source #NamedConcepts #EquivAxioms #Dataset (Train/Dev/Test) Schema.org 894 - Atomic SI: 808/404/2,830 DOID 11,157 - Atomic SI: 90,500/11,312/11,314 FoodOn 30,995 2,383 Atomic SI: 768,486/96,060/96,062 Complex SI: 3,754/1,850/13,080 GO 43,303 11,456 Atomic SI: 772,870/96,608/96,610 Complex SI: 72,318/9,040/9,040 MNLI - - biMNLI: 235,622/26,180/12,906
"},{"location":"ontolama/#usage","title":"Usage","text":"Users have two options for accessing the OntoLAMA datasets. They can either download the datasets directly from Zenodo or use the Huggingface Datasets platform.
If using Huggingface, users should first install the dataset
package:
pip install datasets\n
Then, a dataset can be accessed by:
from datasets import load_dataset\n# dataset = load_dataset(\"krr-oxford/OntoLAMA\", dataset_name)\n# for example, loading the Complex SI dataset of Go\ndataset = load_dataset(\"krr-oxford/OntoLAMA\", \"go-complex-SI\") \n
Options of dataset_name
include:
\"bimnli\"
(from MNLI)\"schemaorg-atomic-SI\"
(from Schema.org)\"doid-atomic-SI\"
(from DOID)\"foodon-atomic-SI\"
, \"foodon-complex-SI\"
(from FoodOn)\"go-atomic-SI\"
, \"go-complex-SI\"
(from Go)After loading the dataset, a particular data split can be accessed by:
dataset[split_name] # split_name = \"train\", \"validation\", or \"test\"\n
Please refer to the Huggingface page for examples of data points and explanations of data fields.
If downloading from Zenodo, users can simply target on specific .jsonl
files.
\\(\\textsf{OntoLAMA}\\) adopts the prompt-based probing approach to examine an LM's knowledge. Specifically, it wraps the verbalised sub-concept and super-concept into a template with a masked position; the LM is expected to predict the masked token and determine whether there exists a subsumption relationship between the two concepts.
The verbalisation algorithm has been implemented as a separate ontology processing module, see verbalise ontology concepts.
To conduct probing, users can write the following code into a script, e.g., probing.py
:
from openprompt.config import get_config\nfrom deeponto.complete.ontolama import run_inference\n\nconfig, args = get_config()\n# you can then manipulate the configuration before running the inference\nconfig.learning_setting = \"few_shot\" # zero_shot, full\nconfig.manual_template.choice = 0 # using the first template in the template file\n...\n\n# run the subsumption inference\nrun_inference(config, args)\n
Then, run the script with the following command:
python probing.py --config_yaml config.yaml\n
See an example of config.yaml
at DeepOnto/scripts/ontolama/config.yaml
The template file for the SI task (two templates) is located in DeepOnto/scripts/ontolama/si_templates.txt
.
The template file for the biMNLI task (two templates) is located in DeepOnto/scripts/ontolama/nli_templates.txt
.
The label word file for both SI and biMNLI tasks is located in DeepOnto/scripts/ontolama/label_words.jsonl
.
\\(\\textsf{DeepOnto}\\) extends from the OWLAPI and implements many useful methods for ontology processing and reasoning, integrated in the base class Ontology
.
This page gives typical examples of how to use Ontology
. There are other more specific usages, please refer to the documentation by clicking Ontology
.
Ontology
can be easily loaded from a local ontology file by its path:
from deeponto.onto import Ontology\n
Importing Ontology
will require JVM memory allocation (defaults to 8g
; if nohup
is used to run the program in the backend, use nohup echo \"8g\" | python command
):
Please enter the maximum memory located to JVM: [8g]: 16g\n\n16g maximum memory allocated to JVM.\nJVM started successfully.\n
Loading an ontology from a local file:
onto = Ontology(\"path_to_ontology.owl\")\n
It also possible to choose a reasoner to be used:
onto = Ontology(\"path_to_ontology.owl\", \"hermit\")\n
Tip
For faster (but incomplete) reasoning over larger ontologies, choose a reasoner like \"elk\"
.
The most fundamental feature of Ontology
is to access entities in the ontology such as classes (or concepts) and properties (object, data, and annotation properties). To get an entity by its IRI, do the following:
from deeponto.onto import Ontology\n# e.g., load the disease ontology\ndoid = Ontology(\"doid.owl\")\n# class or property IRI as input\ndoid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\")\n
To get the asserted parents or children of a given class or property, do the following:
doid.get_asserted_parents(doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"))\ndoid.get_asserted_children(doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"))\n
To obtain the literal values (as Set[str]
) of an annotation property (such as \\(\\texttt{rdfs:label}\\)) for an entity:
# note that annotations with no language tags are deemed as in English (\"en\")\ndoid.get_annotations(\n doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"),\n annotation_property_iri='http://www.w3.org/2000/01/rdf-schema#label',\n annotation_language_tag=None,\n apply_lowercasing=False,\n normalise_identifiers=False\n)\n
Output:
{'carotenemia'}\n
To get the special entities related to top (\\(\\top\\)) and bottom (\\(\\bot\\)), for example, to get \\(\\texttt{owl:Thing}\\):
doid.OWLThing\n
"},{"location":"ontology/#ontology-reasoning","title":"Ontology Reasoning","text":"Ontology
has an important attribute .reasoner
for conducting reasoning activities. Currently, two types of reasoners are supported, i.e., HermitT and ELK.
To get the super-entities (a super-class, or a super-propety) of an entity, do the following:
doid_class = doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\")\ndoid.reasoner.get_inferred_super_entities(doid_class, direct=False) \n
Output:
['http://purl.obolibrary.org/obo/DOID_0014667',\n'http://purl.obolibrary.org/obo/DOID_0060158',\n'http://purl.obolibrary.org/obo/DOID_4']\n
The outputs are IRIs of the corresponding super-entities. direct
is a boolean value indicating whether the returned entities are parents (direct=True
) or ancestors (direct=False
).
To get the sub-entities, simply replace the method name with sub_entities_of
.
To retrieve the entailed instances of a class:
doid.reasoner.instances_of(doid_class)\n
"},{"location":"ontology/#checking-entailment","title":"Checking Entailment","text":"The implemented reasoner also supports several entailment checks for subsumption, disjointness, and so on. For example:
doid.reasoner.check_subsumption(doid_potential_sub_entity, doid_potential_super_entity)\n
"},{"location":"ontology/#feature-requests","title":"Feature Requests","text":"Should you have any feature requests (such as those commonly used in the OWLAPI), please raise a ticket in the \\(\\textsf{DeepOnto}\\) GitHub repository.
"},{"location":"verbaliser/","title":"Verbalise Ontology Concepts","text":"Verbalising concept expressions is very useful for models that take textual inputs. While the named concepts can be verbalised simply using their names (or labels), complex concepts that involve logical operators require a more sophisticated algorithm. In \\(\\textsf{DeepOnto}\\), we have implemented the recursive concept verbaliser originally proposed in the OntoLAMA paper to address the need.
Paper
The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).
@inproceedings{he-etal-2023-language,\n title = \"Language Model Analysis for Ontology Subsumption Inference\",\n author = \"He, Yuan and\n Chen, Jiaoyan and\n Jimenez-Ruiz, Ernesto and\n Dong, Hang and\n Horrocks, Ian\",\n booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n month = jul,\n year = \"2023\",\n address = \"Toronto, Canada\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2023.findings-acl.213\",\n doi = \"10.18653/v1/2023.findings-acl.213\",\n pages = \"3439--3453\"\n}\n
This rule-based verbaliser (found in OntologyVerbaliser
) first parses a complex concept expression into a sub-formula tree (with OntologySyntaxParser
). Each intermediate node within the tree represents the decomposition of a specific logical operator, while the leaf nodes are named concepts or properties. The verbaliser then recursively merges the verbalisations in a bottom-to-top manner, creating the overall textual representation of the complex concept. An example of this process is shown in the following figure:
Figure 1. Verbalising a complex concept recursively.
To use the verbaliser, do the following:
from deeponto.onto import Ontology, OntologyVerbaliser\n\n# load an ontology and init the verbaliser\nonto = Ontology(\"some_ontology_file.owl\")\nverbaliser = OntologyVerbaliser(onto)\n
To verbalise a complex concept expression:
# get complex concepts asserted in the ontology\ncomplex_concepts = list(onto.get_asserted_complex_classes())\n\n# verbalise the first complex concept\nv_concept = verbaliser.verbalise_class_expression(complex_concepts[0])\n
To verbaliser a class subsumption axiom:
# get subsumption axioms from the ontology\nsubsumption_axioms = onto.get_subsumption_axioms(entity_type=\"Classes\")\n\n# verbalise the first subsumption axiom\nv_sub, v_super = verbaliser.verbalise_class_subsumption_axiom(subsumption_axioms[0])\n
Tip
The concept verbaliser is under development to incorporate the parsing of various axiom types. Please check the existing functions of OntologyVerbaliser
for specific usage.
Notice that the verbalised result is a CfgNode
object which keeps track of the recursive process. Users can access the final verbalisation by:
result.verbal\n
Users can also manually update the vocabulary for named entities by:
verbaliser.update_entity_name(entity_iri, entity_name)\n
This is useful when the entity labels are not naturally fitted into the verbalised sentence.
Moreover, users can see the parsed sub-formula tree using:
tree = verbaliser.parser.parse(str(subsumption_axioms[0]))\ntree.render_image()\n
Note that rendering the image requires graphiviz
to be installed. Check this link for installing graphiviz
.
See an example with image at OntologySyntaxParser
.
AlignmentEvaluator()
","text":"Class that provides evaluation metrics for alignment.
Source code insrc/deeponto/align/evaluation.py
def __init__(self):\n pass\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.precision","title":"precision(prediction_mappings, reference_mappings)
staticmethod
","text":"The percentage of correct predictions.
\\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}\\] Source code insrc/deeponto/align/evaluation.py
@staticmethod\ndef precision(prediction_mappings: List[EntityMapping], reference_mappings: List[ReferenceMapping]) -> float:\nr\"\"\"The percentage of correct predictions.\n\n $$P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}$$\n \"\"\"\n preds = [p.to_tuple() for p in prediction_mappings]\n refs = [r.to_tuple() for r in reference_mappings]\n return len(set(preds).intersection(set(refs))) / len(set(preds))\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.recall","title":"recall(prediction_mappings, reference_mappings)
staticmethod
","text":"The percentage of correct retrievals.
\\[R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}\\] Source code insrc/deeponto/align/evaluation.py
@staticmethod\ndef recall(prediction_mappings: List[EntityMapping], reference_mappings: List[ReferenceMapping]) -> float:\nr\"\"\"The percentage of correct retrievals.\n\n $$R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}$$\n \"\"\"\n preds = [p.to_tuple() for p in prediction_mappings]\n refs = [r.to_tuple() for r in reference_mappings]\n return len(set(preds).intersection(set(refs))) / len(set(refs))\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.f1","title":"f1(prediction_mappings, reference_mappings, null_reference_mappings=[])
staticmethod
","text":"Compute the F1 score given the prediction and reference mappings.
\\[F_1 = \\frac{2 P R}{P + R}\\]null_reference_mappings
is an additional set whose elements should be ignored in the calculation, i.e., neither positive nor negative. Specifically, both \\(\\mathcal{M}_{pred}\\) and \\(\\mathcal{M}_{ref}\\) will substract \\(\\mathcal{M}_{null}\\) from them.
src/deeponto/align/evaluation.py
@staticmethod\ndef f1(\n prediction_mappings: List[EntityMapping],\n reference_mappings: List[ReferenceMapping],\n null_reference_mappings: List[ReferenceMapping] = [],\n):\nr\"\"\"Compute the F1 score given the prediction and reference mappings.\n\n $$F_1 = \\frac{2 P R}{P + R}$$\n\n `null_reference_mappings` is an additional set whose elements\n should be **ignored** in the calculation, i.e., **neither positive nor negative**.\n Specifically, both $\\mathcal{M}_{pred}$ and $\\mathcal{M}_{ref}$ will **substract**\n $\\mathcal{M}_{null}$ from them.\n \"\"\"\n preds = [p.to_tuple() for p in prediction_mappings]\n refs = [r.to_tuple() for r in reference_mappings]\n null_refs = [n.to_tuple() for n in null_reference_mappings]\n # elements in the {null_set} are removed from both {pred} and {ref} (ignored)\n if null_refs:\n preds = set(preds) - set(null_refs)\n refs = set(refs) - set(null_refs)\n P = len(set(preds).intersection(set(refs))) / len(set(preds))\n R = len(set(preds).intersection(set(refs))) / len(set(refs))\n F1 = 2 * P * R / (P + R)\n\n return {\"P\": round(P, 3), \"R\": round(R, 3), \"F1\": round(F1, 3)}\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.hits_at_K","title":"hits_at_K(reference_and_candidates, K)
staticmethod
","text":"Compute \\(Hits@K\\) for a list of (reference_mapping, candidate_mappings)
pair.
It is computed as the number of a reference_mapping
existed in the first \\(K\\) ranked candidate_mappings
, divided by the total number of input pairs.
src/deeponto/align/evaluation.py
@staticmethod\ndef hits_at_K(reference_and_candidates: List[Tuple[ReferenceMapping, List[EntityMapping]]], K: int):\nr\"\"\"Compute $Hits@K$ for a list of `(reference_mapping, candidate_mappings)` pair.\n\n It is computed as the number of a `reference_mapping` existed in the first $K$ ranked `candidate_mappings`,\n divided by the total number of input pairs.\n\n $$Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N$$\n \"\"\"\n n_hits = 0\n for pred, cands in reference_and_candidates:\n ordered_candidates = [c.to_tuple() for c in EntityMapping.sort_entity_mappings_by_score(cands, k=K)]\n if pred.to_tuple() in ordered_candidates:\n n_hits += 1\n return n_hits / len(reference_and_candidates)\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank","title":"mean_reciprocal_rank(reference_and_candidates)
staticmethod
","text":"Compute \\(MRR\\) for a list of (reference_mapping, candidate_mappings)
pair.
src/deeponto/align/evaluation.py
@staticmethod\ndef mean_reciprocal_rank(reference_and_candidates: List[Tuple[ReferenceMapping, List[EntityMapping]]]):\nr\"\"\"Compute $MRR$ for a list of `(reference_mapping, candidate_mappings)` pair.\n\n $$MRR = \\sum_i^N rank_i^{-1} / N$$\n \"\"\"\n sum_inverted_ranks = 0\n for pred, cands in reference_and_candidates:\n ordered_candidates = [c.to_tuple() for c in EntityMapping.sort_entity_mappings_by_score(cands)]\n if pred.to_tuple() in ordered_candidates:\n rank = ordered_candidates.index(pred.to_tuple()) + 1\n else:\n rank = math.inf\n sum_inverted_ranks += 1 / rank\n return sum_inverted_ranks / len(reference_and_candidates)\n
"},{"location":"deeponto/align/mapping/","title":"Mapping","text":""},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping","title":"EntityMapping(src_entity_iri, tgt_entity_iri, relation=DEFAULT_REL, score=0.0)
","text":"A datastructure for entity mapping.
Such entities should be named and have an IRI.
Attributes:
Name Type Descriptionsrc_entity_iri
str
The IRI of the source entity, usually its IRI if available.
tgt_entity_iri
str
The IRI of the target entity, usually its IRI if available.
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
score
float
The score that indicates the confidence of this mapping. Defaults to 0.0
.
Parameters:
Name Type Description Defaultsrc_entity_iri
str
The IRI of the source entity, usually its IRI if available.
requiredtgt_entity_iri
str
The IRI of the target entity, usually its IRI if available.
requiredrelation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
score
float
The score that indicates the confidence of this mapping. Defaults to 0.0
.
0.0
Source code in src/deeponto/align/mapping.py
def __init__(self, src_entity_iri: str, tgt_entity_iri: str, relation: str = DEFAULT_REL, score: float = 0.0):\n\"\"\"Intialise an entity mapping.\n\n Args:\n src_entity_iri (str): The IRI of the source entity, usually its IRI if available.\n tgt_entity_iri (str): The IRI of the target entity, usually its IRI if available.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n score (float, optional): The score that indicates the confidence of this mapping. Defaults to `0.0`.\n \"\"\"\n self.head = src_entity_iri\n self.tail = tgt_entity_iri\n self.relation = relation\n self.score = score\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.from_owl_objects","title":"from_owl_objects(src_entity, tgt_entity, relation=DEFAULT_REL, score=0.0)
classmethod
","text":"Create an entity mapping from two OWLObject
entities which have an IRI.
Parameters:
Name Type Description Defaultsrc_entity
OWLObject
The source entity in OWLObject
.
tgt_entity
OWLObject
The target entity in OWLObject
.
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
score
float
The score that indicates the confidence of this mapping. Defaults to 0.0
.
0.0
Returns:
Type DescriptionEntityMapping
The entity mapping created from the source and target entities.
Source code insrc/deeponto/align/mapping.py
@classmethod\ndef from_owl_objects(\n cls, src_entity: OWLObject, tgt_entity: OWLObject, relation: str = DEFAULT_REL, score: float = 0.0\n):\n\"\"\"Create an entity mapping from two `OWLObject` entities which have an IRI.\n\n Args:\n src_entity (OWLObject): The source entity in `OWLObject`.\n tgt_entity (OWLObject): The target entity in `OWLObject`.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n score (float, optional): The score that indicates the confidence of this mapping. Defaults to `0.0`.\n Returns:\n (EntityMapping): The entity mapping created from the source and target entities.\n \"\"\"\n return cls(str(src_entity.getIRI()), str(tgt_entity.getIRI()), relation, score)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.to_tuple","title":"to_tuple(with_score=False)
","text":"Transform an entity mapping (self
) to a tuple representation
Note that relation
is discarded and score
is optionally preserved).
src/deeponto/align/mapping.py
def to_tuple(self, with_score: bool = False):\n\"\"\"Transform an entity mapping (`self`) to a tuple representation\n\n Note that `relation` is discarded and `score` is optionally preserved).\n \"\"\"\n if with_score:\n return (self.head, self.tail, self.score)\n else:\n return (self.head, self.tail)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.as_tuples","title":"as_tuples(entity_mappings, with_score=False)
staticmethod
","text":"Transform a list of entity mappings to their tuple representations.
Note that relation
is discarded and score
is optionally preserved).
src/deeponto/align/mapping.py
@staticmethod\ndef as_tuples(entity_mappings: List[EntityMapping], with_score: bool = False):\n\"\"\"Transform a list of entity mappings to their tuple representations.\n\n Note that `relation` is discarded and `score` is optionally preserved).\n \"\"\"\n return [m.to_tuple(with_score=with_score) for m in entity_mappings]\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.sort_entity_mappings_by_score","title":"sort_entity_mappings_by_score(entity_mappings, k=None)
staticmethod
","text":"Sort the entity mappings in a list by their scores in descending order.
Parameters:
Name Type Description Defaultentity_mappings
List[EntityMapping]
A list entity mappings to sort.
requiredk
int
The number of top \\(k\\) scored entities preserved if specified. Defaults to None
which means to return all entity mappings.
None
Returns:
Type DescriptionList[EntityMapping]
A list of sorted entity mappings.
Source code insrc/deeponto/align/mapping.py
@staticmethod\ndef sort_entity_mappings_by_score(entity_mappings: List[EntityMapping], k: Optional[int] = None):\nr\"\"\"Sort the entity mappings in a list by their scores in descending order.\n\n Args:\n entity_mappings (List[EntityMapping]): A list entity mappings to sort.\n k (int, optional): The number of top $k$ scored entities preserved if specified. Defaults to `None` which\n means to return **all** entity mappings.\n\n Returns:\n (List[EntityMapping]): A list of sorted entity mappings.\n \"\"\"\n return list(sorted(entity_mappings, key=lambda x: x.score, reverse=True))[:k]\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.read_table_mappings","title":"read_table_mappings(table_of_mappings_file, threshold=None, relation=DEFAULT_REL, is_reference=False)
staticmethod
","text":"Read entity mappings from .csv
or .tsv
files.
Mapping Table Format
The columns of the mapping table must have the headings: \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
.
Parameters:
Name Type Description Defaulttable_of_mappings_file
str
The path to the table (.csv
or .tsv
) of mappings.
threshold
Optional[float]
Mappings with scores less than threshold
will not be loaded. Defaults to 0.0.
None
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
is_reference
bool
Whether the loaded mappings are reference mappigns; if so, threshold
is disabled and mapping scores are all set to \\(1.0\\). Defaults to False
.
False
Returns:
Type DescriptionList[EntityMapping]
A list of entity mappings loaded from the table file.
Source code insrc/deeponto/align/mapping.py
@staticmethod\ndef read_table_mappings(\n table_of_mappings_file: str,\n threshold: Optional[float] = None,\n relation: str = DEFAULT_REL,\n is_reference: bool = False,\n) -> List[EntityMapping]:\nr\"\"\"Read entity mappings from `.csv` or `.tsv` files.\n\n !!! note \"Mapping Table Format\"\n\n The columns of the mapping table must have the headings: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`.\n\n Args:\n table_of_mappings_file (str): The path to the table (`.csv` or `.tsv`) of mappings.\n threshold (Optional[float], optional): Mappings with scores less than `threshold` will not be loaded. Defaults to 0.0.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n is_reference (bool): Whether the loaded mappings are reference mappigns; if so, `threshold` is disabled and mapping scores\n are all set to $1.0$. Defaults to `False`.\n\n Returns:\n (List[EntityMapping]): A list of entity mappings loaded from the table file.\n \"\"\"\n df = read_table(table_of_mappings_file)\n entity_mappings = []\n for dp in df.itertuples():\n if is_reference:\n entity_mappings.append(ReferenceMapping(dp.SrcEntity, dp.TgtEntity, relation))\n else:\n # allow `None` for threshold\n if not threshold or dp[\"Score\"] >= threshold:\n entity_mappings.append(EntityMapping(dp.SrcEntity, dp.TgtEntity, relation, dp.Score))\n return entity_mappings\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping","title":"ReferenceMapping(src_entity_iri, tgt_entity_iri, relation=DEFAULT_REL, candidate_mappings=[])
","text":" Bases: EntityMapping
A datastructure for entity mapping that acts as a reference mapping.
A reference mapppings is a ground truth entity mapping (with \\(score = 1.0\\)) and can have several entity mappings as candidates. These candidate mappings should have the same head
(i.e., source entity) as the reference mapping.
Attributes:
Name Type Descriptionsrc_entity_iri
str
The IRI of the source entity, usually its IRI if available.
tgt_entity_iri
str
The IRI of the target entity, usually its IRI if available.
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
Parameters:
Name Type Description Defaultsrc_entity_iri
str
The IRI of the source entity, usually its IRI if available.
requiredtgt_entity_iri
str
The IRI of the target entity, usually its IRI if available.
requiredrelation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
candidate_mappings
List[EntityMapping]
A list of entity mappings that are candidates for this reference mapping. Defaults to []
.
[]
Source code in src/deeponto/align/mapping.py
def __init__(\n self,\n src_entity_iri: str,\n tgt_entity_iri: str,\n relation: str = DEFAULT_REL,\n candidate_mappings: Optional[List[EntityMapping]] = [],\n):\nr\"\"\"Intialise a reference mapping.\n\n Args:\n src_entity_iri (str): The IRI of the source entity, usually its IRI if available.\n tgt_entity_iri (str): The IRI of the target entity, usually its IRI if available.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n candidate_mappings (List[EntityMapping], optional): A list of entity mappings that are candidates for this reference mapping. Defaults to `[]`.\n \"\"\"\n super().__init__(src_entity_iri, tgt_entity_iri, relation, 1.0)\n self.candidates = []\n for candidate in candidate_mappings:\n self.add_candidate(candidate)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping.add_candidate","title":"add_candidate(candidate_mapping)
","text":"Add a candidate mapping whose relation and head entity are the same as the reference mapping's.
Source code insrc/deeponto/align/mapping.py
def add_candidate(self, candidate_mapping: EntityMapping):\n\"\"\"Add a candidate mapping whose relation and head entity are the\n same as the reference mapping's.\n \"\"\"\n if self.relation != candidate_mapping.relation:\n raise ValueError(\n f\"Expect relation of candidate mapping to be {self.relation} but got {candidate_mapping.relation}\"\n )\n if self.head != candidate_mapping.head:\n raise ValueError(\"Candidate mapping does not have the same head entity as the anchor mapping.\")\n self.candidates.append(candidate_mapping)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping.read_table_mappings","title":"read_table_mappings(table_of_mappings_file, relation=DEFAULT_REL)
staticmethod
","text":"Read reference mappings from .csv
or .tsv
files.
Mapping Table Format
The columns of the mapping table must have the headings: \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
.
Parameters:
Name Type Description Defaulttable_of_mappings_file
str
The path to the table (.csv
or .tsv
) of mappings.
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
Returns:
Type DescriptionList[ReferenceMapping]
A list of reference mappings loaded from the table file.
Source code insrc/deeponto/align/mapping.py
@staticmethod\ndef read_table_mappings(table_of_mappings_file: str, relation: str = DEFAULT_REL):\nr\"\"\"Read reference mappings from `.csv` or `.tsv` files.\n\n !!! note \"Mapping Table Format\"\n\n The columns of the mapping table must have the headings: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`.\n\n Args:\n table_of_mappings_file (str): The path to the table (`.csv` or `.tsv`) of mappings.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n\n Returns:\n (List[ReferenceMapping]): A list of reference mappings loaded from the table file.\n \"\"\"\n return EntityMapping.read_table_mappings(table_of_mappings_file, relation=relation, is_reference=True)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator","title":"SubsFromEquivMappingGenerator(src_onto, tgt_onto, equiv_mappings, subs_generation_ratio=None, delete_used_equiv_tgt_class=True)
","text":"Generating subsumption mappings from gold standard equivalence mappings.
paper
The online subsumption mapping construction algorithm is proposed in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).
This generator has an attribute delete_used_equiv_tgt_class
for determining whether or not to sabotage the equivalence mappings used to create \\(\\geq 1\\) subsumption mappings. The reason is that, if the equivalence mapping is broken, then the OM tool is expected to predict subsumption mappings directly without relying on the equivalence mappings as an intermediate.
Attributes:
Name Type Descriptionsrc_onto
Ontology
The source ontology.
tgt_onto
Ontology
The target ontology.
equiv_class_pairs
List[Tuple[str, str]]
A list of class pairs (in IRIs) that are equivalent according to the input equivalence mappings.
subs_generation_ratio
int
The maximum number of subsumption mappings generated from each equivalence mapping. Defaults to None
which means there is no limit on the number of subsumption mappings.
delete_used_equiv_tgt_class
bool
Whether to mark the target side of an equivalence mapping used for creating at least one subsumption mappings as \"deleted\". Defaults to True
.
src/deeponto/align/mapping.py
def __init__(\n self,\n src_onto: Ontology,\n tgt_onto: Ontology,\n equiv_mappings: List[ReferenceMapping],\n subs_generation_ratio: Optional[int] = None,\n delete_used_equiv_tgt_class: bool = True,\n):\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n self.equiv_class_pairs = [m.to_tuple() for m in equiv_mappings]\n self.subs_generation_ratio = subs_generation_ratio\n self.delete_used_equiv_tgt_class = delete_used_equiv_tgt_class\n\n subs_from_equivs, self.used_equiv_tgt_class_iris = self.online_construction()\n # turn into triples with scores 1.0\n self.subs_from_equivs = [(c, p, 1.0) for c, p in subs_from_equivs]\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator.online_construction","title":"online_construction()
","text":"An online algorithm for constructing subsumption mappings from gold standard equivalence mappings.
Let \\(t\\) denote the boolean value that indicates if the target class involved in an equivalence mapping will be deleted. If \\(t\\) is true, then for each equivalent class pair \\((c, c')\\), do the following:
Steps 1 and 2 ensure that target classes that have been involved in a subsumption mapping have no conflicts with target classes that have been used to create a subsumption mapping.
This algorithm is online because the construction and deletion depend on the order of the input equivalent class pairs.
Source code insrc/deeponto/align/mapping.py
def online_construction(self):\nr\"\"\"An **online** algorithm for constructing subsumption mappings from gold standard equivalence mappings.\n\n Let $t$ denote the boolean value that indicates if the target class involved in an equivalence mapping\n will be deleted. If $t$ is true, then for each equivalent class pair $(c, c')$, do the following:\n\n 1. If $c'$ has been inolved in a subsumption mapping, skip this pair as otherwise $c'$ will need to be deleted.\n 2. For each parent class of $c'$, skip it if it has been marked deleted (i.e., involved in an equivalence mapping that has been used to create a subsumption mapping).\n 3. If any subsumption mapping has been created from $(c, c')$, mark $c'$ as deleted.\n\n Steps 1 and 2 ensure that target classes that have been **involved in a subsumption mapping** have **no conflicts** with\n target classes that have been **used to create a subsumption mapping**.\n\n This algorithm is *online* because the construction and deletion depend on the order of the input equivalent class pairs.\n \"\"\"\n subs_class_pairs = []\n in_subs = defaultdict(lambda: False) # in a subsumption mapping\n used_equivs = defaultdict(lambda: False) # in a used equivalence mapping\n\n for src_class_iri, tgt_class_iri in self.equiv_class_pairs:\n\n cur_subs_pairs = []\n\n # NOTE (1) an equiv pair is skipped if the target side is marked constructed\n if self.delete_used_equiv_tgt_class and in_subs[tgt_class_iri]:\n continue\n\n # construct subsumption pairs by matching the source class and the target class's parents\n tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n # tgt_class_parent_iris = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class, direct=True)\n tgt_class_parent_iris = [str(p.getIRI()) for p in self.tgt_onto.get_asserted_parents(tgt_class, named_only=True)]\n for parent_iri in tgt_class_parent_iris:\n # skip this parent if it is marked as \"used\"\n if self.delete_used_equiv_tgt_class and used_equivs[parent_iri]:\n continue\n cur_subs_pairs.append((src_class_iri, parent_iri))\n # if successfully created, mark this parent as \"in\"\n if self.delete_used_equiv_tgt_class:\n in_subs[parent_iri] = True\n\n # mark the target class as \"used\" because it has been used for creating a subsumption mapping\n if self.delete_used_equiv_tgt_class and cur_subs_pairs:\n used_equivs[tgt_class_iri] = True\n\n if self.subs_generation_ratio and len(cur_subs_pairs) > self.subs_generation_ratio:\n cur_subs_pairs = random.sample(cur_subs_pairs, self.subs_generation_ratio)\n subs_class_pairs += cur_subs_pairs\n\n used_equiv_tgt_class_iris = None\n if self.delete_used_equiv_tgt_class:\n used_equiv_tgt_class_iris = [iri for iri, used in used_equivs.items() if used is True]\n logger.info(\n f\"{len(used_equiv_tgt_class_iris)}/{len(self.equiv_class_pairs)} are used for creating at least one subsumption mapping.\"\n )\n\n subs_class_pairs = uniqify(subs_class_pairs)\n logger.info(f\"{len(subs_class_pairs)} subsumption mappings are created in the end.\")\n\n return subs_class_pairs, used_equiv_tgt_class_iris\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator.save_subs","title":"save_subs(save_path)
","text":"Save the constructed subsumption mappings (in tuples) to a local .tsv
file.
src/deeponto/align/mapping.py
def save_subs(self, save_path: str):\n\"\"\"Save the constructed subsumption mappings (in tuples) to a local `.tsv` file.\"\"\"\n subs_df = pd.DataFrame(self.subs_from_equivs, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n subs_df.to_csv(save_path, sep=\"\\t\", index=False)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator","title":"NegativeCandidateMappingGenerator(src_onto, tgt_onto, reference_class_mappings, annotation_property_iris, tokenizer, max_hops=5, for_subsumption=False)
","text":"Generating negative candidate mappings for each gold standard mapping.
Note that the source side of the golden standard mapping is fixed, i.e., candidate mappings are generated according to the target side.
paper
The candidate mapping generation algorithm is proposed in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).
Source code insrc/deeponto/align/mapping.py
def __init__(\n self,\n src_onto: Ontology,\n tgt_onto: Ontology,\n reference_class_mappings: List[ReferenceMapping], # equivalence or subsumption\n annotation_property_iris: List[str], # for text-based candidates\n tokenizer: Tokenizer, # for text-based candidates\n max_hops: int = 5, # for graph-based candidates\n for_subsumption: bool = False, # if for subsumption, avoid adding ancestors as candidates\n):\n\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n self.reference_class_mappings = reference_class_mappings\n self.reference_class_dict = defaultdict(list) # to prevent wrongly adding negative candidates\n for m in self.reference_class_mappings:\n src_class_iri, tgt_class_iri = m.to_tuple()\n self.reference_class_dict[src_class_iri].append(tgt_class_iri)\n\n # for IDF sample\n self.tgt_annotation_index, self.annotation_property_iris = self.tgt_onto.build_annotation_index(\n annotation_property_iris, apply_lowercasing=True\n )\n self.tokenizer = tokenizer\n self.tgt_inverted_annotation_index = self.tgt_onto.build_inverted_annotation_index(\n self.tgt_annotation_index, self.tokenizer\n )\n\n # for neighbour sample\n self.max_hops = max_hops\n\n # if for subsumption, avoid adding ancestors as candidates\n self.for_subsumption = for_subsumption\n # if for subsumption, add (src_class, tgt_class_ancestor) into the reference mappings\n if self.for_subsumption:\n for m in self.reference_class_mappings:\n src_class_iri, tgt_class_iri = m.to_tuple()\n tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n tgt_class_ancestors = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class)\n for tgt_ancestor_iri in tgt_class_ancestors:\n self.reference_class_dict[src_class_iri].append(tgt_ancestor_iri)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.mixed_sample","title":"mixed_sample(reference_class_mapping, **strategy2nums)
","text":"A mixed sampling approach that combines several sampling strategies.
As introduced in the Bio-ML paper, this mixed approach guarantees that the number of samples for each strategy is either the maximum that can be sampled or the required number.
Specifically, at each sampling iteration, the number of candidates is first increased by the number of previously sampled candidates, as in the worst case, all the candidates sampled at this iteration will be duplicated with the previous.
The random sampling is used as the amending strategy, i.e., if other sampling strategies cannot retrieve the specified number of samples, then use random sampling to amend the number.
Parameters:
Name Type Description Defaultreference_class_mapping
ReferenceMapping
The reference class mapping for generating the candidate mappings.
required**strategy2nums
int
The keyword arguments that specify the expected number of candidates for each sampling strategy.
{}
Source code in src/deeponto/align/mapping.py
def mixed_sample(self, reference_class_mapping: ReferenceMapping, **strategy2nums):\n\"\"\"A mixed sampling approach that combines several sampling strategies.\n\n As introduced in the Bio-ML paper, this mixed approach guarantees that the number of samples for each\n strategy is either the **maximum that can be sampled** or the required number.\n\n Specifically, at each sampling iteration, the number of candidates is **first increased by the number of \n previously sampled candidates**, as in the worst case, all the candidates sampled at this iteration\n will be duplicated with the previous. \n\n The random sampling is used as the amending strategy, i.e., if other sampling strategies cannot retrieve\n the specified number of samples, then use random sampling to amend the number.\n\n Args:\n reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n **strategy2nums (int): The keyword arguments that specify the expected number of candidates for each\n sampling strategy.\n \"\"\"\n\n valid_tgt_candidate_iris = []\n sample_stats = defaultdict(lambda: 0)\n i = 0\n total_num_candidates = 0\n for strategy, num_canddiates in strategy2nums.items():\n i += 1\n if strategy in SAMPLING_OPTIONS:\n sampler = getattr(self, f\"{strategy}_sample\")\n # for ith iteration, the worst case is when all n_cands are duplicated\n # or should be excluded from other reference targets so we generate\n # NOTE: total_num_candidates + num_candidates + len(excluded_tgt_class_iris)\n # candidates first and prune the rest; another edge case is when sampled\n # candidates are not sufficient and we use random sample to meet n_cands\n cur_valid_tgt_candidate_iris = sampler(\n reference_class_mapping, total_num_candidates + num_canddiates\n )\n # remove the duplicated candidates (and excluded refs) and prune the tail\n cur_valid_tgt_candidate_iris = list(\n set(cur_valid_tgt_candidate_iris) - set(valid_tgt_candidate_iris)\n )[:num_canddiates]\n sample_stats[strategy] += len(cur_valid_tgt_candidate_iris)\n # use random samples for complementation if not enough\n while len(cur_valid_tgt_candidate_iris) < num_canddiates:\n amend_candidate_iris = self.random_sample(\n reference_class_mapping, num_canddiates - len(cur_valid_tgt_candidate_iris)\n )\n amend_candidate_iris = list(\n set(amend_candidate_iris)\n - set(valid_tgt_candidate_iris)\n - set(cur_valid_tgt_candidate_iris)\n )\n cur_valid_tgt_candidate_iris += amend_candidate_iris\n assert len(cur_valid_tgt_candidate_iris) == num_canddiates\n # record how many random samples to amend\n if strategy != \"random\":\n sample_stats[\"random\"] += num_canddiates - sample_stats[strategy]\n valid_tgt_candidate_iris += cur_valid_tgt_candidate_iris\n total_num_candidates += num_canddiates\n else:\n raise ValueError(f\"Invalid sampling trategy: {strategy}.\")\n assert len(valid_tgt_candidate_iris) == total_num_candidates\n\n # TODO: add the candidate mappings into the reference mapping \n\n return valid_tgt_candidate_iris, sample_stats\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.random_sample","title":"random_sample(reference_class_mapping, num_candidates)
","text":"Randomly sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\).
The sampled candidate classes will be combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).
Parameters:
Name Type Description Defaultreference_class_mapping
ReferenceMapping
The reference class mapping for generating the candidate mappings.
requirednum_candidates
int
The expected number of candidate mappings to generate.
required Source code insrc/deeponto/align/mapping.py
def random_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"**Randomly** sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$.\n\n The sampled candidate classes will be combined with the source reference class $c$ to get a set of\n candidate mappings $\\{(c, c'_{cand})\\}$.\n\n Args:\n reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n num_candidates (int): The expected number of candidate mappings to generate.\n \"\"\"\n ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n all_tgt_class_iris = set(self.tgt_onto.owl_classes.keys())\n valid_tgt_class_iris = all_tgt_class_iris - set(\n self.reference_class_dict[ref_src_class_iri]\n ) # exclude gold standards\n assert not ref_tgt_class_iri in valid_tgt_class_iris\n return random.sample(valid_tgt_class_iris, num_candidates)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.idf_sample","title":"idf_sample(reference_class_mapping, num_candidates)
","text":"Sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\) based on the \\(idf\\) scores w.r.t. the inverted annotation index (sub-word level).
Candidate classes with higher \\(idf\\) scores will be considered first, and then combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).
Parameters:
Name Type Description Defaultreference_class_mapping
ReferenceMapping
The reference class mapping for generating the candidate mappings.
requirednum_candidates
int
The expected number of candidate mappings to generate.
required Source code insrc/deeponto/align/mapping.py
def idf_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"Sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$ based on the $idf$ scores\n w.r.t. the inverted annotation index (sub-word level).\n\n Candidate classes with higher $idf$ scores will be considered first, and then combined with the source reference class $c$\n to get a set of candidate mappings $\\{(c, c'_{cand})\\}$.\n\n Args:\n reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n num_candidates (int): The expected number of candidate mappings to generate.\n \"\"\"\n ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n\n tgt_candidates = self.tgt_inverted_annotation_index.idf_select(\n self.tgt_annotation_index[ref_tgt_class_iri]\n ) # select all non-trivial candidates first\n valid_tgt_class_iris = []\n for tgt_candidate_iri, _ in tgt_candidates:\n # valid as long as it is not one of the reference target\n if tgt_candidate_iri not in self.reference_class_dict[ref_src_class_iri]:\n valid_tgt_class_iris.append(tgt_candidate_iri)\n if len(valid_tgt_class_iris) == num_candidates:\n break\n assert not ref_tgt_class_iri in valid_tgt_class_iris\n return valid_tgt_class_iris\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.neighbour_sample","title":"neighbour_sample(reference_class_mapping, num_candidates)
","text":"Sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\) based on the subsumption hierarchy.
Define one-hop as one edge derived from an asserted subsumption axiom, i.e., to the parent class or the child class. Candidates classes with nearer hops will be considered first, and then combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).
Parameters:
Name Type Description Defaultreference_class_mapping
ReferenceMapping
The reference class mapping for generating the candidate mappings.
requirednum_candidates
int
The expected number of candidate mappings to generate.
required Source code insrc/deeponto/align/mapping.py
def neighbour_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"Sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$ based on the **subsumption\n hierarchy**.\n\n Define one-hop as one edge derived from an **asserted** subsumption axiom, i.e., to the parent class or the child class.\n Candidates classes with nearer hops will be considered first, and then combined with the source reference class $c$\n to get a set of candidate mappings $\\{(c, c'_{cand})\\}$.\n\n Args:\n reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n num_candidates (int): The expected number of candidate mappings to generate.\n \"\"\"\n ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n\n valid_tgt_class_iris = set()\n cur_hop = 1\n frontier = [ref_tgt_class_iri]\n # extract from the nearest neighbours until enough candidates or max hop\n while len(valid_tgt_class_iris) < num_candidates and cur_hop <= self.max_hops:\n\n neighbours_of_cur_hop = []\n for tgt_class_iri in frontier:\n tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n parents = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class, direct=True)\n children = self.tgt_onto.reasoner.get_inferred_sub_entities(tgt_class, direct=True)\n neighbours_of_cur_hop += parents + children # used for further hop expansion\n\n valid_neighbours_of_cur_hop = set(neighbours_of_cur_hop) - set(self.reference_class_dict[ref_src_class_iri])\n # print(valid_neighbours_of_cur_hop)\n\n # NOTE if by adding neighbours of current hop the require number will be met\n # we randomly pick among them\n if len(valid_neighbours_of_cur_hop) > num_candidates - len(valid_tgt_class_iris):\n valid_neighbours_of_cur_hop = random.sample(\n valid_neighbours_of_cur_hop, num_candidates - len(valid_tgt_class_iris)\n )\n valid_tgt_class_iris.update(valid_neighbours_of_cur_hop)\n\n frontier = neighbours_of_cur_hop # update the frontier with all possible neighbors\n cur_hop += 1\n\n assert not ref_tgt_class_iri in valid_tgt_class_iris\n return list(valid_tgt_class_iris)\n
"},{"location":"deeponto/align/oaei/","title":"OAEI Utilities","text":"This page concerns utility functions used in the OAEI.
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.get_ignored_class_index","title":"get_ignored_class_index(onto)
","text":"Get an index for filtering classes that are marked as not used in alignment.
This is indicated by the special class annotation use_in_alignment
with the following IRI: http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment
src/deeponto/align/oaei.py
def get_ignored_class_index(onto: Ontology):\n\"\"\"Get an index for filtering classes that are marked as not used in alignment.\n\n This is indicated by the special class annotation `use_in_alignment` with the following IRI:\n http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\n \"\"\"\n ignored_class_index = defaultdict(lambda: False)\n for class_iri, class_obj in onto.owl_classes.items():\n use_in_alignment = onto.get_annotations(\n class_obj, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\"\n )\n if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n ignored_class_index[class_iri] = True\n return ignored_class_index\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.remove_ignored_mappings","title":"remove_ignored_mappings(mappings, ignored_class_index)
","text":"Filter prediction mappings that involve classes to be ignored.
Source code insrc/deeponto/align/oaei.py
def remove_ignored_mappings(mappings: List[EntityMapping], ignored_class_index: dict):\n\"\"\"Filter prediction mappings that involve classes to be ignored.\"\"\"\n results = []\n for m in mappings:\n if ignored_class_index[m.head] or ignored_class_index[m.tail]:\n continue\n results.append(m)\n return results\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.matching_eval","title":"matching_eval(pred_maps_file, ref_maps_file, null_ref_maps_file=None, ignored_class_index=None, pred_maps_threshold=None)
","text":"Conduct global matching evaluation for the prediction mappings against the reference mappings.
The prediction mappings are formatted the same as full.tsv
(the full reference mappings), with three columns: \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
, indicating the source class IRI, the target class IRI, and the corresponding mapping score.
An ignored_class_index
needs to be constructed for omitting prediction mappings that involve a class marked as not used in alignment.
Use the following code to obtain such index for both the source and target ontologies:
ignored_class_index = get_ignored_class_index(src_onto)\nignored_class_index.update(get_ignored_class_index(tgt_onto))\n
Source code in src/deeponto/align/oaei.py
def matching_eval(\n pred_maps_file: str,\n ref_maps_file: str,\n null_ref_maps_file: Optional[str] = None,\n ignored_class_index: Optional[dict] = None,\n pred_maps_threshold: Optional[float] = None,\n):\nr\"\"\"Conduct **global matching** evaluation for the prediction mappings against the\n reference mappings.\n\n The prediction mappings are formatted the same as `full.tsv` (the full reference mappings),\n with three columns: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`, indicating the source\n class IRI, the target class IRI, and the corresponding mapping score.\n\n An `ignored_class_index` needs to be constructed for omitting prediction mappings\n that involve a class marked as **not used in alignment**.\n\n Use the following code to obtain such index for both the source and target ontologies:\n\n ```python\n ignored_class_index = get_ignored_class_index(src_onto)\n ignored_class_index.update(get_ignored_class_index(tgt_onto))\n ```\n \"\"\"\n refs = ReferenceMapping.read_table_mappings(ref_maps_file, relation=\"=\")\n preds = EntityMapping.read_table_mappings(pred_maps_file, relation=\"=\", threshold=pred_maps_threshold)\n if ignored_class_index:\n preds = remove_ignored_mappings(preds, ignored_class_index)\n null_refs = ReferenceMapping.read_table_mappings(null_ref_maps_file, relation=\"=\") if null_ref_maps_file else []\n results = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=null_refs)\n return results\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.read_candidate_mappings","title":"read_candidate_mappings(cand_maps_file, for_biollm=False, threshold=0.0)
","text":"Load scored or already ranked candidate mappings.
The predicted candidate mappings are formatted the same as test.cands.tsv
, with three columns: \"SrcEntity\"
, \"TgtEntity\"
, and \"TgtCandidates\"
, indicating the source reference class IRI, the target reference class IRI, and a list of tuples in the form of (target_candidate_class_IRI, score)
where score
is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, \"TgtCandidates\"
refers to a list of triples in the form of (target_candidate_class_IRI, score, answer)
where the answer
is required for computing matching scores.
This method loads the candidate mappings in this format and parse them into the inputs of mean_reciprocal_rank
and [hits_at_K
][[mean_reciprocal_rank
][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].
For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of f1
.
src/deeponto/align/oaei.py
def read_candidate_mappings(cand_maps_file: str, for_biollm: bool = False, threshold: float = 0.0):\nr\"\"\"Load scored or already ranked candidate mappings.\n\n The predicted candidate mappings are formatted the same as `test.cands.tsv`, with three columns:\n `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"TgtCandidates\"`, indicating the source reference class IRI, the\n target reference class IRI, and a list of **tuples** in the form of `(target_candidate_class_IRI, score)` where\n `score` is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, `\"TgtCandidates\"`\n refers to a list of **triples** in the form of `(target_candidate_class_IRI, score, answer)` where the `answer` is\n required for computing matching scores.\n\n This method loads the candidate mappings in this format and parse them into the inputs of [`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank]\n and [`hits_at_K`][[`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].\n\n For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of [`f1`][deeponto.align.evaluation.AlignmentEvaluator.f1].\n \"\"\"\n\n all_cand_maps = read_table(cand_maps_file).values.tolist()\n cands = []\n unmatched_cands = []\n preds = [] # only used for bio-llm\n refs = [] # only used for bio-llm\n\n for src_ref_class, tgt_ref_class, tgt_cands in all_cand_maps:\n ref_map = ReferenceMapping(src_ref_class, tgt_ref_class, \"=\")\n tgt_cands = eval(tgt_cands)\n has_score = True if all([not isinstance(x, str) for x in tgt_cands]) else False\n cand_maps = []\n refs.append(ref_map) if tgt_ref_class != \"UnMatched\" else None\n if for_biollm:\n for t, s, a in tgt_cands:\n m = EntityMapping(src_ref_class, t, \"=\", s)\n cand_maps.append(m)\n if a is True and s >= threshold: # only keep first one\n preds.append(m)\n elif has_score:\n cand_maps = [EntityMapping(src_ref_class, t, \"=\", s) for t, s in tgt_cands]\n else:\n warnings.warn(\"Input candidate mappings do not have a score, assume default rank in descending order.\")\n cand_maps = [\n EntityMapping(src_ref_class, t, \"=\", (len(tgt_cands) - i) / len(tgt_cands))\n for i, t in enumerate(tgt_cands)\n ]\n cand_maps = EntityMapping.sort_entity_mappings_by_score(cand_maps)\n if for_biollm and tgt_ref_class == \"UnMatched\":\n unmatched_cands.append((ref_map, cand_maps))\n else:\n cands.append((ref_map, cand_maps))\n\n if for_biollm:\n return cands, unmatched_cands, preds, refs\n else:\n return cands\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.ranking_result_file_check","title":"ranking_result_file_check(cand_maps_file, ref_cand_maps_file)
","text":"Check if the ranking result file is formatted correctly as the original test.cands.tsv
file provided in the dataset.
src/deeponto/align/oaei.py
def ranking_result_file_check(cand_maps_file: str, ref_cand_maps_file: str):\nr\"\"\"Check if the ranking result file is formatted correctly as the original\n `test.cands.tsv` file provided in the dataset.\n \"\"\"\n formatted_cand_maps = read_candidate_mappings(cand_maps_file)\n formatted_ref_cand_maps = read_candidate_mappings(ref_cand_maps_file)\n assert len(formatted_cand_maps) == len(\n formatted_ref_cand_maps\n ), f\"Mismatched number of reference mappings: {len(formatted_cand_maps)}; should be {len(formatted_ref_cand_maps)}.\"\n for i in range(len(formatted_cand_maps)):\n anchor, cands = formatted_cand_maps[i]\n ref_anchor, ref_cands = formatted_ref_cand_maps[i]\n assert (\n anchor.to_tuple() == ref_anchor.to_tuple()\n ), f\"Mismatched reference mapping: {anchor}; should be {ref_anchor}.\"\n cands = [c.to_tuple() for c in cands]\n ref_cands = [rc.to_tuple() for rc in ref_cands]\n assert not (\n set(cands) - set(ref_cands)\n ), f\"Mismatch set of candidate mappings for the reference mapping: {anchor}.\"\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.ranking_eval","title":"ranking_eval(cand_maps_file, Ks=[1, 5, 10])
","text":"Conduct local ranking evaluation for the scored or ranked candidate mappings.
See read_candidate_mappings
for the file format and loading.
src/deeponto/align/oaei.py
def ranking_eval(cand_maps_file: str, Ks=[1, 5, 10]):\nr\"\"\"Conduct **local ranking** evaluation for the scored or ranked candidate mappings.\n\n See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.\n \"\"\"\n formatted_cand_maps = read_candidate_mappings(cand_maps_file)\n results = {\"MRR\": AlignmentEvaluator.mean_reciprocal_rank(formatted_cand_maps)}\n for K in Ks:\n results[f\"Hits@{K}\"] = AlignmentEvaluator.hits_at_K(formatted_cand_maps, K=K)\n return results\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.is_rejection","title":"is_rejection(preds, cands)
","text":"A successful rejection means none of the candidate mappings are predicted as true mappings.
Source code insrc/deeponto/align/oaei.py
def is_rejection(preds: List[EntityMapping], cands: List[EntityMapping]):\n\"\"\"A successful rejection means none of the candidate mappings are predicted as true mappings.\"\"\"\n return set([p.to_tuple() for p in preds]).intersection(set([c.to_tuple() for c in cands])) == set()\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.biollm_eval","title":"biollm_eval(cand_maps_file, Ks=[1], threshold=0.0)
","text":"Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.
See read_candidate_mappings
for the file format and loading.
src/deeponto/align/oaei.py
def biollm_eval(cand_maps_file, Ks=[1], threshold: float = 0.0):\nr\"\"\"Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.\n\n See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.\n \"\"\"\n matched_cand_maps, unmatched_cand_maps, preds, refs = read_candidate_mappings(\n cand_maps_file, for_biollm=True, threshold=threshold\n )\n\n results = AlignmentEvaluator.f1(preds, refs)\n for K in Ks:\n results[f\"Hits@{K}\"] = AlignmentEvaluator.hits_at_K(matched_cand_maps, K=K)\n results[\"MRR\"] = AlignmentEvaluator.mean_reciprocal_rank(matched_cand_maps)\n rej = 0\n for _, cs in unmatched_cand_maps:\n rej += int(is_rejection(preds, cs))\n results[\"RR\"] = rej / len(unmatched_cand_maps)\n return results\n
"},{"location":"deeponto/align/bertmap/","title":"BERTMap","text":"Paper
\\(\\textsf{BERTMap}\\) is proposed in the paper: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).
@inproceedings{he2022bertmap,\n title={BERTMap: a BERT-based ontology alignment system},\n author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},\n booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n volume={36},\n number={5},\n pages={5684--5691},\n year={2022}\n}\n
\\(\\textsf{BERTMap}\\) is a BERT-based ontology matching (OM) system consisting of following components:
\\(\\textsf{BERTMapLt}\\) is a light-weight version of \\(\\textsf{BERTMap}\\) without the BERT module and mapping refiner.
See the tutorial for \\(\\textsf{BERTMap}\\) here.
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline","title":"BERTMapPipeline(src_onto, tgt_onto, config)
","text":"Class for the whole ontology alignment pipeline of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) models.
Note
Parameters related to BERT training are None
by default. They will be constructed for \\(\\textsf{BERTMap}\\) and stay as None
for \\(\\textsf{BERTMapLt}\\).
Attributes:
Name Type Descriptionconfig
CfgNode
The configuration for BERTMap or BERTMapLt.
name
str
The name of the model, either bertmap
or bertmaplt
.
output_path
str
The path to the output directory.
src_onto
Ontology
The source ontology to be matched.
tgt_onto
Ontology
The target ontology to be matched.
annotation_property_iris
List[str]
The annotation property IRIs used for extracting synonyms and nonsynonyms.
src_annotation_index
dict
A dictionary that stores the (class_iri, class_annotations)
pairs from src_onto
according to annotation_property_iris
.
tgt_annotation_index
dict
A dictionary that stores the (class_iri, class_annotations)
pairs from tgt_onto
according to annotation_property_iris
.
known_mappings
List[ReferenceMapping]
List of known mappings for constructing the cross-ontology corpus.
auxliary_ontos
List[Ontology]
List of auxiliary ontolgoies for constructing any auxiliary corpus.
corpora
dict
A dictionary that stores the summary
of built text semantics corpora and the sampled synonyms
and nonsynonyms
.
finetune_data
dict
A dictionary that stores the training
and validation
splits of samples from corpora
.
bert
BERTSynonymClassifier
A BERT model for synonym classification and mapping prediction.
best_checkpoint
str
The path to the best BERT checkpoint which will be loaded after training.
mapping_predictor
MappingPredictor
The predictor function based on class annotations, used for global matching or mapping scoring.
Parameters:
Name Type Description Defaultsrc_onto
Ontology
The source ontology for alignment.
requiredtgt_onto
Ontology
The target ontology for alignment.
requiredconfig
CfgNode
The configuration for BERTMap or BERTMapLt.
required Source code insrc/deeponto/align/bertmap/pipeline.py
def __init__(self, src_onto: Ontology, tgt_onto: Ontology, config: CfgNode):\n\"\"\"Initialise the BERTMap or BERTMapLt model.\n\n Args:\n src_onto (Ontology): The source ontology for alignment.\n tgt_onto (Ontology): The target ontology for alignment.\n config (CfgNode): The configuration for BERTMap or BERTMapLt.\n \"\"\"\n # load the configuration and confirm model name is valid\n self.config = config\n self.name = self.config.model\n if not self.name in MODEL_OPTIONS.keys():\n raise RuntimeError(f\"`model` {self.name} in the config file is not one of the supported.\")\n\n # create the output directory, e.g., experiments/bertmap\n self.config.output_path = \".\" if not self.config.output_path else self.config.output_path\n self.config.output_path = os.path.abspath(self.config.output_path)\n self.output_path = os.path.join(self.config.output_path, self.name)\n create_path(self.output_path)\n\n # create logger and progress manager (hidden attribute) \n self.logger = create_logger(self.name, self.output_path)\n self.enlighten_manager = enlighten.get_manager()\n\n # ontology\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n self.annotation_property_iris = self.config.annotation_property_iris\n self.logger.info(f\"Load the following configurations:\\n{print_dict(self.config)}\")\n config_path = os.path.join(self.output_path, \"config.yaml\")\n self.logger.info(f\"Save the configuration file at {config_path}.\")\n self.save_bertmap_config(self.config, config_path)\n\n # build the annotation thesaurus\n self.src_annotation_index, _ = self.src_onto.build_annotation_index(self.annotation_property_iris, apply_lowercasing=True)\n self.tgt_annotation_index, _ = self.tgt_onto.build_annotation_index(self.annotation_property_iris, apply_lowercasing=True)\n if (not self.src_annotation_index) or (not self.tgt_annotation_index):\n raise RuntimeError(\"No class annotations found in input ontologies; unable to produce alignment.\")\n\n # provided mappings if any\n self.known_mappings = self.config.known_mappings\n if self.known_mappings:\n self.known_mappings = ReferenceMapping.read_table_mappings(self.known_mappings)\n\n # auxiliary ontologies if any\n self.auxiliary_ontos = self.config.auxiliary_ontos\n if self.auxiliary_ontos:\n self.auxiliary_ontos = [Ontology(ao) for ao in self.auxiliary_ontos]\n\n self.data_path = os.path.join(self.output_path, \"data\")\n # load or construct the corpora\n self.corpora_path = os.path.join(self.data_path, \"text-semantics.corpora.json\")\n self.corpora = self.load_text_semantics_corpora()\n\n # load or construct fine-tune data\n self.finetune_data_path = os.path.join(self.data_path, \"fine-tune.data.json\")\n self.finetune_data = self.load_finetune_data()\n\n # load the bert model and train\n self.bert_config = self.config.bert\n self.bert_pretrained_path = self.bert_config.pretrained_path\n self.bert_finetuned_path = os.path.join(self.output_path, \"bert\")\n self.bert_resume_training = self.bert_config.resume_training\n self.bert_synonym_classifier = None\n self.best_checkpoint = None\n if self.name == \"bertmap\":\n self.bert_synonym_classifier = self.load_bert_synonym_classifier()\n # train if the loaded classifier is not in eval mode\n if self.bert_synonym_classifier.eval_mode == False:\n self.logger.info(\n f\"Data statistics:\\n \\\n{print_dict(self.bert_synonym_classifier.data_stat)}\"\n )\n self.bert_synonym_classifier.train(self.bert_resume_training)\n # turn on eval mode after training\n self.bert_synonym_classifier.eval()\n # NOTE potential redundancy here: after training, load the best checkpoint\n self.best_checkpoint = self.load_best_checkpoint()\n if not self.best_checkpoint:\n raise RuntimeError(f\"No best checkpoint found for the BERT synonym classifier model.\")\n self.logger.info(f\"Fine-tuning finished, found best checkpoint at {self.best_checkpoint}.\")\n else:\n self.logger.info(f\"No training needed; skip BERT fine-tuning.\")\n\n # pretty progress bar tracking\n self.enlighten_status = self.enlighten_manager.status_bar(\n status_format=u'Global Matching{fill}Stage: {demo}{fill}{elapsed}',\n color='bold_underline_bright_white_on_lightslategray',\n justify=enlighten.Justify.CENTER, demo='Initializing',\n autorefresh=True, min_delta=0.5\n )\n\n # mapping predictions\n self.global_matching_config = self.config.global_matching\n\n # build ignored class index for OAEI\n self.ignored_class_index = None \n if self.global_matching_config.for_oaei:\n self.ignored_class_index = defaultdict(lambda: False)\n for src_class_iri, src_class in self.src_onto.owl_classes.items():\n use_in_alignment = self.src_onto.get_annotations(src_class, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\")\n if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n self.ignored_class_index[src_class_iri] = True\n for tgt_class_iri, tgt_class in self.tgt_onto.owl_classes.items():\n use_in_alignment = self.tgt_onto.get_annotations(tgt_class, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\")\n if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n self.ignored_class_index[tgt_class_iri] = True\n\n self.mapping_predictor = MappingPredictor(\n output_path=self.output_path,\n tokenizer_path=self.bert_config.pretrained_path,\n src_annotation_index=self.src_annotation_index,\n tgt_annotation_index=self.tgt_annotation_index,\n bert_synonym_classifier=self.bert_synonym_classifier,\n num_raw_candidates=self.global_matching_config.num_raw_candidates,\n num_best_predictions=self.global_matching_config.num_best_predictions,\n batch_size_for_prediction=self.bert_config.batch_size_for_prediction,\n logger=self.logger,\n enlighten_manager=self.enlighten_manager,\n enlighten_status=self.enlighten_status,\n ignored_class_index=self.ignored_class_index,\n )\n self.mapping_refiner = None\n\n # if global matching is disabled (potentially used for class pair scoring)\n if self.config.global_matching.enabled:\n self.mapping_predictor.mapping_prediction() # mapping prediction\n if self.name == \"bertmap\":\n self.mapping_refiner = MappingRefiner(\n output_path=self.output_path,\n src_onto=self.src_onto,\n tgt_onto=self.tgt_onto,\n mapping_predictor=self.mapping_predictor,\n mapping_extension_threshold=self.global_matching_config.mapping_extension_threshold,\n mapping_filtered_threshold=self.global_matching_config.mapping_filtered_threshold,\n logger=self.logger,\n enlighten_manager=self.enlighten_manager,\n enlighten_status=self.enlighten_status\n )\n self.mapping_refiner.mapping_extension() # mapping extension\n self.mapping_refiner.mapping_repair() # mapping repair\n self.enlighten_status.update(demo=\"Finished\") \n else:\n self.enlighten_status.update(demo=\"Skipped\") \n\n self.enlighten_status.close()\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_or_construct","title":"load_or_construct(data_file, data_name, construct_func, *args, **kwargs)
","text":"Load existing data or construct a new one.
An auxlirary function that checks the existence of a data file and loads it if it exists. Otherwise, construct new data with the input construct_func
which is supported generate a local data file.
src/deeponto/align/bertmap/pipeline.py
def load_or_construct(self, data_file: str, data_name: str, construct_func: Callable, *args, **kwargs):\n\"\"\"Load existing data or construct a new one.\n\n An auxlirary function that checks the existence of a data file and loads it if it exists.\n Otherwise, construct new data with the input `construct_func` which is supported generate\n a local data file.\n \"\"\"\n if os.path.exists(data_file):\n self.logger.info(f\"Load existing {data_name} from {data_file}.\")\n else:\n self.logger.info(f\"Construct new {data_name} and save at {data_file}.\")\n construct_func(*args, **kwargs)\n # load the data file that is supposed to be saved locally\n return load_file(data_file)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_text_semantics_corpora","title":"load_text_semantics_corpora()
","text":"Load or construct text semantics corpora.
See TextSemanticsCorpora
.
src/deeponto/align/bertmap/pipeline.py
def load_text_semantics_corpora(self):\n\"\"\"Load or construct text semantics corpora.\n\n See [`TextSemanticsCorpora`][deeponto.align.bertmap.text_semantics.TextSemanticsCorpora].\n \"\"\"\n data_name = \"text semantics corpora\"\n\n if self.name == \"bertmap\":\n\n def construct():\n corpora = TextSemanticsCorpora(\n src_onto=self.src_onto,\n tgt_onto=self.tgt_onto,\n annotation_property_iris=self.annotation_property_iris,\n class_mappings=self.known_mappings,\n auxiliary_ontos=self.auxiliary_ontos,\n )\n self.logger.info(str(corpora))\n corpora.save(self.data_path)\n\n return self.load_or_construct(self.corpora_path, data_name, construct)\n\n self.logger.info(f\"No training needed; skip the construction of {data_name}.\")\n return None\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_finetune_data","title":"load_finetune_data()
","text":"Load or construct fine-tuning data from text semantics corpora.
Steps of constructing fine-tuning data from text semantics:
src/deeponto/align/bertmap/pipeline.py
def load_finetune_data(self):\nr\"\"\"Load or construct fine-tuning data from text semantics corpora.\n\n Steps of constructing fine-tuning data from text semantics:\n\n 1. Mix synonym and nonsynonym data.\n 2. Randomly sample 90% as training samples and 10% as validation.\n \"\"\"\n data_name = \"fine-tuning data\"\n\n if self.name == \"bertmap\":\n\n def construct():\n finetune_data = dict()\n samples = self.corpora[\"synonyms\"] + self.corpora[\"nonsynonyms\"]\n random.shuffle(samples)\n split_index = int(0.9 * len(samples)) # split at 90%\n finetune_data[\"training\"] = samples[:split_index]\n finetune_data[\"validation\"] = samples[split_index:]\n save_file(finetune_data, self.finetune_data_path)\n\n return self.load_or_construct(self.finetune_data_path, data_name, construct)\n\n self.logger.info(f\"No training needed; skip the construction of {data_name}.\")\n return None\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_bert_synonym_classifier","title":"load_bert_synonym_classifier()
","text":"Load the BERT model from a pre-trained or a local checkpoint.
bert-uncased
.eval
mode for mapping predictions.self.bert_resume_training
is True
, it will be loaded from the latest saved checkpoint.src/deeponto/align/bertmap/pipeline.py
def load_bert_synonym_classifier(self):\n\"\"\"Load the BERT model from a pre-trained or a local checkpoint.\n\n - If loaded from pre-trained, it means to start training from a pre-trained model such as `bert-uncased`.\n - If loaded from local, turn on the `eval` mode for mapping predictions.\n - If `self.bert_resume_training` is `True`, it will be loaded from the latest saved checkpoint.\n \"\"\"\n checkpoint = self.load_best_checkpoint() # load the best checkpoint or nothing\n eval_mode = True\n # if no checkpoint has been found, start training from scratch OR resume training\n # no point to load the best checkpoint if resume training (will automatically search for the latest checkpoint)\n if not checkpoint or self.bert_resume_training:\n checkpoint = self.bert_pretrained_path\n eval_mode = False # since it is for training now\n\n return BERTSynonymClassifier(\n loaded_path=checkpoint,\n output_path=self.bert_finetuned_path,\n eval_mode=eval_mode,\n max_length_for_input=self.bert_config.max_length_for_input,\n num_epochs_for_training=self.bert_config.num_epochs_for_training,\n batch_size_for_training=self.bert_config.batch_size_for_training,\n batch_size_for_prediction=self.bert_config.batch_size_for_prediction,\n training_data=self.finetune_data[\"training\"],\n validation_data=self.finetune_data[\"validation\"],\n )\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_best_checkpoint","title":"load_best_checkpoint()
","text":"Find the best checkpoint by searching for trainer states in each checkpoint file.
Source code insrc/deeponto/align/bertmap/pipeline.py
def load_best_checkpoint(self) -> Optional[str]:\n\"\"\"Find the best checkpoint by searching for trainer states in each checkpoint file.\"\"\"\n best_checkpoint = -1\n\n if os.path.exists(self.bert_finetuned_path):\n for file in os.listdir(self.bert_finetuned_path):\n # load trainer states from each checkpoint file\n if file.startswith(\"checkpoint\"):\n trainer_state = load_file(\n os.path.join(self.bert_finetuned_path, file, \"trainer_state.json\")\n )\n checkpoint = int(trainer_state[\"best_model_checkpoint\"].split(\"/\")[-1].split(\"-\")[-1])\n # find the latest best checkpoint\n if checkpoint > best_checkpoint:\n best_checkpoint = checkpoint\n\n if best_checkpoint == -1:\n best_checkpoint = None\n else:\n best_checkpoint = os.path.join(self.bert_finetuned_path, f\"checkpoint-{best_checkpoint}\")\n\n return best_checkpoint\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_bertmap_config","title":"load_bertmap_config(config_file=None)
staticmethod
","text":"Load the BERTMap configuration in .yaml
. If the file is not provided, use the default configuration.
src/deeponto/align/bertmap/pipeline.py
@staticmethod\ndef load_bertmap_config(config_file: Optional[str] = None):\n\"\"\"Load the BERTMap configuration in `.yaml`. If the file\n is not provided, use the default configuration.\n \"\"\"\n if not config_file:\n config_file = DEFAULT_CONFIG_FILE\n print(f\"Use the default configuration at {DEFAULT_CONFIG_FILE}.\") \n if not config_file.endswith(\".yaml\"):\n raise RuntimeError(\"Configuration file should be in `yaml` format.\")\n return CfgNode(load_file(config_file))\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.save_bertmap_config","title":"save_bertmap_config(config, config_file)
staticmethod
","text":"Save the BERTMap configuration in .yaml
.
src/deeponto/align/bertmap/pipeline.py
@staticmethod\ndef save_bertmap_config(config: CfgNode, config_file: str):\n\"\"\"Save the BERTMap configuration in `.yaml`.\"\"\"\n with open(config_file, \"w\") as c:\n config.dump(stream=c, sort_keys=False, default_flow_style=False)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus","title":"AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)
","text":"A thesaurus class for synonyms and non-synonyms extracted from an ontology.
Some related definitions of arguments here:
synonym_group
is a set of annotation phrases that are synonymous to each other;transitivity
of synonyms means if A and B are synonymous and B and C are synonymous, then A and C are synonymous. This is achieved by a connected graph-based algorithm.synonym_pair
is a pair synonymous annotation phrase which can be extracted from the cartesian product of a synonym_group
and itself. NOTE that reflexivity and symmetry are preserved meaning that (i) every phrase A is a synonym of itself and (ii) if (A, B) is a synonym pair then (B, A) is a synonym pair, too.Attributes:
Name Type Descriptiononto
Ontology
An ontology to construct the annotation thesaurus from.
annotation_index
Dict[str, Set[str]]
An index of the class annotations with (class_iri, annotations)
pairs.
annotation_property_iris
List[str]
A list of annotation property IRIs used to extract the annotations.
average_number_of_annotations_per_class
int
The average number of (extracted) annotations per ontology class.
apply_transitivity
bool
Apply synonym transitivity to merge synonym groups or not.
synonym_groups
List[Set[str]]
The list of synonym groups extracted from the ontology according to specified annotation properties.
Parameters:
Name Type Description Defaultonto
Ontology
The input ontology to extract annotations from.
requiredannotation_property_iris
List[str]
Specify which annotation properties to be used.
requiredapply_transitivity
bool
Apply synonym transitivity to merge synonym groups or not. Defaults to False
.
False
Source code in src/deeponto/align/bertmap/text_semantics.py
def __init__(self, onto: Ontology, annotation_property_iris: List[str], apply_transitivity: bool = False):\nr\"\"\"Initialise a thesaurus for ontology class annotations.\n\n Args:\n onto (Ontology): The input ontology to extract annotations from.\n annotation_property_iris (List[str]): Specify which annotation properties to be used.\n apply_transitivity (bool, optional): Apply synonym transitivity to merge synonym groups or not. Defaults to `False`.\n \"\"\"\n\n self.onto = onto\n # build the annotation index to extract synonyms from `onto`\n # the input property iris may not exist in this ontology\n # the output property iris will be truncated to the existing ones\n index, iris = self.onto.build_annotation_index(\n annotation_property_iris=annotation_property_iris,\n entity_type=\"Classes\",\n apply_lowercasing=True,\n )\n self.annotation_index = index\n self.annotation_property_iris = iris\n total_number_of_annotations = sum([len(v) for v in self.annotation_index.values()])\n self.average_number_of_annotations_per_class = total_number_of_annotations / len(self.annotation_index)\n\n # synonym groups\n self.apply_transitivity = apply_transitivity\n self.synonym_groups = list(self.annotation_index.values())\n if self.apply_transitivity:\n self.synonym_groups = self.merge_synonym_groups_by_transitivity(self.synonym_groups)\n\n # summary\n self.info = {\n type(self).__name__: {\n \"ontology\": self.onto.info[type(self.onto).__name__],\n \"average_number_of_annotations_per_class\": round(self.average_number_of_annotations_per_class, 3),\n \"number_of_synonym_groups\": len(self.synonym_groups),\n }\n }\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.get_synonym_pairs","title":"get_synonym_pairs(synonym_group, remove_duplicates=True)
staticmethod
","text":"Get synonym pairs from a synonym group through a cartesian product.
Parameters:
Name Type Description Defaultsynonym_group
Set[str]
A set of annotation phrases that are synonymous to each other.
requiredReturns:
Type DescriptionList[Tuple[str, str]]
A list of synonym pairs.
Source code insrc/deeponto/align/bertmap/text_semantics.py
@staticmethod\ndef get_synonym_pairs(synonym_group: Set[str], remove_duplicates: bool = True):\n\"\"\"Get synonym pairs from a synonym group through a cartesian product.\n\n Args:\n synonym_group (Set[str]): A set of annotation phrases that are synonymous to each other.\n\n Returns:\n (List[Tuple[str, str]]): A list of synonym pairs.\n \"\"\"\n synonym_pairs = list(itertools.product(synonym_group, synonym_group))\n if remove_duplicates:\n return uniqify(synonym_pairs)\n else:\n return synonym_pairs\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.merge_synonym_groups_by_transitivity","title":"merge_synonym_groups_by_transitivity(synonym_groups)
staticmethod
","text":"Merge synonym groups by transitivity.
Synonym groups that share a common annotation phrase will be merged. NOTE that for multiple ontologies, we can merge their synonym groups by first concatenating them then use this function.
Note
In \\(\\textsf{BERTMap}\\) experiments we have considered this as a data augmentation approach but it does not bring a significant performance improvement. However, if the overall number of annotations is not large enough then this could be a good option.
Parameters:
Name Type Description Defaultsynonym_groups
List[Set[str]]
A sequence of synonym groups to be merged.
requiredReturns:
Type DescriptionList[Set[str]]
A list of merged synonym groups.
Source code insrc/deeponto/align/bertmap/text_semantics.py
@staticmethod\ndef merge_synonym_groups_by_transitivity(synonym_groups: List[Set[str]]):\nr\"\"\"Merge synonym groups by transitivity.\n\n Synonym groups that share a common annotation phrase will be merged. NOTE that for\n multiple ontologies, we can merge their synonym groups by first concatenating them\n then use this function.\n\n !!! note\n\n In $\\textsf{BERTMap}$ experiments we have considered this as a data augmentation approach\n but it does not bring a significant performance improvement. However, if the\n overall number of annotations is not large enough then this could be a good option.\n\n Args:\n synonym_groups (List[Set[str]]): A sequence of synonym groups to be merged.\n\n Returns:\n (List[Set[str]]): A list of merged synonym groups.\n \"\"\"\n synonym_pairs = []\n for synonym_group in synonym_groups:\n # gather synonym pairs from the self-product of a synonym group\n synonym_pairs += AnnotationThesaurus.get_synonym_pairs(synonym_group, remove_duplicates=False)\n synonym_pairs = uniqify(synonym_pairs)\n merged_grouped_synonyms = AnnotationThesaurus.connected_labels(synonym_pairs)\n return merged_grouped_synonyms\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.connected_annotations","title":"connected_annotations(synonym_pairs)
staticmethod
","text":"Build a graph for adjacency among the class annotations (labels) such that the transitivity of synonyms is ensured.
Auxiliary function for merge_synonym_groups_by_transitivity
.
Parameters:
Name Type Description Defaultsynonym_pairs
List[Tuple[str, str]]
List of pairs of phrases that are synonymous.
requiredReturns:
Type DescriptionList[Set[str]]
A list of synonym groups.
Source code insrc/deeponto/align/bertmap/text_semantics.py
@staticmethod\ndef connected_annotations(synonym_pairs: List[Tuple[str, str]]):\n\"\"\"Build a graph for adjacency among the class annotations (labels) such that\n the **transitivity** of synonyms is ensured.\n\n Auxiliary function for [`merge_synonym_groups_by_transitivity`][deeponto.align.bertmap.text_semantics.AnnotationThesaurus.merge_synonym_groups_by_transitivity].\n\n Args:\n synonym_pairs (List[Tuple[str, str]]): List of pairs of phrases that are synonymous.\n\n Returns:\n (List[Set[str]]): A list of synonym groups.\n \"\"\"\n graph = nx.Graph()\n graph.add_edges_from(synonym_pairs)\n # nx.draw(G, with_labels = True)\n connected = list(nx.connected_components(graph))\n return connected\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.synonym_sampling","title":"synonym_sampling(num_samples=None)
","text":"Sample synonym pairs from a list of synonym groups extracted from the input ontology.
According to the \\(\\textsf{BERTMap}\\) paper, synonyms are defined as label pairs that belong to the same ontology class.
NOTE this has been validated for getting the same results as in the original \\(\\textsf{BERTMap}\\) repository.
Parameters:
Name Type Description Defaultnum_samples
int
The (maximum) number of unique samples extracted. Defaults to None
.
None
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique synonym pair samples.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def synonym_sampling(self, num_samples: Optional[int] = None):\nr\"\"\"Sample synonym pairs from a list of synonym groups extracted from the input ontology.\n\n According to the $\\textsf{BERTMap}$ paper, **synonyms** are defined as label pairs that belong\n to the same ontology class.\n\n NOTE this has been validated for getting the same results as in the original $\\textsf{BERTMap}$ repository.\n\n Args:\n num_samples (int, optional): The (maximum) number of **unique** samples extracted. Defaults to `None`.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique synonym pair samples.\n \"\"\"\n synonym_pool = []\n for synonym_group in self.synonym_groups:\n # do not remove duplicates in the loop to save time\n synonym_pairs = self.get_synonym_pairs(synonym_group, remove_duplicates=False)\n synonym_pool += synonym_pairs\n # remove duplicates afer the loop\n synonym_pool = uniqify(synonym_pool)\n\n if (not num_samples) or (num_samples >= len(synonym_pool)):\n # print(\"Return all synonym pairs without downsampling.\")\n return synonym_pool\n else:\n return random.sample(synonym_pool, num_samples)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.soft_nonsynonym_sampling","title":"soft_nonsynonym_sampling(num_samples, max_iter=5)
","text":"Sample soft non-synonyms from a list of synonym groups extracted from the input ontology.
According to the \\(\\textsf{BERTMap}\\) paper, soft non-synonyms are defined as label pairs from two different synonym groups that are randomly selected.
Parameters:
Name Type Description Defaultnum_samples
int
The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups).
requiredmax_iter
int
The maximum number of iterations for conducting sampling. Defaults to 5
.
5
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique (soft) non-synonym pair samples.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def soft_nonsynonym_sampling(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample **soft** non-synonyms from a list of synonym groups extracted from the input ontology.\n\n According to the $\\textsf{BERTMap}$ paper, **soft non-synonyms** are defined as label pairs\n from two *different* synonym groups that are **randomly** selected.\n\n Args:\n num_samples (int): The (maximum) number of **unique** samples extracted; this is\n required **unlike for synonym sampling** because the non-synonym pool is **significantly\n larger** (considering random combinations of different synonym groups).\n max_iter (int): The maximum number of iterations for conducting sampling. Defaults to `5`.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique (soft) non-synonym pair samples.\n \"\"\"\n nonsyonym_pool = []\n # randomly select disjoint synonym group pairs from all\n for _ in range(num_samples):\n left_synonym_group, right_synonym_group = tuple(random.sample(self.synonym_groups, 2))\n try:\n # randomly choose one label from a synonym group\n left_label = random.choice(list(left_synonym_group))\n right_label = random.choice(list(right_synonym_group))\n nonsyonym_pool.append((left_label, right_label))\n except:\n # skip if there are no class labels\n continue\n\n # DataUtils.uniqify is too slow so we should avoid operating it too often\n nonsyonym_pool = uniqify(nonsyonym_pool)\n\n while len(nonsyonym_pool) < num_samples and max_iter > 0:\n max_iter = max_iter - 1 # reduce the iteration to prevent exhausting loop\n nonsyonym_pool += self.soft_nonsynonym_sampling(num_samples - len(nonsyonym_pool), max_iter)\n nonsyonym_pool = uniqify(nonsyonym_pool)\n\n return nonsyonym_pool\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.weighted_random_choices_of_sibling_groups","title":"weighted_random_choices_of_sibling_groups(k=1)
","text":"Randomly (weighted) select a number of sibling class groups.
The weights are computed according to the sizes of the sibling class groups.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def weighted_random_choices_of_sibling_groups(self, k: int = 1):\n\"\"\"Randomly (weighted) select a number of sibling class groups.\n\n The weights are computed according to the sizes of the sibling class groups.\n \"\"\"\n weights = [len(s) for s in self.onto.sibling_class_groups]\n weights = [w / sum(weights) for w in weights] # normalised\n return random.choices(self.onto.sibling_class_groups, weights=weights, k=k)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.hard_nonsynonym_sampling","title":"hard_nonsynonym_sampling(num_samples, max_iter=5)
","text":"Sample hard non-synonyms from sibling classes of the input ontology.
According to the \\(\\textsf{BERTMap}\\) paper, hard non-synonyms are defined as label pairs that belong to two disjoint ontology classes. For practical reason, the condition is eased to two sibling ontology classes.
Parameters:
Name Type Description Defaultnum_samples
int
The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups).
requiredmax_iter
int
The maximum number of iterations for conducting sampling. Defaults to 5
.
5
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique (hard) non-synonym pair samples.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def hard_nonsynonym_sampling(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample **hard** non-synonyms from sibling classes of the input ontology.\n\n According to the $\\textsf{BERTMap}$ paper, **hard non-synonyms** are defined as label pairs\n that belong to two **disjoint** ontology classes. For practical reason, the condition\n is eased to two **sibling** ontology classes.\n\n Args:\n num_samples (int): The (maximum) number of **unique** samples extracted; this is\n required **unlike for synonym sampling** because the non-synonym pool is **significantly\n larger** (considering random combinations of different synonym groups).\n max_iter (int): The maximum number of iterations for conducting sampling. Defaults to `5`.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique (hard) non-synonym pair samples.\n \"\"\"\n # intialise the sibling class groups\n self.onto.sibling_class_groups\n\n if not self.onto.sibling_class_groups:\n warnings.warn(\"Skip hard negative sampling as no sibling class groups are defined.\")\n return []\n\n # flatten the disjointness groups into all pairs of hard neagtives\n nonsynonym_pool = []\n # randomly (weighted) select a number of sibling class groups with replacement\n sibling_class_groups = self.weighted_random_choices_of_sibling_groups(k=num_samples)\n\n for sibling_class_group in sibling_class_groups:\n # random select two sibling classes; no weights this time\n left_class_iri, right_class_iri = tuple(random.sample(sibling_class_group, 2))\n try:\n # random select a label for each of them\n left_label = random.choice(list(self.annotation_index[left_class_iri]))\n right_label = random.choice(list(self.annotation_index[right_class_iri]))\n # add the label pair to the pool\n nonsynonym_pool.append((left_label, right_label))\n except:\n # skip them if there are no class labels\n continue\n\n # DataUtils.uniqify is too slow so we should avoid operating it too often\n nonsynonym_pool = uniqify(nonsynonym_pool)\n\n while len(nonsynonym_pool) < num_samples and max_iter > 0:\n max_iter = max_iter - 1 # reduce the iteration to prevent exhausting loop\n nonsynonym_pool += self.hard_nonsynonym_sampling(num_samples - len(nonsynonym_pool), max_iter)\n nonsynonym_pool = uniqify(nonsynonym_pool)\n\n return nonsynonym_pool\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.IntraOntologyTextSemanticsCorpus","title":"IntraOntologyTextSemanticsCorpus(onto, annotation_property_iris, soft_negative_ratio=2, hard_negative_ratio=2)
","text":"Class for creating the intra-ontology text semantics corpus from an ontology.
As defined in the \\(\\textsf{BERTMap}\\) paper, the intra-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the ontology class annotations.
Attributes:
Name Type Descriptiononto
Ontology
An ontology to construct the intra-ontology text semantics corpus from.
annotation_property_iris
List[str]
Specify which annotation properties to be used.
soft_negative_ratio
int
The expected negative sample ratio of the soft non-synonyms to the extracted synonyms. Defaults to 2
.
hard_negative_ratio
int
The expected negative sample ratio of the hard non-synonyms to the extracted synonyms. Defaults to 2
. However, hard non-synonyms are sometimes insufficient given an ontology's hierarchy, the soft ones are used to compensate the number in this case.
src/deeponto/align/bertmap/text_semantics.py
def __init__(\n self,\n onto: Ontology,\n annotation_property_iris: List[str],\n soft_negative_ratio: int = 2,\n hard_negative_ratio: int = 2,\n):\n self.onto = onto\n # $\\textsf{BERTMap}$ does not apply synonym transitivity\n self.thesaurus = AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)\n\n self.synonyms = self.thesaurus.synonym_sampling()\n # sample hard negatives first as they might not be enough\n num_hard = hard_negative_ratio * len(self.synonyms)\n self.hard_nonsynonyms = self.thesaurus.hard_nonsynonym_sampling(num_hard)\n # compensate the number of hard negatives as soft negatives are almost always available\n num_soft = (soft_negative_ratio + hard_negative_ratio) * len(self.synonyms) - len(self.hard_nonsynonyms)\n self.soft_nonsynonyms = self.thesaurus.soft_nonsynonym_sampling(num_soft)\n\n self.info = {\n type(self).__name__: {\n \"num_synonyms\": len(self.synonyms),\n \"num_nonsynonyms\": len(self.soft_nonsynonyms) + len(self.hard_nonsynonyms),\n \"num_soft_nonsynonyms\": len(self.soft_nonsynonyms),\n \"num_hard_nonsynonyms\": len(self.hard_nonsynonyms),\n \"annotation_thesaurus\": self.thesaurus.info[\"AnnotationThesaurus\"],\n }\n }\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.IntraOntologyTextSemanticsCorpus.save","title":"save(save_path)
","text":"Save the intra-ontology corpus (a .json
file for label pairs and its summary) in the specified directory.
src/deeponto/align/bertmap/text_semantics.py
def save(self, save_path: str):\n\"\"\"Save the intra-ontology corpus (a `.json` file for label pairs\n and its summary) in the specified directory.\n \"\"\"\n create_path(save_path)\n save_json = {\n \"summary\": self.info,\n \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.soft_nonsynonyms + self.hard_nonsynonyms],\n }\n save_file(save_json, os.path.join(save_path, \"intra-onto.corpus.json\"))\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus","title":"CrossOntologyTextSemanticsCorpus(class_mappings, src_onto, tgt_onto, annotation_property_iris, negative_ratio=4)
","text":"Class for creating the cross-ontology text semantics corpus from two ontologies and provided mappings between them.
As defined in the \\(\\textsf{BERTMap}\\) paper, the cross-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the annotations/labels of class pairs involved in the provided cross-ontology mappigns.
Attributes:
Name Type Descriptionclass_mappings
List[ReferenceMapping]
A list of cross-ontology class mappings.
src_onto
Ontology
The source ontology whose class IRIs are heads of the class_mappings
.
tgt_onto
Ontology
The target ontology whose class IRIs are tails of the class_mappings
.
annotation_property_iris
List[str]
A list of annotation property IRIs used to extract the annotations.
negative_ratio
int
The expected negative sample ratio of the non-synonyms to the extracted synonyms. Defaults to 4
. NOTE that we do not have hard non-synonyms at the cross-ontology level.
src/deeponto/align/bertmap/text_semantics.py
def __init__(\n self,\n class_mappings: List[ReferenceMapping],\n src_onto: Ontology,\n tgt_onto: Ontology,\n annotation_property_iris: List[str],\n negative_ratio: int = 4,\n):\n self.class_mappings = class_mappings\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n # build the annotation thesaurus for each ontology\n self.src_thesaurus = AnnotationThesaurus(src_onto, annotation_property_iris)\n self.tgt_thesaurus = AnnotationThesaurus(tgt_onto, annotation_property_iris)\n self.negative_ratio = negative_ratio\n\n self.synonyms = self.synonym_sampling_from_mappings()\n num_negative = negative_ratio * len(self.synonyms)\n self.nonsynonyms = self.nonsynonym_sampling_from_mappings(num_negative)\n\n self.info = {\n type(self).__name__: {\n \"num_synonyms\": len(self.synonyms),\n \"num_nonsynonyms\": len(self.nonsynonyms),\n \"num_mappings\": len(self.class_mappings),\n \"src_annotation_thesaurus\": self.src_thesaurus.info[\"AnnotationThesaurus\"],\n \"tgt_annotation_thesaurus\": self.tgt_thesaurus.info[\"AnnotationThesaurus\"],\n }\n }\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.save","title":"save(save_path)
","text":"Save the cross-ontology corpus (a .json
file for label pairs and its summary) in the specified directory.
src/deeponto/align/bertmap/text_semantics.py
def save(self, save_path: str):\n\"\"\"Save the cross-ontology corpus (a `.json` file for label pairs\n and its summary) in the specified directory.\n \"\"\"\n create_path(save_path)\n save_json = {\n \"summary\": self.info,\n \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.nonsynonyms],\n }\n save_file(save_json, os.path.join(save_path, \"cross-onto.corpus.json\"))\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.synonym_sampling_from_mappings","title":"synonym_sampling_from_mappings()
","text":"Sample synonyms from cross-ontology class mappings.
Arguments of this method are all class attributes. See CrossOntologyTextSemanticsCorpus
.
According to the \\(\\textsf{BERTMap}\\) paper, cross-ontology synonyms are defined as label pairs that belong to two matched classes. Suppose the class \\(C\\) from the source ontology and the class \\(D\\) from the target ontology are matched according to one of the class_mappings
, then the cartesian product of labels of \\(C\\) and labels of \\(D\\) form cross-ontology synonyms. Note that identity synonyms in the form of \\((a, a)\\) are removed because they have been covered in the intra-ontology case.
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique synonym pair samples from ontology class mappings.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def synonym_sampling_from_mappings(self):\nr\"\"\"Sample synonyms from cross-ontology class mappings.\n\n Arguments of this method are all class attributes.\n See [`CrossOntologyTextSemanticsCorpus`][deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus].\n\n According to the $\\textsf{BERTMap}$ paper, **cross-ontology synonyms** are defined as label pairs\n that belong to two **matched** classes. Suppose the class $C$ from the source ontology\n and the class $D$ from the target ontology are matched according to one of the `class_mappings`,\n then the cartesian product of labels of $C$ and labels of $D$ form cross-ontology synonyms.\n Note that **identity synonyms** in the form of $(a, a)$ are removed because they have been covered\n in the intra-ontology case.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique synonym pair samples from ontology class mappings.\n \"\"\"\n synonym_pool = []\n\n for class_mapping in self.class_mappings:\n src_class_iri, tgt_class_iri = class_mapping.to_tuple()\n src_class_annotations = self.src_thesaurus.annotation_index[src_class_iri]\n tgt_class_annotations = self.tgt_thesaurus.annotation_index[tgt_class_iri]\n synonym_pairs = list(itertools.product(src_class_annotations, tgt_class_annotations))\n # remove the identity synonyms as the have been covered in the intra-ontology case\n synonym_pairs = [(l, r) for l, r in synonym_pairs if l != r]\n backward_synonym_pairs = [(r, l) for l, r in synonym_pairs]\n synonym_pool += synonym_pairs + backward_synonym_pairs\n\n synonym_pool = uniqify(synonym_pool)\n return synonym_pool\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.nonsynonym_sampling_from_mappings","title":"nonsynonym_sampling_from_mappings(num_samples, max_iter=5)
","text":"Sample non-synonyms from cross-ontology class mappings.
Arguments of this method are all class attributes. See CrossOntologyTextSemanticsCorpus
.
According to the \\(\\textsf{BERTMap}\\) paper, cross-ontology non-synonyms are defined as label pairs that belong to two unmatched classes. Assume that the provided class mappings are self-contained in the sense that they are complete for the classes involved in them, then we can randomly sample two cross-ontology classes that are not matched according to the mappings and take their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since the number of incorrect mappings is much larger than the number of correct ones.
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique nonsynonym pair samples from ontology class mappings.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def nonsynonym_sampling_from_mappings(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample non-synonyms from cross-ontology class mappings.\n\n Arguments of this method are all class attributes.\n See [`CrossOntologyTextSemanticsCorpus`][deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus].\n\n According to the $\\textsf{BERTMap}$ paper, **cross-ontology non-synonyms** are defined as label pairs\n that belong to two **unmatched** classes. Assume that the provided class mappings are self-contained\n in the sense that they are complete for the classes involved in them, then we can randomly\n sample two cross-ontology classes that are not matched according to the mappings and take\n their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since\n the number of incorrect mappings is much larger than the number of correct ones.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique nonsynonym pair samples from ontology class mappings.\n \"\"\"\n nonsynonym_pool = []\n\n # form cross-ontology synonym groups\n cross_onto_synonym_group_pair = []\n for class_mapping in self.class_mappings:\n src_class_iri, tgt_class_iri = class_mapping.to_tuple()\n src_class_annotations = self.src_thesaurus.annotation_index[src_class_iri]\n tgt_class_annotations = self.tgt_thesaurus.annotation_index[tgt_class_iri]\n # let each matched class pair's annotations form a synonym group_pair\n cross_onto_synonym_group_pair.append((src_class_annotations, tgt_class_annotations))\n\n # randomly select disjoint synonym group pairs from all\n for _ in range(num_samples):\n left_class_pair, right_class_pair = tuple(random.sample(cross_onto_synonym_group_pair, 2))\n try:\n # randomly choose one label from a synonym group\n left_label = random.choice(list(left_class_pair[0])) # choosing the src side by [0]\n right_label = random.choice(list(right_class_pair[1])) # choosing the tgt side by [1]\n nonsynonym_pool.append((left_label, right_label))\n except:\n # skip if there are no class labels\n continue\n\n # DataUtils.uniqify is too slow so we should avoid operating it too often\n nonsynonym_pool = uniqify(nonsynonym_pool)\n while len(nonsynonym_pool) < num_samples and max_iter > 0:\n max_iter = max_iter - 1 # reduce the iteration to prevent exhausting loop\n nonsynonym_pool += self.nonsynonym_sampling_from_mappings(num_samples - len(nonsynonym_pool), max_iter)\n nonsynonym_pool = uniqify(nonsynonym_pool)\n return nonsynonym_pool\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora","title":"TextSemanticsCorpora(src_onto, tgt_onto, annotation_property_iris, class_mappings=None, auxiliary_ontos=None)
","text":"Class for creating the collection text semantics corpora.
As defined in the \\(\\textsf{BERTMap}\\) paper, the collection of text semantics corpora contains at least two intra-ontology sub-corpora from the source and target ontologies, respectively. If some class mappings are provided, then a cross-ontology sub-corpus will be created. If some additional auxiliary ontologies are provided, the intra-ontology corpora created from them will serve as the auxiliary sub-corpora.
Attributes:
Name Type Descriptionsrc_onto
Ontology
The source ontology to be matched or aligned.
tgt_onto
Ontology
The target ontology to be matched or aligned.
annotation_property_iris
List[str]
A list of annotation property IRIs used to extract the annotations.
class_mappings
List[ReferenceMapping]
A list of cross-ontology class mappings between the source and the target ontologies. Defaults to None
.
auxiliary_ontos
List[Ontology]
A list of auxiliary ontologies for augmenting more synonym/non-synonym samples. Defaults to None
.
src/deeponto/align/bertmap/text_semantics.py
def __init__(\n self,\n src_onto: Ontology,\n tgt_onto: Ontology,\n annotation_property_iris: List[str],\n class_mappings: Optional[List[ReferenceMapping]] = None,\n auxiliary_ontos: Optional[List[Ontology]] = None,\n):\n self.synonyms = []\n self.nonsynonyms = []\n\n # build intra-ontology corpora\n # negative sample ratios are by default\n self.intra_src_onto_corpus = IntraOntologyTextSemanticsCorpus(src_onto, annotation_property_iris)\n self.add_samples_from_sub_corpus(self.intra_src_onto_corpus)\n self.intra_tgt_onto_corpus = IntraOntologyTextSemanticsCorpus(tgt_onto, annotation_property_iris)\n self.add_samples_from_sub_corpus(self.intra_tgt_onto_corpus)\n\n # build cross-ontolgoy corpora\n self.class_mappings = class_mappings\n self.cross_onto_corpus = None\n if self.class_mappings:\n self.cross_onto_corpus = CrossOntologyTextSemanticsCorpus(\n class_mappings, src_onto, tgt_onto, annotation_property_iris\n )\n self.add_samples_from_sub_corpus(self.cross_onto_corpus)\n\n # build auxiliary ontology corpora (same as intra-ontology)\n self.auxiliary_ontos = auxiliary_ontos\n self.auxiliary_onto_corpora = []\n if self.auxiliary_ontos:\n for auxiliary_onto in self.auxiliary_ontos:\n self.auxiliary_onto_corpora.append(\n IntraOntologyTextSemanticsCorpus(auxiliary_onto, annotation_property_iris)\n )\n for auxiliary_onto_corpus in self.auxiliary_onto_corpora:\n self.add_samples_from_sub_corpus(auxiliary_onto_corpus)\n\n # DataUtils.uniqify the samples\n self.synonyms = uniqify(self.synonyms)\n self.nonsynonyms = uniqify(self.nonsynonyms)\n # remove invalid nonsynonyms\n self.nonsynonyms = list(set(self.nonsynonyms) - set(self.synonyms))\n\n # summary\n self.info = {\n type(self).__name__: {\n \"num_synonyms\": len(self.synonyms),\n \"num_nonsynonyms\": len(self.nonsynonyms),\n \"intra_src_onto_corpus\": self.intra_src_onto_corpus.info[\"IntraOntologyTextSemanticsCorpus\"],\n \"intra_tgt_onto_corpus\": self.intra_tgt_onto_corpus.info[\"IntraOntologyTextSemanticsCorpus\"],\n \"cross_onto_corpus\": self.cross_onto_corpus.info[\"CrossOntologyTextSemanticsCorpus\"]\n if self.cross_onto_corpus\n else None,\n \"auxiliary_onto_corpora\": [\n a.info[\"IntraOntologyTextSemanticsCorpus\"] for a in self.auxiliary_onto_corpora\n ],\n }\n }\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora.save","title":"save(save_path)
","text":"Save the overall text semantics corpora (a .json
file for label pairs and its summary) in the specified directory.
src/deeponto/align/bertmap/text_semantics.py
def save(self, save_path: str):\n\"\"\"Save the overall text semantics corpora (a `.json` file for label pairs\n and its summary) in the specified directory.\n \"\"\"\n create_path(save_path)\n save_json = {\n \"summary\": self.info,\n \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.nonsynonyms],\n }\n save_file(save_json, os.path.join(save_path, \"text-semantics.corpora.json\"))\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora.add_samples_from_sub_corpus","title":"add_samples_from_sub_corpus(sub_corpus)
","text":"Add synonyms and non-synonyms from each sub-corpus to the overall collection.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def add_samples_from_sub_corpus(\n self, sub_corpus: Union[IntraOntologyTextSemanticsCorpus, CrossOntologyTextSemanticsCorpus]\n):\n\"\"\"Add synonyms and non-synonyms from each sub-corpus to the overall collection.\"\"\"\n self.synonyms += sub_corpus.synonyms\n if isinstance(sub_corpus, IntraOntologyTextSemanticsCorpus):\n self.nonsynonyms += sub_corpus.soft_nonsynonyms + sub_corpus.hard_nonsynonyms\n else:\n self.nonsynonyms += sub_corpus.nonsynonyms\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier","title":"BERTSynonymClassifier(loaded_path, output_path, eval_mode, max_length_for_input, num_epochs_for_training=None, batch_size_for_training=None, batch_size_for_prediction=None, training_data=None, validation_data=None)
","text":"Class for BERT synonym classifier.
The main scoring module of \\(\\textsf{BERTMap}\\) consisting of a BERT model and a binary synonym classifier.
Attributes:
Name Type Descriptionloaded_path
str
The path to the checkpoint of a pre-trained BERT model.
output_path
str
The path to the output BERT model (usually fine-tuned).
eval_mode
bool
Set to False
if the model is loaded for training.
max_length_for_input
int
The maximum length of an input sequence.
num_epochs_for_training
int
The number of epochs for training a BERT model.
batch_size_for_training
int
The batch size for training a BERT model.
batch_size_for_prediction
int
The batch size for making predictions.
training_data
Dataset
Data for training the model if for_training
is set to True
. Defaults to None
.
validation_data
Dataset
Data for validating the model if for_training
is set to True
. Defaults to None
.
training_args
TrainingArguments
Training arguments for training the model if for_training
is set to True
. Defaults to None
.
trainer
Trainer
The model trainer fed with training_args
and data samples. Defaults to None
.
softmax
torch.nn.SoftMax
The softmax layer used for normalising synonym scores. Defaults to None
.
src/deeponto/align/bertmap/bert_classifier.py
def __init__(\n self,\n loaded_path: str,\n output_path: str,\n eval_mode: bool,\n max_length_for_input: int,\n num_epochs_for_training: Optional[float] = None,\n batch_size_for_training: Optional[int] = None,\n batch_size_for_prediction: Optional[int] = None,\n training_data: Optional[List[Tuple[str, str, int]]] = None, # (sentence1, sentence2, label)\n validation_data: Optional[List[Tuple[str, str, int]]] = None,\n):\n # Load the pretrained BERT model from the given path\n self.loaded_path = loaded_path\n print(f\"Loading a BERT model from: {self.loaded_path}.\")\n self.model = AutoModelForSequenceClassification.from_pretrained(\n self.loaded_path, output_hidden_states=eval_mode\n )\n self.tokenizer = Tokenizer.from_pretrained(loaded_path)\n\n self.output_path = output_path\n self.eval_mode = eval_mode\n self.max_length_for_input = max_length_for_input\n self.num_epochs_for_training = num_epochs_for_training\n self.batch_size_for_training = batch_size_for_training\n self.batch_size_for_prediction = batch_size_for_prediction\n self.training_data = None\n self.validation_data = None\n self.data_stat = {}\n self.training_args = None\n self.trainer = None\n self.softmax = None\n\n # load the pre-trained BERT model and set it to eval mode (static)\n if self.eval_mode:\n self.eval()\n # load the pre-trained BERT model for fine-tuning\n else:\n if not training_data:\n raise RuntimeError(\"Training data should be provided when `for_training` is `True`.\")\n if not validation_data:\n raise RuntimeError(\"Validation data should be provided when `for_training` is `True`.\")\n # load data (max_length is used for truncation)\n self.training_data = self.load_dataset(training_data, \"training\")\n self.validation_data = self.load_dataset(validation_data, \"validation\")\n self.data_stat = {\n \"num_training\": len(self.training_data),\n \"num_validation\": len(self.validation_data),\n }\n\n # generate training arguments\n epoch_steps = len(self.training_data) // self.batch_size_for_training # total steps of an epoch\n if torch.cuda.device_count() > 0:\n epoch_steps = epoch_steps // torch.cuda.device_count() # to deal with multi-gpus case\n # keep logging steps consisitent even for small batch size\n # report logging on every 0.02 epoch\n logging_steps = int(epoch_steps * 0.02)\n # eval on every 0.2 epoch\n eval_steps = 10 * logging_steps\n # generate the training arguments\n self.training_args = TrainingArguments(\n output_dir=self.output_path,\n num_train_epochs=self.num_epochs_for_training,\n per_device_train_batch_size=self.batch_size_for_training,\n per_device_eval_batch_size=self.batch_size_for_training,\n warmup_ratio=0.0,\n weight_decay=0.01,\n logging_steps=logging_steps,\n logging_dir=f\"{self.output_path}/tensorboard\",\n eval_steps=eval_steps,\n evaluation_strategy=\"steps\",\n do_train=True,\n do_eval=True,\n save_steps=eval_steps,\n save_total_limit=2,\n load_best_model_at_end=True,\n )\n # build the trainer\n self.trainer = Trainer(\n model=self.model,\n args=self.training_args,\n train_dataset=self.training_data,\n eval_dataset=self.validation_data,\n compute_metrics=self.compute_metrics,\n tokenizer=self.tokenizer._tokenizer,\n )\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.train","title":"train(resume_from_checkpoint=None)
","text":"Start training the BERT model.
Source code insrc/deeponto/align/bertmap/bert_classifier.py
def train(self, resume_from_checkpoint: Optional[Union[bool, str]] = None):\n\"\"\"Start training the BERT model.\"\"\"\n if self.eval_mode:\n raise RuntimeError(\"Training cannot be started in `eval` mode.\")\n self.trainer.train(resume_from_checkpoint=resume_from_checkpoint)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.eval","title":"eval()
","text":"To eval mode.
Source code insrc/deeponto/align/bertmap/bert_classifier.py
def eval(self):\n\"\"\"To eval mode.\"\"\"\n print(\"The BERT model is set to eval mode for making predictions.\")\n self.model.eval()\n # TODO: to implement multi-gpus for inference\n self.device = self.get_device(device_num=0)\n self.model.to(self.device)\n self.softmax = torch.nn.Softmax(dim=1).to(self.device)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.predict","title":"predict(sent_pairs)
","text":"Run prediction pipeline for synonym classification.
Return the softmax
probailities of predicting pairs as synonyms (index=1
).
src/deeponto/align/bertmap/bert_classifier.py
def predict(self, sent_pairs: List[Tuple[str, str]]):\nr\"\"\"Run prediction pipeline for synonym classification.\n\n Return the `softmax` probailities of predicting pairs as synonyms (`index=1`).\n \"\"\"\n inputs = self.process_inputs(sent_pairs)\n with torch.no_grad():\n return self.softmax(self.model(**inputs).logits)[:, 1]\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.load_dataset","title":"load_dataset(data, split)
","text":"Load the list of (annotation1, annotation2, label)
samples into a datasets.Dataset
.
src/deeponto/align/bertmap/bert_classifier.py
def load_dataset(self, data: List[Tuple[str, str, int]], split: str) -> Dataset:\nr\"\"\"Load the list of `(annotation1, annotation2, label)` samples into a `datasets.Dataset`.\"\"\"\n\n def iterate():\n for sample in data:\n yield {\"annotation1\": sample[0], \"annotation2\": sample[1], \"labels\": sample[2]}\n\n dataset = Dataset.from_generator(iterate)\n # NOTE: no padding here because the Trainer class supports dynamic padding\n dataset = dataset.map(\n lambda examples: self.tokenizer._tokenizer(\n examples[\"annotation1\"], examples[\"annotation2\"], max_length=self.max_length_for_input, truncation=True\n ),\n batched=True,\n desc=f\"Load {split} data:\",\n )\n return dataset\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.process_inputs","title":"process_inputs(sent_pairs)
","text":"Process input sentence pairs for the BERT model.
Transform the sentences into BERT input embeddings and load them into the device. This function is called only when the BERT model is about to make predictions (eval
mode).
src/deeponto/align/bertmap/bert_classifier.py
def process_inputs(self, sent_pairs: List[Tuple[str, str]]):\nr\"\"\"Process input sentence pairs for the BERT model.\n\n Transform the sentences into BERT input embeddings and load them into the device.\n This function is called only when the BERT model is about to make predictions (`eval` mode).\n \"\"\"\n return self.tokenizer._tokenizer(\n sent_pairs,\n return_tensors=\"pt\",\n max_length=self.max_length_for_input,\n padding=True,\n truncation=True,\n ).to(self.device)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.compute_metrics","title":"compute_metrics(pred)
staticmethod
","text":"Add more evaluation metrics into the training log.
Source code insrc/deeponto/align/bertmap/bert_classifier.py
@staticmethod\ndef compute_metrics(pred):\n\"\"\"Add more evaluation metrics into the training log.\"\"\"\n # TODO: currently only accuracy is added, will expect more in the future if needed\n labels = pred.label_ids\n preds = pred.predictions.argmax(-1)\n acc = accuracy_score(labels, preds)\n return {\"accuracy\": acc}\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.get_device","title":"get_device(device_num=0)
staticmethod
","text":"Get a device (GPU or CPU) for the torch model
Source code insrc/deeponto/align/bertmap/bert_classifier.py
@staticmethod\ndef get_device(device_num: int = 0):\n\"\"\"Get a device (GPU or CPU) for the torch model\"\"\"\n # If there's a GPU available...\n if torch.cuda.is_available():\n # Tell PyTorch to use the GPU.\n device = torch.device(f\"cuda:{device_num}\")\n print(\"There are %d GPU(s) available.\" % torch.cuda.device_count())\n print(\"We will use the GPU:\", torch.cuda.get_device_name(device_num))\n # If not...\n else:\n print(\"No GPU available, using the CPU instead.\")\n device = torch.device(\"cpu\")\n return device\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.set_seed","title":"set_seed(seed_val=888)
staticmethod
","text":"Set random seed for reproducible results.
Source code insrc/deeponto/align/bertmap/bert_classifier.py
@staticmethod\ndef set_seed(seed_val: int = 888):\n\"\"\"Set random seed for reproducible results.\"\"\"\n random.seed(seed_val)\n np.random.seed(seed_val)\n torch.manual_seed(seed_val)\n torch.cuda.manual_seed_all(seed_val)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor","title":"MappingPredictor(output_path, tokenizer_path, src_annotation_index, tgt_annotation_index, bert_synonym_classifier, num_raw_candidates, num_best_predictions, batch_size_for_prediction, logger, enlighten_manager, enlighten_status, ignored_class_index=None)
","text":"Class for the mapping prediction module of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) models.
Attributes:
Name Type Descriptiontokenizer
Tokenizer
The tokenizer used for constructing the inverted annotation index and candidate selection.
src_annotation_index
dict
A dictionary that stores the (class_iri, class_annotations)
pairs from src_onto
according to annotation_property_iris
.
tgt_annotation_index
dict
A dictionary that stores the (class_iri, class_annotations)
pairs from tgt_onto
according to annotation_property_iris
.
tgt_inverted_annotation_index
InvertedIndex
The inverted index built from tgt_annotation_index
used for target class candidate selection.
bert_synonym_classifier
BERTSynonymClassifier
The BERT synonym classifier fine-tuned on text semantics corpora.
num_raw_candidates
int
The maximum number of selected target class candidates for a source class.
num_best_predictions
int
The maximum number of best scored mappings presevred for a source class.
batch_size_for_prediction
int
The batch size of class annotation pairs for computing synonym scores.
ignored_class_index
dict
OAEI arguemnt, a dictionary that stores the (class_iri, used_in_alignment)
pairs.
src/deeponto/align/bertmap/mapping_prediction.py
def __init__(\n self,\n output_path: str,\n tokenizer_path: str,\n src_annotation_index: dict,\n tgt_annotation_index: dict,\n bert_synonym_classifier: Optional[BERTSynonymClassifier],\n num_raw_candidates: Optional[int],\n num_best_predictions: Optional[int],\n batch_size_for_prediction: int,\n logger: Logger,\n enlighten_manager: enlighten.Manager,\n enlighten_status: enlighten.StatusBar,\n ignored_class_index: Optional[dict] = None,\n):\n self.logger = logger\n self.enlighten_manager = enlighten_manager\n self.enlighten_status = enlighten_status\n\n self.tokenizer = Tokenizer.from_pretrained(tokenizer_path)\n\n self.logger.info(\"Build inverted annotation index for candidate selection.\")\n self.src_annotation_index = src_annotation_index\n self.tgt_annotation_index = tgt_annotation_index\n self.tgt_inverted_annotation_index = Ontology.build_inverted_annotation_index(\n tgt_annotation_index, self.tokenizer\n )\n # the fundamental judgement for whether bertmap or bertmaplt is loaded\n self.bert_synonym_classifier = bert_synonym_classifier\n self.num_raw_candidates = num_raw_candidates\n self.num_best_predictions = num_best_predictions\n self.batch_size_for_prediction = batch_size_for_prediction\n self.output_path = output_path\n\n # for the OAEI, adding in check for classes that are not used in alignment\n self.ignored_class_index = ignored_class_index\n\n self.init_class_mapping = lambda head, tail, score: EntityMapping(head, tail, \"<EquivalentTo>\", score)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.bert_mapping_score","title":"bert_mapping_score(src_class_annotations, tgt_class_annotations)
","text":"\\(\\textsf{BERTMap}\\)'s main mapping score module which utilises the fine-tuned BERT synonym classifier.
Compute the synonym score for each pair of src-tgt class annotations, and return the average score as the mapping score. Apply string matching before applying the BERT module to filter easy mappings (with scores \\(1.0\\)).
Source code insrc/deeponto/align/bertmap/mapping_prediction.py
def bert_mapping_score(\n self,\n src_class_annotations: Set[str],\n tgt_class_annotations: Set[str],\n):\nr\"\"\"$\\textsf{BERTMap}$'s main mapping score module which utilises the fine-tuned BERT synonym\n classifier.\n\n Compute the **synonym score** for each pair of src-tgt class annotations, and return\n the **average** score as the mapping score. Apply string matching before applying the\n BERT module to filter easy mappings (with scores $1.0$).\n \"\"\"\n\n if not src_class_annotations or not tgt_class_annotations:\n warnings.warn(\"Return zero score due to empty input class annotations...\")\n return 0.0\n\n # apply string matching before applying the bert module\n prelim_score = self.edit_similarity_mapping_score(\n src_class_annotations,\n tgt_class_annotations,\n string_match_only=True,\n )\n if prelim_score == 1.0:\n return prelim_score\n # apply BERT classifier and define mapping score := Average(SynonymScores)\n class_annotation_pairs = list(itertools.product(src_class_annotations, tgt_class_annotations))\n synonym_scores = self.bert_synonym_classifier.predict(class_annotation_pairs)\n # only one element tensor is able to be extracted as a scalar by .item()\n return float(torch.mean(synonym_scores).item())\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.edit_similarity_mapping_score","title":"edit_similarity_mapping_score(src_class_annotations, tgt_class_annotations, string_match_only=False)
staticmethod
","text":"\\(\\textsf{BERTMap}\\)'s string match module and \\(\\textsf{BERTMapLt}\\)'s mapping prediction function.
Compute the normalised edit similarity (1 - normalised edit distance)
for each pair of src-tgt class annotations, and return the maximum score as the mapping score.
src/deeponto/align/bertmap/mapping_prediction.py
@staticmethod\ndef edit_similarity_mapping_score(\n src_class_annotations: Set[str],\n tgt_class_annotations: Set[str],\n string_match_only: bool = False,\n):\nr\"\"\"$\\textsf{BERTMap}$'s string match module and $\\textsf{BERTMapLt}$'s mapping prediction function.\n\n Compute the **normalised edit similarity** `(1 - normalised edit distance)` for each pair\n of src-tgt class annotations, and return the **maximum** score as the mapping score.\n \"\"\"\n\n if not src_class_annotations or not tgt_class_annotations:\n warnings.warn(\"Return zero score due to empty input class annotations...\")\n return 0.0\n\n # edge case when src and tgt classes have an exact match of annotation\n if len(src_class_annotations.intersection(tgt_class_annotations)) > 0:\n return 1.0\n # a shortcut to save time for $\\textsf{BERTMap}$\n if string_match_only:\n return 0.0\n annotation_pairs = itertools.product(src_class_annotations, tgt_class_annotations)\n sim_scores = [levenshtein.normalized_similarity(src, tgt) for src, tgt in annotation_pairs]\n return max(sim_scores)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction_for_src_class","title":"mapping_prediction_for_src_class(src_class_iri)
","text":"Predict \\(N\\) best scored mappings for a source ontology class, where \\(N\\) is specified in self.num_best_predictions
.
If using the BERT synonym classifier module:
batch_size_for_prediction
, i.e., stop adding annotations of a target class candidate into the current batch if this operation will cause the size of current batch to exceed the limit.src/deeponto/align/bertmap/mapping_prediction.py
def mapping_prediction_for_src_class(self, src_class_iri: str) -> List[EntityMapping]:\nr\"\"\"Predict $N$ best scored mappings for a source ontology class, where\n $N$ is specified in `self.num_best_predictions`.\n\n 1. Apply the **string matching** module to compute \"easy\" mappings.\n 2. Return the mappings if found any, or if there is no BERT synonym classifier\n as in $\\textsf{BERTMapLt}$.\n 3. If using the BERT synonym classifier module:\n\n - Generate batches for class annotation pairs. Each batch contains the combinations of the\n source class annotations and $M$ target candidate classes' annotations. $M$ is determined\n by `batch_size_for_prediction`, i.e., stop adding annotations of a target class candidate into\n the current batch if this operation will cause the size of current batch to exceed the limit.\n - Compute the synonym scores for each batch and aggregate them into mapping scores; preserve\n $N$ best scored candidates and update them in the next batch. By this dynamic process, we eventually\n get $N$ best scored mappings for a source ontology class.\n \"\"\"\n\n src_class_annotations = self.src_annotation_index[src_class_iri]\n # previously wrongly put tokenizer again !!!\n tgt_class_candidates = self.tgt_inverted_annotation_index.idf_select(\n list(src_class_annotations), pool_size=len(self.tgt_annotation_index.keys())\n ) # [(tgt_class_iri, idf_score)]\n # if some classes are set to be ignored, remove them from the candidates\n if self.ignored_class_index:\n tgt_class_candidates = [(iri, idf_score) for iri, idf_score in tgt_class_candidates if not self.ignored_class_index[iri]]\n # select a truncated number of candidates\n tgt_class_candidates = tgt_class_candidates[:self.num_raw_candidates]\n best_scored_mappings = []\n\n # for string matching: save time if already found string-matched candidates\n def string_match():\n\"\"\"Compute string-matched mappings.\"\"\"\n string_matched_mappings = []\n for tgt_candidate_iri, _ in tgt_class_candidates:\n tgt_candidate_annotations = self.tgt_annotation_index[tgt_candidate_iri]\n prelim_score = self.edit_similarity_mapping_score(\n src_class_annotations,\n tgt_candidate_annotations,\n string_match_only=True,\n )\n if prelim_score > 0.0:\n # if src_class_annotations.intersection(tgt_candidate_annotations):\n string_matched_mappings.append(\n self.init_class_mapping(src_class_iri, tgt_candidate_iri, prelim_score)\n )\n\n return string_matched_mappings\n\n best_scored_mappings += string_match()\n # return string-matched mappings if found or if there is no bert module (bertmaplt)\n if best_scored_mappings or not self.bert_synonym_classifier:\n self.logger.info(f\"The best scored class mappings for {src_class_iri} are\\n{best_scored_mappings}\")\n return best_scored_mappings\n\n def generate_batched_annotations(batch_size: int):\n\"\"\"Generate batches of class annotations for the input source class and its\n target candidates.\n \"\"\"\n batches = []\n # the `nums`` parameter determines how the annotations are grouped\n current_batch = CfgNode({\"annotations\": [], \"nums\": []})\n for i, (tgt_candidate_iri, _) in enumerate(tgt_class_candidates):\n tgt_candidate_annotations = self.tgt_annotation_index[tgt_candidate_iri]\n annotation_pairs = list(itertools.product(src_class_annotations, tgt_candidate_annotations))\n current_batch.annotations += annotation_pairs\n num_annotation_pairs = len(annotation_pairs)\n current_batch.nums.append(num_annotation_pairs)\n # collect when the batch is full or for the last target class candidate\n if sum(current_batch.nums) > batch_size or i == len(tgt_class_candidates) - 1:\n batches.append(current_batch)\n current_batch = CfgNode({\"annotations\": [], \"nums\": []})\n return batches\n\n def bert_match():\n\"\"\"Compute mappings with fine-tuned BERT synonym classifier.\"\"\"\n bert_matched_mappings = []\n class_annotation_batches = generate_batched_annotations(self.batch_size_for_prediction)\n batch_base_candidate_idx = (\n 0 # after each batch, the base index will be increased by # of covered target candidates\n )\n device = self.bert_synonym_classifier.device\n\n # intialize N prediction scores and N corresponding indices w.r.t `tgt_class_candidates`\n final_best_scores = torch.tensor([-1] * self.num_best_predictions).to(device)\n final_best_idxs = torch.tensor([-1] * self.num_best_predictions).to(device)\n\n for annotation_batch in class_annotation_batches:\n\n synonym_scores = self.bert_synonym_classifier.predict(annotation_batch.annotations)\n # aggregating to mappings cores\n grouped_synonym_scores = torch.split(\n synonym_scores,\n split_size_or_sections=annotation_batch.nums,\n )\n mapping_scores = torch.stack([torch.mean(chunk) for chunk in grouped_synonym_scores])\n assert len(mapping_scores) == len(annotation_batch.nums)\n\n # preserve N best scored mappings\n # scale N in case there are less than N tgt candidates in this batch\n N = min(len(mapping_scores), self.num_best_predictions)\n batch_best_scores, batch_best_idxs = torch.topk(mapping_scores, k=N)\n batch_best_idxs += batch_base_candidate_idx\n\n # we do the substitution for every batch to prevent from memory overflow\n final_best_scores, _idxs = torch.topk(\n torch.cat([batch_best_scores, final_best_scores]),\n k=self.num_best_predictions,\n )\n final_best_idxs = torch.cat([batch_best_idxs, final_best_idxs])[_idxs]\n\n # update the index for target candidate classes\n batch_base_candidate_idx += len(annotation_batch.nums)\n\n for candidate_idx, mapping_score in zip(final_best_idxs, final_best_scores):\n # ignore intial values (-1.0) for dummy mappings\n # the threshold 0.9 is for mapping extension\n if mapping_score.item() >= 0.9:\n tgt_candidate_iri = tgt_class_candidates[candidate_idx.item()][0]\n bert_matched_mappings.append(\n self.init_class_mapping(\n src_class_iri,\n tgt_candidate_iri,\n mapping_score.item(),\n )\n )\n\n assert len(bert_matched_mappings) <= self.num_best_predictions\n self.logger.info(f\"The best scored class mappings for {src_class_iri} are\\n{bert_matched_mappings}\")\n return bert_matched_mappings\n\n return bert_match()\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction","title":"mapping_prediction()
","text":"Apply global matching for each class in the source ontology.
See mapping_prediction_for_src_class
.
If this process is accidentally stopped, it can be resumed from already saved predictions. The progress bar keeps track of the number of source ontology classes that have been matched.
Source code insrc/deeponto/align/bertmap/mapping_prediction.py
def mapping_prediction(self):\nr\"\"\"Apply global matching for each class in the source ontology.\n\n See [`mapping_prediction_for_src_class`][deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction_for_src_class].\n\n If this process is accidentally stopped, it can be resumed from already saved predictions. The progress\n bar keeps track of the number of source ontology classes that have been matched.\n \"\"\"\n self.logger.info(\"Start global matching for each class in the source ontology.\")\n\n match_dir = os.path.join(self.output_path, \"match\")\n try:\n mapping_index = load_file(os.path.join(match_dir, \"raw_mappings.json\"))\n self.logger.info(\"Load the existing mapping prediction file.\")\n except:\n mapping_index = dict()\n create_path(match_dir)\n\n progress_bar = self.enlighten_manager.counter(\n total=len(self.src_annotation_index), desc=\"Mapping Prediction\", unit=\"per src class\"\n )\n self.enlighten_status.update(demo=\"Mapping Prediction\")\n\n for i, src_class_iri in enumerate(self.src_annotation_index.keys()):\n # skip computed classes\n if src_class_iri in mapping_index.keys():\n self.logger.info(f\"[Class {i}] Skip matching {src_class_iri} as already computed.\")\n progress_bar.update()\n continue\n # for OAEI\n if self.ignored_class_index and self.ignored_class_index[src_class_iri]:\n self.logger.info(f\"[Class {i}] Skip matching {src_class_iri} as marked as not used in alignment.\")\n progress_bar.update()\n continue\n mappings = self.mapping_prediction_for_src_class(src_class_iri)\n mapping_index[src_class_iri] = [m.to_tuple(with_score=True) for m in mappings]\n\n if i % 100 == 0 or i == len(self.src_annotation_index) - 1:\n save_file(mapping_index, os.path.join(match_dir, \"raw_mappings.json\"))\n # also save a .tsv version\n mapping_in_tuples = list(itertools.chain.from_iterable(mapping_index.values()))\n mapping_df = pd.DataFrame(mapping_in_tuples, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n mapping_df.to_csv(os.path.join(match_dir, \"raw_mappings.tsv\"), sep=\"\\t\", index=False)\n self.logger.info(\"Save currently computed mappings to prevent undesirable loss.\")\n\n progress_bar.update()\n\n self.logger.info(\"Finished mapping prediction for each class in the source ontology.\")\n progress_bar.close()\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner","title":"MappingRefiner(output_path, src_onto, tgt_onto, mapping_predictor, mapping_extension_threshold, mapping_filtered_threshold, logger, enlighten_manager, enlighten_status)
","text":"Class for the mapping refinement module of \\(\\textsf{BERTMap}\\).
\\(\\textsf{BERTMapLt}\\) does not go through mapping refinement for its being \"light\". All the attributes of this class are supposed to be passed from BERTMapPipeline
.
Attributes:
Name Type Descriptionsrc_onto
Ontology
The source ontology to be matched.
tgt_onto
Ontology
The target ontology to be matched.
mapping_predictor
MappingPredictor
The mapping prediction module of BERTMap.
mapping_extension_threshold
float
Mappings with scores \\(\\geq\\) this value will be considered in the iterative mapping extension process.
raw_mappings
List[EntityMapping]
List of raw class mappings predicted in the global matching phase.
mapping_score_dict
dict
A dynamic dictionary that keeps track of mappings (with scores) that have already been computed.
mapping_filter_threshold
float
Mappings with scores \\(\\geq\\) this value will be preserved for the final mapping repairing.
Source code insrc/deeponto/align/bertmap/mapping_refinement.py
def __init__(\n self,\n output_path: str,\n src_onto: Ontology,\n tgt_onto: Ontology,\n mapping_predictor: MappingPredictor,\n mapping_extension_threshold: float,\n mapping_filtered_threshold: float,\n logger: Logger,\n enlighten_manager: enlighten.Manager,\n enlighten_status: enlighten.StatusBar\n):\n self.output_path = output_path\n self.logger = logger\n self.enlighten_manager = enlighten_manager\n self.enlighten_status = enlighten_status\n\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n\n # iterative mapping extension\n self.mapping_predictor = mapping_predictor\n self.mapping_extension_threshold = mapping_extension_threshold # \\kappa\n self.raw_mappings = EntityMapping.read_table_mappings(\n os.path.join(self.output_path, \"match\", \"raw_mappings.tsv\"),\n threshold=self.mapping_extension_threshold,\n relation=\"<EquivalentTo>\",\n )\n # keep track of already scored mappings to prevent duplicated predictions\n self.mapping_score_dict = dict()\n for m in self.raw_mappings:\n src_class_iri, tgt_class_iri, score = m.to_tuple(with_score=True)\n self.mapping_score_dict[(src_class_iri, tgt_class_iri)] = score\n\n # the threshold for final filtering the extended mappings\n self.mapping_filtered_threshold = mapping_filtered_threshold # \\lambda\n\n # logmap mapping repair folder\n self.logmap_repair_path = os.path.join(self.output_path, \"match\", \"logmap-repair\")\n\n # paths for mapping extension and repair\n self.extended_mapping_path = os.path.join(self.output_path, \"match\", \"extended_mappings.tsv\")\n self.filtered_mapping_path = os.path.join(self.output_path, \"match\", \"filtered_mappings.tsv\")\n self.repaired_mapping_path = os.path.join(self.output_path, \"match\", \"repaired_mappings.tsv\")\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.mapping_extension","title":"mapping_extension(max_iter=10)
","text":"Iterative mapping extension based on the locality principle.
For each class pair \\((c, c')\\) (scored in the global matching phase) with score \\(\\geq \\kappa\\), search for plausible mappings between the parents of \\(c\\) and \\(c'\\), and between the children of \\(c\\) and \\(c'\\). This is an iterative process as the set newly discovered mappings can act renew the frontier for searching. Terminate if no new mappings with score \\(\\geq \\kappa\\) can be found or the limit max_iter
has been reached. Note that \\(\\kappa\\) is set to \\(0.9\\) by default (can be altered in the configuration file). The mapping extension progress bar keeps track of the total number of extended mappings (including the previously predicted ones).
A further filtering will be performed by only preserving mappings with score \\(\\geq \\lambda\\), in the original BERTMap paper, \\(\\lambda\\) is determined by the validation mappings, but in practice \\(\\lambda\\) is not a sensitive hyperparameter and validation mappings are often not available. Therefore, we manually set \\(\\lambda\\) to \\(0.9995\\) by default (can be altered in the configuration file). The mapping filtering progress bar keeps track of the total number of filtered mappings (this bar is purely for logging purpose).
Parameters:
Name Type Description Defaultmax_iter
int
The maximum number of mapping extension iterations. Defaults to 10
.
10
Source code in src/deeponto/align/bertmap/mapping_refinement.py
def mapping_extension(self, max_iter: int = 10):\nr\"\"\"Iterative mapping extension based on the locality principle.\n\n For each class pair $(c, c')$ (scored in the global matching phase) with score \n $\\geq \\kappa$, search for plausible mappings between the parents of $c$ and $c'$,\n and between the children of $c$ and $c'$. This is an iterative process as the set \n newly discovered mappings can act renew the frontier for searching. Terminate if\n no new mappings with score $\\geq \\kappa$ can be found or the limit `max_iter` has \n been reached. Note that $\\kappa$ is set to $0.9$ by default (can be altered\n in the configuration file). The mapping extension progress bar keeps track of the \n total number of extended mappings (including the previously predicted ones).\n\n A further filtering will be performed by only preserving mappings with score $\\geq \\lambda$,\n in the original BERTMap paper, $\\lambda$ is determined by the validation mappings, but\n in practice $\\lambda$ is not a sensitive hyperparameter and validation mappings are often\n not available. Therefore, we manually set $\\lambda$ to $0.9995$ by default (can be altered\n in the configuration file). The mapping filtering progress bar keeps track of the \n total number of filtered mappings (this bar is purely for logging purpose).\n\n Args:\n max_iter (int, optional): The maximum number of mapping extension iterations. Defaults to `10`.\n \"\"\"\n\n num_iter = 0\n self.enlighten_status.update(demo=\"Mapping Extension\")\n extension_progress_bar = self.enlighten_manager.counter(\n desc=f\"Mapping Extension [Iteration #{num_iter}]\", unit=\"mapping\"\n )\n filtering_progress_bar = self.enlighten_manager.counter(\n desc=f\"Mapping Filtering\", unit=\"mapping\"\n )\n\n if os.path.exists(self.extended_mapping_path) and os.path.exists(self.filtered_mapping_path):\n self.logger.info(\n f\"Found extended and filtered mapping files at {self.extended_mapping_path}\"\n + f\" and {self.filtered_mapping_path}.\\nPlease check file integrity; if incomplete, \"\n + \"delete them and re-run the program.\"\n )\n\n # for animation purposes\n extension_progress_bar.desc = f\"Mapping Extension\"\n for _ in EntityMapping.read_table_mappings(self.extended_mapping_path):\n extension_progress_bar.update()\n\n self.enlighten_status.update(demo=\"Mapping Filtering\")\n for _ in EntityMapping.read_table_mappings(self.filtered_mapping_path):\n filtering_progress_bar.update()\n\n extension_progress_bar.close()\n filtering_progress_bar.close()\n\n return\n # intialise the frontier, explored, final expansion sets with the raw mappings\n # NOTE be careful of address pointers\n frontier = [m.to_tuple() for m in self.raw_mappings]\n expansion = [m.to_tuple(with_score=True) for m in self.raw_mappings]\n # for animation purposes\n for _ in range(len(expansion)):\n extension_progress_bar.update()\n\n self.logger.info(\n f\"Start mapping extension for each class pair with score >= {self.mapping_extension_threshold}.\"\n )\n while frontier and num_iter < max_iter:\n new_mappings = []\n for src_class_iri, tgt_class_iri in frontier:\n # one hop extension makes sure new mappings are really \"new\"\n cur_new_mappings = self.one_hop_extend(src_class_iri, tgt_class_iri)\n extension_progress_bar.update(len(cur_new_mappings))\n new_mappings += cur_new_mappings\n # add new mappings to the expansion set\n expansion += new_mappings\n # renew frontier with the newly discovered mappings\n frontier = [(x, y) for x, y, _ in new_mappings]\n\n self.logger.info(f\"Add {len(new_mappings)} mappings at iteration #{num_iter}.\")\n num_iter += 1\n extension_progress_bar.desc = f\"Mapping Extension [Iteration #{num_iter}]\"\n\n num_extended = len(expansion) - len(self.raw_mappings)\n self.logger.info(\n f\"Finished iterative mapping extension with {num_extended} new mappings and in total {len(expansion)} extended mappings.\"\n )\n\n extended_mapping_df = pd.DataFrame(expansion, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n extended_mapping_df.to_csv(self.extended_mapping_path, sep=\"\\t\", index=False)\n\n self.enlighten_status.update(demo=\"Mapping Filtering\")\n\n filtered_expansion = [\n (src, tgt, score) for src, tgt, score in expansion if score >= self.mapping_filtered_threshold\n ]\n self.logger.info(\n f\"Filtered the extended mappings by a threshold of {self.mapping_filtered_threshold}.\"\n + f\"There are {len(filtered_expansion)} mappings left for mapping repair.\"\n )\n\n for _ in range(len(filtered_expansion)):\n filtering_progress_bar.update()\n\n filtered_mapping_df = pd.DataFrame(filtered_expansion, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n filtered_mapping_df.to_csv(self.filtered_mapping_path, sep=\"\\t\", index=False)\n\n extension_progress_bar.close()\n filtering_progress_bar.close()\n return filtered_expansion\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.one_hop_extend","title":"one_hop_extend(src_class_iri, tgt_class_iri, pool_size=200)
","text":"Extend mappings from a scored class pair \\((c, c')\\) by searching from one-hop neighbors.
Search for plausible mappings between the parents of \\(c\\) and \\(c'\\), and between the children of \\(c\\) and \\(c'\\). Mappings that are not already computed (recorded in self.mapping_score_dict
) and have a score \\(\\geq\\) self.mapping_extension_threshold
will be returned as new mappings.
Parameters:
Name Type Description Defaultsrc_class_iri
str
The IRI of the source ontology class \\(c\\).
requiredtgt_class_iri
str
The IRI of the target ontology class \\(c'\\).
requiredpool_size
int
The maximum number of plausible mappings to be extended. Defaults to 200.
200
Returns:
Type DescriptionList[EntityMapping]
A list of one-hop extended mappings.
Source code insrc/deeponto/align/bertmap/mapping_refinement.py
def one_hop_extend(self, src_class_iri: str, tgt_class_iri: str, pool_size: int = 200):\nr\"\"\"Extend mappings from a scored class pair $(c, c')$ by\n searching from one-hop neighbors.\n\n Search for plausible mappings between the parents of $c$ and $c'$,\n and between the children of $c$ and $c'$. Mappings that are not\n already computed (recorded in `self.mapping_score_dict`) and have\n a score $\\geq$ `self.mapping_extension_threshold` will be returned as\n **new** mappings.\n\n Args:\n src_class_iri (str): The IRI of the source ontology class $c$.\n tgt_class_iri (str): The IRI of the target ontology class $c'$.\n pool_size (int, optional): The maximum number of plausible mappings to be extended. Defaults to 200.\n\n Returns:\n (List[EntityMapping]): A list of one-hop extended mappings.\n \"\"\"\n\n def get_iris(owl_objects):\n return [str(x.getIRI()) for x in owl_objects]\n\n src_class = self.src_onto.get_owl_object(src_class_iri)\n src_class_parent_iris = get_iris(self.src_onto.get_asserted_parents(src_class, named_only=True))\n src_class_children_iris = get_iris(self.src_onto.get_asserted_children(src_class, named_only=True))\n\n tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n tgt_class_parent_iris = get_iris(self.tgt_onto.get_asserted_parents(tgt_class, named_only=True))\n tgt_class_children_iris = get_iris(self.tgt_onto.get_asserted_children(tgt_class, named_only=True))\n\n # pair up parents and children, respectively; NOTE set() might not be necessary\n parent_pairs = list(set(itertools.product(src_class_parent_iris, tgt_class_parent_iris)))\n children_pairs = list(set(itertools.product(src_class_children_iris, tgt_class_children_iris)))\n\n candidate_pairs = parent_pairs + children_pairs\n # downsample if the number of candidates is too large\n if len(candidate_pairs) > pool_size:\n candidate_pairs = random.sample(candidate_pairs, pool_size)\n\n extended_mappings = []\n for src_candidate_iri, tgt_candidate_iri in parent_pairs + children_pairs:\n\n # if already computed meaning that it is not a new mapping\n if (src_candidate_iri, tgt_candidate_iri) in self.mapping_score_dict:\n continue\n\n src_candidate_annotations = self.mapping_predictor.src_annotation_index[src_candidate_iri]\n tgt_candidate_annotations = self.mapping_predictor.tgt_annotation_index[tgt_candidate_iri]\n score = self.mapping_predictor.bert_mapping_score(src_candidate_annotations, tgt_candidate_annotations)\n # add to already scored collection\n self.mapping_score_dict[(src_candidate_iri, tgt_candidate_iri)] = score\n\n # skip mappings with low scores\n if score < self.mapping_extension_threshold:\n continue\n\n extended_mappings.append((src_candidate_iri, tgt_candidate_iri, score))\n\n self.logger.info(\n f\"New mappings (in tuples) extended from {(src_class_iri, tgt_class_iri)} are:\\n\" + f\"{extended_mappings}\"\n )\n\n return extended_mappings\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.mapping_repair","title":"mapping_repair()
","text":"Repair the filtered mappings with LogMap's debugger.
Note
A sub-folder under match
named logmap-repair
contains LogMap-related intermediate files.
src/deeponto/align/bertmap/mapping_refinement.py
def mapping_repair(self):\n\"\"\"Repair the filtered mappings with LogMap's debugger.\n\n !!! note\n\n A sub-folder under `match` named `logmap-repair` contains LogMap-related intermediate files.\n \"\"\"\n\n # progress bar for animation purposes\n self.enlighten_status.update(demo=\"Mapping Repairing\")\n repair_progress_bar = self.enlighten_manager.counter(\n desc=f\"Mapping Repairing\", unit=\"mapping\"\n )\n\n # skip repairing if already found the file\n if os.path.exists(self.repaired_mapping_path):\n self.logger.info(\n f\"Found the repaired mapping file at {self.repaired_mapping_path}.\"\n + \"\\nPlease check file integrity; if incomplete, \"\n + \"delete it and re-run the program.\"\n )\n # update progress bar for animation purposes\n for _ in EntityMapping.read_table_mappings(self.repaired_mapping_path):\n repair_progress_bar.update()\n repair_progress_bar.close()\n return \n\n # start mapping repair\n self.logger.info(\"Repair the filtered mappings with LogMap debugger.\")\n # formatting the filtered mappings\n self.logmap_repair_formatting()\n\n # run the LogMap repair module on the extended mappings\n run_logmap_repair(\n self.src_onto.owl_path,\n self.tgt_onto.owl_path,\n os.path.join(self.logmap_repair_path, f\"filtered_mappings_for_LogMap_repair.txt\"),\n self.logmap_repair_path,\n Ontology.get_max_jvm_memory()\n )\n\n # create table mappings from LogMap repair outputs\n with open(os.path.join(self.logmap_repair_path, \"mappings_repaired_with_LogMap.tsv\"), \"r\") as f:\n lines = f.readlines()\n with open(os.path.join(self.output_path, \"match\", \"repaired_mappings.tsv\"), \"w+\") as f:\n f.write(\"SrcEntity\\tTgtEntity\\tScore\\n\")\n for line in lines:\n src_ent_iri, tgt_ent_iri, score = line.split(\"\\t\")\n f.write(f\"{src_ent_iri}\\t{tgt_ent_iri}\\t{score}\")\n repair_progress_bar.update()\n\n self.logger.info(\"Mapping repair finished.\")\n repair_progress_bar.close()\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.logmap_repair_formatting","title":"logmap_repair_formatting()
","text":"Transform the filtered mapping file into the LogMap format.
An auxiliary function of the mapping repair module which requires mappings to be formatted as LogMap's input format.
Source code insrc/deeponto/align/bertmap/mapping_refinement.py
def logmap_repair_formatting(self):\n\"\"\"Transform the filtered mapping file into the LogMap format.\n\n An auxiliary function of the mapping repair module which requires mappings\n to be formatted as LogMap's input format.\n \"\"\"\n # read the filtered mapping file and convert to tuples\n filtered_mappings = EntityMapping.read_table_mappings(self.filtered_mapping_path)\n filtered_mappings_in_tuples = [m.to_tuple(with_score=True) for m in filtered_mappings]\n\n # write the mappings into logmap format\n lines = []\n for src_class_iri, tgt_class_iri, score in filtered_mappings_in_tuples:\n lines.append(f\"{src_class_iri}|{tgt_class_iri}|=|{score}|CLS\\n\")\n\n # create a path to prevent error\n create_path(self.logmap_repair_path)\n formatted_file = os.path.join(self.logmap_repair_path, f\"filtered_mappings_for_LogMap_repair.txt\")\n with open(formatted_file, \"w\") as f:\n f.writelines(lines)\n\n return lines\n
"},{"location":"deeponto/align/bertsubs/","title":"BERTSubs (Inter)","text":""},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline","title":"BERTSubsInterPipeline(src_onto, tgt_onto, config)
","text":"Class for the model training and prediction/validation pipeline of inter-ontology subsumption of BERTSubs.
Attributes:
Name Type Descriptionsrc_onto
Ontology
Source ontology (the sub-class side).
tgt_onto
Ontology
Target ontology (the super-class side).
config
CfgNode
Configuration.
src_sampler
SubsumptionSampler
Object for sampling-related functions of the source ontology.
tgt_sampler
SubsumptionSampler
Object for sampling-related functions of the target ontology.
Source code insrc/deeponto/complete/bertsubs/pipeline_inter.py
def __init__(self, src_onto: Ontology, tgt_onto: Ontology, config: CfgNode):\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n self.config = config\n self.config.label_property = self.config.src_label_property\n self.src_sampler = SubsumptionSampler(onto=self.src_onto, config=self.config)\n self.config.label_property = self.config.tgt_label_property\n self.tgt_sampler = SubsumptionSampler(onto=self.tgt_onto, config=self.config)\n start_time = datetime.datetime.now()\n\n read_subsumptions = lambda file_name: [line.strip().split(',') for line in open(file_name).readlines()]\n test_subsumptions = None if config.test_subsumption_file is None or config.test_subsumption_file == 'None' \\\n else read_subsumptions(config.test_subsumption_file)\n valid_subsumptions = None if config.valid_subsumption_file is None or config.valid_subsumption_file == 'None' \\\n else read_subsumptions(config.valid_subsumption_file)\n\n if config.use_ontology_subsumptions_training:\n src_subsumptions = BERTSubsIntraPipeline.extract_subsumptions_from_ontology(onto=self.src_onto,\n subsumption_type=config.subsumption_type)\n tgt_subsumptions = BERTSubsIntraPipeline.extract_subsumptions_from_ontology(onto=self.tgt_onto,\n subsumption_type=config.subsumption_type)\n src_subsumptions0, tgt_subsumptions0 = [], []\n if config.subsumption_type == 'named_class':\n for subs in src_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n src_subsumptions0.append([str(c1.getIRI()), str(c2.getIRI())])\n for subs in tgt_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n tgt_subsumptions0.append([str(c1.getIRI()), str(c2.getIRI())])\n elif config.subsumption_type == 'restriction':\n for subs in src_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n src_subsumptions0.append([str(c1.getIRI()), str(c2)])\n for subs in tgt_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n tgt_subsumptions0.append([str(c1.getIRI()), str(c2)])\n restrictions = BERTSubsIntraPipeline.extract_restrictions_from_ontology(onto=self.tgt_onto)\n print('restrictions in the target ontology: %d' % len(restrictions))\n else:\n warnings.warn('Unknown subsumption type %s' % config.subsumption_type)\n sys.exit(0)\n print('Positive train subsumptions from the source/target ontology: %d/%d' % (\n len(src_subsumptions0), len(tgt_subsumptions0)))\n\n src_tr = self.src_sampler.generate_samples(subsumptions=src_subsumptions0)\n tgt_tr = self.tgt_sampler.generate_samples(subsumptions=tgt_subsumptions0)\n else:\n src_tr, tgt_tr = [], []\n\n if config.train_subsumption_file is None or config.train_subsumption_file == 'None':\n tr = src_tr + tgt_tr\n else:\n train_subsumptions = read_subsumptions(config.train_subsumption_file)\n tr = self.inter_ontology_sampling(subsumptions=train_subsumptions, pos_dup=config.fine_tune.train_pos_dup,\n neg_dup=config.fine_tune.train_neg_dup)\n tr = tr + src_tr + tgt_tr\n\n if len(tr) == 0:\n warnings.warn('No training samples extracted')\n if config.fine_tune.do_fine_tune:\n sys.exit(0)\n\n end_time = datetime.datetime.now()\n print('data pre-processing costs %.1f minutes' % ((end_time - start_time).seconds / 60))\n\n start_time = datetime.datetime.now()\n torch.cuda.empty_cache()\n bert_trainer = BERTSubsumptionClassifierTrainer(config.fine_tune.pretrained, train_data=tr,\n val_data=tr[0:int(len(tr) / 5)],\n max_length=config.prompt.max_length,\n early_stop=config.fine_tune.early_stop)\n\n epoch_steps = len(bert_trainer.tra) // config.fine_tune.batch_size # total steps of an epoch\n logging_steps = int(epoch_steps * 0.02) if int(epoch_steps * 0.02) > 0 else 5\n eval_steps = 5 * logging_steps\n training_args = TrainingArguments(\n output_dir=config.fine_tune.output_dir,\n num_train_epochs=config.fine_tune.num_epochs,\n per_device_train_batch_size=config.fine_tune.batch_size,\n per_device_eval_batch_size=config.fine_tune.batch_size,\n warmup_ratio=config.fine_tune.warm_up_ratio,\n weight_decay=0.01,\n logging_steps=logging_steps,\n logging_dir=f\"{config.fine_tune.output_dir}/tb\",\n eval_steps=eval_steps,\n evaluation_strategy=\"steps\",\n do_train=True,\n do_eval=True,\n save_steps=eval_steps,\n load_best_model_at_end=True,\n save_total_limit=1,\n metric_for_best_model=\"accuracy\",\n greater_is_better=True\n )\n if config.fine_tune.do_fine_tune and (config.prompt.prompt_type == 'traversal' or (\n config.prompt.prompt_type == 'path' and config.prompt.use_sub_special_token)):\n bert_trainer.add_special_tokens(['<SUB>'])\n\n bert_trainer.train(train_args=training_args, do_fine_tune=config.fine_tune.do_fine_tune)\n if config.fine_tune.do_fine_tune:\n bert_trainer.trainer.save_model(\n output_dir=os.path.join(config.fine_tune.output_dir, 'fine-tuned-checkpoint'))\n print('fine-tuning done, fine-tuned model saved')\n else:\n print('pretrained or fine-tuned model loaded.')\n end_time = datetime.datetime.now()\n print('Fine-tuning costs %.1f minutes' % ((end_time - start_time).seconds / 60))\n\n bert_trainer.model.eval()\n self.device = torch.device(f\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n bert_trainer.model.to(self.device)\n self.tokenize = lambda x: bert_trainer.tokenizer(x, max_length=config.prompt.max_length, truncation=True,\n padding=True, return_tensors=\"pt\")\n softmax = torch.nn.Softmax(dim=1)\n self.classifier = lambda x: softmax(bert_trainer.model(**x).logits)[:, 1]\n\n if valid_subsumptions is not None:\n self.evaluate(target_subsumptions=valid_subsumptions, test_type='valid')\n\n if test_subsumptions is not None:\n if config.test_type == 'evaluation':\n self.evaluate(target_subsumptions=test_subsumptions, test_type='test')\n elif config.test_type == 'prediction':\n self.predict(target_subsumptions=test_subsumptions)\n else:\n warnings.warn(\"Unknown test_type: %s\" % config.test_type)\n print('\\n ------------------------- done! ---------------------------\\n\\n\\n')\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.inter_ontology_sampling","title":"inter_ontology_sampling(subsumptions, pos_dup=1, neg_dup=1)
","text":"Transform inter-ontology subsumptions to two-string samples
Parameters:
Name Type Description Defaultsubsumptions
List[List]
A list of subsumptions; each subsumption is composed of two IRIs.
requiredpos_dup
int
Positive sample duplication.
1
neg_dup
int
Negative sample duplication.
1
Source code in src/deeponto/complete/bertsubs/pipeline_inter.py
def inter_ontology_sampling(self, subsumptions: List[List], pos_dup: int = 1, neg_dup: int = 1):\nr\"\"\"Transform inter-ontology subsumptions to two-string samples\n Args:\n subsumptions (List[List]): A list of subsumptions; each subsumption is composed of two IRIs.\n pos_dup (int): Positive sample duplication.\n neg_dup (int): Negative sample duplication.\n \"\"\"\n pos_samples = list()\n for subs in subsumptions:\n sub_strs = self.src_sampler.subclass_to_strings(subcls=subs[0])\n sup_strs = self.tgt_sampler.supclass_to_strings(supcls=subs[1],\n subsumption_type=self.config.subsumption_type)\n for sub_str in sub_strs:\n for sup_str in sup_strs:\n pos_samples.append([sub_str, sup_str, 1])\n pos_samples = pos_dup * pos_samples\n\n neg_subsumptions = list()\n for subs in subsumptions:\n for _ in range(neg_dup):\n neg_c = self.tgt_sampler.get_negative_sample(subclass_iri=subs[1],\n subsumption_type=self.config.subsumption_type)\n neg_subsumptions.append([subs[0], neg_c])\n\n neg_samples = list()\n for subs in neg_subsumptions:\n sub_strs = self.src_sampler.subclass_to_strings(subcls=subs[0])\n sup_strs = self.tgt_sampler.supclass_to_strings(supcls=subs[1],\n subsumption_type=self.config.subsumption_type)\n for sub_str in sub_strs:\n for sup_str in sup_strs:\n neg_samples.append([sub_str, sup_str, 0])\n\n if len(neg_samples) < len(pos_samples):\n neg_samples = neg_samples + [random.choice(neg_samples) for _ in range(len(pos_samples) - len(neg_samples))]\n if len(neg_samples) > len(pos_samples):\n pos_samples = pos_samples + [random.choice(pos_samples) for _ in range(len(neg_samples) - len(pos_samples))]\n print('training mappings, pos_samples: %d, neg_samples: %d' % (len(pos_samples), len(neg_samples)))\n all_samples = [s for s in pos_samples + neg_samples if s[0] != '' and s[1] != '']\n return all_samples\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.inter_ontology_subsumption_to_sample","title":"inter_ontology_subsumption_to_sample(subsumption)
","text":"Transform an inter ontology subsumption into a sample (a two-string list).
Parameters:
Name Type Description Defaultsubsumption
List
a subsumption composed of two IRIs.
required Source code insrc/deeponto/complete/bertsubs/pipeline_inter.py
def inter_ontology_subsumption_to_sample(self, subsumption: List):\nr\"\"\"Transform an inter ontology subsumption into a sample (a two-string list).\n\n Args:\n subsumption (List): a subsumption composed of two IRIs.\n \"\"\"\n subcls, supcls = subsumption[0], subsumption[1]\n substrs = self.src_sampler.subclass_to_strings(subcls=subcls)\n supstrs = self.tgt_sampler.supclass_to_strings(supcls=supcls, subsumption_type='named_class')\n samples = list()\n for substr in substrs:\n for supstr in supstrs:\n samples.append([substr, supstr])\n return samples\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.score","title":"score(samples)
","text":"Score the samples with the classifier.
Parameters:
Name Type Description Defaultsamples
List[List]
Each item is a list with two strings (input).
required Source code insrc/deeponto/complete/bertsubs/pipeline_inter.py
def score(self, samples):\nr\"\"\"Score the samples with the classifier.\n\n Args:\n samples (List[List]): Each item is a list with two strings (input).\n \"\"\"\n sample_size = len(samples)\n scores = np.zeros(sample_size)\n batch_num = math.ceil(sample_size / self.config.evaluation.batch_size)\n for i in range(batch_num):\n j = (i + 1) * self.config.evaluation.batch_size \\\n if (i + 1) * self.config.evaluation.batch_size <= sample_size else sample_size\n inputs = self.tokenize(samples[i * self.config.evaluation.batch_size:j])\n inputs.to(self.device)\n with torch.no_grad():\n batch_scores = self.classifier(inputs)\n scores[i * self.config.evaluation.batch_size:j] = batch_scores.cpu().numpy()\n return scores\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.evaluate","title":"evaluate(target_subsumptions, test_type='test')
","text":"Test and calculate the metrics according to a given list of subsumptions.
Parameters:
Name Type Description Defaulttarget_subsumptions
List[List]
A list of subsumptions, each of which of is a two-component list (subclass_iri, super_class_iri_or_str)
.
test_type
str
\"test\"
or \"valid\"
.
'test'
Source code in src/deeponto/complete/bertsubs/pipeline_inter.py
def evaluate(self, target_subsumptions: List[List], test_type: str = 'test'):\nr\"\"\"Test and calculate the metrics according to a given list of subsumptions.\n\n Args:\n target_subsumptions (List[List]): A list of subsumptions, each of which of is a two-component list `(subclass_iri, super_class_iri_or_str)`.\n test_type (str): `\"test\"` or `\"valid\"`.\n \"\"\"\n MRR_sum, hits1_sum, hits5_sum, hits10_sum = 0, 0, 0, 0\n MRR, Hits1, Hits5, Hits10 = 0, 0, 0, 0\n size_sum, size_n = 0, 0\n for k0, test in enumerate(target_subsumptions):\n subcls, gt = test[0], test[1]\n candidates = test[1:]\n candidate_subsumptions = [[subcls, c] for c in candidates]\n candidate_scores = np.zeros(len(candidate_subsumptions))\n for k1, candidate_subsumption in enumerate(candidate_subsumptions):\n samples = self.inter_ontology_subsumption_to_sample(subsumption=candidate_subsumption)\n size_sum += len(samples)\n size_n += 1\n scores = self.score(samples=samples)\n candidate_scores[k1] = np.average(scores)\n\n sorted_indexes = np.argsort(candidate_scores)[::-1]\n sorted_classes = [candidates[i] for i in sorted_indexes]\n rank = sorted_classes.index(gt) + 1\n MRR_sum += 1.0 / rank\n hits1_sum += 1 if gt in sorted_classes[:1] else 0\n hits5_sum += 1 if gt in sorted_classes[:5] else 0\n hits10_sum += 1 if gt in sorted_classes[:10] else 0\n num = k0 + 1\n MRR, Hits1, Hits5, Hits10 = MRR_sum / num, hits1_sum / num, hits5_sum / num, hits10_sum / num\n if num % 500 == 0:\n print('\\n%d tested, MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n' % (\n num, MRR, Hits1, Hits5, Hits10))\n print('\\n[%s], MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n' % (test_type, MRR, Hits1, Hits5, Hits10))\n print('%.2f samples per testing subsumption' % (size_sum / size_n))\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.predict","title":"predict(target_subsumptions)
","text":"Predict a score for each given subsumption.
The scores will be saved in test_subsumption_scores.csv
.
Parameters:
Name Type Description Defaulttarget_subsumptions
List[List]
Each item is a list with the first element as the sub-class, and the remaining elements as n candidate super-classes.
required Source code insrc/deeponto/complete/bertsubs/pipeline_inter.py
def predict(self, target_subsumptions: List[List]):\nr\"\"\"Predict a score for each given subsumption. \n\n The scores will be saved in `test_subsumption_scores.csv`.\n\n Args:\n target_subsumptions (List[List]): Each item is a list with the first element as the sub-class,\n and the remaining elements as n candidate super-classes.\n \"\"\"\n out_lines = []\n for test in target_subsumptions:\n subcls, candidates = test[0], test[1:]\n candidate_subsumptions = [[subcls, c] for c in candidates]\n candidate_scores = []\n\n for candidate_subsumption in candidate_subsumptions:\n samples = self.inter_ontology_subsumption_to_sample(subsumption=candidate_subsumption)\n scores = self.score(samples=samples)\n candidate_scores.append(np.average(scores))\n out_lines.append(','.join([str(i) for i in candidate_scores]))\n\n out_file = 'test_subsumption_scores.csv'\n with open(out_file, 'w') as f:\n for line in out_lines:\n f.write('%s\\n' % line)\n print('Predicted subsumption scores are saved to %s' % out_file)\n
"},{"location":"deeponto/align/logmap/","title":"LogMap","text":"Run LogMap matcher 4.0 in a jar
command.
Credit
See LogMap repository at: https://github.com/ernestojimenezruiz/logmap-matcher.
"},{"location":"deeponto/align/logmap/#deeponto.align.logmap.run_logmap_repair","title":"run_logmap_repair(src_onto_path, tgt_onto_path, mapping_file_path, output_path, max_jvm_memory='10g')
","text":"Run the repair module of LogMap with java -jar
.
src/deeponto/align/logmap/__init__.py
def run_logmap_repair(\n src_onto_path: str, tgt_onto_path: str, mapping_file_path: str, output_path: str, max_jvm_memory: str = \"10g\"\n):\n\"\"\"Run the repair module of LogMap with `java -jar`.\"\"\"\n\n # find logmap directory\n logmap_path = os.path.dirname(__file__)\n\n # obtain absolute paths\n src_onto_path = os.path.abspath(src_onto_path)\n tgt_onto_path = os.path.abspath(tgt_onto_path)\n mapping_file_path = os.path.abspath(mapping_file_path)\n output_path = os.path.abspath(output_path)\n\n # run jar command\n print(f\"Run the repair module of LogMap from {logmap_path}.\")\n repair_command = (\n f\"java -Xms500m -Xmx{max_jvm_memory} -DentityExpansionLimit=100000000 -jar {logmap_path}/logmap-matcher-4.0.jar DEBUGGER \"\n + f\"file:{src_onto_path} file:{tgt_onto_path} TXT {mapping_file_path}\"\n + f\" {output_path} false false\"\n )\n print(f\"The jar command is:\\n{repair_command}.\")\n run_jar(repair_command)\n
"},{"location":"deeponto/complete/ontolama/","title":"OntoLAMA","text":""},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.inference.run_inference","title":"run_inference(config, args)
","text":"Main entry for running the OpenPrompt script.
Source code insrc/deeponto/complete/ontolama/inference.py
def run_inference(config, args):\n\"\"\"Main entry for running the OpenPrompt script.\n \"\"\"\n global CUR_TEMPLATE, CUR_VERBALIZER\n # exit()\n # init logger, create log dir and set log level, etc.\n if args.resume and args.test:\n raise Exception(\"cannot use flag --resume and --test together\")\n if args.resume or args.test:\n config.logging.path = EXP_PATH = args.resume or args.test\n else:\n EXP_PATH = config_experiment_dir(config)\n init_logger(\n os.path.join(EXP_PATH, \"log.txt\"),\n config.logging.file_level,\n config.logging.console_level,\n )\n # save config to the logger directory\n save_config_to_yaml(config)\n\n # load dataset. The valid_dataset can be None\n train_dataset, valid_dataset, test_dataset, Processor = OntoLAMADataProcessor.load_inference_dataset(\n config, test=args.test is not None or config.learning_setting == \"zero_shot\"\n )\n\n # main\n if config.learning_setting == \"full\":\n res = trainer(\n EXP_PATH,\n config,\n Processor,\n resume=args.resume,\n test=args.test,\n train_dataset=train_dataset,\n valid_dataset=valid_dataset,\n test_dataset=test_dataset,\n )\n elif config.learning_setting == \"few_shot\":\n if config.few_shot.few_shot_sampling is None:\n raise ValueError(\"use few_shot setting but config.few_shot.few_shot_sampling is not specified\")\n seeds = config.sampling_from_train.seed\n res = 0\n for seed in seeds:\n if not args.test:\n sampler = FewShotSampler(\n num_examples_per_label=config.sampling_from_train.num_examples_per_label,\n also_sample_dev=config.sampling_from_train.also_sample_dev,\n num_examples_per_label_dev=config.sampling_from_train.num_examples_per_label_dev,\n )\n train_sampled_dataset, valid_sampled_dataset = sampler(\n train_dataset=train_dataset, valid_dataset=valid_dataset, seed=seed\n )\n result = trainer(\n os.path.join(EXP_PATH, f\"seed-{seed}\"),\n config,\n Processor,\n resume=args.resume,\n test=args.test,\n train_dataset=train_sampled_dataset,\n valid_dataset=valid_sampled_dataset,\n test_dataset=test_dataset,\n )\n else:\n result = trainer(\n os.path.join(EXP_PATH, f\"seed-{seed}\"),\n config,\n Processor,\n test=args.test,\n test_dataset=test_dataset,\n )\n res += result\n res /= len(seeds)\n elif config.learning_setting == \"zero_shot\":\n res = trainer(\n EXP_PATH,\n config,\n Processor,\n zero=True,\n train_dataset=train_dataset,\n valid_dataset=valid_dataset,\n test_dataset=test_dataset,\n )\n\n return config, CUR_TEMPLATE, CUR_VERBALIZER\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase","title":"SubsumptionSamplerBase(onto)
","text":"Base Class for Sampling Subsumption Pairs.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def __init__(self, onto: Ontology):\n self.onto = onto\n self.progress_manager = enlighten.get_manager()\n\n # for faster sampling\n self.concept_iris = list(self.onto.owl_classes.keys())\n self.object_property_iris = list(self.onto.owl_object_properties.keys())\n self.sibling_concept_groups = self.onto.sibling_class_groups\n self.sibling_auxiliary_dict = defaultdict(list)\n for i, sib_group in enumerate(self.sibling_concept_groups):\n for sib in sib_group:\n self.sibling_auxiliary_dict[sib].append(i)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_named_concept","title":"random_named_concept()
","text":"Randomly draw a named concept's IRI.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def random_named_concept(self) -> str:\n\"\"\"Randomly draw a named concept's IRI.\"\"\"\n return random.choice(self.concept_iris)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_object_property","title":"random_object_property()
","text":"Randomly draw a object property's IRI.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def random_object_property(self) -> str:\n\"\"\"Randomly draw a object property's IRI.\"\"\"\n return random.choice(self.object_property_iris)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.get_siblings","title":"get_siblings(concept_iri)
","text":"Get the sibling concepts of the given concept.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def get_siblings(self, concept_iri: str):\n\"\"\"Get the sibling concepts of the given concept.\"\"\"\n sibling_group = self.sibling_auxiliary_dict[concept_iri]\n sibling_group = [self.sibling_concept_groups[i] for i in sibling_group]\n sibling_group = list(itertools.chain.from_iterable(sibling_group))\n return sibling_group\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_sibling","title":"random_sibling(concept_iri)
","text":"Randomly draw a sibling concept for a given concept.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def random_sibling(self, concept_iri: str) -> str:\n\"\"\"Randomly draw a sibling concept for a given concept.\"\"\"\n sibling_group = self.get_siblings(concept_iri)\n if sibling_group:\n return random.choice(sibling_group)\n else:\n # not every concept has a sibling concept\n return None\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler","title":"AtomicSubsumptionSampler(onto)
","text":" Bases: SubsumptionSamplerBase
Sampler for constructing the Atomic Subsumption Inference (SI) dataset.
Positive samples come from the entailed subsumptions.
Soft negative samples come from the pairs of randomly selected concepts, subject to passing the assumed disjointness check.
Hard negative samples come from the pairs of randomly selected sibling concepts, subject to passing the assumed disjointness check.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def __init__(self, onto: Ontology):\n super().__init__(onto)\n\n # compute the sibling concept pairs for faster hard negative sampling\n self.sibling_pairs = []\n for sib_group in self.sibling_concept_groups:\n self.sibling_pairs += [(x, y) for x, y in itertools.product(sib_group, sib_group) if x != y]\n self.maximum_num_hard_negatives = len(self.sibling_pairs)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler.positive_sampling","title":"positive_sampling(num_samples=None)
","text":"Sample named concept pairs that are involved in a subsumption axiom.
An extracted pair \\((C, D)\\) indicates \\(\\mathcal{O} \\models C \\sqsubseteq D\\) where \\(\\mathcal{O}\\) is the input ontology.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def positive_sampling(self, num_samples: Optional[int] = None):\nr\"\"\"Sample named concept pairs that are involved in a subsumption axiom.\n\n An extracted pair $(C, D)$ indicates $\\mathcal{O} \\models C \\sqsubseteq D$ where\n $\\mathcal{O}$ is the input ontology.\n \"\"\"\n pbar = self.progress_manager.counter(desc=\"Sample Positive Subsumptions\", unit=\"pair\")\n positives = []\n for concept_iri in self.concept_iris:\n owl_concept = self.onto.owl_classes[concept_iri]\n for subsumer_iri in self.onto.reasoner.get_inferred_super_entities(owl_concept, direct=False):\n positives.append((concept_iri, subsumer_iri))\n pbar.update()\n positives = list(set(sorted(positives)))\n if num_samples:\n positives = random.sample(positives, num_samples)\n print(f\"Sample {len(positives)} unique positive subsumption pairs.\")\n return positives\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler.negative_sampling","title":"negative_sampling(negative_sample_type, num_samples, apply_assumed_disjointness_alternative=True)
","text":"Sample named concept pairs that are involved in a disjoiness (assumed) axiom, which then implies non-subsumption.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def negative_sampling(\n self,\n negative_sample_type: str,\n num_samples: int,\n apply_assumed_disjointness_alternative: bool = True,\n):\nr\"\"\"Sample named concept pairs that are involved in a disjoiness (assumed) axiom, which then\n implies non-subsumption.\n \"\"\"\n if negative_sample_type == \"soft\":\n draw_one = lambda: tuple(random.sample(self.concept_iris, k=2))\n elif negative_sample_type == \"hard\":\n draw_one = lambda: random.choice(self.sibling_pairs)\n else:\n raise RuntimeError(f\"{negative_sample_type} not supported.\")\n\n negatives = []\n max_iter = 2 * num_samples\n\n # which method to validate the negative sample\n valid_negative = self.onto.reasoner.check_assumed_disjoint\n if apply_assumed_disjointness_alternative:\n valid_negative = self.onto.reasoner.check_assumed_disjoint_alternative\n\n print(f\"Sample {negative_sample_type} negative subsumption pairs.\")\n # create two bars for process tracking\n added_bar = self.progress_manager.counter(total=num_samples, desc=\"Sample Negative Subsumptions\", unit=\"pair\")\n iter_bar = self.progress_manager.counter(total=max_iter, desc=\"#Iteration\", unit=\"it\")\n i = 0\n added = 0\n while added < num_samples and i < max_iter:\n sub_concept_iri, super_concept_iri = draw_one()\n sub_concept = self.onto.get_owl_object(sub_concept_iri)\n super_concept = self.onto.get_owl_object(super_concept_iri)\n # collect class iri if accepted\n if valid_negative(sub_concept, super_concept):\n neg = (sub_concept_iri, super_concept_iri)\n negatives.append(neg)\n added += 1\n added_bar.update(1)\n if added == num_samples:\n negatives = list(set(sorted(negatives)))\n added = len(negatives)\n added_bar.count = added\n i += 1\n iter_bar.update(1)\n negatives = list(set(sorted(negatives)))\n print(f\"Sample {len(negatives)} unique positive subsumption pairs.\")\n return negatives\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler","title":"ComplexSubsumptionSampler(onto)
","text":" Bases: SubsumptionSamplerBase
Sampler for constructing the Complex Subsumption Inference (SI) dataset.
To obtain complex concept expressions on both sides of the subsumption relationship (as a sub-concept or a super-concept), this sampler utilises the equivalence axioms in the form of \\(C \\equiv C_{comp}\\) where \\(C\\) is atomic and \\(C_{comp}\\) is complex.
An equivalence axiom like \\(C \\equiv C_{comp}\\) is deemed as an anchor axiom.
Positive samples are in the form of \\(C_{sub} \\sqsubseteq C_{comp}\\) or \\(C_{comp} \\sqsubseteq C_{super}\\) where \\(C_{sub}\\) is an entailed sub-concept of \\(C\\) and \\(C_{comp}\\), \\(C_{super}\\) is an entailed super-concept of \\(C\\) and \\(C_{comp}\\).
Negative samples are formed by replacing one of the named entities in the anchor axiom, the modified sub-concept and super-concept need to pass the assumed disjointness check to be accepted as a valid negative sample. Without loss of generality, suppose we choose \\(C \\sqsubseteq C_{comp}\\) and replace a named entity in \\(C_{comp}'\\) to form \\(C \\sqsubseteq C_{comp}'\\), then \\(C\\) and \\(C_{comp}'\\) is a valid negative only if they satisfy the assumed disjointness check.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def __init__(self, onto: Ontology):\n super().__init__(onto)\n self.anchor_axioms = self.onto.get_equivalence_axioms(\"Classes\")\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.positive_sampling_from_anchor","title":"positive_sampling_from_anchor(anchor_axiom)
","text":"Returns all positive subsumption pairs extracted from an anchor equivalence axiom.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def positive_sampling_from_anchor(self, anchor_axiom: OWLAxiom):\n\"\"\"Returns all positive subsumption pairs extracted from an anchor equivalence axiom.\"\"\"\n sub_axiom = list(anchor_axiom.asOWLSubClassOfAxioms())[0]\n atomic_concept, complex_concept = sub_axiom.getSubClass(), sub_axiom.getSuperClass()\n # determine which is the atomic concept\n if complex_concept.isClassExpressionLiteral():\n atomic_concept, complex_concept = complex_concept, atomic_concept\n\n # intialise the positive samples from the anchor equivalence axiom\n positives = list(anchor_axiom.asOWLSubClassOfAxioms())\n for super_concept_iri in self.onto.reasoner.get_inferred_super_entities(atomic_concept, direct=False):\n positives.append(\n self.onto.owl_data_factory.getOWLSubClassOfAxiom(\n complex_concept, self.onto.get_owl_object(super_concept_iri)\n )\n )\n for sub_concept_iri in self.onto.reasoner.get_inferred_sub_entities(atomic_concept, direct=False):\n positives.append(\n self.onto.owl_data_factory.getOWLSubClassOfAxiom(\n self.onto.get_owl_object(sub_concept_iri), complex_concept\n )\n )\n\n # TESTING\n # for p in positives:\n # assert self.onto.reasoner.owl_reasoner.isEntailed(p) \n\n return list(set(sorted(positives)))\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.positive_sampling","title":"positive_sampling(num_samples_per_anchor=10)
","text":"Sample positive subsumption axioms that involve one atomic and one complex concepts.
An extracted pair \\((C, D)\\) indicates \\(\\mathcal{O} \\models C \\sqsubseteq D\\) where \\(\\mathcal{O}\\) is the input ontology.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def positive_sampling(self, num_samples_per_anchor: Optional[int] = 10):\nr\"\"\"Sample positive subsumption axioms that involve one atomic and one complex concepts.\n\n An extracted pair $(C, D)$ indicates $\\mathcal{O} \\models C \\sqsubseteq D$ where\n $\\mathcal{O}$ is the input ontology.\n \"\"\"\n print(f\"Maximum number of positive samples for each anchor is set to {num_samples_per_anchor}.\")\n pbar = self.progress_manager.counter(desc=\"Sample Positive Subsumptions from\", unit=\"anchor axiom\")\n positives = dict()\n for anchor in self.anchor_axioms:\n positives_from_anchor = self.positive_sampling_from_anchor(anchor)\n if num_samples_per_anchor and num_samples_per_anchor < len(positives_from_anchor):\n positives_from_anchor = random.sample(positives_from_anchor, k = num_samples_per_anchor)\n positives[str(anchor)] = positives_from_anchor\n pbar.update()\n # positives = list(set(sorted(positives)))\n print(f\"Sample {sum([len(v) for v in positives.values()])} unique positive subsumption pairs.\")\n return positives\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.negative_sampling","title":"negative_sampling(num_samples_per_anchor=10)
","text":"Sample negative subsumption axioms that involve one atomic and one complex concepts.
An extracted pair \\((C, D)\\) indicates \\(C\\) and \\(D\\) pass the assumed disjointness check.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def negative_sampling(self, num_samples_per_anchor: Optional[int] = 10):\nr\"\"\"Sample negative subsumption axioms that involve one atomic and one complex concepts.\n\n An extracted pair $(C, D)$ indicates $C$ and $D$ pass the [assumed disjointness check][deeponto.onto.OntologyReasoner.check_assumed_disjoint].\n \"\"\"\n print(f\"Maximum number of negative samples for each anchor is set to {num_samples_per_anchor}.\")\n pbar = self.progress_manager.counter(desc=\"Sample Negative Subsumptions from\", unit=\"anchor axiom\")\n negatives = dict()\n for anchor in self.anchor_axioms:\n negatives_from_anchor = []\n i, max_iter = 0, num_samples_per_anchor + 2\n while i < max_iter and len(negatives_from_anchor) < num_samples_per_anchor:\n corrupted_anchor = self.random_corrupt(anchor)\n corrupted_sub_axiom = random.choice(list(corrupted_anchor.asOWLSubClassOfAxioms()))\n sub_concept, super_concept = corrupted_sub_axiom.getSubClass(), corrupted_sub_axiom.getSuperClass()\n if self.onto.reasoner.check_assumed_disjoint_alternative(sub_concept, super_concept):\n negatives_from_anchor.append(corrupted_sub_axiom)\n i += 1\n negatives[str(anchor)] = list(set(sorted(negatives_from_anchor)))\n pbar.update()\n # negatives = list(set(sorted(negatives)))\n print(f\"Sample {sum([len(v) for v in negatives.values()])} unique positive subsumption pairs.\")\n return negatives\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.random_corrupt","title":"random_corrupt(axiom)
","text":"Randomly change an IRI in the input axiom and return a new one.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def random_corrupt(self, axiom: OWLAxiom):\n\"\"\"Randomly change an IRI in the input axiom and return a new one.\n \"\"\"\n replaced_iri = random.choice(re.findall(IRI, str(axiom)))[1:-1]\n replaced_entity = self.onto.get_owl_object(replaced_iri)\n replacement_iri = None\n if self.onto.get_entity_type(replaced_entity) == \"Classes\":\n replacement_iri = self.random_named_concept()\n elif self.onto.get_entity_type(replaced_entity) == \"ObjectProperties\":\n replacement_iri = self.random_object_property()\n else:\n # NOTE: to extend to other types of entities in future\n raise RuntimeError(\"Unknown type of axiom.\")\n return self.onto.replace_entity(axiom, replaced_iri, replacement_iri)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor","title":"OntoLAMADataProcessor()
","text":" Bases: DataProcessor
Class for processing the OntoLAMA data points.
Source code insrc/deeponto/complete/ontolama/data_processor.py
def __init__(self):\n super().__init__()\n self.labels = [\"negative\", \"positive\"]\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.load_dataset","title":"load_dataset(task_name, split)
staticmethod
","text":"Load a specific OntoLAMA dataset from huggingface dataset hub.
Source code insrc/deeponto/complete/ontolama/data_processor.py
@staticmethod\ndef load_dataset(task_name: str, split: str):\n\"\"\"Load a specific OntoLAMA dataset from huggingface dataset hub.\"\"\"\n # TODO: remove use_auth_token after going to public\n return load_dataset(\"krr-oxford/OntoLAMA\", task_name, split=split, use_auth_token=True)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.get_examples","title":"get_examples(task_name, split)
","text":"Load a specific OntoLAMA dataset and transform the data points into input examples for prompt-based inference.
Source code insrc/deeponto/complete/ontolama/data_processor.py
def get_examples(self, task_name, split):\n\"\"\"Load a specific OntoLAMA dataset and transform the data points into\n input examples for prompt-based inference.\n \"\"\"\n\n dataset = self.load_dataset(task_name, split)\n\n premise_name = \"v_sub_concept\"\n hypothesis_name = \"v_super_concept\"\n # different data fields for the bimnli dataset\n if \"bimnli\" in task_name:\n premise_name = \"premise\"\n hypothesis_name = \"hypothesis\"\n\n prompt_samples = []\n for samp in dataset:\n inp = InputExample(text_a=samp[premise_name], text_b=samp[hypothesis_name], label=samp[\"label\"])\n prompt_samples.append(inp)\n\n return prompt_samples\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.load_inference_dataset","title":"load_inference_dataset(config, return_class=True, test=False)
classmethod
","text":"A plm loader using a global config. It will load the train, valid, and test set (if exists) simulatenously.
Parameters:
Name Type Description Defaultconfig
CfgNode
The global config from the CfgNode.
requiredreturn_class
bool
Whether return the data processor class for future usage.
True
Returns:
Type DescriptionOptional[List[InputExample]]
The train dataset.
Optional[List[InputExample]]
The valid dataset.
Optional[List[InputExample]]
The test dataset.
Optional[OntoLAMADataProcessor]
The data processor object.
Source code insrc/deeponto/complete/ontolama/data_processor.py
@classmethod\ndef load_inference_dataset(cls, config: CfgNode, return_class=True, test=False):\nr\"\"\"A plm loader using a global config.\n It will load the train, valid, and test set (if exists) simulatenously.\n\n Args:\n config (CfgNode): The global config from the CfgNode.\n return_class (bool): Whether return the data processor class for future usage.\n\n Returns:\n (Optional[List[InputExample]]): The train dataset.\n (Optional[List[InputExample]]): The valid dataset.\n (Optional[List[InputExample]]): The test dataset.\n (Optional[OntoLAMADataProcessor]): The data processor object.\n \"\"\"\n dataset_config = config.dataset\n\n processor = cls()\n\n train_dataset = None\n valid_dataset = None\n if not test:\n try:\n train_dataset = processor.get_examples(dataset_config.task_name, \"train\")\n except FileNotFoundError:\n logger.warning(f\"Has no training dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n try:\n valid_dataset = processor.get_examples(dataset_config.task_name, \"validation\")\n except FileNotFoundError:\n logger.warning(f\"Has no validation dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n\n test_dataset = None\n try:\n test_dataset = processor.get_examples(dataset_config.task_name, \"test\")\n except FileNotFoundError:\n logger.warning(f\"Has no test dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n # checking whether donwloaded.\n if (train_dataset is None) and (valid_dataset is None) and (test_dataset is None):\n logger.error(\n \"Dataset is empty. Either there is no download or the path is wrong. \"\n + \"If not downloaded, please `cd datasets/` and `bash download_xxx.sh`\"\n )\n exit()\n if return_class:\n return train_dataset, valid_dataset, test_dataset, processor\n else:\n return train_dataset, valid_dataset, test_dataset\n
"},{"location":"deeponto/complete/bertsubs/","title":"BERTSubs (Intra)","text":""},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline","title":"BERTSubsIntraPipeline(onto, config)
","text":"Class for the intra-ontology subsumption prediction setting of BERTSubs.
Attributes:
Name Type Descriptiononto
Ontology
The target ontology.
config
CfgNode
The configuration for BERTSubs.
sampler
SubsumptionSample
The subsumption sampler for BERTSubs.
Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
def __init__(self, onto: Ontology, config: CfgNode):\n self.onto = onto\n self.config = config\n self.sampler = SubsumptionSampler(onto=onto, config=config)\n start_time = datetime.datetime.now()\n\n n = 0\n for k in self.sampler.named_classes:\n n += len(self.sampler.iri_label[k])\n print(\n \"%d named classes, %.1f labels per class\"\n % (len(self.sampler.named_classes), n / len(self.sampler.named_classes))\n )\n\n read_subsumptions = lambda file_name: [line.strip().split(\",\") for line in open(file_name).readlines()]\n test_subsumptions = (\n None\n if config.test_subsumption_file is None or config.test_subsumption_file == \"None\"\n else read_subsumptions(config.test_subsumption_file)\n )\n\n # The train/valid subsumptions are not given. They will be extracted from the given ontology:\n if config.train_subsumption_file is None or config.train_subsumption_file == \"None\":\n subsumptions0 = self.extract_subsumptions_from_ontology(\n onto=onto, subsumption_type=config.subsumption_type\n )\n random.shuffle(subsumptions0)\n valid_size = int(len(subsumptions0) * config.valid.valid_ratio)\n train_subsumptions0, valid_subsumptions0 = subsumptions0[valid_size:], subsumptions0[0:valid_size]\n train_subsumptions, valid_subsumptions = [], []\n if config.subsumption_type == \"named_class\":\n for subs in train_subsumptions0:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n train_subsumptions.append([str(c1.getIRI()), str(c2.getIRI())])\n\n size_sum = 0\n for subs in valid_subsumptions0:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n neg_candidates = BERTSubsIntraPipeline.get_test_neg_candidates_named_class(\n subclass=c1, gt=c2, max_neg_size=config.valid.max_neg_size, onto=onto\n )\n size = len(neg_candidates)\n size_sum += size\n if size > 0:\n item = [str(c1.getIRI()), str(c2.getIRI())] + [str(c.getIRI()) for c in neg_candidates]\n valid_subsumptions.append(item)\n print(\"\\t average neg candidate size in validation: %.2f\" % (size_sum / len(valid_subsumptions)))\n\n elif config.subsumption_type == \"restriction\":\n for subs in train_subsumptions0:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n train_subsumptions.append([str(c1.getIRI()), str(c2)])\n\n restrictions = BERTSubsIntraPipeline.extract_restrictions_from_ontology(onto=onto)\n print(\"restrictions: %d\" % len(restrictions))\n size_sum = 0\n for subs in valid_subsumptions0:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n c2_neg = BERTSubsIntraPipeline.get_test_neg_candidates_restriction(\n subcls=c1, max_neg_size=config.valid.max_neg_size, restrictions=restrictions, onto=onto\n )\n size_sum += len(c2_neg)\n item = [str(c1.getIRI()), str(c2)] + [str(r) for r in c2_neg]\n valid_subsumptions.append(item)\n print(\"valid candidate negative avg. size: %.1f\" % (size_sum / len(valid_subsumptions)))\n else:\n warnings.warn(\"Unknown subsumption type %s\" % config.subsumption_type)\n sys.exit(0)\n\n # The train/valid subsumptions are given:\n else:\n train_subsumptions = read_subsumptions(config.train_subsumption_file)\n valid_subsumptions = read_subsumptions(config.valid_subsumption_file)\n\n print(\"Positive train/valid subsumptions: %d/%d\" % (len(train_subsumptions), len(valid_subsumptions)))\n tr = self.sampler.generate_samples(subsumptions=train_subsumptions)\n va = self.sampler.generate_samples(subsumptions=valid_subsumptions, duplicate=False)\n\n end_time = datetime.datetime.now()\n print(\"data pre-processing costs %.1f minutes\" % ((end_time - start_time).seconds / 60))\n\n start_time = datetime.datetime.now()\n torch.cuda.empty_cache()\n bert_trainer = BERTSubsumptionClassifierTrainer(\n config.fine_tune.pretrained,\n train_data=tr,\n val_data=va,\n max_length=config.prompt.max_length,\n early_stop=config.fine_tune.early_stop,\n )\n\n epoch_steps = len(bert_trainer.tra) // config.fine_tune.batch_size # total steps of an epoch\n logging_steps = int(epoch_steps * 0.02) if int(epoch_steps * 0.02) > 0 else 5\n eval_steps = 5 * logging_steps\n training_args = TrainingArguments(\n output_dir=config.fine_tune.output_dir,\n num_train_epochs=config.fine_tune.num_epochs,\n per_device_train_batch_size=config.fine_tune.batch_size,\n per_device_eval_batch_size=config.fine_tune.batch_size,\n warmup_ratio=config.fine_tune.warm_up_ratio,\n weight_decay=0.01,\n logging_steps=logging_steps,\n logging_dir=f\"{config.fine_tune.output_dir}/tb\",\n eval_steps=eval_steps,\n evaluation_strategy=\"steps\",\n do_train=True,\n do_eval=True,\n save_steps=eval_steps,\n load_best_model_at_end=True,\n save_total_limit=1,\n metric_for_best_model=\"accuracy\",\n greater_is_better=True,\n )\n if config.fine_tune.do_fine_tune and (\n config.prompt.prompt_type == \"traversal\"\n or (config.prompt.prompt_type == \"path\" and config.prompt.use_sub_special_token)\n ):\n bert_trainer.add_special_tokens([\"<SUB>\"])\n\n bert_trainer.train(train_args=training_args, do_fine_tune=config.fine_tune.do_fine_tune)\n if config.fine_tune.do_fine_tune:\n bert_trainer.trainer.save_model(\n output_dir=os.path.join(config.fine_tune.output_dir, \"fine-tuned-checkpoint\")\n )\n print(\"fine-tuning done, fine-tuned model saved\")\n else:\n print(\"pretrained or fine-tuned model loaded.\")\n end_time = datetime.datetime.now()\n print(\"Fine-tuning costs %.1f minutes\" % ((end_time - start_time).seconds / 60))\n\n bert_trainer.model.eval()\n self.device = torch.device(f\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n bert_trainer.model.to(self.device)\n self.tokenize = lambda x: bert_trainer.tokenizer(\n x, max_length=config.prompt.max_length, truncation=True, padding=True, return_tensors=\"pt\"\n )\n softmax = torch.nn.Softmax(dim=1)\n self.classifier = lambda x: softmax(bert_trainer.model(**x).logits)[:, 1]\n\n self.evaluate(target_subsumptions=valid_subsumptions, test_type=\"valid\")\n if test_subsumptions is not None:\n if config.test_type == \"evaluation\":\n self.evaluate(target_subsumptions=test_subsumptions, test_type=\"test\")\n elif config.test_type == \"prediction\":\n self.predict(target_subsumptions=test_subsumptions)\n else:\n warnings.warn(\"Unknown test_type: %s\" % config.test_type)\n print(\"\\n ------------------------- done! ---------------------------\\n\\n\\n\")\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.score","title":"score(samples)
","text":"The scoring function based on the fine-tuned BERT classifier.
Parameters:
Name Type Description Defaultsamples
List[Tuple]
A list of input sentence pairs to be scored.
required Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
def score(self, samples: List[List]):\nr\"\"\"The scoring function based on the fine-tuned BERT classifier.\n\n Args:\n samples (List[Tuple]): A list of input sentence pairs to be scored.\n \"\"\"\n sample_size = len(samples)\n scores = np.zeros(sample_size)\n batch_num = math.ceil(sample_size / self.config.evaluation.batch_size)\n for i in range(batch_num):\n j = (\n (i + 1) * self.config.evaluation.batch_size\n if (i + 1) * self.config.evaluation.batch_size <= sample_size\n else sample_size\n )\n inputs = self.tokenize(samples[i * self.config.evaluation.batch_size : j])\n inputs.to(self.device)\n with torch.no_grad():\n batch_scores = self.classifier(inputs)\n scores[i * self.config.evaluation.batch_size : j] = batch_scores.cpu().numpy()\n return scores\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.evaluate","title":"evaluate(target_subsumptions, test_type='test')
","text":"Test and calculate the metrics for a given list of subsumption pairs.
Parameters:
Name Type Description Defaulttarget_subsumptions
List[Tuple]
A list of subsumption pairs.
requiredtest_type
str
test
for testing or valid
for validation.
'test'
Source code in src/deeponto/complete/bertsubs/pipeline_intra.py
def evaluate(self, target_subsumptions: List[List], test_type: str = \"test\"):\nr\"\"\"Test and calculate the metrics for a given list of subsumption pairs.\n\n Args:\n target_subsumptions (List[Tuple]): A list of subsumption pairs.\n test_type (str): `test` for testing or `valid` for validation.\n \"\"\"\n\n MRR_sum, hits1_sum, hits5_sum, hits10_sum = 0, 0, 0, 0\n MRR, Hits1, Hits5, Hits10 = 0, 0, 0, 0\n size_sum, size_n = 0, 0\n for k0, test in enumerate(target_subsumptions):\n subcls, gt = test[0], test[1]\n candidates = test[1:]\n\n candidate_subsumptions = [[subcls, c] for c in candidates]\n candidate_scores = np.zeros(len(candidate_subsumptions))\n for k1, candidate_subsumption in enumerate(candidate_subsumptions):\n samples = self.sampler.subsumptions_to_samples(subsumptions=[candidate_subsumption], sample_label=None)\n size_sum += len(samples)\n size_n += 1\n scores = self.score(samples=samples)\n candidate_scores[k1] = np.average(scores)\n\n sorted_indexes = np.argsort(candidate_scores)[::-1]\n sorted_classes = [candidates[i] for i in sorted_indexes]\n\n rank = sorted_classes.index(gt) + 1\n MRR_sum += 1.0 / rank\n hits1_sum += 1 if gt in sorted_classes[:1] else 0\n hits5_sum += 1 if gt in sorted_classes[:5] else 0\n hits10_sum += 1 if gt in sorted_classes[:10] else 0\n num = k0 + 1\n MRR, Hits1, Hits5, Hits10 = MRR_sum / num, hits1_sum / num, hits5_sum / num, hits10_sum / num\n if num % 500 == 0:\n print(\n \"\\n%d tested, MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n\"\n % (num, MRR, Hits1, Hits5, Hits10)\n )\n print(\n \"\\n[%s], MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n\" % (test_type, MRR, Hits1, Hits5, Hits10)\n )\n print(\"%.2f samples per testing subsumption\" % (size_sum / size_n))\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.predict","title":"predict(target_subsumptions)
","text":"Predict a score for each given subsumption in the list.
The scores will be saved in test_subsumption_scores.csv
.
Parameters:
Name Type Description Defaulttarget_subsumptions
List[List]
Each item is a list where the first element is a fixed ontology class \\(C\\), and the remaining elements are potential (candidate) super-classes of \\(C\\).
required Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
def predict(self, target_subsumptions: List[List]):\nr\"\"\"Predict a score for each given subsumption in the list.\n\n The scores will be saved in `test_subsumption_scores.csv`.\n\n Args:\n target_subsumptions (List[List]): Each item is a list where the first element is a fixed ontology class $C$,\n and the remaining elements are potential (candidate) super-classes of $C$.\n \"\"\"\n out_lines = []\n for test in target_subsumptions:\n subcls, candidates = test[0], test[1:]\n candidate_subsumptions = [[subcls, c] for c in candidates]\n candidate_scores = []\n\n for candidate_subsumption in candidate_subsumptions:\n samples = self.sampler.subsumptions_to_samples(subsumptions=[candidate_subsumption], sample_label=None)\n scores = self.score(samples=samples)\n candidate_scores.append(np.average(scores))\n\n out_lines.append(\",\".join([str(i) for i in candidate_scores]))\n\n out_file = \"test_subsumption_scores.csv\"\n with open(out_file, \"w\") as f:\n for line in out_lines:\n f.write(\"%s\\n\" % line)\n print(\"Predicted subsumption scores are saved to %s\" % out_file)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.extract_subsumptions_from_ontology","title":"extract_subsumptions_from_ontology(onto, subsumption_type)
staticmethod
","text":"Extract target subsumptions from a given ontology.
Parameters:
Name Type Description Defaultonto
Ontology
The target ontology.
requiredsubsumption_type
str
the type of subsumptions, options are \"named_class\"
or \"restriction\"
.
src/deeponto/complete/bertsubs/pipeline_intra.py
@staticmethod\ndef extract_subsumptions_from_ontology(onto: Ontology, subsumption_type: str):\nr\"\"\"Extract target subsumptions from a given ontology.\n\n Args:\n onto (Ontology): The target ontology.\n subsumption_type (str): the type of subsumptions, options are `\"named_class\"` or `\"restriction\"`.\n\n \"\"\"\n all_subsumptions = onto.get_subsumption_axioms(entity_type=\"Classes\")\n subsumptions = []\n if subsumption_type == \"restriction\":\n for subs in all_subsumptions:\n if (\n not onto.check_deprecated(owl_object=subs.getSubClass())\n and not onto.check_named_entity(owl_object=subs.getSuperClass())\n and SubsumptionSampler.is_basic_existential_restriction(\n complex_class_str=str(subs.getSuperClass())\n )\n ):\n subsumptions.append(subs)\n elif subsumption_type == \"named_class\":\n for subs in all_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n if (\n onto.check_named_entity(owl_object=c1)\n and not onto.check_deprecated(owl_object=c1)\n and onto.check_named_entity(owl_object=c2)\n and not onto.check_deprecated(owl_object=c2)\n ):\n subsumptions.append(subs)\n else:\n warnings.warn(\"\\nUnknown subsumption type: %s\\n\" % subsumption_type)\n return subsumptions\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.extract_restrictions_from_ontology","title":"extract_restrictions_from_ontology(onto)
staticmethod
","text":"Extract basic existential restriction from an ontology.
Parameters:
Name Type Description Defaultonto
Ontology
The target ontology.
requiredReturns:
Name Type Descriptionrestrictions
List
a list of existential restrictions.
Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
@staticmethod\ndef extract_restrictions_from_ontology(onto: Ontology):\nr\"\"\"Extract basic existential restriction from an ontology.\n\n Args:\n onto (Ontology): The target ontology.\n Returns:\n restrictions (List): a list of existential restrictions.\n \"\"\"\n restrictions = []\n for complexC in onto.get_asserted_complex_classes():\n if SubsumptionSampler.is_basic_existential_restriction(complex_class_str=str(complexC)):\n restrictions.append(complexC)\n return restrictions\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.get_test_neg_candidates_restriction","title":"get_test_neg_candidates_restriction(subcls, max_neg_size, restrictions, onto)
staticmethod
","text":"Get a list of negative candidate class restrictions for testing.
Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
@staticmethod\ndef get_test_neg_candidates_restriction(subcls, max_neg_size, restrictions, onto):\n\"\"\"Get a list of negative candidate class restrictions for testing.\"\"\"\n neg_restrictions = list()\n n = max_neg_size * 2 if max_neg_size * 2 <= len(restrictions) else len(restrictions)\n for r in random.sample(restrictions, n):\n if not onto.reasoner.check_subsumption(sub_entity=subcls, super_entity=r):\n neg_restrictions.append(r)\n if len(neg_restrictions) >= max_neg_size:\n break\n return neg_restrictions\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.get_test_neg_candidates_named_class","title":"get_test_neg_candidates_named_class(subclass, gt, max_neg_size, onto, max_depth=3, max_width=8)
staticmethod
","text":"Get a list of negative candidate named classes for testing.
Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
@staticmethod\ndef get_test_neg_candidates_named_class(subclass, gt, max_neg_size, onto, max_depth=3, max_width=8):\n\"\"\"Get a list of negative candidate named classes for testing.\"\"\"\n all_nebs, seeds = set(), [gt]\n depth = 1\n while depth <= max_depth:\n new_seeds = set()\n for seed in seeds:\n nebs = set()\n for nc_iri in onto.reasoner.get_inferred_sub_entities(\n seed, direct=True\n ) + onto.reasoner.get_inferred_super_entities(seed, direct=True):\n nc = onto.owl_classes[nc_iri]\n if onto.check_named_entity(owl_object=nc) and not onto.check_deprecated(owl_object=nc):\n nebs.add(nc)\n new_seeds = new_seeds.union(nebs)\n all_nebs = all_nebs.union(nebs)\n depth += 1\n seeds = random.sample(new_seeds, max_width) if len(new_seeds) > max_width else new_seeds\n all_nebs = (\n all_nebs\n - {onto.owl_classes[iri] for iri in onto.reasoner.get_inferred_super_entities(subclass, direct=False)}\n - {subclass}\n )\n if len(all_nebs) > max_neg_size:\n return random.sample(all_nebs, max_neg_size)\n else:\n return list(all_nebs)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler","title":"SubsumptionSampler(onto, config)
","text":"Class for sampling functions for training the subsumption prediction model.
Attributes:
Name Type Descriptiononto
Ontology
The target ontology.
config
CfgNode
The loaded configuration.
named_classes
Set[str]
IRIs of named classes that are not deprecated.
iri_label
Dict[str, List]
key -- class iris from named_classes
, value -- a list of labels.
restrictionObjects
Set[OWLClassExpression]
Basic existential restrictions that appear in the ontology.
restrictions
set[str]
Strings of basic existential restrictions corresponding to restrictionObjects
.
restriction_label
Dict[str
List]): key -- existential restriction string, value -- a list of existential restriction labels.
verb
OntologyVerbaliser
object for verbalisation.
Source code insrc/deeponto/complete/bertsubs/text_semantics.py
def __init__(self, onto: Ontology, config: CfgNode):\n self.onto = onto\n self.config = config\n self.named_classes = self.extract_named_classes(onto=onto)\n self.iri_label = dict()\n for iri in self.named_classes:\n self.iri_label[iri] = []\n for p in config.label_property:\n strings = onto.get_annotations(\n owl_object=onto.get_owl_object(iri),\n annotation_property_iri=p,\n annotation_language_tag=None,\n apply_lowercasing=False,\n normalise_identifiers=False,\n )\n for s in strings:\n if s not in self.iri_label[iri]:\n self.iri_label[iri].append(s)\n\n self.restrictionObjects = set()\n self.restrictions = set()\n self.restriction_label = dict()\n self.verb = OntologyVerbaliser(onto=onto)\n for complexC in onto.get_asserted_complex_classes():\n s = str(complexC)\n self.restriction_label[s] = []\n if self.is_basic_existential_restriction(complex_class_str=s):\n self.restrictionObjects.add(complexC)\n self.restrictions.add(s)\n self.restriction_label[s].append(self.verb.verbalise_class_expression(complexC).verbal)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.is_basic_existential_restriction","title":"is_basic_existential_restriction(complex_class_str)
staticmethod
","text":"Determine if a complex class expression is a basic existential restriction.
Source code insrc/deeponto/complete/bertsubs/text_semantics.py
@staticmethod\ndef is_basic_existential_restriction(complex_class_str: str):\n\"\"\"Determine if a complex class expression is a basic existential restriction.\"\"\"\n IRI = \"<https?:\\\\/\\\\/(?:www\\\\.)?[-a-zA-Z0-9@:%._\\\\+~#=]{1,256}\\\\.[a-zA-Z0-9()]{1,6}\\\\b(?:[-a-zA-Z0-9()@:%_\\\\+.~#?&\\\\/=]*)>\"\n p = rf\"ObjectSomeValuesFrom\\({IRI}\\s{IRI}\\)\"\n if re.match(p, complex_class_str):\n return True\n else:\n return False\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.generate_samples","title":"generate_samples(subsumptions, duplicate=True)
","text":"Generate text samples from subsumptions.
Parameters:
Name Type Description Defaultsubsumptions
List[List]
A list of subsumptions, each of which of is a two-component list (sub_class_iri, super_class_iri_or_str)
.
duplicate
bool
True
-- duplicate the positive and negative samples, False
-- do not duplicate.
True
Returns:
Type DescriptionList[List]
A list of samples, each element is a triple in the form of (sub_class_string, super_class_string, label_index)
.
src/deeponto/complete/bertsubs/text_semantics.py
def generate_samples(self, subsumptions: List[List], duplicate: bool = True):\nr\"\"\"Generate text samples from subsumptions.\n\n Args:\n subsumptions (List[List]): A list of subsumptions, each of which of is a two-component list `(sub_class_iri, super_class_iri_or_str)`.\n duplicate (bool): `True` -- duplicate the positive and negative samples, `False` -- do not duplicate.\n\n Returns:\n (List[List]): A list of samples, each element is a triple\n in the form of `(sub_class_string, super_class_string, label_index)`.\n \"\"\"\n if duplicate:\n pos_dup, neg_dup = self.config.fine_tune.train_pos_dup, self.config.fine_tune.train_neg_dup\n else:\n pos_dup, neg_dup = 1, 1\n neg_subsumptions = list()\n for subs in subsumptions:\n c1 = subs[0]\n for _ in range(neg_dup):\n neg_c = self.get_negative_sample(subclass_iri=c1, subsumption_type=self.config.subsumption_type)\n if neg_c is not None:\n neg_subsumptions.append([c1, neg_c])\n pos_samples = self.subsumptions_to_samples(subsumptions=subsumptions, sample_label=1)\n pos_samples = pos_dup * pos_samples\n neg_samples = self.subsumptions_to_samples(subsumptions=neg_subsumptions, sample_label=0)\n if len(neg_samples) < len(pos_samples):\n neg_samples = neg_samples + [\n random.choice(neg_samples) for _ in range(len(pos_samples) - len(neg_samples))\n ]\n if len(neg_samples) > len(pos_samples):\n pos_samples = pos_samples + [\n random.choice(pos_samples) for _ in range(len(neg_samples) - len(pos_samples))\n ]\n print(\"pos_samples: %d, neg_samples: %d\" % (len(pos_samples), len(neg_samples)))\n all_samples = [s for s in pos_samples + neg_samples if s[0] != \"\" and s[1] != \"\"]\n random.shuffle(all_samples)\n return all_samples\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.subsumptions_to_samples","title":"subsumptions_to_samples(subsumptions, sample_label)
","text":"Transform subsumptions into samples of strings.
Parameters:
Name Type Description Defaultsubsumptions
List[List]
The given subsumptions.
requiredsample_label
Union[int, None]
1
(positive), 0
(negative), None
(no label).
Returns:
Type DescriptionList[List]
A list of samples, each element is a triple in the form of (sub_class_string, super_class_string, label_index)
.
src/deeponto/complete/bertsubs/text_semantics.py
def subsumptions_to_samples(self, subsumptions: List[List], sample_label: Union[int, None]):\nr\"\"\"Transform subsumptions into samples of strings.\n\n Args:\n subsumptions (List[List]): The given subsumptions.\n sample_label (Union[int, None]): `1` (positive), `0` (negative), `None` (no label).\n\n Returns:\n (List[List]): A list of samples, each element is a triple\n in the form of `(sub_class_string, super_class_string, label_index)`.\n\n \"\"\"\n local_samples = list()\n for subs in subsumptions:\n subcls, supcls = subs[0], subs[1]\n substrs = self.iri_label[subcls] if subcls in self.iri_label and len(self.iri_label[subcls]) > 0 else [\"\"]\n\n if self.config.subsumption_type == \"named_class\":\n supstrs = self.iri_label[supcls] if supcls in self.iri_label and len(self.iri_label[supcls]) else [\"\"]\n else:\n if supcls in self.restriction_label and len(self.restriction_label[supcls]) > 0:\n supstrs = self.restriction_label[supcls]\n else:\n supstrs = [self.verb.verbalise_class_expression(supcls).verbal]\n\n if self.config.use_one_label:\n substrs, supstrs = substrs[0:1], supstrs[0:1]\n\n if self.config.prompt.prompt_type == \"isolated\":\n for substr in substrs:\n for supstr in supstrs:\n local_samples.append([substr, supstr])\n\n elif self.config.prompt.prompt_type == \"traversal\":\n subs_list_strs = set()\n for _ in range(self.config.prompt.context_dup):\n context_sub, no_duplicate = self.traversal_subsumptions(\n cls=subcls,\n hop=self.config.prompt.prompt_hop,\n direction=\"subclass\",\n max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n )\n subs_list = [self.named_subsumption_to_str(subsum) for subsum in context_sub]\n subs_list_str = \" <SEP> \".join(subs_list)\n subs_list_strs.add(subs_list_str)\n if no_duplicate:\n break\n\n if self.config.subsumption_type == \"named_class\":\n sups_list_strs = set()\n for _ in range(self.config.prompt.context_dup):\n context_sup, no_duplicate = self.traversal_subsumptions(\n cls=supcls,\n hop=self.config.prompt.prompt_hop,\n direction=\"supclass\",\n max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n )\n sups_list = [self.named_subsumption_to_str(subsum) for subsum in context_sup]\n sups_list_str = \" <SEP> \".join(sups_list)\n sups_list_strs.add(sups_list_str)\n if no_duplicate:\n break\n else:\n sups_list_strs = set(supstrs)\n\n for subs_list_str in subs_list_strs:\n for substr in substrs:\n s1 = substr + \" <SEP> \" + subs_list_str\n for sups_list_str in sups_list_strs:\n for supstr in supstrs:\n s2 = supstr + \" <SEP> \" + sups_list_str\n local_samples.append([s1, s2])\n\n elif self.config.prompt.prompt_type == \"path\":\n sep_token = \"<SUB>\" if self.config.prompt.use_sub_special_token else \"<SEP>\"\n\n s1_set = set()\n for _ in range(self.config.prompt.context_dup):\n context_sub, no_duplicate = self.path_subsumptions(\n cls=subcls, hop=self.config.prompt.prompt_hop, direction=\"subclass\"\n )\n if len(context_sub) > 0:\n s1 = \"\"\n for i in range(len(context_sub)):\n subsum = context_sub[len(context_sub) - i - 1]\n subc = subsum[0]\n s1 += \"%s %s \" % (\n self.iri_label[subc][0]\n if subc in self.iri_label and len(self.iri_label[subc]) > 0\n else \"\",\n sep_token,\n )\n for substr in substrs:\n s1_set.add(s1 + substr)\n else:\n for substr in substrs:\n s1_set.add(\"%s %s\" % (sep_token, substr))\n\n if no_duplicate:\n break\n\n if self.config.subsumption_type == \"named_class\":\n s2_set = set()\n for _ in range(self.config.prompt.context_dup):\n context_sup, no_duplicate = self.path_subsumptions(\n cls=supcls, hop=self.config.prompt.prompt_hop, direction=\"supclass\"\n )\n if len(context_sup) > 0:\n s2 = \"\"\n for subsum in context_sup:\n supc = subsum[1]\n s2 += \" %s %s\" % (\n sep_token,\n self.iri_label[supc][0]\n if supc in self.iri_label and len(self.iri_label[supc]) > 0\n else \"\",\n )\n for supstr in supstrs:\n s2_set.add(supstr + s2)\n else:\n for supstr in supstrs:\n s2_set.add(\"%s %s\" % (supstr, sep_token))\n\n if no_duplicate:\n break\n else:\n s2_set = set(supstrs)\n\n for s1 in s1_set:\n for s2 in s2_set:\n local_samples.append([s1, s2])\n\n else:\n print(f\"unknown context type {self.config.prompt.prompt_type}\")\n sys.exit(0)\n\n if sample_label is not None:\n for i in range(len(local_samples)):\n local_samples[i].append(sample_label)\n\n return local_samples\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.get_negative_sample","title":"get_negative_sample(subclass_iri, subsumption_type='named_class')
","text":"Given a named subclass, get a negative class for a negative subsumption.
Parameters:
Name Type Description Defaultsubclass_iri
str
IRI of a given sub-class.
requiredsubsumption_type
str
named_class
or restriction
.
'named_class'
Source code in src/deeponto/complete/bertsubs/text_semantics.py
def get_negative_sample(self, subclass_iri: str, subsumption_type: str = \"named_class\"):\nr\"\"\"Given a named subclass, get a negative class for a negative subsumption.\n\n Args:\n subclass_iri (str): IRI of a given sub-class.\n subsumption_type (str): `named_class` or `restriction`.\n \"\"\"\n subclass = self.onto.get_owl_object(iri=subclass_iri)\n if subsumption_type == \"named_class\":\n if self.config.no_reasoning:\n parents = self.onto.get_asserted_parents(owl_object=subclass, named_only=True)\n ancestors = set([str(item.getIRI()) for item in parents])\n else:\n ancestors = set(self.onto.reasoner.get_inferred_super_entities(subclass, direct=False))\n neg_c = random.sample(self.named_classes - ancestors, 1)[0]\n return neg_c\n else:\n for neg_c in random.sample(self.restrictionObjects, 5):\n if self.config.no_reasoning:\n return str(neg_c)\n else:\n if not self.onto.reasoner.check_subsumption(sub_entity=subclass, super_entity=neg_c):\n return str(neg_c)\n return None\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.named_subsumption_to_str","title":"named_subsumption_to_str(subsum)
","text":"Transform a named subsumption into string with <SUB>
and classes' labels.
Parameters:
Name Type Description Defaultsubsum
List[Tuple]
A list of subsumption pairs in the form of (sub_class_iri, super_class_iri)
.
src/deeponto/complete/bertsubs/text_semantics.py
def named_subsumption_to_str(self, subsum: List):\nr\"\"\"Transform a named subsumption into string with `<SUB>` and classes' labels.\n\n Args:\n subsum (List[Tuple]): A list of subsumption pairs in the form of `(sub_class_iri, super_class_iri)`.\n \"\"\"\n subc, supc = subsum[0], subsum[1]\n subs = self.iri_label[subc][0] if subc in self.iri_label and len(self.iri_label[subc]) > 0 else \"\"\n sups = self.iri_label[supc][0] if supc in self.iri_label and len(self.iri_label[supc]) > 0 else \"\"\n return \"%s <SUB> %s\" % (subs, sups)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.subclass_to_strings","title":"subclass_to_strings(subcls)
","text":"Transform a sub-class into strings (with the path or traversal context template).
Parameters:
Name Type Description Defaultsubcls
str
IRI of the sub-class.
required Source code insrc/deeponto/complete/bertsubs/text_semantics.py
def subclass_to_strings(self, subcls):\nr\"\"\"Transform a sub-class into strings (with the path or traversal context template).\n\n Args:\n subcls (str): IRI of the sub-class.\n \"\"\"\n substrs = self.iri_label[subcls] if subcls in self.iri_label and len(self.iri_label[subcls]) > 0 else [\"\"]\n\n if self.config.use_one_label:\n substrs = substrs[0:1]\n\n if self.config.prompt.prompt_type == \"isolated\":\n return substrs\n\n elif self.config.prompt.prompt_type == \"traversal\":\n subs_list_strs = set()\n for _ in range(self.config.prompt.context_dup):\n context_sub, no_duplicate = self.traversal_subsumptions(\n cls=subcls,\n hop=self.config.prompt.prompt_hop,\n direction=\"subclass\",\n max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n )\n subs_list = [self.named_subsumption_to_str(subsum) for subsum in context_sub]\n subs_list_str = \" <SEP> \".join(subs_list)\n subs_list_strs.add(subs_list_str)\n if no_duplicate:\n break\n\n strs = list()\n for subs_list_str in subs_list_strs:\n for substr in substrs:\n s1 = substr + \" <SEP> \" + subs_list_str\n strs.append(s1)\n return strs\n\n elif self.config.prompt.prompt_type == \"path\":\n sep_token = \"<SUB>\" if self.config.prompt.use_sub_special_token else \"<SEP>\"\n\n s1_set = set()\n for _ in range(self.config.prompt.context_dup):\n context_sub, no_duplicate = self.path_subsumptions(\n cls=subcls, hop=self.config.prompt.prompt_hop, direction=\"subclass\"\n )\n if len(context_sub) > 0:\n s1 = \"\"\n for i in range(len(context_sub)):\n subsum = context_sub[len(context_sub) - i - 1]\n subc = subsum[0]\n s1 += \"%s %s \" % (\n self.iri_label[subc][0]\n if subc in self.iri_label and len(self.iri_label[subc]) > 0\n else \"\",\n sep_token,\n )\n for substr in substrs:\n s1_set.add(s1 + substr)\n else:\n for substr in substrs:\n s1_set.add(\"%s %s\" % (sep_token, substr))\n if no_duplicate:\n break\n\n return list(s1_set)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.supclass_to_strings","title":"supclass_to_strings(supcls, subsumption_type='named_class')
","text":"Transform a super-class into strings (with the path or traversal context template if the subsumption type is \"named_class\"
).
Parameters:
Name Type Description Defaultsupcls
str
IRI of the super-class.
requiredsubsumption_type
str
The type of the subsumption.
'named_class'
Source code in src/deeponto/complete/bertsubs/text_semantics.py
def supclass_to_strings(self, supcls: str, subsumption_type: str = \"named_class\"):\nr\"\"\"Transform a super-class into strings (with the path or traversal context template if the subsumption type is `\"named_class\"`).\n\n Args:\n supcls (str): IRI of the super-class.\n subsumption_type (str): The type of the subsumption.\n \"\"\"\n\n if subsumption_type == \"named_class\":\n supstrs = self.iri_label[supcls] if supcls in self.iri_label and len(self.iri_label[supcls]) else [\"\"]\n else:\n if supcls in self.restriction_label and len(self.restriction_label[supcls]) > 0:\n supstrs = self.restriction_label[supcls]\n else:\n warnings.warn(\"Warning: %s has no descriptions\" % supcls)\n supstrs = [\"\"]\n\n if self.config.use_one_label:\n if subsumption_type == \"named_class\":\n supstrs = supstrs[0:1]\n\n if self.config.prompt.prompt_type == \"isolated\":\n return supstrs\n\n elif self.config.prompt.prompt_type == \"traversal\":\n if subsumption_type == \"named_class\":\n sups_list_strs = set()\n for _ in range(self.config.prompt.context_dup):\n context_sup, no_duplicate = self.traversal_subsumptions(\n cls=supcls,\n hop=self.config.prompt.prompt_hop,\n direction=\"supclass\",\n max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n )\n sups_list = [self.named_subsumption_to_str(subsum) for subsum in context_sup]\n sups_list_str = \" <SEP> \".join(sups_list)\n sups_list_strs.add(sups_list_str)\n if no_duplicate:\n break\n\n else:\n sups_list_strs = set(supstrs)\n\n strs = list()\n for sups_list_str in sups_list_strs:\n for supstr in supstrs:\n s2 = supstr + \" <SEP> \" + sups_list_str\n strs.append(s2)\n return strs\n\n elif self.config.prompt.prompt_type == \"path\":\n sep_token = \"<SUB>\" if self.config.prompt.use_sub_special_token else \"<SEP>\"\n\n if subsumption_type == \"named_class\":\n s2_set = set()\n for _ in range(self.config.prompt.context_dup):\n context_sup, no_duplicate = self.path_subsumptions(\n cls=supcls, hop=self.config.prompt.prompt_hop, direction=\"supclass\"\n )\n if len(context_sup) > 0:\n s2 = \"\"\n for subsum in context_sup:\n supc = subsum[1]\n s2 += \" %s %s\" % (\n sep_token,\n self.iri_label[supc][0]\n if supc in self.iri_label and len(self.iri_label[supc]) > 0\n else \"\",\n )\n for supstr in supstrs:\n s2_set.add(supstr + s2)\n else:\n for supstr in supstrs:\n s2_set.add(\"%s %s\" % (supstr, sep_token))\n\n if no_duplicate:\n break\n else:\n s2_set = set(supstrs)\n\n return list(s2_set)\n\n else:\n print(\"unknown context type %s\" % self.config.prompt.prompt_type)\n sys.exit(0)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.traversal_subsumptions","title":"traversal_subsumptions(cls, hop=1, direction='subclass', max_subsumptions=5)
","text":"Given a class, get its subsumptions by traversing the class hierarchy.
If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\nIf the class is a super-class in the subsumption axiom, get subsumptions from upside.\n
Parameters:
Name Type Description Defaultcls
str
IRI of a named class.
requiredhop
int
The depth of the path.
1
direction
str
subclass
(downside path) or supclass
(upside path).
'subclass'
max_subsumptions
int
The maximum number of subsumptions to consider.
5
Source code in src/deeponto/complete/bertsubs/text_semantics.py
def traversal_subsumptions(self, cls: str, hop: int = 1, direction: str = \"subclass\", max_subsumptions: int = 5):\nr\"\"\"Given a class, get its subsumptions by traversing the class hierarchy.\n\n If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\n If the class is a super-class in the subsumption axiom, get subsumptions from upside.\n\n Args:\n cls (str): IRI of a named class.\n hop (int): The depth of the path.\n direction (str): `subclass` (downside path) or `supclass` (upside path).\n max_subsumptions (int): The maximum number of subsumptions to consider.\n \"\"\"\n subsumptions = list()\n seeds = [cls]\n d = 1\n no_duplicate = True\n while d <= hop:\n new_seeds = list()\n for s in seeds:\n if direction == \"subclass\":\n tmp = self.onto.reasoner.get_inferred_sub_entities(\n self.onto.get_owl_object(iri=s), direct=True\n )\n if len(tmp) > 1:\n no_duplicate = False\n random.shuffle(tmp)\n for c in tmp:\n if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n subsumptions.append([c, s])\n if c not in new_seeds:\n new_seeds.append(c)\n elif direction == \"supclass\":\n tmp = self.onto.reasoner.get_inferred_super_entities(\n self.onto.get_owl_object(iri=s), direct=True\n )\n if len(tmp) > 1:\n no_duplicate = False\n random.shuffle(tmp)\n for c in tmp:\n if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n subsumptions.append([s, c])\n if c not in new_seeds:\n new_seeds.append(c)\n else:\n warnings.warn(\"Unknown direction: %s\" % direction)\n if len(subsumptions) >= max_subsumptions:\n subsumptions = random.sample(subsumptions, max_subsumptions)\n break\n else:\n seeds = new_seeds\n random.shuffle(seeds)\n d += 1\n return subsumptions, no_duplicate\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.path_subsumptions","title":"path_subsumptions(cls, hop=1, direction='subclass')
","text":"Given a class, get its path subsumptions.
If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\nIf the class is a super-class in the subsumption axiom, get subsumptions from upside.\n
Parameters:
Name Type Description Defaultcls
str
IRI of a named class.
requiredhop
int
The depth of the path.
1
direction
str
subclass
(downside path) or supclass
(upside path).
'subclass'
Source code in src/deeponto/complete/bertsubs/text_semantics.py
def path_subsumptions(self, cls: str, hop: int = 1, direction: str = \"subclass\"):\nr\"\"\"Given a class, get its path subsumptions.\n\n If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\n If the class is a super-class in the subsumption axiom, get subsumptions from upside.\n\n Args:\n cls (str): IRI of a named class.\n hop (int): The depth of the path.\n direction (str): `subclass` (downside path) or `supclass` (upside path).\n \"\"\"\n subsumptions = list()\n seed = cls\n d = 1\n no_duplicate = True\n while d <= hop:\n if direction == \"subclass\":\n tmp = self.onto.reasoner.get_inferred_sub_entities(\n self.onto.get_owl_object(iri=seed), direct=True\n )\n if len(tmp) > 1:\n no_duplicate = False\n end = True\n if len(tmp) > 0:\n random.shuffle(tmp)\n for c in tmp:\n if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n subsumptions.append([c, seed])\n seed = c\n end = False\n break\n if end:\n break\n elif direction == \"supclass\":\n tmp = self.onto.reasoner.get_inferred_super_entities(\n self.onto.get_owl_object(iri=seed), direct=True\n )\n if len(tmp) > 1:\n no_duplicate = False\n end = True\n if len(tmp) > 0:\n random.shuffle(tmp)\n for c in tmp:\n if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n subsumptions.append([seed, c])\n seed = c\n end = False\n break\n if end:\n break\n else:\n warnings.warn(\"Unknown direction: %s\" % direction)\n\n d += 1\n return subsumptions, no_duplicate\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer","title":"BERTSubsumptionClassifierTrainer(bert_checkpoint, train_data, val_data, max_length=128, early_stop=False, early_stop_patience=10)
","text":"Source code in src/deeponto/complete/bertsubs/bert_classifier.py
def __init__(\n self,\n bert_checkpoint: str,\n train_data: List,\n val_data: List,\n max_length: int = 128,\n early_stop: bool = False,\n early_stop_patience: int = 10,\n):\n print(f\"initialize BERT for Binary Classification from the Pretrained BERT model at: {bert_checkpoint} ...\")\n\n # BERT\n self.model = AutoModelForSequenceClassification.from_pretrained(bert_checkpoint)\n self.tokenizer = AutoTokenizer.from_pretrained(bert_checkpoint)\n self.trainer = None\n\n self.max_length = max_length\n self.tra = self.load_dataset(train_data, max_length=self.max_length, count_token_size=True)\n self.val = self.load_dataset(val_data, max_length=self.max_length, count_token_size=True)\n print(f\"text max length: {self.max_length}\")\n print(f\"data files loaded with sizes:\")\n print(f\"\\t[# Train]: {len(self.tra)}, [# Val]: {len(self.val)}\")\n\n # early stopping\n self.early_stop = early_stop\n self.early_stop_patience = early_stop_patience\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.add_special_tokens","title":"add_special_tokens(tokens)
","text":"Add additional special tokens into the tokenizer's vocab.
Parameters:
Name Type Description Defaulttokens
List[str]
additional tokens to add, e.g., [\"<SUB>\",\"<EOA>\",\"<EOC>\"]
src/deeponto/complete/bertsubs/bert_classifier.py
def add_special_tokens(self, tokens: List):\nr\"\"\"Add additional special tokens into the tokenizer's vocab.\n Args:\n tokens (List[str]): additional tokens to add, e.g., `[\"<SUB>\",\"<EOA>\",\"<EOC>\"]`\n \"\"\"\n special_tokens_dict = {\"additional_special_tokens\": tokens}\n self.tokenizer.add_special_tokens(special_tokens_dict)\n self.model.resize_token_embeddings(len(self.tokenizer))\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.train","title":"train(train_args, do_fine_tune=True)
","text":"Initiate the Huggingface trainer with input arguments and start training.
Parameters:
Name Type Description Defaulttrain_args
TrainingArguments
Arguments for training.
requireddo_fine_tune
bool
False
means loading the checkpoint without training. Defaults to True
.
True
Source code in src/deeponto/complete/bertsubs/bert_classifier.py
def train(self, train_args: TrainingArguments, do_fine_tune: bool = True):\nr\"\"\"Initiate the Huggingface trainer with input arguments and start training.\n Args:\n train_args (TrainingArguments): Arguments for training.\n do_fine_tune (bool): `False` means loading the checkpoint without training. Defaults to `True`.\n \"\"\"\n self.trainer = Trainer(\n model=self.model,\n args=train_args,\n train_dataset=self.tra,\n eval_dataset=self.val,\n compute_metrics=self.compute_metrics,\n tokenizer=self.tokenizer,\n )\n if self.early_stop:\n self.trainer.add_callback(EarlyStoppingCallback(early_stopping_patience=self.early_stop_patience))\n if do_fine_tune:\n self.trainer.train()\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.compute_metrics","title":"compute_metrics(pred)
staticmethod
","text":"Auxiliary function to add accurate metric into evaluation.
Source code insrc/deeponto/complete/bertsubs/bert_classifier.py
@staticmethod\ndef compute_metrics(pred):\n\"\"\"Auxiliary function to add accurate metric into evaluation.\n \"\"\"\n labels = pred.label_ids\n preds = pred.predictions.argmax(-1)\n acc = accuracy_score(labels, preds)\n return {\"accuracy\": acc}\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.load_dataset","title":"load_dataset(data, max_length=512, count_token_size=False)
","text":"Load a Huggingface dataset from a list of samples.
Parameters:
Name Type Description Defaultdata
List[Tuple]
Data samples in a list.
requiredmax_length
int
Maximum length of the input sequence.
512
count_token_size
bool
Whether or not to count the token sizes of the data. Defaults to False
.
False
Source code in src/deeponto/complete/bertsubs/bert_classifier.py
def load_dataset(self, data: List, max_length: int = 512, count_token_size: bool = False) -> Dataset:\nr\"\"\"Load a Huggingface dataset from a list of samples.\n Args:\n data (List[Tuple]): Data samples in a list.\n max_length (int): Maximum length of the input sequence.\n count_token_size (bool): Whether or not to count the token sizes of the data. Defaults to `False`.\n \"\"\"\n # data_df = pd.DataFrame(data, columns=[\"sent1\", \"sent2\", \"labels\"])\n # dataset = Dataset.from_pandas(data_df)\n\n def iterate():\n for sample in data:\n yield {\"sent1\": sample[0], \"sent2\": sample[1], \"labels\": sample[2]}\n\n dataset = Dataset.from_generator(iterate)\n\n if count_token_size:\n tokens = self.tokenizer(dataset[\"sent1\"], dataset[\"sent2\"])\n l_sum, num_128, num_256, num_512, l_max = 0, 0, 0, 0, 0\n for item in tokens[\"input_ids\"]:\n l = len(item)\n l_sum += l\n if l <= 128:\n num_128 += 1\n if l <= 256:\n num_256 += 1\n if l <= 512:\n num_512 += 1\n if l > l_max:\n l_max = l\n print(\"average token size: %.2f\" % (l_sum / len(tokens[\"input_ids\"])))\n print(\"ratio of token size <= 128: %.3f\" % (num_128 / len(tokens[\"input_ids\"])))\n print(\"ratio of token size <= 256: %.3f\" % (num_256 / len(tokens[\"input_ids\"])))\n print(\"ratio of token size <= 512: %.3f\" % (num_512 / len(tokens[\"input_ids\"])))\n print(\"max token size: %d\" % l_max)\n dataset = dataset.map(\n lambda examples: self.tokenizer(\n examples[\"sent1\"], examples[\"sent2\"], max_length=max_length, truncation=True\n ),\n batched=True,\n num_proc=1,\n )\n return dataset\n
"},{"location":"deeponto/onto/normalisation/","title":"Ontology Pruning","text":""},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser","title":"OntologyNormaliser()
","text":"Class for ontology normalisation.
Credit
The code of this class originates from the mOWL library, which utilises the normalisation functionality from the Java library Jcel
.
The normalisation process transforms ontology axioms into normal forms in the Description Logic \\(\\mathcal{EL}\\), including:
where \\(C\\) and \\(C'\\) can be named concepts or \\(\\top\\), \\(D\\) is a named concept or \\(\\bot\\), \\(r\\) is a role (property).
Attributes:
Name Type Descriptiononto
Ontology
The input ontology to be normalised.
temp_super_class_index
Dict[OWLCLassExpression, OWLClass]
A dictionary in the form of {complex_sub_class: temp_super_class}
, which means temp_super_class
is created during the normalisation of a complex subsumption axiom that has complex_sub_class
as the sub-class.
src/deeponto/onto/normalisation.py
def __init__(self):\n return\n
"},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser.normalise","title":"normalise(ontology)
","text":"Performs the \\(\\mathcal{EL}\\) normalisation.
Parameters:
Name Type Description Defaultontology
Ontology
An ontology to be normalised.
requiredReturns:
Type Descriptionlist[OWLAxiom]
A list of normalised TBox axioms.
Source code insrc/deeponto/onto/normalisation.py
def normalise(self, ontology: Ontology):\nr\"\"\"Performs the $\\mathcal{EL}$ normalisation.\n\n Args:\n ontology (Ontology): An ontology to be normalised.\n\n Returns:\n (list[OWLAxiom]): A list of normalised TBox axioms.\n \"\"\"\n\n processed_owl_onto = self.preprocess_ontology(ontology)\n root_ont = processed_owl_onto\n translator = Translator(\n processed_owl_onto.getOWLOntologyManager().getOWLDataFactory(), IntegerOntologyObjectFactoryImpl()\n )\n axioms = HashSet()\n axioms.addAll(root_ont.getAxioms())\n translator.getTranslationRepository().addAxiomEntities(root_ont)\n\n for ont in root_ont.getImportsClosure():\n axioms.addAll(ont.getAxioms())\n translator.getTranslationRepository().addAxiomEntities(ont)\n\n intAxioms = translator.translateSA(axioms)\n\n normaliser = OntologyNormalizer()\n\n factory = IntegerOntologyObjectFactoryImpl()\n normalised_ontology = normaliser.normalize(intAxioms, factory)\n self.rTranslator = ReverseAxiomTranslator(translator, processed_owl_onto)\n\n normalised_axioms = []\n # revert the jcel axioms to the original OWLAxioms\n for ax in normalised_ontology:\n try:\n axiom = self.rTranslator.visit(ax)\n normalised_axioms.append(axiom)\n except Exception as e:\n logging.info(\"Reverse translation. Ignoring axiom: %s\", ax)\n logging.info(e)\n\n return list(set(axioms))\n
"},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser.preprocess_ontology","title":"preprocess_ontology(ontology)
","text":"Preprocess the ontology to remove axioms that are not supported by the normalisation process.
Source code insrc/deeponto/onto/normalisation.py
def preprocess_ontology(self, ontology: Ontology):\n\"\"\"Preprocess the ontology to remove axioms that are not supported by the normalisation process.\"\"\"\n\n tbox_axioms = ontology.owl_onto.getTBoxAxioms(Imports.fromBoolean(True))\n new_tbox_axioms = HashSet()\n\n for axiom in tbox_axioms:\n axiom_as_str = axiom.toString()\n\n if \"UnionOf\" in axiom_as_str:\n continue\n elif \"MinCardinality\" in axiom_as_str:\n continue\n elif \"ComplementOf\" in axiom_as_str:\n continue\n elif \"AllValuesFrom\" in axiom_as_str:\n continue\n elif \"MaxCardinality\" in axiom_as_str:\n continue\n elif \"ExactCardinality\" in axiom_as_str:\n continue\n elif \"Annotation\" in axiom_as_str:\n continue\n elif \"ObjectHasSelf\" in axiom_as_str:\n continue\n elif \"urn:swrl\" in axiom_as_str:\n continue\n elif \"EquivalentObjectProperties\" in axiom_as_str:\n continue\n elif \"SymmetricObjectProperty\" in axiom_as_str:\n continue\n elif \"AsymmetricObjectProperty\" in axiom_as_str:\n continue\n elif \"ObjectOneOf\" in axiom_as_str:\n continue\n else:\n new_tbox_axioms.add(axiom)\n\n processed_owl_onto = ontology.owl_manager.createOntology(new_tbox_axioms)\n # NOTE: the returned object is `owlapi.OWLOntology` not `deeponto.onto.Ontology`\n return processed_owl_onto\n
"},{"location":"deeponto/onto/ontology/","title":"Ontology","text":"Python classes in this page are strongly dependent on the OWLAPI library. The base class Ontology
extends several features including convenient access to specially defined entities (e.g., owl:Thing
and owl:Nothing
), indexing of entities in the signature with their IRIs as keys, and some other customised functions for specific ontology engineering purposes. Ontology
also has an OntologyReasoner
attribute which provides reasoning facilities such as classifying entities, checking entailment, and so on. Users who are familiar with the OWLAPI should feel relatively easy to extend the Python classes here.
Ontology(owl_path, reasoner_type='hermit')
","text":"Ontology class that extends from the Java library OWLAPI.
Typing from OWLAPI
Types with OWL
prefix are mostly imported from the OWLAPI library by, for example, from org.semanticweb.owlapi.model import OWLObject
.
Attributes:
Name Type Descriptionowl_path
str
The path to the OWL ontology file.
owl_manager
OWLOntologyManager
A ontology manager for creating OWLOntology
.
owl_onto
OWLOntology
An OWLOntology
created by owl_manger
from owl_path
.
owl_iri
str
The IRI of the owl_onto
.
owl_classes
dict[str, OWLClass]
A dictionary that stores the (iri, ontology_class)
pairs.
owl_object_properties
dict[str, OWLObjectProperty]
A dictionary that stores the (iri, ontology_object_property)
pairs.
owl_data_properties
dict[str, OWLDataProperty]
A dictionary that stores the (iri, ontology_data_property)
pairs.
owl_annotation_properties
dict[str, OWLAnnotationProperty]
A dictionary that stores the (iri, ontology_annotation_property)
pairs.
owl_individuals
dict[str, OWLIndividual]
A dictionary that stores the (iri, ontology_individual)
pairs.
owl_data_factory
OWLDataFactory
A data factory for manipulating axioms.
reasoner_type
str
The type of reasoner used. Defaults to \"hermit\"
. Options are [\"hermit\", \"elk\", \"struct\"]
.
reasoner
OntologyReasoner
A reasoner for ontology inference.
Parameters:
Name Type Description Defaultowl_path
str
The path to the OWL ontology file.
requiredreasoner_type
str
The type of reasoner used. Defaults to \"hermit\"
. Options are [\"hermit\", \"elk\", \"struct\"]
.
'hermit'
Source code in src/deeponto/onto/ontology.py
def __init__(self, owl_path: str, reasoner_type: str = \"hermit\"):\n\"\"\"Initialise a new ontology.\n\n Args:\n owl_path (str): The path to the OWL ontology file.\n reasoner_type (str): The type of reasoner used. Defaults to `\"hermit\"`. Options are `[\"hermit\", \"elk\", \"struct\"]`.\n \"\"\"\n self.owl_path = os.path.abspath(owl_path)\n self.owl_manager = OWLManager.createOWLOntologyManager()\n self.owl_onto = self.owl_manager.loadOntologyFromOntologyDocument(IRI.create(File(self.owl_path)))\n self.owl_iri = str(self.owl_onto.getOntologyID().getOntologyIRI().get())\n self.owl_classes = self._get_owl_objects(\"Classes\")\n self.owl_object_properties = self._get_owl_objects(\"ObjectProperties\")\n # for some reason the top object property is included\n if OWL_TOP_OBJECT_PROPERTY in self.owl_object_properties.keys():\n del self.owl_object_properties[OWL_TOP_OBJECT_PROPERTY]\n self.owl_data_properties = self._get_owl_objects(\"DataProperties\")\n self.owl_data_factory = self.owl_manager.getOWLDataFactory()\n self.owl_annotation_properties = self._get_owl_objects(\"AnnotationProperties\")\n self.owl_individuals = self._get_owl_objects(\"Individuals\")\n\n # reasoning\n self.reasoner_type = reasoner_type\n self.reasoner = OntologyReasoner(self, self.reasoner_type)\n\n # hidden attributes\n self._multi_children_classes = None\n self._sibling_class_groups = None\n self._axiom_type = AxiomType # for development use\n\n # summary\n self.info = {\n type(self).__name__: {\n \"loaded_from\": os.path.basename(self.owl_path),\n \"num_classes\": len(self.owl_classes),\n \"num_object_properties\": len(self.owl_object_properties),\n \"num_data_properties\": len(self.owl_data_properties),\n \"num_annotation_properties\": len(self.owl_annotation_properties),\n \"num_individuals\": len(self.owl_individuals),\n \"reasoner_type\": self.reasoner_type,\n }\n }\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.name","title":"name
property
","text":"Return the name of the ontology file.
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLThing","title":"OWLThing
property
","text":"Return OWLThing
.
OWLNothing
property
","text":"Return OWLNoThing
.
OWLTopObjectProperty
property
","text":"Return OWLTopObjectProperty
.
OWLBottomObjectProperty
property
","text":"Return OWLBottomObjectProperty
.
OWLTopDataProperty
property
","text":"Return OWLTopDataProperty
.
OWLBottomDataProperty
property
","text":"Return OWLBottomDataProperty
.
sibling_class_groups: List[List[str]]
property
","text":"Return grouped sibling classes (with a common direct parent);
NOTE that only groups with size > 1 will be considered
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_entity_type","title":"get_entity_type(entity, is_singular=False)
staticmethod
","text":"A handy method to get the type
of an OWLObject
entity.
src/deeponto/onto/ontology.py
@staticmethod\ndef get_entity_type(entity: OWLObject, is_singular: bool = False):\n\"\"\"A handy method to get the `type` of an `OWLObject` entity.\"\"\"\n if isinstance(entity, OWLClassExpression):\n return \"Classes\" if not is_singular else \"Class\"\n elif isinstance(entity, OWLObjectPropertyExpression):\n return \"ObjectProperties\" if not is_singular else \"ObjectProperty\"\n elif isinstance(entity, OWLDataPropertyExpression):\n return \"DataProperties\" if not is_singular else \"DataProperty\"\n elif isinstance(entity, OWLIndividual):\n return \"Individuals\" if not is_singular else \"Individual\"\n else:\n # NOTE: add further options in future\n pass\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_max_jvm_memory","title":"get_max_jvm_memory()
staticmethod
","text":"Get the maximum heap size assigned to the JVM.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef get_max_jvm_memory():\n\"\"\"Get the maximum heap size assigned to the JVM.\"\"\"\n if jpype.isJVMStarted():\n return int(Runtime.getRuntime().maxMemory())\n else:\n raise RuntimeError(\"Cannot retrieve JVM memory as it is not started.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_owl_object","title":"get_owl_object(iri)
","text":"Get an OWLObject
given its IRI.
src/deeponto/onto/ontology.py
def get_owl_object(self, iri: str):\n\"\"\"Get an `OWLObject` given its IRI.\"\"\"\n if iri in self.owl_classes.keys():\n return self.owl_classes[iri]\n elif iri in self.owl_object_properties.keys():\n return self.owl_object_properties[iri]\n elif iri in self.owl_data_properties.keys():\n return self.owl_data_properties[iri]\n elif iri in self.owl_annotation_properties.keys():\n return self.owl_annotation_properties[iri]\n elif iri in self.owl_individuals.keys():\n return self.owl_individuals[iri]\n else:\n raise KeyError(f\"Cannot retrieve unknown IRI: {iri}.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_iri","title":"get_iri(owl_object)
","text":"Get the IRI of an OWLObject
. Raises an exception if there is no associated IRI.
src/deeponto/onto/ontology.py
def get_iri(self, owl_object: OWLObject):\n\"\"\"Get the IRI of an `OWLObject`. Raises an exception if there is no associated IRI.\"\"\"\n try:\n return str(owl_object.getIRI())\n except:\n raise RuntimeError(\"Input owl object does not have IRI.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_axiom_type","title":"get_axiom_type(axiom)
staticmethod
","text":"Get the axiom type (in str
) for the given axiom.
Check full list at: http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/AxiomType.html.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef get_axiom_type(axiom: OWLAxiom):\nr\"\"\"Get the axiom type (in `str`) for the given axiom.\n\n Check full list at: <http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/AxiomType.html>.\n \"\"\"\n return str(axiom.getAxiomType())\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_all_axioms","title":"get_all_axioms()
","text":"Return all axioms (in a list) asserted in the ontology.
Source code insrc/deeponto/onto/ontology.py
def get_all_axioms(self):\n\"\"\"Return all axioms (in a list) asserted in the ontology.\"\"\"\n return list(self.owl_onto.getAxioms())\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_subsumption_axioms","title":"get_subsumption_axioms(entity_type='Classes')
","text":"Return subsumption axioms (subject to input entity type) asserted in the ontology.
Parameters:
Name Type Description Defaultentity_type
str
The entity type to be considered. Defaults to \"Classes\"
. Options are \"Classes\"
, \"ObjectProperties\"
, \"DataProperties\"
, and \"AnnotationProperties\"
.
'Classes'
Returns:
Type DescriptionList[OWLAxiom]
A list of equivalence axioms subject to input entity type.
Source code insrc/deeponto/onto/ontology.py
def get_subsumption_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return subsumption axioms (subject to input entity type) asserted in the ontology.\n\n Args:\n entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n Options are `\"Classes\"`, `\"ObjectProperties\"`, `\"DataProperties\"`, and `\"AnnotationProperties\"`.\n Returns:\n (List[OWLAxiom]): A list of equivalence axioms subject to input entity type.\n \"\"\"\n if entity_type == \"Classes\":\n return list(self.owl_onto.getAxioms(AxiomType.SUBCLASS_OF))\n elif entity_type == \"ObjectProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.SUB_OBJECT_PROPERTY))\n elif entity_type == \"DataProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.SUB_DATA_PROPERTY))\n elif entity_type == \"AnnotationProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.SUB_ANNOTATION_PROPERTY_OF))\n else:\n raise ValueError(f\"Unknown entity type {entity_type}.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_equivalence_axioms","title":"get_equivalence_axioms(entity_type='Classes')
","text":"Return equivalence axioms (subject to input entity type) asserted in the ontology.
Parameters:
Name Type Description Defaultentity_type
str
The entity type to be considered. Defaults to \"Classes\"
. Options are \"Classes\"
, \"ObjectProperties\"
, and \"DataProperties\"
.
'Classes'
Returns:
Type Descriptionlist[OWLAxiom]
A list of equivalence axioms subject to input entity type.
Source code insrc/deeponto/onto/ontology.py
def get_equivalence_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return equivalence axioms (subject to input entity type) asserted in the ontology.\n\n Args:\n entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n Options are `\"Classes\"`, `\"ObjectProperties\"`, and `\"DataProperties\"`.\n Returns:\n (list[OWLAxiom]): A list of equivalence axioms subject to input entity type.\n \"\"\"\n if entity_type == \"Classes\":\n return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_CLASSES))\n elif entity_type == \"ObjectProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_OBJECT_PROPERTIES))\n elif entity_type == \"DataProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_DATA_PROPERTIES))\n else:\n raise ValueError(f\"Unknown entity type {entity_type}.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_assertion_axioms","title":"get_assertion_axioms(entity_type='Classes')
","text":"Return assertion (ABox) axioms (subject to input entity type) asserted in the ontology.
Parameters:
Name Type Description Defaultentity_type
str
The entity type to be considered. Defaults to \"Classes\"
. Options are \"Classes\"
, \"ObjectProperties\"
, and \"DataProperties\"
.
'Classes'
Returns:
Type Descriptionlist[OWLAxiom]
A list of assertion axioms subject to input entity type.
Source code insrc/deeponto/onto/ontology.py
def get_assertion_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return assertion (ABox) axioms (subject to input entity type) asserted in the ontology.\n\n Args:\n entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n Options are `\"Classes\"`, `\"ObjectProperties\"`, and `\"DataProperties\"`.\n Returns:\n (list[OWLAxiom]): A list of assertion axioms subject to input entity type.\n \"\"\"\n if entity_type == \"Classes\":\n return list(self.owl_onto.getAxioms(AxiomType.CLASS_ASSERTION))\n elif entity_type == \"ObjectProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.OBJECT_PROPERTY_ASSERTION))\n elif entity_type == \"DataProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.DATA_PROPERTY_ASSERTION))\n elif entity_type == \"Annotations\":\n return list(self.owl_onto.getAxioms(AxiomType.ANNOTATION_ASSERTION))\n else:\n raise ValueError(f\"Unknown entity type {entity_type}.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_parents","title":"get_asserted_parents(owl_object, named_only=False)
","text":"Get all the asserted parents of a given owl object.
Parameters:
Name Type Description Defaultowl_object
OWLObject
An owl object that could have a parent.
requirednamed_only
bool
If True
, return parents that are named classes.
False
Returns:
Type Descriptionset[OWLObject]
The parent set of the given owl object.
Source code insrc/deeponto/onto/ontology.py
def get_asserted_parents(self, owl_object: OWLObject, named_only: bool = False):\nr\"\"\"Get all the asserted parents of a given owl object.\n\n Args:\n owl_object (OWLObject): An owl object that could have a parent.\n named_only (bool): If `True`, return parents that are named classes.\n Returns:\n (set[OWLObject]): The parent set of the given owl object.\n \"\"\"\n entity_type = self.get_entity_type(owl_object)\n if entity_type == \"Classes\":\n parents = set(EntitySearcher.getSuperClasses(owl_object, self.owl_onto))\n elif entity_type.endswith(\"Properties\"):\n parents = set(EntitySearcher.getSuperProperties(owl_object, self.owl_onto))\n else:\n raise ValueError(f\"Unsupported entity type {entity_type}.\")\n if named_only:\n parents = set([p for p in parents if self.check_named_entity(p)])\n return parents\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_children","title":"get_asserted_children(owl_object, named_only=False)
","text":"Get all the asserted children of a given owl object.
Parameters:
Name Type Description Defaultowl_object
OWLObject
An owl object that could have a child.
requirednamed_only
bool
If True
, return children that are named classes.
False
Returns:
Type Descriptionset[OWLObject]
The children set of the given owl object.
Source code insrc/deeponto/onto/ontology.py
def get_asserted_children(self, owl_object: OWLObject, named_only: bool = False):\nr\"\"\"Get all the asserted children of a given owl object.\n\n Args:\n owl_object (OWLObject): An owl object that could have a child.\n named_only (bool): If `True`, return children that are named classes.\n Returns:\n (set[OWLObject]): The children set of the given owl object.\n \"\"\"\n entity_type = self.get_entity_type(owl_object)\n if entity_type == \"Classes\":\n children = set(EntitySearcher.getSubClasses(owl_object, self.owl_onto))\n elif entity_type.endswith(\"Properties\"):\n children = set(EntitySearcher.getSubProperties(owl_object, self.owl_onto))\n else:\n raise ValueError(f\"Unsupported entity type {entity_type}.\")\n if named_only:\n children = set([c for c in children if self.check_named_entity(c)])\n return children\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_complex_classes","title":"get_asserted_complex_classes(gci_only=False)
","text":"Get complex classes that occur in at least one of the ontology axioms.
Parameters:
Name Type Description Defaultgci_only
bool
If True
, consider complex classes that occur in GCIs only; otherwise consider those that occur in equivalence axioms as well.
False
Returns:
Type Descriptionset[OWLClassExpression]
A set of complex classes.
Source code insrc/deeponto/onto/ontology.py
def get_asserted_complex_classes(self, gci_only: bool = False):\n\"\"\"Get complex classes that occur in at least one of the ontology axioms.\n\n Args:\n gci_only (bool): If `True`, consider complex classes that occur in GCIs only; otherwise consider\n those that occur in equivalence axioms as well.\n Returns:\n (set[OWLClassExpression]): A set of complex classes.\n \"\"\"\n complex_classes = []\n\n for gci in self.get_subsumption_axioms(\"Classes\"):\n super_class = gci.getSuperClass()\n sub_class = gci.getSubClass()\n if not OntologyReasoner.has_iri(super_class):\n complex_classes.append(super_class)\n if not OntologyReasoner.has_iri(sub_class):\n complex_classes.append(sub_class)\n\n # also considering equivalence axioms\n if not gci_only:\n for eq in self.get_equivalence_axioms(\"Classes\"):\n gci = list(eq.asOWLSubClassOfAxioms())[0]\n super_class = gci.getSuperClass()\n sub_class = gci.getSubClass()\n if not OntologyReasoner.has_iri(super_class):\n complex_classes.append(super_class)\n if not OntologyReasoner.has_iri(sub_class):\n complex_classes.append(sub_class)\n\n return set(complex_classes)\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_annotations","title":"get_annotations(owl_object, annotation_property_iri=None, annotation_language_tag=None, apply_lowercasing=False, normalise_identifiers=False)
","text":"Get the annotation literals of the given OWLObject
.
Parameters:
Name Type Description Defaultowl_object
Union[OWLObject, str]
An OWLObject
or its IRI.
annotation_property_iri
str
Any particular annotation property IRI of interest. Defaults to None
.
None
annotation_language_tag
str
Any particular annotation language tag of interest; NOTE that not every annotation has a language tag, in this case assume it is in English. Defaults to None
. Options are \"en\"
, \"ge\"
etc.
None
apply_lowercasing
bool
Whether or not to apply lowercasing to annotation literals. Defaults to False
.
False
normalise_identifiers
bool
Whether to normalise annotation text that is in the Java identifier format. Defaults to False
.
False
Returns:
Type Descriptionset[str]
A set of annotation literals of the given OWLObject
.
src/deeponto/onto/ontology.py
def get_annotations(\n self,\n owl_object: Union[OWLObject, str],\n annotation_property_iri: Optional[str] = None,\n annotation_language_tag: Optional[str] = None,\n apply_lowercasing: bool = False,\n normalise_identifiers: bool = False,\n):\n\"\"\"Get the annotation literals of the given `OWLObject`.\n\n Args:\n owl_object (Union[OWLObject, str]): An `OWLObject` or its IRI.\n annotation_property_iri (str, optional):\n Any particular annotation property IRI of interest. Defaults to `None`.\n annotation_language_tag (str, optional):\n Any particular annotation language tag of interest; NOTE that not every\n annotation has a language tag, in this case assume it is in English.\n Defaults to `None`. Options are `\"en\"`, `\"ge\"` etc.\n apply_lowercasing (bool): Whether or not to apply lowercasing to annotation literals.\n Defaults to `False`.\n normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format.\n Defaults to `False`.\n Returns:\n (set[str]): A set of annotation literals of the given `OWLObject`.\n \"\"\"\n if isinstance(owl_object, str):\n owl_object = self.get_owl_object(owl_object)\n\n annotation_property = None\n if annotation_property_iri:\n # return an empty list if `annotation_property_iri` does not exist in this OWLOntology`\n annotation_property = self.get_owl_object(annotation_property_iri)\n\n annotations = []\n for annotation in EntitySearcher.getAnnotations(owl_object, self.owl_onto, annotation_property):\n annotation = annotation.getValue()\n # boolean that indicates whether the annotation's language is of interest\n fit_language = False\n if not annotation_language_tag:\n # it is set to `True` if `annotation_langauge` is not specified\n fit_language = True\n else:\n # restrict the annotations to a language if specified\n try:\n # NOTE: not every annotation has a language attribute\n fit_language = annotation.getLang() == annotation_language_tag\n except:\n # in the case when this annotation has no language tag\n # we assume it is in English\n if annotation_language_tag == \"en\":\n fit_language = True\n\n if fit_language:\n # only get annotations that have a literal value\n if annotation.isLiteral():\n annotations.append(\n process_annotation_literal(\n str(annotation.getLiteral()), apply_lowercasing, normalise_identifiers\n )\n )\n\n return uniqify(annotations)\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_consistency","title":"check_consistency()
","text":"Check if the ontology is consistent according to the pre-loaded reasoner.
Source code insrc/deeponto/onto/ontology.py
def check_consistency(self):\n\"\"\"Check if the ontology is consistent according to the pre-loaded reasoner.\n \"\"\"\n logging.info(f\"Checking consistency with `{self.reasoner_type}` reasoner.\")\n return self.reasoner.owl_reasoner.isConsistent()\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_named_entity","title":"check_named_entity(owl_object)
","text":"Check if the input entity is a named atomic entity. That is, it is not a complex entity, \\(\\top\\), or \\(\\bot\\).
Source code insrc/deeponto/onto/ontology.py
def check_named_entity(self, owl_object: OWLObject):\nr\"\"\"Check if the input entity is a named atomic entity. That is,\n it is not a complex entity, $\\top$, or $\\bot$.\n \"\"\"\n entity_type = self.get_entity_type(owl_object)\n top = TOP_BOTTOMS[entity_type].TOP\n bottom = TOP_BOTTOMS[entity_type].BOTTOM\n if OntologyReasoner.has_iri(owl_object):\n iri = str(owl_object.getIRI())\n # check if the entity is TOP or BOTTOM\n return iri != top and iri != bottom\n return False\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_deprecated","title":"check_deprecated(owl_object)
","text":"Check if the given OWL object is marked as deprecated according to \\(\\texttt{owl:deprecated}\\).
NOTE: the string literal indicating deprecation is either 'true'
or 'True'
. Also, if \\(\\texttt{owl:deprecated}\\) is not defined in this ontology, return False
by default.
src/deeponto/onto/ontology.py
def check_deprecated(self, owl_object: OWLObject):\nr\"\"\"Check if the given OWL object is marked as deprecated according to $\\texttt{owl:deprecated}$.\n\n NOTE: the string literal indicating deprecation is either `'true'` or `'True'`. Also, if $\\texttt{owl:deprecated}$\n is not defined in this ontology, return `False` by default.\n \"\"\"\n if not OWL_DEPRECATED in self.owl_annotation_properties.keys():\n # return False if owl:deprecated is not defined in this ontology\n return False\n\n deprecated = self.get_annotations(owl_object, annotation_property_iri=OWL_DEPRECATED)\n if deprecated and (list(deprecated)[0] == \"true\" or list(deprecated)[0] == \"True\"):\n return True\n else:\n return False\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.save_onto","title":"save_onto(save_path)
","text":"Save the ontology file to the given path.
Source code insrc/deeponto/onto/ontology.py
def save_onto(self, save_path: str):\n\"\"\"Save the ontology file to the given path.\"\"\"\n self.owl_onto.saveOntology(IRI.create(File(save_path).toURI()))\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.build_annotation_index","title":"build_annotation_index(annotation_property_iris=[RDFS_LABEL], entity_type='Classes', apply_lowercasing=False, normalise_identifiers=False)
","text":"Build an annotation index for a given type of entities.
Parameters:
Name Type Description Defaultannotation_property_iris
list[str]
A list of annotation property IRIs (it is possible that not every annotation property IRI is in use); if not provided, the built-in rdfs:label
is considered. Defaults to [RDFS_LABEL]
.
[RDFS_LABEL]
entity_type
str
The entity type to be considered. Defaults to \"Classes\"
. Options are \"Classes\"
, \"ObjectProperties\"
, \"DataProperties\"
, etc.
'Classes'
apply_lowercasing
bool
Whether or not to apply lowercasing to annotation literals. Defaults to True
.
False
normalise_identifiers
bool
Whether to normalise annotation text that is in the Java identifier format. Defaults to False
.
False
Returns:
Type DescriptionTuple[dict, list[str]]
The built annotation index, and the list of annotation property IRIs that are in use.
Source code insrc/deeponto/onto/ontology.py
def build_annotation_index(\n self,\n annotation_property_iris: List[str] = [RDFS_LABEL],\n entity_type: str = \"Classes\",\n apply_lowercasing: bool = False,\n normalise_identifiers: bool = False,\n):\n\"\"\"Build an annotation index for a given type of entities.\n\n Args:\n annotation_property_iris (list[str]): A list of annotation property IRIs (it is possible\n that not every annotation property IRI is in use); if not provided, the built-in\n `rdfs:label` is considered. Defaults to `[RDFS_LABEL]`.\n entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n Options are `\"Classes\"`, `\"ObjectProperties\"`, `\"DataProperties\"`, etc.\n apply_lowercasing (bool): Whether or not to apply lowercasing to annotation literals.\n Defaults to `True`.\n normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format.\n Defaults to `False`.\n\n Returns:\n (Tuple[dict, list[str]]): The built annotation index, and the list of annotation property IRIs that are in use.\n \"\"\"\n\n annotation_index = defaultdict(set)\n # example: Classes => owl_classes; ObjectProperties => owl_object_properties\n entity_type = \"owl_\" + split_java_identifier(entity_type).replace(\" \", \"_\").lower()\n entity_index = getattr(self, entity_type)\n\n # preserve available annotation properties\n annotation_property_iris = [\n airi for airi in annotation_property_iris if airi in self.owl_annotation_properties.keys()\n ]\n\n # build the annotation index without duplicated literals\n for airi in annotation_property_iris:\n for iri, entity in entity_index.items():\n annotation_index[iri].update(\n self.get_annotations(\n owl_object=entity,\n annotation_property_iri=airi,\n annotation_language_tag=None,\n apply_lowercasing=apply_lowercasing,\n normalise_identifiers=normalise_identifiers,\n )\n )\n\n return annotation_index, annotation_property_iris\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.build_inverted_annotation_index","title":"build_inverted_annotation_index(annotation_index, tokenizer)
staticmethod
","text":"Build an inverted annotation index given an annotation index and a tokenizer.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef build_inverted_annotation_index(annotation_index: dict, tokenizer: Tokenizer):\n\"\"\"Build an inverted annotation index given an annotation index and a tokenizer.\"\"\"\n return InvertedIndex(annotation_index, tokenizer)\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.add_axiom","title":"add_axiom(owl_axiom, return_undo=True)
","text":"Add an axiom into the current ontology.
Parameters:
Name Type Description Defaultowl_axiom
OWLAxiom
An axiom to be added.
requiredreturn_undo
bool
Returning the undo operation or not. Defaults to True
.
True
Source code in src/deeponto/onto/ontology.py
def add_axiom(self, owl_axiom: OWLAxiom, return_undo: bool = True):\n\"\"\"Add an axiom into the current ontology.\n\n Args:\n owl_axiom (OWLAxiom): An axiom to be added.\n return_undo (bool, optional): Returning the undo operation or not. Defaults to `True`.\n \"\"\"\n change = AddAxiom(self.owl_onto, owl_axiom)\n result = self.owl_onto.applyChange(change)\n logger.info(f\"[{str(result)}] Adding the axiom {str(owl_axiom)} into the ontology.\")\n if return_undo:\n return change.reverseChange()\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.remove_axiom","title":"remove_axiom(owl_axiom, return_undo=True)
","text":"Remove an axiom from the current ontology.
Parameters:
Name Type Description Defaultowl_axiom
OWLAxiom
An axiom to be removed.
requiredreturn_undo
bool
Returning the undo operation or not. Defaults to True
.
True
Source code in src/deeponto/onto/ontology.py
def remove_axiom(self, owl_axiom: OWLAxiom, return_undo: bool = True):\n\"\"\"Remove an axiom from the current ontology.\n\n Args:\n owl_axiom (OWLAxiom): An axiom to be removed.\n return_undo (bool, optional): Returning the undo operation or not. Defaults to `True`.\n \"\"\"\n change = RemoveAxiom(self.owl_onto, owl_axiom)\n result = self.owl_onto.applyChange(change)\n logger.info(f\"[{str(result)}] Removing the axiom {str(owl_axiom)} from the ontology.\")\n if return_undo:\n return change.reverseChange()\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.replace_entity","title":"replace_entity(owl_object, entity_iri, replacement_iri)
","text":"Replace an entity in a class expression with another entity.
Parameters:
Name Type Description Defaultowl_object
OWLObject
An OWLObject
entity to be manipulated.
entity_iri
str
IRI of the entity to be replaced.
requiredreplacement_iri
str
IRI of the entity to replace.
requiredReturns:
Type DescriptionOWLObject
The changed OWLObject
entity.
src/deeponto/onto/ontology.py
def replace_entity(self, owl_object: OWLObject, entity_iri: str, replacement_iri: str):\n\"\"\"Replace an entity in a class expression with another entity.\n\n Args:\n owl_object (OWLObject): An `OWLObject` entity to be manipulated.\n entity_iri (str): IRI of the entity to be replaced.\n replacement_iri (str): IRI of the entity to replace.\n\n Returns:\n (OWLObject): The changed `OWLObject` entity.\n \"\"\"\n iri_dict = {IRI.create(entity_iri): IRI.create(replacement_iri)}\n replacer = OWLObjectDuplicator(self.owl_data_factory, iri_dict)\n return replacer.duplicateObject(owl_object)\n
"},{"location":"deeponto/onto/projection/","title":"Ontology Projection","text":""},{"location":"deeponto/onto/projection/#deeponto.onto.projection.OntologyProjector","title":"OntologyProjector(bidirectional_taxonomy=False, only_taxonomy=False, include_literals=False)
","text":"Class for ontology projection -- transforming ontology axioms into triples.
Credit
The code of this class originates from the mOWL library.
Attributes:
Name Type Descriptionbidirectional_taxonomy
bool
If True
then per each SubClass
edge one SuperClass
edge will be generated. Defaults to False
.
only_taxonomy
bool
If True
, then projection will only include subClass
edges. Defaults to False
.
include_literals
bool
If True
the projection will also include triples involving data property assertions and annotations. Defaults to False
.
Parameters:
Name Type Description Defaultbidirectional_taxonomy
bool
description. If True
then per each SubClass
edge one SuperClass
edge will be generated. Defaults to False
.
False
only_taxonomy
bool
If True
, then projection will only include subClass
edges. Defaults to False
.
False
include_literals
bool
description. If True
the projection will also include triples involving data property assertions and annotations. Defaults to False
.
False
Source code in src/deeponto/onto/projection.py
def __init__(self, bidirectional_taxonomy: bool=False, only_taxonomy: bool=False, include_literals: bool=False):\n\"\"\"Initialise an ontology projector.\n\n Args:\n bidirectional_taxonomy (bool, optional): _description_. If `True` then per each `SubClass` edge one `SuperClass` edge will\n be generated. Defaults to `False`.\n only_taxonomy (bool, optional): If `True`, then projection will only include `subClass` edges. Defaults to `False`.\n include_literals (bool, optional): _description_. If `True` the projection will also include triples involving data property\n assertions and annotations. Defaults to `False`.\n \"\"\"\n self.bidirectional_taxonomy = bidirectional_taxonomy\n self.include_literals = include_literals\n self.only_taxonomy = only_taxonomy\n self.projector = Projector(self.bidirectional_taxonomy, self.only_taxonomy,\n self.include_literals)\n
"},{"location":"deeponto/onto/projection/#deeponto.onto.projection.OntologyProjector.project","title":"project(ontology)
","text":"The projection algorithm implemented in OWL2Vec*.
Parameters:
Name Type Description Defaultontology
Ontology
An ontology to be processed.
requiredReturns:
Type Descriptionset
Set of triples after projection.
Source code insrc/deeponto/onto/projection.py
def project(self, ontology: Ontology):\n\"\"\"The projection algorithm implemented in OWL2Vec*.\n\n Args:\n ontology (Ontology): An ontology to be processed.\n\n Returns:\n (set): Set of triples after projection.\n \"\"\"\n ontology = ontology.owl_onto\n if not isinstance(ontology, OWLOntology):\n raise TypeError(\n \"Input ontology must be of type `org.semanticweb.owlapi.model.OWLOntology`.\")\n edges = self.projector.project(ontology)\n triples = []\n for e in edges:\n s, r, o = str(e.src()), str(e.rel()), str(e.dst())\n if o != \"\":\n if r == \"http://subclassof\":\n r = str(RDFS.subClassOf)\n triples.append((s, r, o))\n return set(triples)\n
"},{"location":"deeponto/onto/pruning/","title":"Ontology Pruning","text":""},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner","title":"OntologyPruner(onto)
","text":"Class for in-place ontology pruning.
Attributes:
Name Type Descriptiononto
Ontology
The input ontology to be pruned. Note that the pruning process is in-place.
Parameters:
Name Type Description Defaultonto
Ontology
The input ontology to be pruned. Note that the pruning process is in-place.
required Source code insrc/deeponto/onto/pruning.py
def __init__(self, onto: Ontology):\n\"\"\"Initialise an ontology pruner.\n\n Args:\n onto (Ontology): The input ontology to be pruned. Note that the pruning process is in-place.\n \"\"\"\n self.onto = onto\n self._pruning_applied = None\n
"},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner.save_onto","title":"save_onto(save_path)
","text":"Save the pruned ontology file to the given path.
Source code insrc/deeponto/onto/pruning.py
def save_onto(self, save_path: str):\n\"\"\"Save the pruned ontology file to the given path.\"\"\"\n logging.info(f\"{self._pruning_applied} pruning algorithm has been applied.\")\n logging.info(f\"Save the pruned ontology file to {save_path}.\")\n return self.onto.save_onto(save_path)\n
"},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner.prune","title":"prune(class_iris_to_be_removed)
","text":"Apply ontology pruning while preserving the relevant hierarchy.
paper
This refers to the ontology pruning algorithm introduced in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).
For each class \\(c\\) to be pruned, subsumption axioms will be created between \\(c\\)'s parents and children so as to preserve the relevant hierarchy.
Parameters:
Name Type Description Defaultclass_iris_to_be_removed
list[str]
Classes with IRIs in this list will be pruned and the relevant hierarchy will be repaired.
required Source code insrc/deeponto/onto/pruning.py
def prune(self, class_iris_to_be_removed: List[str]):\nr\"\"\"Apply ontology pruning while preserving the relevant hierarchy.\n\n !!! credit \"paper\"\n\n This refers to the ontology pruning algorithm introduced in the paper:\n [*Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022)*](https://link.springer.com/chapter/10.1007/978-3-031-19433-7_33).\n\n For each class $c$ to be pruned, subsumption axioms will be created between $c$'s parents and children so as to preserve the\n relevant hierarchy.\n\n Args:\n class_iris_to_be_removed (list[str]): Classes with IRIs in this list will be pruned and the relevant hierarchy will be repaired.\n \"\"\"\n\n # create the subsumption axioms first\n for cl_iri in class_iris_to_be_removed:\n cl = self.onto.get_owl_object(cl_iri)\n cl_parents = self.onto.get_asserted_parents(cl)\n cl_children = self.onto.get_asserted_children(cl)\n for parent, child in itertools.product(cl_parents, cl_children):\n sub_axiom = self.onto.owl_data_factory.getOWLSubClassOfAxiom(child, parent)\n self.onto.add_axiom(sub_axiom)\n\n # apply pruning\n class_remover = OWLEntityRemover(Collections.singleton(self.onto.owl_onto))\n for cl_iri in class_iris_to_be_removed:\n cl = self.onto.get_owl_object(cl_iri)\n cl.accept(class_remover)\n self.onto.owl_manager.applyChanges(class_remover.getChanges())\n
"},{"location":"deeponto/onto/reasoning/","title":"Ontology Reasoning","text":""},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner","title":"OntologyReasoner(onto, reasoner_type)
","text":"Ontology reasoner class that extends from the Java library OWLAPI.
Attributes:
Name Type Descriptiononto
Ontology
The input deeponto
ontology.
owl_reasoner_factory
OWLReasonerFactory
A reasoner factory for creating a reasoner.
owl_reasoner
OWLReasoner
The created reasoner.
owl_data_factory
OWLDataFactory
A data factory (inherited from onto
) for manipulating axioms.
Parameters:
Name Type Description Defaultonto
Ontology
The input ontology to conduct reasoning on.
requiredreasoner_type
str
The type of reasoner used. Options are [\"hermit\", \"elk\", \"struct\"]
.
src/deeponto/onto/ontology.py
def __init__(self, onto: Ontology, reasoner_type: str):\n\"\"\"Initialise an ontology reasoner.\n\n Args:\n onto (Ontology): The input ontology to conduct reasoning on.\n reasoner_type (str): The type of reasoner used. Options are `[\"hermit\", \"elk\", \"struct\"]`.\n \"\"\"\n self.onto = onto\n self.owl_reasoner_factory = None\n self.owl_reasoner = None\n self.reasoner_type = reasoner_type\n self.load_reasoner(self.reasoner_type)\n self.owl_data_factory = self.onto.owl_data_factory\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.load_reasoner","title":"load_reasoner(reasoner_type)
","text":"Load a new reaonser and dispose the old one if existed.
Source code insrc/deeponto/onto/ontology.py
def load_reasoner(self, reasoner_type: str):\n\"\"\"Load a new reaonser and dispose the old one if existed.\"\"\"\n assert reasoner_type in REASONER_DICT.keys(), f\"Unknown or unsupported reasoner type: {reasoner_type}.\"\n\n if self.owl_reasoner:\n self.owl_reasoner.dispose()\n\n self.reasoner_type = reasoner_type\n self.owl_reasoner_factory = REASONER_DICT[self.reasoner_type]()\n # TODO: remove ELK message\n # somehow Level.ERROR does not prevent the INFO message from ELK\n # Logger.getLogger(\"org.semanticweb.elk\").setLevel(Level.OFF)\n\n self.owl_reasoner = self.owl_reasoner_factory.createReasoner(self.onto.owl_onto)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_entity_type","title":"get_entity_type(entity, is_singular=False)
staticmethod
","text":"A handy method to get the type of an entity (OWLObject
).
NOTE: This method is inherited from the Ontology Class.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef get_entity_type(entity: OWLObject, is_singular: bool = False):\n\"\"\"A handy method to get the type of an entity (`OWLObject`).\n\n NOTE: This method is inherited from the Ontology Class.\n \"\"\"\n return Ontology.get_entity_type(entity, is_singular)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.has_iri","title":"has_iri(entity)
staticmethod
","text":"Check if an entity has an IRI.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef has_iri(entity: OWLObject):\n\"\"\"Check if an entity has an IRI.\"\"\"\n try:\n entity.getIRI()\n return True\n except:\n return False\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_inferred_super_entities","title":"get_inferred_super_entities(entity, direct=False)
","text":"Return the IRIs of named super-entities of a given OWLObject
according to the reasoner.
A mixture of getSuperClasses
, getSuperObjectProperties
, getSuperDataProperties
functions imported from the OWLAPI reasoner. The type of input entity will be automatically determined. The top entity such as owl:Thing
is ignored.
Parameters:
Name Type Description Defaultentity
OWLObject
An OWLObject
entity of interest.
direct
bool
Return parents (direct=True
) or ancestors (direct=False
). Defaults to False
.
False
Returns:
Type Descriptionlist[str]
A list of IRIs of the super-entities of the given OWLObject
entity.
src/deeponto/onto/ontology.py
def get_inferred_super_entities(self, entity: OWLObject, direct: bool = False):\nr\"\"\"Return the IRIs of named super-entities of a given `OWLObject` according to the reasoner.\n\n A mixture of `getSuperClasses`, `getSuperObjectProperties`, `getSuperDataProperties`\n functions imported from the OWLAPI reasoner. The type of input entity will be\n automatically determined. The top entity such as `owl:Thing` is ignored.\n\n\n Args:\n entity (OWLObject): An `OWLObject` entity of interest.\n direct (bool, optional): Return parents (`direct=True`) or\n ancestors (`direct=False`). Defaults to `False`.\n\n Returns:\n (list[str]): A list of IRIs of the super-entities of the given `OWLObject` entity.\n \"\"\"\n entity_type = self.get_entity_type(entity)\n get_super = f\"getSuper{entity_type}\"\n TOP = TOP_BOTTOMS[entity_type].TOP # get the corresponding TOP entity\n super_entities = getattr(self.owl_reasoner, get_super)(entity, direct).getFlattened()\n super_entity_iris = [str(s.getIRI()) for s in super_entities]\n # the root node is owl#Thing\n if TOP in super_entity_iris:\n super_entity_iris.remove(TOP)\n return super_entity_iris\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_inferred_sub_entities","title":"get_inferred_sub_entities(entity, direct=False)
","text":"Return the IRIs of named sub-entities of a given OWLObject
according to the reasoner.
A mixture of getSubClasses
, getSubObjectProperties
, getSubDataProperties
functions imported from the OWLAPI reasoner. The type of input entity will be automatically determined. The bottom entity such as owl:Nothing
is ignored.
Parameters:
Name Type Description Defaultentity
OWLObject
An OWLObject
entity of interest.
direct
bool
Return parents (direct=True
) or ancestors (direct=False
). Defaults to False
.
False
Returns:
Type Descriptionlist[str]
A list of IRIs of the sub-entities of the given OWLObject
entity.
src/deeponto/onto/ontology.py
def get_inferred_sub_entities(self, entity: OWLObject, direct: bool = False):\n\"\"\"Return the IRIs of named sub-entities of a given `OWLObject` according to the reasoner.\n\n A mixture of `getSubClasses`, `getSubObjectProperties`, `getSubDataProperties`\n functions imported from the OWLAPI reasoner. The type of input entity will be\n automatically determined. The bottom entity such as `owl:Nothing` is ignored.\n\n Args:\n entity (OWLObject): An `OWLObject` entity of interest.\n direct (bool, optional): Return parents (`direct=True`) or\n ancestors (`direct=False`). Defaults to `False`.\n\n Returns:\n (list[str]): A list of IRIs of the sub-entities of the given `OWLObject` entity.\n \"\"\"\n entity_type = self.get_entity_type(entity)\n get_sub = f\"getSub{entity_type}\"\n BOTTOM = TOP_BOTTOMS[entity_type].BOTTOM\n sub_entities = getattr(self.owl_reasoner, get_sub)(entity, direct).getFlattened()\n sub_entity_iris = [str(s.getIRI()) for s in sub_entities]\n # the root node is owl#Thing\n if BOTTOM in sub_entity_iris:\n sub_entity_iris.remove(BOTTOM)\n return sub_entity_iris\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_subsumption","title":"check_subsumption(sub_entity, super_entity)
","text":"Check if the first entity is subsumed by the second entity according to the reasoner.
Source code insrc/deeponto/onto/ontology.py
def check_subsumption(self, sub_entity: OWLObject, super_entity: OWLObject):\n\"\"\"Check if the first entity is subsumed by the second entity according to the reasoner.\"\"\"\n entity_type = self.get_entity_type(sub_entity, is_singular=True)\n assert entity_type == self.get_entity_type(super_entity, is_singular=True)\n\n sub_axiom = getattr(self.owl_data_factory, f\"getOWLSub{entity_type}OfAxiom\")(sub_entity, super_entity)\n\n return self.owl_reasoner.isEntailed(sub_axiom)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_disjoint","title":"check_disjoint(entity1, entity2)
","text":"Check if two entities are disjoint according to the reasoner.
Source code insrc/deeponto/onto/ontology.py
def check_disjoint(self, entity1: OWLObject, entity2: OWLObject):\n\"\"\"Check if two entities are disjoint according to the reasoner.\"\"\"\n entity_type = self.get_entity_type(entity1)\n assert entity_type == self.get_entity_type(entity2)\n\n disjoint_axiom = getattr(self.owl_data_factory, f\"getOWLDisjoint{entity_type}Axiom\")([entity1, entity2])\n\n return self.owl_reasoner.isEntailed(disjoint_axiom)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_common_descendants","title":"check_common_descendants(entity1, entity2)
","text":"Check if two entities have a common decendant.
Entities can be OWL class or property expressions, and can be either atomic or complex. It takes longer computation time for the complex ones. Complex entities do not have an IRI. This method is optimised in the way that if there exists an atomic entity A
, we compute descendants for A
and compare them against the other entity which could be complex.
src/deeponto/onto/ontology.py
def check_common_descendants(self, entity1: OWLObject, entity2: OWLObject):\n\"\"\"Check if two entities have a common decendant.\n\n Entities can be **OWL class or property expressions**, and can be either **atomic\n or complex**. It takes longer computation time for the complex ones. Complex\n entities do not have an IRI. This method is optimised in the way that if\n there exists an atomic entity `A`, we compute descendants for `A` and\n compare them against the other entity which could be complex.\n \"\"\"\n entity_type = self.get_entity_type(entity1)\n assert entity_type == self.get_entity_type(entity2)\n\n if not self.has_iri(entity1) and not self.has_iri(entity2):\n logger.warn(\"Computing descendants for two complex entities is not efficient.\")\n\n # `computed` is the one we compute the descendants\n # `compared` is the one we compare `computed`'s descendant one-by-one\n # we set the atomic entity as `computed` for efficiency if there is one\n computed, compared = entity1, entity2\n if not self.has_iri(entity1) and self.has_iri(entity2):\n computed, compared = entity2, entity1\n\n # for every inferred child of `computed`, check if it is subsumed by `compared``\n for descendant_iri in self.get_inferred_sub_entities(computed, direct=False):\n # print(\"check a subsumption\")\n if self.check_subsumption(self.onto.get_owl_object(descendant_iri), compared):\n return True\n return False\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.instances_of","title":"instances_of(owl_class, direct=False)
","text":"Return the list of named individuals that are instances of a given OWL class expression.
Parameters:
Name Type Description Defaultowl_class
OWLClassExpression
An ontology class of interest.
requireddirect
bool
Return direct instances (direct=True
) or also include the sub-classes' instances (direct=False
). Defaults to False
.
False
Returns:
Type Descriptionlist[OWLIndividual]
A list of named individuals that are instances of owl_class
.
src/deeponto/onto/ontology.py
def instances_of(self, owl_class: OWLClassExpression, direct: bool = False):\n\"\"\"Return the list of named individuals that are instances of a given OWL class expression.\n\n Args:\n owl_class (OWLClassExpression): An ontology class of interest.\n direct (bool, optional): Return direct instances (`direct=True`) or\n also include the sub-classes' instances (`direct=False`). Defaults to `False`.\n\n Returns:\n (list[OWLIndividual]): A list of named individuals that are instances of `owl_class`.\n \"\"\"\n return list(self.owl_reasoner.getInstances(owl_class, direct).getFlattened())\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_instance","title":"check_instance(owl_instance, owl_class)
","text":"Check if a named individual is an instance of an OWL class.
Source code insrc/deeponto/onto/ontology.py
def check_instance(self, owl_instance: OWLIndividual, owl_class: OWLClassExpression):\n\"\"\"Check if a named individual is an instance of an OWL class.\"\"\"\n assertion_axiom = self.owl_data_factory.getOWLClassAssertionAxiom(owl_class, owl_instance)\n return self.owl_reasoner.isEntailed(assertion_axiom)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_common_instances","title":"check_common_instances(owl_class1, owl_class2)
","text":"Check if two OWL class expressions have a common instance.
Class expressions can be atomic or complex, and it takes longer computation time for the complex ones. Complex classes do not have an IRI. This method is optimised in the way that if there exists an atomic class A
, we compute instances for A
and compare them against the other class which could be complex.
Difference with check_common_descendants
The inputs of this function are restricted to OWL class expressions. This is because descendant
is related to hierarchy and both class and property expressions have a hierarchy, but instance
is restricted to classes.
src/deeponto/onto/ontology.py
def check_common_instances(self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression):\n\"\"\"Check if two OWL class expressions have a common instance.\n\n Class expressions can be **atomic or complex**, and it takes longer computation time\n for the complex ones. Complex classes do not have an IRI. This method is optimised\n in the way that if there exists an atomic class `A`, we compute instances for `A` and\n compare them against the other class which could be complex.\n\n !!! note \"Difference with [`check_common_descendants`][deeponto.onto.OntologyReasoner.check_common_descendants]\"\n The inputs of this function are restricted to **OWL class expressions**. This is because\n `descendant` is related to hierarchy and both class and property expressions have a hierarchy,\n but `instance` is restricted to classes.\n \"\"\"\n\n if not self.has_iri(owl_class1) and not self.has_iri(owl_class2):\n logger.warn(\"Computing instances for two complex classes is not efficient.\")\n\n # `computed` is the one we compute the instances\n # `compared` is the one we compare `computed`'s descendant one-by-one\n # we set the atomic entity as `computed` for efficiency if there is one\n computed, compared = owl_class1, owl_class2\n if not self.has_iri(owl_class1) and self.has_iri(owl_class2):\n computed, compared = owl_class2, owl_class2\n\n # for every inferred instance of `computed`, check if it is subsumed by `compared``\n for instance in self.instances_of(computed, direct=False):\n if self.check_instance(instance, compared):\n return True\n return False\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_assumed_disjoint","title":"check_assumed_disjoint(owl_class1, owl_class2)
","text":"Check if two OWL class expressions satisfy the Assumed Disjointness.
Paper
The definition of Assumed Disjointness comes from the paper: Language Model Analysis for Ontology Subsumption Inference.
Assumed Disjointness (Definition)
Two class expressions \\(C\\) and \\(D\\) are assumed to be disjoint if they meet the followings:
Note that the special case where \\(C\\) and \\(D\\) are already disjoint is covered by the first check. The paper also proposed a practical alternative to decide Assumed Disjointness. See check_assumed_disjoint_alternative
.
Examples:
Suppose pre-load an ontology onto
from the disease ontology file doid.owl
.
>>> c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n>>> c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n>>> onto.reasoner.check_assumed_disjoint(c1, c2)\n[SUCCESSFULLY] Adding the axiom DisjointClasses(<http://purl.obolibrary.org/obo/DOID_0001816> <http://purl.obolibrary.org/obo/DOID_4058>) into the ontology.\n[CHECK1 True] input classes are still satisfiable;\n[SUCCESSFULLY] Removing the axiom from the ontology.\n[CHECK2 False] input classes have NO common descendant.\n[PASSED False] assumed disjointness check done.\nFalse\n
Source code in src/deeponto/onto/ontology.py
def check_assumed_disjoint(self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression):\nr\"\"\"Check if two OWL class expressions satisfy the Assumed Disjointness.\n\n !!! credit \"Paper\"\n\n The definition of **Assumed Disjointness** comes from the paper:\n [Language Model Analysis for Ontology Subsumption Inference](https://aclanthology.org/2023.findings-acl.213).\n\n !!! note \"Assumed Disjointness (Definition)\"\n Two class expressions $C$ and $D$ are assumed to be disjoint if they meet the followings:\n\n 1. By adding the disjointness axiom of them into the ontology, $C$ and $D$ are **still satisfiable**.\n 2. $C$ and $D$ **do not have a common descendant** (otherwise $C$ and $D$ can be satisfiable but their\n common descendants become the bottom $\\bot$.)\n\n Note that the special case where $C$ and $D$ are already disjoint is covered by the first check.\n The paper also proposed a practical alternative to decide Assumed Disjointness.\n See [`check_assumed_disjoint_alternative`][deeponto.onto.OntologyReasoner.check_assumed_disjoint_alternative].\n\n Examples:\n Suppose pre-load an ontology `onto` from the disease ontology file `doid.owl`.\n\n ```python\n >>> c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n >>> c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n >>> onto.reasoner.check_assumed_disjoint(c1, c2)\n [SUCCESSFULLY] Adding the axiom DisjointClasses(<http://purl.obolibrary.org/obo/DOID_0001816> <http://purl.obolibrary.org/obo/DOID_4058>) into the ontology.\n [CHECK1 True] input classes are still satisfiable;\n [SUCCESSFULLY] Removing the axiom from the ontology.\n [CHECK2 False] input classes have NO common descendant.\n [PASSED False] assumed disjointness check done.\n False\n ```\n \"\"\"\n # banner_message(\"Check Asssumed Disjointness\")\n\n entity_type = self.get_entity_type(owl_class1)\n assert entity_type == self.get_entity_type(owl_class2)\n\n # adding the disjointness axiom of `class1`` and `class2``\n disjoint_axiom = getattr(self.owl_data_factory, f\"getOWLDisjoint{entity_type}Axiom\")([owl_class1, owl_class2])\n undo_change = self.onto.add_axiom(disjoint_axiom, return_undo=True)\n self.load_reasoner(self.reasoner_type)\n\n # check if they are still satisfiable\n still_satisfiable = self.owl_reasoner.isSatisfiable(owl_class1)\n still_satisfiable = still_satisfiable and self.owl_reasoner.isSatisfiable(owl_class2)\n logger.info(f\"[CHECK1 {still_satisfiable}] input classes are still satisfiable;\")\n\n # remove the axiom and re-construct the reasoner\n undo_change_result = self.onto.owl_onto.applyChange(undo_change)\n logger.info(f\"[{str(undo_change_result)}] Removing the axiom from the ontology.\")\n self.load_reasoner(self.reasoner_type)\n\n # failing first check, there is no need to do the second.\n if not still_satisfiable:\n logger.info(\"Failed `satisfiability check`, skip the `common descendant` check.\")\n logger.info(f\"[PASSED {still_satisfiable}] assumed disjointness check done.\")\n return False\n\n # otherwise, the classes are still satisfiable and we should conduct the second check\n has_common_descendants = self.check_common_descendants(owl_class1, owl_class2)\n logger.info(f\"[CHECK2 {not has_common_descendants}] input classes have NO common descendant.\")\n logger.info(f\"[PASSED {not has_common_descendants}] assumed disjointness check done.\")\n return not has_common_descendants\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_assumed_disjoint_alternative","title":"check_assumed_disjoint_alternative(owl_class1, owl_class2, verbose=False)
","text":"Check if two OWL class expressions satisfy the Assumed Disjointness.
Paper
The definition of Assumed Disjointness comes from the paper: Language Model Analysis for Ontology Subsumption Inference.
The practical alternative version of check_assumed_disjoint
with following conditions:
Assumed Disjointness (Practical Alternative)
Two class expressions \\(C\\) and \\(D\\) are assumed to be disjoint if they
If all the conditions have been met, then we assume class1
and class2
as disjoint.
Examples:
Suppose pre-load an ontology onto
from the disease ontology file doid.owl
.
>>> c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n>>> c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n>>> onto.reasoner.check_assumed_disjoint(c1, c2, verbose=True)\n[CHECK1 True] input classes have NO subsumption relationship;\n[CHECK2 False] input classes have NO common descendant;\nFailed the `common descendant check`, skip the `common instance` check.\n[PASSED False] assumed disjointness check done.\nFalse\n
In this alternative implementation, we do no need to add and remove axioms which will then be time-saving. Source code in src/deeponto/onto/ontology.py
def check_assumed_disjoint_alternative(\n self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression, verbose: bool = False\n):\nr\"\"\"Check if two OWL class expressions satisfy the Assumed Disjointness.\n\n !!! credit \"Paper\"\n\n The definition of **Assumed Disjointness** comes from the paper:\n [Language Model Analysis for Ontology Subsumption Inference](https://aclanthology.org/2023.findings-acl.213).\n\n The practical alternative version of [`check_assumed_disjoint`][deeponto.onto.OntologyReasoner.check_assumed_disjoint]\n with following conditions:\n\n\n !!! note \"Assumed Disjointness (Practical Alternative)\"\n Two class expressions $C$ and $D$ are assumed to be disjoint if they\n\n 1. **do not** have a **subsumption relationship** between them,\n 2. **do not** have a **common descendant** (in TBox),\n 3. **do not** have a **common instance** (in ABox).\n\n If all the conditions have been met, then we assume `class1` and `class2` as disjoint.\n\n Examples:\n Suppose pre-load an ontology `onto` from the disease ontology file `doid.owl`.\n\n ```python\n >>> c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n >>> c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n >>> onto.reasoner.check_assumed_disjoint(c1, c2, verbose=True)\n [CHECK1 True] input classes have NO subsumption relationship;\n [CHECK2 False] input classes have NO common descendant;\n Failed the `common descendant check`, skip the `common instance` check.\n [PASSED False] assumed disjointness check done.\n False\n ```\n In this alternative implementation, we do no need to add and remove axioms which will then\n be time-saving.\n \"\"\"\n # banner_message(\"Check Asssumed Disjointness (Alternative)\")\n\n # # Check for entailed disjointness (short-cut)\n # if self.check_disjoint(owl_class1, owl_class2):\n # print(f\"Input classes are already entailed as disjoint.\")\n # return True\n\n # Check for entailed subsumption,\n # common descendants and common instances\n\n has_subsumption = self.check_subsumption(owl_class1, owl_class2)\n has_subsumption = has_subsumption or self.check_subsumption(owl_class2, owl_class1)\n if verbose:\n logger.info(f\"[CHECK1 {not has_subsumption}] input classes have NO subsumption relationship;\")\n if has_subsumption:\n if verbose:\n logger.info(\"Failed the `subsumption check`, skip the `common descendant` check.\")\n logger.info(f\"[PASSED {not has_subsumption}] assumed disjointness check done.\")\n return False\n\n has_common_descendants = self.check_common_descendants(owl_class1, owl_class2)\n if verbose:\n logger.info(f\"[CHECK2 {not has_common_descendants}] input classes have NO common descendant;\")\n if has_common_descendants:\n if verbose:\n logger.info(\"Failed the `common descendant check`, skip the `common instance` check.\")\n logger.info(f\"[PASSED {not has_common_descendants}] assumed disjointness check done.\")\n return False\n\n # TODO: `check_common_instances` is still experimental because we have not tested it with ontologies of rich ABox.\n has_common_instances = self.check_common_instances(owl_class1, owl_class2)\n if verbose:\n logger.info(f\"[CHECK3 {not has_common_instances}] input classes have NO common instance;\")\n logger.info(f\"[PASSED {not has_common_instances}] assumed disjointness check done.\")\n return not has_common_instances\n
"},{"location":"deeponto/onto/taxonomy/","title":"Ontology Taxonomy","text":"Extracting the taxonomy from an ontology often comes in handy for graph-based machine learning techniques. Here we provide a basic Taxonomy
class built upon networkx.DiGraph
where nodes represent entities and edges represent subsumptions. We then provide the OntologyTaxonomy
class that extends the basic Taxonomy
. It utilises the simple structural reasoner to enrich the ontology subsumptions beyond asserted ones, and build the taxonomy over the expanded subsumptions. Each node represents a named class and has a label (rdfs:label
) attribute. The root node owl:Thing
is also specified for functions like counting the node depths, etc. Moreover, we provide the WordnetTaxonomy
class that wraps the WordNet knowledge graph for easier access.
Note
It is also possible to use OntologyProjector
to extract triples from the ontology as edges of the taxonomy. We will consider this feature in the future.
Taxonomy(edges, root_node=None)
","text":"Class for building the taxonomy over structured data.
Attributes:
Name Type Descriptionnodes
list
A list of entity ids.
edges
list
A list of (parent, child)
pairs.
graph
networkx.DiGraph
A directed graph that represents the taxonomy.
root_node
Optional[str]
Optional root node id. Defaults to None
.
src/deeponto/onto/taxonomy.py
def __init__(self, edges: list, root_node: Optional[str] = None):\n self.edges = edges\n self.graph = nx.DiGraph(self.edges)\n self.nodes = list(self.graph.nodes)\n self.root_node = root_node\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_node_attributes","title":"get_node_attributes(entity_id)
","text":"Get the attributes of the given entity.
Source code insrc/deeponto/onto/taxonomy.py
def get_node_attributes(self, entity_id: str):\n\"\"\"Get the attributes of the given entity.\"\"\"\n return self.graph.nodes[entity_id]\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_children","title":"get_children(entity_id, apply_transitivity=False)
","text":"Get the set of children for a given entity.
Source code insrc/deeponto/onto/taxonomy.py
def get_children(self, entity_id: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of children for a given entity.\"\"\"\n if not apply_transitivity:\n return set(self.graph.successors(entity_id))\n else:\n return set(itertools.chain.from_iterable(nx.dfs_successors(self.graph, entity_id).values()))\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_parents","title":"get_parents(entity_id, apply_transitivity=False)
","text":"Get the set of parents for a given entity.
Source code insrc/deeponto/onto/taxonomy.py
def get_parents(self, entity_id: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of parents for a given entity.\"\"\"\n if not apply_transitivity:\n return set(self.graph.predecessors(entity_id))\n else:\n # NOTE: the nx.dfs_predecessors does not give desirable results\n frontier = list(self.get_parents(entity_id))\n explored = set()\n descendants = frontier\n while frontier:\n for candidate in frontier:\n descendants += list(self.get_parents(candidate))\n explored.update(frontier)\n frontier = set(descendants) - explored\n return set(descendants)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_descendant_graph","title":"get_descendant_graph(entity_id)
","text":"Create a descendant graph (networkx.DiGraph
) for a given entity.
src/deeponto/onto/taxonomy.py
def get_descendant_graph(self, entity_id: str):\nr\"\"\"Create a descendant graph (`networkx.DiGraph`) for a given entity.\"\"\"\n descendants = self.get_children(entity_id, apply_transitivity=True)\n return self.graph.subgraph(list(descendants))\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_shortest_node_depth","title":"get_shortest_node_depth(entity_id)
","text":"Get the shortest depth of the given entity in the taxonomy.
Source code insrc/deeponto/onto/taxonomy.py
def get_shortest_node_depth(self, entity_id: str):\n\"\"\"Get the shortest depth of the given entity in the taxonomy.\"\"\"\n if not self.root_node:\n raise RuntimeError(\"No root node specified.\")\n return nx.shortest_path_length(self.graph, self.root_node, entity_id)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_longest_node_depth","title":"get_longest_node_depth(entity_id)
","text":"Get the longest depth of the given entity in the taxonomy.
Source code insrc/deeponto/onto/taxonomy.py
def get_longest_node_depth(self, entity_id: str):\n\"\"\"Get the longest depth of the given entity in the taxonomy.\"\"\"\n if not self.root_node:\n raise RuntimeError(\"No root node specified.\")\n return max([len(p) for p in nx.all_simple_paths(self.graph, self.root_node, entity_id)])\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_lowest_common_ancestor","title":"get_lowest_common_ancestor(entity_id1, entity_id2)
","text":"Get the lowest common ancestor of the given two entities.
Source code insrc/deeponto/onto/taxonomy.py
def get_lowest_common_ancestor(self, entity_id1: str, entity_id2: str):\n\"\"\"Get the lowest common ancestor of the given two entities.\"\"\"\n return nx.lowest_common_ancestor(self.graph, entity_id1, entity_id2)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy","title":"OntologyTaxonomy(onto, reasoner_type='struct')
","text":" Bases: Taxonomy
Class for building the taxonomy (top-down subsumption graph) from an ontology.
The nodes of this graph are named classes only, but the hierarchy is enriched (beyond asserted axioms) by an ontology reasoner.
Attributes:
Name Type Descriptiononto
Ontology
The input ontology to build the taxonomy.
reasoner_type
str
The type of reasoner used. Defaults to \"struct\"
. Options are [\"hermit\", \"elk\", \"struct\"]
.
reasoner
OntologyReasoner
An ontology reasoner used for completing the hierarchy. If the reasoner_type
is the same as onto.reasoner_type
, then re-use onto.reasoner
; otherwise, create a new one.
root_node
str
The root node that represents owl:Thing
.
nodes
list
A list of named class IRIs.
edges
list
A list of (parent, child)
class pairs. That is, if \\(C \\sqsubseteq D\\), then \\((D, C)\\) will be added as an edge.
graph
networkx.DiGraph
A directed subsumption graph.
Source code insrc/deeponto/onto/taxonomy.py
def __init__(self, onto: Ontology, reasoner_type: str = \"struct\"):\n self.onto = onto\n # the reasoner is used for completing the hierarchy\n self.reasoner_type = reasoner_type\n # re-use onto.reasoner if the reasoner type is the same; otherwise create a new one\n self.reasoner = (\n self.onto.reasoner\n if reasoner_type == self.onto.reasoner_type\n else OntologyReasoner(self.onto, reasoner_type)\n )\n root_node = \"owl:Thing\"\n subsumption_pairs = []\n for cl_iri, cl in self.onto.owl_classes.items():\n # NOTE: this is different from using self.onto.get_asserted_parents which does not conduct simple reasoning\n named_parents = self.reasoner.get_inferred_super_entities(cl, direct=True)\n if not named_parents:\n # if no parents then add root node as the parent\n named_parents.append(root_node)\n for named_parent in named_parents:\n subsumption_pairs.append((named_parent, cl_iri))\n super().__init__(edges=subsumption_pairs, root_node=root_node)\n\n # set node annotations (rdfs:label)\n for class_iri in self.nodes:\n if class_iri == self.root_node:\n self.graph.nodes[class_iri][\"label\"] = \"Thing\"\n else:\n owl_class = self.onto.get_owl_object(class_iri)\n self.graph.nodes[class_iri][\"label\"] = self.onto.get_annotations(owl_class, RDFS_LABEL)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_parents","title":"get_parents(class_iri, apply_transitivity=False)
","text":"Get the set of parents for a given class.
It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning. For more advanced logical reasoning, use the DL reasoner self.onto.reasoner
instead.
src/deeponto/onto/taxonomy.py
def get_parents(self, class_iri: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of parents for a given class.\n\n It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning.\n For more advanced logical reasoning, use the DL reasoner `self.onto.reasoner` instead.\n \"\"\"\n return super().get_parents(class_iri, apply_transitivity)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_children","title":"get_children(class_iri, apply_transitivity=False)
","text":"Get the set of children for a given class.
It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning. For more advanced logical reasoning, use the DL reasoner self.onto.reasoner
instead.
src/deeponto/onto/taxonomy.py
def get_children(self, class_iri: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of children for a given class.\n\n It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning.\n For more advanced logical reasoning, use the DL reasoner `self.onto.reasoner` instead.\n \"\"\"\n return super().get_children(class_iri, apply_transitivity)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_descendant_graph","title":"get_descendant_graph(class_iri)
","text":"Create a descendant graph (networkx.DiGraph
) for a given ontology class.
src/deeponto/onto/taxonomy.py
def get_descendant_graph(self, class_iri: str):\nr\"\"\"Create a descendant graph (`networkx.DiGraph`) for a given ontology class.\"\"\"\n super().get_descendant_graph(class_iri)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_shortest_node_depth","title":"get_shortest_node_depth(class_iri)
","text":"Get the shortest depth of the given named class in the taxonomy.
Source code insrc/deeponto/onto/taxonomy.py
def get_shortest_node_depth(self, class_iri: str):\n\"\"\"Get the shortest depth of the given named class in the taxonomy.\"\"\"\n return nx.shortest_path_length(self.graph, self.root_node, class_iri)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_longest_node_depth","title":"get_longest_node_depth(class_iri)
","text":"Get the longest depth of the given named class in the taxonomy.
Source code insrc/deeponto/onto/taxonomy.py
def get_longest_node_depth(self, class_iri: str):\n\"\"\"Get the longest depth of the given named class in the taxonomy.\"\"\"\n return max([len(p) for p in nx.all_simple_paths(self.graph, self.root_node, class_iri)])\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_lowest_common_ancestor","title":"get_lowest_common_ancestor(class_iri1, class_iri2)
","text":"Get the lowest common ancestor of the given two named classes.
Source code insrc/deeponto/onto/taxonomy.py
def get_lowest_common_ancestor(self, class_iri1: str, class_iri2: str):\n\"\"\"Get the lowest common ancestor of the given two named classes.\"\"\"\n return super().get_lowest_common_ancestor(class_iri1, class_iri2)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy","title":"WordnetTaxonomy(pos='n', include_membership=False)
","text":" Bases: Taxonomy
Class for the building the taxonomy (hypernym graph) from wordnet.
Attributes:
Name Type Descriptionpos
str
The pos-tag of entities to be extracted from wordnet.
nodes
list
A list of entity ids extracted from wordnet.
edges
list
A list of hyponym-hypernym pairs.
graph
networkx.DiGraph
A directed hypernym graph.
Parameters:
Name Type Description Defaultpos
str
The pos-tag of entities to be extracted from wordnet.
'n'
include_membership
bool
Whether to include instance_hypernyms
or not (e.g., London is an instance of City). Defaults to False
.
False
Source code in src/deeponto/onto/taxonomy.py
def __init__(self, pos: str = \"n\", include_membership: bool = False):\nr\"\"\"Initialise the wordnet taxonomy.\n\n Args:\n pos (str): The pos-tag of entities to be extracted from wordnet.\n include_membership (bool): Whether to include `instance_hypernyms` or not (e.g., London is an instance of City). Defaults to `False`.\n \"\"\"\n\n self.pos = pos\n synsets = self.fetch_synsets(pos=pos)\n hypernym_pairs = self.fetch_hypernyms(synsets, include_membership)\n super().__init__(edges=hypernym_pairs)\n\n # set node annotations\n for synset in synsets:\n try:\n self.graph.nodes[synset.name()][\"name\"] = synset.name().split(\".\")[0].replace(\"_\", \" \")\n self.graph.nodes[synset.name()][\"definition\"] = synset.definition()\n except:\n continue\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy.fetch_synsets","title":"fetch_synsets(pos='n')
staticmethod
","text":"Get synsets of certain pos-tag from wordnet.
Source code insrc/deeponto/onto/taxonomy.py
@staticmethod\ndef fetch_synsets(pos: str = \"n\"):\n\"\"\"Get synsets of certain pos-tag from wordnet.\"\"\"\n words = wn.words()\n synsets = set()\n for word in words:\n synsets.update(wn.synsets(word, pos=pos))\n logger.info(f'{len(synsets)} synsets (pos=\"{pos}\") fetched.')\n return synsets\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy.fetch_hypernyms","title":"fetch_hypernyms(synsets, include_membership=False)
staticmethod
","text":"Get hypernym-hyponym pairs from a given set of wordnet synsets.
Source code insrc/deeponto/onto/taxonomy.py
@staticmethod\ndef fetch_hypernyms(synsets: set, include_membership: bool = False):\n\"\"\"Get hypernym-hyponym pairs from a given set of wordnet synsets.\"\"\"\n hypernym_hyponym_pairs = []\n for synset in synsets:\n for h_synset in synset.hypernyms():\n hypernym_hyponym_pairs.append((h_synset.name(), synset.name()))\n if include_membership:\n for h_synset in synset.instance_hypernyms():\n hypernym_hyponym_pairs.append((h_synset.name(), synset.name()))\n logger.info(f\"{len(hypernym_hyponym_pairs)} hypernym-hyponym pairs fetched.\")\n return hypernym_hyponym_pairs\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler","title":"TaxonomyNegativeSampler(taxonomy, entity_weights=None)
","text":"Class for the efficient negative sampling with buffer over the taxonomy.
Attributes:
Name Type Descriptiontaxonomy
str
The taxonomy for negative sampling.
entity_weights
Optional[dict]
A dictionary with the taxonomy entities as keys and their corresponding weights as values. Defaults to None
.
src/deeponto/onto/taxonomy.py
def __init__(self, taxonomy: Taxonomy, entity_weights: Optional[dict] = None):\n self.taxonomy = taxonomy\n self.entities = self.taxonomy.nodes\n # uniform distribution if weights not provided\n self.entity_weights = entity_weights\n\n self._entity_probs = None\n if self.entity_weights:\n self._entity_probs = np.array([self.entity_weights[e] for e in self.entities])\n self._entity_probs = self._entity_probs / self._entity_probs.sum()\n self._buffer = []\n self._default_buffer_size = 10000\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler.fill","title":"fill(buffer_size=None)
","text":"Buffer a large collection of entities sampled with replacement for faster negative sampling.
Source code insrc/deeponto/onto/taxonomy.py
def fill(self, buffer_size: Optional[int] = None):\n\"\"\"Buffer a large collection of entities sampled with replacement for faster negative sampling.\"\"\"\n buffer_size = buffer_size if buffer_size else self._default_buffer_size\n if self._entity_probs:\n self._buffer = np.random.choice(self.entities, size=buffer_size, p=self._entity_probs)\n else:\n self._buffer = np.random.choice(self.entities, size=buffer_size)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler.sample","title":"sample(entity_id, n_samples, buffer_size=None)
","text":"Sample N negative samples for a given entity with replacement.
Source code insrc/deeponto/onto/taxonomy.py
def sample(self, entity_id: str, n_samples: int, buffer_size: Optional[int] = None):\n\"\"\"Sample N negative samples for a given entity with replacement.\"\"\"\n negative_samples = []\n positive_samples = self.taxonomy.get_parents(entity_id, True)\n while len(negative_samples) < n_samples:\n if len(self._buffer) < n_samples:\n self.fill(buffer_size)\n negative_samples += list(filter(lambda x: x not in positive_samples, self._buffer[:n_samples]))\n self._buffer = self._buffer[n_samples:] # remove the samples from the buffer\n return negative_samples[:n_samples]\n
"},{"location":"deeponto/onto/verbalisation/","title":"Ontology Verbalisation","text":"Verbalising an ontology into natural language texts is a challenging task. \\(\\textsf{DeepOnto}\\) provides some basic building blocks for achieving this goal. The implemented OntologyVerbaliser
is essentially a recursive concept verbaliser that first splits a complex concept \\(C\\) into a sub-formula tree, verbalising the leaf nodes (atomic concepts or object properties) by their names, then merging the verbalised child nodes according to the logical pattern at their parent node.
Please cite the following paper if you consider using our verbaliser.
Paper
The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).
@inproceedings{he-etal-2023-language,\n title = \"Language Model Analysis for Ontology Subsumption Inference\",\n author = \"He, Yuan and\n Chen, Jiaoyan and\n Jimenez-Ruiz, Ernesto and\n Dong, Hang and\n Horrocks, Ian\",\n booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n month = jul,\n year = \"2023\",\n address = \"Toronto, Canada\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2023.findings-acl.213\",\n doi = \"10.18653/v1/2023.findings-acl.213\",\n pages = \"3439--3453\"\n}\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser","title":"OntologyVerbaliser(onto, apply_lowercasing=False, keep_iri=False, apply_auto_correction=False, add_quantifier_word=False)
","text":"A recursive natural language verbaliser for the OWL logical expressions, e.g., OWLAxiom
and OWLClassExpression
.
The concept patterns supported by this verbaliser are shown below:
Pattern Verbalisation (\\(\\mathcal{V}\\)) \\(A\\) (atomic) the name (\\(\\texttt{rdfs:label}\\)) of \\(A\\) (auto-correction is optional) \\(r\\) (property) the name (\\(\\texttt{rdfs:label}\\)) of \\(r\\) (auto-correction is optional) \\(\\neg C\\) \"not \\(\\mathcal{V}(C)\\)\" \\(\\exists r.C\\) \"something that \\(\\mathcal{V}(r)\\) some \\(\\mathcal{V}(C)\\)\" (the quantifier word \"some\" is optional) \\(\\forall r.C\\) \"something that \\(\\mathcal{V}(r)\\) only \\(\\mathcal{V}(C)\\)\" (the quantifier word \"only\" is optional) \\(C_1 \\sqcap ... \\sqcap C_n\\) if \\(C_i = \\exists/\\forall r.D_i\\) and \\(C_j = \\exists/\\forall r.D_j\\), they will be re-written into \\(\\exists/\\forall r.(D_i \\sqcap D_j)\\) before verbalisation; suppose after re-writing the new expression is \\(C_1 \\sqcap ... \\sqcap C_{n'}\\)(a) if all \\(C_i\\)s (for \\(i = 1, ..., n'\\)) are restrictions, in the form of \\(\\exists/\\forall r_i.D_i\\): \"something that \\(\\mathcal{V}(r_1)\\) some/only \\(V(D_1)\\) and ... and \\(\\mathcal{V}(r_{n'})\\) some/only \\(V(D_{n'})\\)\" (b) if some \\(C_i\\)s (for \\(i = m+1, ..., n'\\)) are restrictions, in the form of \\(\\exists/\\forall r_i.D_i\\): \"\\(\\mathcal{V}(C_{1})\\) and ... and \\(\\mathcal{V}(C_{m})\\) that \\(\\mathcal{V}(r_{m+1})\\) some/only \\(V(D_{m+1})\\) and ... and \\(\\mathcal{V}(r_{n'})\\) some/only \\(V(D_{n'})\\)\" (c) if no \\(C_i\\) is a restriction: \"\\(\\mathcal{V}(C_{1})\\) and ... and \\(\\mathcal{V}(C_{n'})\\)\" \\(C_1 \\sqcup ... \\sqcup C_n\\) similar to verbalising \\(C_1 \\sqcap ... \\sqcap C_n\\) except that \"and\" is replaced by \"or\" and case (b) uses the same verbalisation as case (c) \\(r_1 \\cdot r_2\\) (property chain) \\(\\mathcal{V}(r_1)\\) something that \\(\\mathcal{V}(r_2)\\)
With this concept verbaliser, a range of OWL axioms are supported:
The verbaliser operates at the concept level, and an additional template is needed to integrate the verbalised components of an axiom.
Warning
This verbaliser utilises spacy for POS tagging used in the auto-correction of property names. Automatic download of the rule-based library en_core_web_sm
is available at the init function. However, if you somehow cannot find it, please manually download it using python -m spacy download en_core_web_sm
.
Attributes:
Name Type Descriptiononto
Ontology
An ontology whose entities and axioms are to be verbalised.
parser
OntologySyntaxParser
A syntax parser for the string representation of an OWLObject
.
vocab
dict[str, list[str]]
A dictionary with (entity_iri, entity_name)
pairs, by default the names are retrieved from \\(\\texttt{rdfs:label}\\).
apply_lowercasing
bool
Whether to apply lowercasing to the entity names. Defaults to False
.
keep_iri
bool
Whether to keep the IRIs of entities without verbalising them using self.vocab
. Defaults to False
.
apply_auto_correction
bool
Whether to automatically apply rule-based auto-correction to entity names. Defaults to False
.
add_quantifier_word
bool
Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to False
.
Parameters:
Name Type Description Defaultonto
Ontology
An ontology whose entities and axioms are to be verbalised.
requiredapply_lowercasing
bool
Whether to apply lowercasing to the entity names. Defaults to False
.
False
keep_iri
bool
Whether to keep the IRIs of entities without verbalising them using self.vocab
. Defaults to False
.
False
apply_auto_correction
bool
Whether to automatically apply rule-based auto-correction to entity names. Defaults to False
.
False
add_quantifier_word
bool
Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to False
.
False
Source code in src/deeponto/onto/verbalisation.py
def __init__(\n self,\n onto: Ontology,\n apply_lowercasing: bool = False,\n keep_iri: bool = False,\n apply_auto_correction: bool = False,\n add_quantifier_word: bool = False,\n):\n\"\"\"Initialise an ontology verbaliser.\n\n Args:\n onto (Ontology): An ontology whose entities and axioms are to be verbalised.\n apply_lowercasing (bool, optional): Whether to apply lowercasing to the entity names. Defaults to `False`.\n keep_iri (bool, optional): Whether to keep the IRIs of entities without verbalising them using `self.vocab`. Defaults to `False`.\n apply_auto_correction (bool, optional): Whether to automatically apply rule-based auto-correction to entity names. Defaults to `False`.\n add_quantifier_word (bool, optional): Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to `False`.\n \"\"\"\n self.onto = onto\n self.parser = OntologySyntaxParser()\n\n # download en_core_web_sm for object property\n try:\n spacy.load(\"en_core_web_sm\")\n except:\n print(\"Download `en_core_web_sm` for pos tagger.\")\n os.system(\"python -m spacy download en_core_web_sm\")\n\n self.nlp = spacy.load(\"en_core_web_sm\")\n\n # build the default vocabulary for entities\n self.apply_lowercasing_to_vocab = apply_lowercasing\n self.vocab = dict()\n for entity_type in [\"Classes\", \"ObjectProperties\", \"DataProperties\", \"Individuals\"]:\n entity_annotations, _ = self.onto.build_annotation_index(\n entity_type=entity_type, apply_lowercasing=self.apply_lowercasing_to_vocab\n )\n self.vocab.update(**entity_annotations)\n literal_or_iri = lambda k, v: list(v)[0] if v else k # set vocab to IRI if no string available\n self.vocab = {k: literal_or_iri(k, v) for k, v in self.vocab.items()} # only set one name for each entity\n\n self.keep_iri = keep_iri\n self.apply_auto_correction = apply_auto_correction\n self.add_quantifier_word = add_quantifier_word\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.update_entity_name","title":"update_entity_name(entity_iri, entity_name)
","text":"Update the name of an entity in self.vocab
.
If you want to change the name of a specific entity, you should call this function before applying verbalisation.
Source code insrc/deeponto/onto/verbalisation.py
def update_entity_name(self, entity_iri: str, entity_name: str):\n\"\"\"Update the name of an entity in `self.vocab`.\n\n If you want to change the name of a specific entity, you should call this\n function before applying verbalisation.\n \"\"\"\n self.vocab[entity_iri] = entity_name\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_expression","title":"verbalise_class_expression(class_expression)
","text":"Verbalise a class expression (OWLClassExpression
) or its parsed form (in RangeNode
).
See currently supported types of class (or concept) expressions here.
Parameters:
Name Type Description Defaultclass_expression
Union[OWLClassExpression, str, RangeNode]
A class expression to be verbalised.
requiredRaises:
Type DescriptionRuntimeError
Occurs when the class expression is not in one of the supported types.
Returns:
Type DescriptionCfgNode
A nested dictionary that presents the recursive results of verbalisation. The verbalised string can be accessed with the key [\"verbal\"]
or with the attribute .verbal
.
src/deeponto/onto/verbalisation.py
def verbalise_class_expression(self, class_expression: Union[OWLClassExpression, str, RangeNode]):\nr\"\"\"Verbalise a class expression (`OWLClassExpression`) or its parsed form (in `RangeNode`).\n\n See currently supported types of class (or concept) expressions [here][deeponto.onto.verbalisation.OntologyVerbaliser].\n\n\n Args:\n class_expression (Union[OWLClassExpression, str, RangeNode]): A class expression to be verbalised.\n\n Raises:\n RuntimeError: Occurs when the class expression is not in one of the supported types.\n\n Returns:\n (CfgNode): A nested dictionary that presents the recursive results of verbalisation. The verbalised string\n can be accessed with the key `[\"verbal\"]` or with the attribute `.verbal`.\n \"\"\"\n\n if not isinstance(class_expression, RangeNode):\n parsed_class_expression = self.parser.parse(class_expression).children[0] # skip the root node\n else:\n parsed_class_expression = class_expression\n\n # for a singleton IRI\n if parsed_class_expression.is_iri:\n return self._verbalise_iri(parsed_class_expression)\n\n if parsed_class_expression.name.startswith(\"NEG\"):\n # negation only has one child\n cl = self.verbalise_class_expression(parsed_class_expression.children[0])\n return CfgNode({\"verbal\": \"not \" + cl.verbal, \"class\": cl, \"type\": \"NEG\"})\n\n # for existential and universal restrictions\n if parsed_class_expression.name.startswith(\"EX.\") or parsed_class_expression.name.startswith(\"ALL\"):\n return self._verbalise_restriction(parsed_class_expression)\n\n # for conjunction and disjunction\n if parsed_class_expression.name.startswith(\"AND\") or parsed_class_expression.name.startswith(\"OR\"):\n return self._verbalise_junction(parsed_class_expression)\n\n # for a property chain\n if parsed_class_expression.name.startswith(\"OPC\"):\n return self._verbalise_property(parsed_class_expression)\n\n raise RuntimeError(f\"Input class expression `{str(class_expression)}` is not in one of the supported types.\")\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_subsumption_axiom","title":"verbalise_class_subsumption_axiom(class_subsumption_axiom)
","text":"Verbalise a class subsumption axiom.
The subsumption axiom can have two forms:
SubClassOf
axiom;SuperClassOf
axiom.Parameters:
Name Type Description Defaultclass_subsumption_axiom
OWLAxiom
Then class subsumption axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised sub-concept \\(\\mathcal{V}(C_{sub})\\) and super-concept \\(\\mathcal{V}(C_{super})\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_class_subsumption_axiom(self, class_subsumption_axiom: OWLAxiom):\nr\"\"\"Verbalise a class subsumption axiom.\n\n The subsumption axiom can have two forms:\n\n - $C_{sub} \\sqsubseteq C_{super}$, the `SubClassOf` axiom;\n - $C_{super} \\sqsupseteq C_{sub}$, the `SuperClassOf` axiom.\n\n Args:\n class_subsumption_axiom (OWLAxiom): Then class subsumption axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised sub-concept $\\mathcal{V}(C_{sub})$ and super-concept $\\mathcal{V}(C_{super})$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(class_subsumption_axiom, \"SubClassOf\", \"SuperClassOf\")\n\n parsed_subsumption_axiom = self.parser.parse(class_subsumption_axiom).children[0] # skip the root node\n if str(class_subsumption_axiom).startswith(\"SubClassOf\"):\n parsed_sub_class, parsed_super_class = parsed_subsumption_axiom.children\n elif str(class_subsumption_axiom).startswith(\"SuperClassOf\"):\n parsed_super_class, parsed_sub_class = parsed_subsumption_axiom.children\n\n verbalised_sub_class = self.verbalise_class_expression(parsed_sub_class)\n verbalised_super_class = self.verbalise_class_expression(parsed_super_class)\n return verbalised_sub_class, verbalised_super_class\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_equivalence_axiom","title":"verbalise_class_equivalence_axiom(class_equivalence_axiom)
","text":"Verbalise a class equivalence axiom.
The equivalence axiom has the form \\(C \\equiv D\\).
Parameters:
Name Type Description Defaultclass_equivalence_axiom
OWLAxiom
The class equivalence axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised concept \\(\\mathcal{V}(C)\\) and its equivalent concept \\(\\mathcal{V}(D)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_class_equivalence_axiom(self, class_equivalence_axiom: OWLAxiom):\nr\"\"\"Verbalise a class equivalence axiom.\n\n The equivalence axiom has the form $C \\equiv D$.\n\n Args:\n class_equivalence_axiom (OWLAxiom): The class equivalence axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised concept $\\mathcal{V}(C)$ and its equivalent concept $\\mathcal{V}(D)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(class_equivalence_axiom, \"EquivalentClasses\")\n\n parsed_equivalence_axiom = self.parser.parse(class_equivalence_axiom).children[0] # skip the root node\n parsed_class_left, parsed_class_right = parsed_equivalence_axiom.children\n\n verbalised_left_class = self.verbalise_class_expression(parsed_class_left)\n verbalised_right_class = self.verbalise_class_expression(parsed_class_right)\n return verbalised_left_class, verbalised_right_class\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_assertion_axiom","title":"verbalise_class_assertion_axiom(class_assertion_axiom)
","text":"Verbalise a class assertion axiom.
The class assertion axiom has the form \\(C(x)\\).
Parameters:
Name Type Description Defaultclass_assertion_axiom
OWLAxiom
The class assertion axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised class \\(\\mathcal{V}(C)\\) and individual \\(\\mathcal{V}(x)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_class_assertion_axiom(self, class_assertion_axiom: OWLAxiom):\nr\"\"\"Verbalise a class assertion axiom.\n\n The class assertion axiom has the form $C(x)$.\n\n Args:\n class_assertion_axiom (OWLAxiom): The class assertion axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised class $\\mathcal{V}(C)$ and individual $\\mathcal{V}(x)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(class_assertion_axiom, \"ClassAssertion\")\n\n parsed_equivalence_axiom = self.parser.parse(class_assertion_axiom).children[0] # skip the root node\n parsed_class, parsed_individual = parsed_equivalence_axiom.children\n\n verbalised_class = self.verbalise_class_expression(parsed_class)\n verbalised_individual = self._verbalise_iri(parsed_individual)\n return verbalised_class, verbalised_individual\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_subsumption_axiom","title":"verbalise_object_property_subsumption_axiom(object_property_subsumption_axiom)
","text":"Verbalise an object property subsumption axiom.
The subsumption axiom can have two forms:
SubObjectPropertyOf
axiom;SuperObjectPropertyOf
axiom.Parameters:
Name Type Description Defaultobject_property_subsumption_axiom
OWLAxiom
The object property subsumption axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised sub-property \\(\\mathcal{V}(r_{sub})\\) and super-property \\(\\mathcal{V}(r_{super})\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_object_property_subsumption_axiom(self, object_property_subsumption_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property subsumption axiom.\n\n The subsumption axiom can have two forms:\n\n - $r_{sub} \\sqsubseteq r_{super}$, the `SubObjectPropertyOf` axiom;\n - $r_{super} \\sqsupseteq r_{sub}$, the `SuperObjectPropertyOf` axiom.\n\n Args:\n object_property_subsumption_axiom (OWLAxiom): The object property subsumption axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised sub-property $\\mathcal{V}(r_{sub})$ and super-property $\\mathcal{V}(r_{super})$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(\n object_property_subsumption_axiom,\n \"SubObjectPropertyOf\",\n \"SuperObjectPropertyOf\",\n \"SubPropertyChainOf\",\n \"SuperPropertyChainOf\",\n )\n\n parsed_subsumption_axiom = self.parser.parse(object_property_subsumption_axiom).children[\n 0\n ] # skip the root node\n if str(object_property_subsumption_axiom).startswith(\"SubObjectPropertyOf\"):\n parsed_sub_property, parsed_super_property = parsed_subsumption_axiom.children\n elif str(object_property_subsumption_axiom).startswith(\"SuperObjectPropertyOf\"):\n parsed_super_property, parsed_sub_property = parsed_subsumption_axiom.children\n\n verbalised_sub_property = self._verbalise_property(parsed_sub_property)\n verbalised_super_property = self._verbalise_property(parsed_super_property)\n return verbalised_sub_property, verbalised_super_property\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_assertion_axiom","title":"verbalise_object_property_assertion_axiom(object_property_assertion_axiom)
","text":"Verbalise an object property assertion axiom.
The object property assertion axiom has the form \\(r(x, y)\\).
Parameters:
Name Type Description Defaultobject_property_assertion_axiom
OWLAxiom
The object property assertion axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised object property \\(\\mathcal{V}(r)\\) and two individuals \\(\\mathcal{V}(x)\\) and \\(\\mathcal{V}(y)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_object_property_assertion_axiom(self, object_property_assertion_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property assertion axiom.\n\n The object property assertion axiom has the form $r(x, y)$.\n\n Args:\n object_property_assertion_axiom (OWLAxiom): The object property assertion axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and two individuals $\\mathcal{V}(x)$ and $\\mathcal{V}(y)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(object_property_assertion_axiom, \"ObjectPropertyAssertion\")\n\n # skip the root node\n parsed_object_property_assertion_axiom = self.parser.parse(object_property_assertion_axiom).children[0]\n parsed_obj_prop, parsed_indiv_x, parsed_indiv_y = parsed_object_property_assertion_axiom.children\n\n verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n verbalised_individual_x = self._verbalise_iri(parsed_indiv_x)\n verbalised_individual_y = self._verbalise_iri(parsed_indiv_y)\n return verbalised_object_property, verbalised_individual_x, verbalised_individual_y\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_domain_axiom","title":"verbalise_object_property_domain_axiom(object_property_domain_axiom)
","text":"Verbalise an object property domain axiom.
The domain of a property \\(r: X \\rightarrow Y\\) specifies the concept expression \\(X\\) of its subject.
Parameters:
Name Type Description Defaultobject_property_domain_axiom
OWLAxiom
The object property domain axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised object property \\(\\mathcal{V}(r)\\) and its domain \\(\\mathcal{V}(X)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_object_property_domain_axiom(self, object_property_domain_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property domain axiom.\n\n The domain of a property $r: X \\rightarrow Y$ specifies the concept expression $X$ of its subject.\n\n Args:\n object_property_domain_axiom (OWLAxiom): The object property domain axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and its domain $\\mathcal{V}(X)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(object_property_domain_axiom, \"ObjectPropertyDomain\")\n\n # skip the root node\n parsed_object_property_domain_axiom = self.parser.parse(object_property_domain_axiom).children[0]\n parsed_obj_prop, parsed_obj_prop_domain = parsed_object_property_domain_axiom.children\n\n verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n verbalised_object_property_domain = self.verbalise_class_expression(parsed_obj_prop_domain)\n\n return verbalised_object_property, verbalised_object_property_domain\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_range_axiom","title":"verbalise_object_property_range_axiom(object_property_range_axiom)
","text":"Verbalise an object property range axiom.
The range of a property \\(r: X \\rightarrow Y\\) specifies the concept expression \\(Y\\) of its object.
Parameters:
Name Type Description Defaultobject_property_range_axiom
OWLAxiom
The object property range axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised object property \\(\\mathcal{V}(r)\\) and its range \\(\\mathcal{V}(Y)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_object_property_range_axiom(self, object_property_range_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property range axiom.\n\n The range of a property $r: X \\rightarrow Y$ specifies the concept expression $Y$ of its object.\n\n Args:\n object_property_range_axiom (OWLAxiom): The object property range axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and its range $\\mathcal{V}(Y)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(object_property_range_axiom, \"ObjectPropertyRange\")\n\n # skip the root node\n parsed_object_property_range_axiom = self.parser.parse(object_property_range_axiom).children[0]\n parsed_obj_prop, parsed_obj_prop_range = parsed_object_property_range_axiom.children\n\n verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n verbalised_object_property_range = self.verbalise_class_expression(parsed_obj_prop_range)\n\n return verbalised_object_property, verbalised_object_property_range\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser","title":"OntologySyntaxParser()
","text":"A syntax parser for the OWL logical expressions, e.g., OWLAxiom
and OWLClassExpression
.
It makes use of the string representation (based on Manchester Syntax) defined in the OWLAPI. In Python, such string can be accessed by simply using str(some_owl_object)
.
To keep the Java import in the main Ontology
class, this parser does not deal with OWLAxiom
directly but instead its string representation.
Due to the OWLObject
syntax, this parser relies on two components:
RangeNode
).As a result, it will return a RangeNode
that specifies the sub-formulas (and their respective positions in the string representation) in a tree structure.
Examples:
Suppose the input is an OWLAxiom
that has the string representation:
>>> str(owl_axiom)\n>>> 'EquivalentClasses(<http://purl.obolibrary.org/obo/FOODON_00001707> ObjectIntersectionOf(<http://purl.obolibrary.org/obo/FOODON_00002044> ObjectSomeValuesFrom(<http://purl.obolibrary.org/obo/RO_0001000> <http://purl.obolibrary.org/obo/FOODON_03412116>)) )'\n
This corresponds to the following logical expression:
\\[ CephalopodFoodProduct \\equiv MolluskFoodProduct \\sqcap \\exists derivesFrom.Cephalopod \\]After apply the parser, a RangeNode
will be returned which can be rentered as:
axiom_parser = OntologySyntaxParser()\nprint(axiom_parser.parse(str(owl_axiom)).render_tree())\n
Output:
Root@[0:inf]\n\u2514\u2500\u2500 EQV@[0:212]\n \u251c\u2500\u2500 FOODON_00001707@[6:54]\n \u2514\u2500\u2500 AND@[55:210]\n \u251c\u2500\u2500 FOODON_00002044@[61:109]\n \u2514\u2500\u2500 EX.@[110:209]\n \u251c\u2500\u2500 RO_0001000@[116:159]\n \u2514\u2500\u2500 FOODON_03412116@[160:208]\n
Or, if graphviz
(installed by e.g., sudo apt install graphviz
) is available, you can visualise the tree as an image by:
axiom_parser.parse(str(owl_axiom)).render_image()\n
Output:
The name for each node has the form {node_type}@[{start}:{end}]
, which means a node of the type {node_type}
is located at the range [{start}:{end}]
in the abbreviated expression (see abbreviate_owl_expression
below).
The leaf nodes are IRIs and they are represented by the last segment (split by \"/\"
) of the whole IRI.
Child nodes can be accessed by .children
, the string representation of the sub-formula in this node can be accessed by .text
. For example:
parser.parse(str(owl_axiom)).children[0].children[1].text\n
Output:
'[AND](<http://purl.obolibrary.org/obo/FOODON_00002044> [EX.](<http://purl.obolibrary.org/obo/RO_0001000> <http://purl.obolibrary.org/obo/FOODON_03412116>))'\n
Source code in src/deeponto/onto/verbalisation.py
def __init__(self):\n pass\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.abbreviate_owl_expression","title":"abbreviate_owl_expression(owl_expression)
","text":"Abbreviate the string representations of logical operators to a fixed length (easier for parsing).
The abbreviations are specified at deeponto.onto.verbalisation.ABBREVIATION_DICT
.
Parameters:
Name Type Description Defaultowl_expression
str
The string representation of an OWLObject
.
Returns:
Type Descriptionstr
The modified string representation of this OWLObject
where the logical operators are abbreviated.
src/deeponto/onto/verbalisation.py
def abbreviate_owl_expression(self, owl_expression: str):\nr\"\"\"Abbreviate the string representations of logical operators to a\n fixed length (easier for parsing).\n\n The abbreviations are specified at `deeponto.onto.verbalisation.ABBREVIATION_DICT`.\n\n Args:\n owl_expression (str): The string representation of an `OWLObject`.\n\n Returns:\n (str): The modified string representation of this `OWLObject` where the logical operators are abbreviated.\n \"\"\"\n for k, v in ABBREVIATION_DICT.items():\n owl_expression = owl_expression.replace(k, v)\n return owl_expression\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.parse","title":"parse(owl_expression)
","text":"Parse an OWLAxiom
into a RangeNode
.
This is the main entry for using the parser, which relies on the parse_by_parentheses
method below.
Parameters:
Name Type Description Defaultowl_expression
Union[str, OWLObject]
The string representation of an OWLObject
or the OWLObject
itself.
Returns:
Type DescriptionRangeNode
A parsed syntactic tree given what parentheses to be matched.
Source code insrc/deeponto/onto/verbalisation.py
def parse(self, owl_expression: Union[str, OWLObject]) -> RangeNode:\nr\"\"\"Parse an `OWLAxiom` into a [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n This is the main entry for using the parser, which relies on the [`parse_by_parentheses`][deeponto.onto.verbalisation.OntologySyntaxParser.parse_by_parentheses]\n method below.\n\n Args:\n owl_expression (Union[str, OWLObject]): The string representation of an `OWLObject` or the `OWLObject` itself.\n\n Returns:\n (RangeNode): A parsed syntactic tree given what parentheses to be matched.\n \"\"\"\n if not isinstance(owl_expression, str):\n owl_expression = str(owl_expression)\n owl_expression = self.abbreviate_owl_expression(owl_expression)\n # print(\"To parse the following (transformed) axiom text:\\n\", owl_expression)\n # parse complex patterns first\n cur_parsed = self.parse_by_parentheses(owl_expression)\n # parse the IRI patterns latter\n return self.parse_by_parentheses(owl_expression, cur_parsed, for_iri=True)\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.parse_by_parentheses","title":"parse_by_parentheses(owl_expression, already_parsed=None, for_iri=False)
classmethod
","text":"Parse an OWLAxiom
based on parentheses matching into a RangeNode
.
This function needs to be applied twice to get a fully parsed RangeNode
because IRIs have a different parenthesis pattern.
Parameters:
Name Type Description Defaultowl_expression
str
The string representation of an OWLObject
.
already_parsed
RangeNode
A partially parsed RangeNode
to continue with. Defaults to None
.
None
for_iri
bool
Parentheses are by default ()
but will be changed to <>
for IRIs. Defaults to False
.
False
Raises:
Type DescriptionRuntimeError
Raised when the input axiom text is nor properly formatted.
Returns:
Type DescriptionRangeNode
A parsed syntactic tree given what parentheses to be matched.
Source code insrc/deeponto/onto/verbalisation.py
@classmethod\ndef parse_by_parentheses(\n cls, owl_expression: str, already_parsed: RangeNode = None, for_iri: bool = False\n) -> RangeNode:\nr\"\"\"Parse an `OWLAxiom` based on parentheses matching into a [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n This function needs to be applied twice to get a fully parsed [`RangeNode`][deeponto.onto.verbalisation.RangeNode] because IRIs have\n a different parenthesis pattern.\n\n Args:\n owl_expression (str): The string representation of an `OWLObject`.\n already_parsed (RangeNode, optional): A partially parsed [`RangeNode`][deeponto.onto.verbalisation.RangeNode] to continue with. Defaults to `None`.\n for_iri (bool, optional): Parentheses are by default `()` but will be changed to `<>` for IRIs. Defaults to `False`.\n\n Raises:\n RuntimeError: Raised when the input axiom text is nor properly formatted.\n\n Returns:\n (RangeNode): A parsed syntactic tree given what parentheses to be matched.\n \"\"\"\n if not already_parsed:\n # a root node that covers the entire sentence\n parsed = RangeNode(0, math.inf, name=f\"Root\", text=owl_expression, is_iri=False)\n else:\n parsed = already_parsed\n stack = []\n left_par = \"(\"\n right_par = \")\"\n if for_iri:\n left_par = \"<\"\n right_par = \">\"\n\n for i, c in enumerate(owl_expression):\n if c == left_par:\n stack.append(i)\n if c == right_par:\n try:\n start = stack.pop()\n end = i\n if not for_iri:\n # the first character is actually \"[\"\n real_start = start - 5\n axiom_type = owl_expression[real_start + 1 : start - 1]\n node = RangeNode(\n real_start,\n end + 1,\n name=f\"{axiom_type}\",\n text=owl_expression[real_start : end + 1],\n is_iri=False,\n )\n parsed.insert_child(node)\n else:\n # no preceding characters for just atomic class (IRI)\n abbr_iri = owl_expression[start : end + 1].split(\"/\")[-1].rstrip(\">\")\n node = RangeNode(\n start, end + 1, name=abbr_iri, text=owl_expression[start : end + 1], is_iri=True\n )\n parsed.insert_child(node)\n except IndexError:\n print(\"Too many closing parentheses\")\n\n if stack: # check if stack is empty afterwards\n raise RuntimeError(\"Too many opening parentheses\")\n\n return parsed\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode","title":"RangeNode(start, end, name=None, **kwargs)
","text":" Bases: NodeMixin
A tree implementation for ranges (without partial overlap).
[1, 10]
is a parent of [2, 5]
.[2, 4]
and [3, 5]
cannot appear in the same RangeNodeTree
.src/deeponto/onto/verbalisation.py
def __init__(self, start, end, name=None, **kwargs):\n if start >= end:\n raise RuntimeError(\"invalid start and end positions ...\")\n self.start = start\n self.end = end\n self.name = \"Root\" if not name else name\n self.name = f\"{self.name}@[{self.start}:{self.end}]\" # add start and ent to the name\n for k, v in kwargs.items():\n setattr(self, k, v)\n super().__init__()\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.__gt__","title":"__gt__(other)
","text":"Compare two ranges if they have a different start
and/or a different end
.
\"irrelevant\"
: if range \\(R_1\\) and range \\(R_2\\) have no overlap.Warning
Partial overlap is not allowed.
Source code insrc/deeponto/onto/verbalisation.py
def __gt__(self, other: RangeNode):\nr\"\"\"Compare two ranges if they have a different `start` and/or a different `end`.\n\n - $R_1 \\lt R_2$: if range $R_1$ is completely contained in range $R_2$, and $R_1 \\neq R_2$.\n - $R_1 \\gt R_2$: if range $R_2$ is completely contained in range $R_1$, and $R_1 \\neq R_2$.\n - `\"irrelevant\"`: if range $R_1$ and range $R_2$ have no overlap.\n\n !!! warning\n\n Partial overlap is not allowed.\n \"\"\"\n # ranges inside\n if self.start <= other.start and other.end <= self.end:\n return True\n\n # ranges outside\n if other.start <= self.start and self.end <= other.end:\n return False\n\n if other.end < self.start or self.end < other.start:\n return \"irrelevant\"\n\n raise RuntimeError(\"Compared ranges have a partial overlap.\")\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.sort_by_start","title":"sort_by_start(nodes)
staticmethod
","text":"A sorting function that sorts the nodes by their starting positions.
Source code insrc/deeponto/onto/verbalisation.py
@staticmethod\ndef sort_by_start(nodes: List[RangeNode]):\n\"\"\"A sorting function that sorts the nodes by their starting positions.\"\"\"\n temp = {sib: sib.start for sib in nodes}\n return list(dict(sorted(temp.items(), key=lambda item: item[1])).keys())\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.insert_child","title":"insert_child(node)
","text":"Inserting a child RangeNode
.
Child nodes have a smaller (inclusive) range, e.g., [2, 5]
is a child of [1, 6]
.
src/deeponto/onto/verbalisation.py
def insert_child(self, node: RangeNode):\nr\"\"\"Inserting a child [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n Child nodes have a smaller (inclusive) range, e.g., `[2, 5]` is a child of `[1, 6]`.\n \"\"\"\n if node > self:\n raise RuntimeError(\"invalid child node\")\n if node.start == self.start and node.end == self.end:\n # duplicated node\n return\n # print(self.children)\n if self.children:\n inserted = False\n for ch in self.children:\n if (node < ch) is True:\n # print(\"further down\")\n ch.insert_child(node)\n inserted = True\n break\n elif (node > ch) is True:\n # print(\"insert in between\")\n ch.parent = node\n # NOTE: should not break here as it could be parent of multiple children !\n # break\n # NOTE: the equal case is when two nodes are exactly the same, no operation needed\n if not inserted:\n self.children = list(self.children) + [node]\n self.children = self.sort_by_start(self.children)\n else:\n node.parent = self\n self.children = [node]\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.render_tree","title":"render_tree()
","text":"Render the whole tree.
Source code insrc/deeponto/onto/verbalisation.py
def render_tree(self):\n\"\"\"Render the whole tree.\"\"\"\n return RenderTree(self)\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.render_image","title":"render_image()
","text":"Calling this function will generate a temporary range_node.png
file which will be displayed.
To make this visualisation work, you need to install graphviz
by, e.g.,
sudo apt install graphviz\n
Source code in src/deeponto/onto/verbalisation.py
def render_image(self):\n\"\"\"Calling this function will generate a temporary `range_node.png` file\n which will be displayed.\n\n To make this visualisation work, you need to install `graphviz` by, e.g.,\n\n ```bash\n sudo apt install graphviz\n ```\n \"\"\"\n RenderTreeGraph(self).to_picture(\"range_node.png\")\n return Image(\"range_node.png\")\n
"},{"location":"deeponto/utils/data_utils/","title":"Data Utilities","text":""},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.set_seed","title":"set_seed(seed)
","text":"Set seed function imported from transformers.
Source code insrc/deeponto/utils/data_utils.py
def set_seed(seed):\n\"\"\"Set seed function imported from transformers.\"\"\"\n t_set_seed(seed)\n
"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.sort_dict_by_values","title":"sort_dict_by_values(dic, desc=True, k=None)
","text":"Return a sorted dict by values with first k reserved if provided.
Source code insrc/deeponto/utils/data_utils.py
def sort_dict_by_values(dic: dict, desc: bool = True, k: Optional[int] = None):\n\"\"\"Return a sorted dict by values with first k reserved if provided.\"\"\"\n sorted_items = list(sorted(dic.items(), key=lambda item: item[1], reverse=desc))\n return dict(sorted_items[:k])\n
"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.uniqify","title":"uniqify(ls)
","text":"Return a list of unique elements without messing around the order
Source code insrc/deeponto/utils/data_utils.py
def uniqify(ls):\n\"\"\"Return a list of unique elements without messing around the order\"\"\"\n non_empty_ls = list(filter(lambda x: x != \"\", ls))\n return list(dict.fromkeys(non_empty_ls))\n
"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.print_dict","title":"print_dict(dic)
","text":"Pretty print a dictionary.
Source code insrc/deeponto/utils/data_utils.py
def print_dict(dic: dict):\n\"\"\"Pretty print a dictionary.\"\"\"\n pretty_print = json.dumps(dic, indent=4, separators=(\",\", \": \"))\n # print(pretty_print)\n return pretty_print\n
"},{"location":"deeponto/utils/decorators/","title":"Decorators","text":""},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.timer","title":"timer(function)
","text":"Print the runtime of the decorated function.
Source code insrc/deeponto/utils/decorators.py
def timer(function):\n\"\"\"Print the runtime of the decorated function.\"\"\"\n\n @wraps(function)\n def wrapper_timer(*args, **kwargs):\n start_time = time.perf_counter() # 1\n value = function(*args, **kwargs)\n end_time = time.perf_counter() # 2\n run_time = end_time - start_time # 3\n print(f\"Finished {function.__name__!r} in {run_time:.4f} secs.\")\n return value\n\n return wrapper_timer\n
"},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.debug","title":"debug(function)
","text":"Print the function signature and return value.
Source code insrc/deeponto/utils/decorators.py
def debug(function):\n\"\"\"Print the function signature and return value.\"\"\"\n\n @wraps(function)\n def wrapper_debug(*args, **kwargs):\n args_repr = [repr(a) for a in args]\n kwargs_repr = [f\"{k}={v!r}\" for k, v in kwargs.items()]\n signature = \", \".join(args_repr + kwargs_repr)\n print(f\"Calling {function.__name__}({signature})\")\n value = function(*args, **kwargs)\n print(f\"{function.__name__!r} returned {value!r}.\")\n return value\n\n return wrapper_debug\n
"},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.paper","title":"paper(title, link)
","text":"Add paper tagger for methods.
Source code insrc/deeponto/utils/decorators.py
def paper(title: str, link: str):\n\"\"\"Add paper tagger for methods.\"\"\"\n # Define a new decorator, named \"decorator\", to return\n def decorator(func):\n # Ensure the decorated function keeps its metadata\n @wraps(func)\n def wrapper(*args, **kwargs):\n # Call the function being decorated and return the result\n return func(*args, **kwargs)\n\n wrapper.paper_title = f'This method is associated with tha paper of title: \"{title}\".'\n wrapper.paper_link = f\"This method is associated with the paper with link: {link}.\"\n return wrapper\n\n # Return the new decorator\n return decorator\n
"},{"location":"deeponto/utils/file_utils/","title":"File Utilities","text":""},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.create_path","title":"create_path(path)
","text":"Create a path recursively.
Source code insrc/deeponto/utils/file_utils.py
def create_path(path: str):\n\"\"\"Create a path recursively.\"\"\"\n Path(path).mkdir(parents=True, exist_ok=True)\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.save_file","title":"save_file(obj, save_path, sort_keys=False)
","text":"Save an object to a certain format.
Source code insrc/deeponto/utils/file_utils.py
def save_file(obj, save_path: str, sort_keys: bool = False):\n\"\"\"Save an object to a certain format.\"\"\"\n if save_path.endswith(\".json\"):\n with open(save_path, \"w\") as output:\n json.dump(obj, output, indent=4, separators=(\",\", \": \"), sort_keys=sort_keys)\n elif save_path.endswith(\".pkl\"):\n with open(save_path, \"wb\") as output:\n pickle.dump(obj, output, -1)\n elif save_path.endswith(\".yaml\"):\n with open(save_path, \"w\") as output:\n yaml.dump(obj, output, default_flow_style=False, allow_unicode=True)\n else:\n raise RuntimeError(f\"Unsupported saving format: {save_path}\")\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.load_file","title":"load_file(save_path)
","text":"Load an object of a certain format.
Source code insrc/deeponto/utils/file_utils.py
def load_file(save_path: str):\n\"\"\"Load an object of a certain format.\"\"\"\n if save_path.endswith(\".json\"):\n with open(save_path, \"r\") as input:\n return json.load(input)\n elif save_path.endswith(\".pkl\"):\n with open(save_path, \"rb\") as input:\n return pickle.load(input)\n elif save_path.endswith(\".yaml\"):\n with open(save_path, \"r\") as input:\n return yaml.safe_load(input)\n else:\n raise RuntimeError(f\"Unsupported loading format: {save_path}\")\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.copy2","title":"copy2(source, destination)
","text":"Copy a file from source to destination.
Source code insrc/deeponto/utils/file_utils.py
def copy2(source: str, destination: str):\n\"\"\"Copy a file from source to destination.\"\"\"\n try:\n shutil.copy2(source, destination)\n print(f\"copied successfully FROM {source} TO {destination}\")\n except shutil.SameFileError:\n print(f\"same file exists at {destination}\")\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_table","title":"read_table(table_file_path)
","text":"Read csv
or tsv
file as pandas dataframe without treating \"NULL\"
, \"null\"
, and \"n/a\"
as an empty string.
src/deeponto/utils/file_utils.py
def read_table(table_file_path: str):\nr\"\"\"Read `csv` or `tsv` file as pandas dataframe without treating `\"NULL\"`, `\"null\"`, and `\"n/a\"` as an empty string.\"\"\"\n # TODO: this might change with the version of pandas\n na_vals = pd.io.parsers.readers.STR_NA_VALUES.difference({\"NULL\", \"null\", \"n/a\"})\n sep = \"\\t\" if table_file_path.endswith(\".tsv\") else \",\"\n return pd.read_csv(table_file_path, sep=sep, na_values=na_vals, keep_default_na=False)\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_jsonl","title":"read_jsonl(file_path)
","text":"Read .jsonl
file (list of json) introduced in the BLINK project.
src/deeponto/utils/file_utils.py
def read_jsonl(file_path: str):\n\"\"\"Read `.jsonl` file (list of json) introduced in the BLINK project.\"\"\"\n results = []\n key_set = []\n with open(file_path, \"r\", encoding=\"utf-8-sig\") as f:\n lines = f.readlines()\n for line in lines:\n record = json.loads(line)\n results.append(record)\n key_set += list(record.keys())\n print(f\"all available keys: {set(key_set)}\")\n return results\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_oaei_mappings","title":"read_oaei_mappings(rdf_file)
","text":"To read mapping files in the OAEI rdf format.
Source code insrc/deeponto/utils/file_utils.py
def read_oaei_mappings(rdf_file: str):\n\"\"\"To read mapping files in the OAEI rdf format.\"\"\"\n xml_root = ET.parse(rdf_file).getroot()\n ref_mappings = [] # where relation is \"=\"\n ignored_mappings = [] # where relation is \"?\"\n\n for elem in xml_root.iter():\n # every Cell contains a mapping of en1 -rel(some value)-> en2\n if \"Cell\" in elem.tag:\n en1, en2, rel, measure = None, None, None, None\n for sub_elem in elem:\n if \"entity1\" in sub_elem.tag:\n en1 = list(sub_elem.attrib.values())[0]\n elif \"entity2\" in sub_elem.tag:\n en2 = list(sub_elem.attrib.values())[0]\n elif \"relation\" in sub_elem.tag:\n rel = sub_elem.text\n elif \"measure\" in sub_elem.tag:\n measure = sub_elem.text\n row = (en1, en2, measure)\n # =: equivalent; > superset of; < subset of.\n if rel == \"=\" or rel == \">\" or rel == \"<\":\n # rel.replace(\">\", \">\").replace(\"<\", \"<\")\n ref_mappings.append(row)\n elif rel == \"?\":\n ignored_mappings.append(row)\n else:\n print(\"Unknown Relation Warning: \", rel)\n\n print('#Maps (\"=\"):', len(ref_mappings))\n print('#Maps (\"?\"):', len(ignored_mappings))\n\n return ref_mappings, ignored_mappings\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.run_jar","title":"run_jar(jar_command, timeout=3600)
","text":"Run jar command using subprocess.
Source code insrc/deeponto/utils/file_utils.py
def run_jar(jar_command: str, timeout=3600):\n\"\"\"Run jar command using subprocess.\"\"\"\n print(f\"Run jar command with timeout: {timeout}s.\")\n proc = subprocess.Popen(jar_command.split(\" \"))\n try:\n _, _ = proc.communicate(timeout=timeout)\n except subprocess.TimeoutExpired:\n warnings.warn(\"kill the jar process as timed out\")\n proc.kill()\n _, _ = proc.communicate()\n
"},{"location":"deeponto/utils/logging/","title":"Logging","text":""},{"location":"deeponto/utils/logging/#deeponto.utils.logging.RuntimeFormatter","title":"RuntimeFormatter(*args, **kwargs)
","text":" Bases: logging.Formatter
Auxiliary class for runtime formatting in the logger.
Source code insrc/deeponto/utils/logging.py
def __init__(self, *args, **kwargs):\n super().__init__(*args, **kwargs)\n self.start_time = time.time()\n
"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.RuntimeFormatter.formatTime","title":"formatTime(record, datefmt=None)
","text":"Record relative runtime in hr:min:sec format\u3002
Source code insrc/deeponto/utils/logging.py
def formatTime(self, record, datefmt=None):\n\"\"\"Record relative runtime in hr:min:sec format\u3002\"\"\"\n duration = datetime.datetime.utcfromtimestamp(record.created - self.start_time)\n elapsed = duration.strftime(\"%H:%M:%S\")\n return \"{}\".format(elapsed)\n
"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.create_logger","title":"create_logger(model_name, saved_path)
","text":"Create logger for both console info and saved info.
The pre-existed log file will be cleared before writing into new messages.
Source code insrc/deeponto/utils/logging.py
def create_logger(model_name: str, saved_path: str):\n\"\"\"Create logger for both console info and saved info.\n\n The pre-existed log file will be cleared before writing into new messages.\n \"\"\"\n logger = logging.getLogger(model_name)\n logger.setLevel(logging.DEBUG)\n # create file handler which logs even debug messages\n fh = logging.FileHandler(f\"{saved_path}/{model_name}.log\", mode=\"w\") # \"w\" means clear the log file before writing\n fh.setLevel(logging.DEBUG)\n # create console handler with a higher log level\n ch = logging.StreamHandler()\n ch.setLevel(logging.INFO)\n # create formatter and add it to the handlers\n formatter = RuntimeFormatter(\"[Time: %(asctime)s] - [PID: %(process)d] - [Model: %(name)s] \\n%(message)s\")\n fh.setFormatter(formatter)\n ch.setFormatter(formatter)\n # add the handlers to the logger\n logger.addHandler(fh)\n logger.addHandler(ch)\n logger.propagate = False\n return logger\n
"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.banner_message","title":"banner_message(message, sym='^')
","text":"Print a banner message surrounded by special symbols.
Source code insrc/deeponto/utils/logging.py
def banner_message(message: str, sym=\"^\"):\n\"\"\"Print a banner message surrounded by special symbols.\"\"\"\n print()\n message = message.upper()\n banner_len = len(message) + 4\n message = \" \" * ((banner_len - len(message)) // 2) + message\n message = message + \" \" * (banner_len - len(message))\n print(message)\n print(sym * banner_len)\n print()\n
"},{"location":"deeponto/utils/text_utils/","title":"Text Utilities","text":""},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer","title":"Tokenizer(tokenizer_type)
","text":"A Tokenizer class for both sub-word (pre-trained) and word (rule-based) level tokenization.
Source code insrc/deeponto/utils/text_utils.py
def __init__(self, tokenizer_type: str):\n self.type = tokenizer_type\n self._tokenizer = None # hidden tokenizer\n self.tokenize = None # the tokenization method\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer.from_pretrained","title":"from_pretrained(pretrained_path='bert-base-uncased')
classmethod
","text":"(Based on transformers) Load a sub-word level tokenizer from pre-trained model.
Source code insrc/deeponto/utils/text_utils.py
@classmethod\ndef from_pretrained(cls, pretrained_path: str = \"bert-base-uncased\"):\n\"\"\"(Based on **transformers**) Load a sub-word level tokenizer from pre-trained model.\"\"\"\n instance = cls(\"pre-trained\")\n instance._tokenizer = AutoTokenizer.from_pretrained(pretrained_path)\n instance.tokenize = instance._tokenizer.tokenize\n return instance\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer.from_rule_based","title":"from_rule_based()
classmethod
","text":"(Based on spacy) Load a word-level (rule-based) tokenizer.
Source code insrc/deeponto/utils/text_utils.py
@classmethod\ndef from_rule_based(cls):\n\"\"\"(Based on **spacy**) Load a word-level (rule-based) tokenizer.\"\"\"\n spacy.prefer_gpu()\n instance = cls(\"rule-based\")\n instance._tokenizer = English()\n instance.tokenize = lambda texts: [word.text for word in instance._tokenizer(texts).doc]\n return instance\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.InvertedIndex","title":"InvertedIndex(index, tokenizer)
","text":"Inverted index built from a text index.
Attributes:
Name Type Descriptiontokenizer
Tokenizer
A tokenizer instance to be used.
original_index
defaultdict
A dictionary where the values are text strings to be tokenized.
constructed_index
defaultdict
A dictionary that acts as the inverted index of original_index
.
src/deeponto/utils/text_utils.py
def __init__(self, index: defaultdict, tokenizer: Tokenizer):\n self.tokenizer = tokenizer\n self.original_index = index\n self.constructed_index = defaultdict(list)\n for k, v in self.original_index.items():\n # value is a list of strings\n for token in self.tokenizer(v):\n self.constructed_index[token].append(k)\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.InvertedIndex.idf_select","title":"idf_select(texts, pool_size=200)
","text":"Given a list of tokens, select a set candidates based on the inverted document frequency (idf) scores.
We use idf
instead of tf
because labels have different lengths and thus tf is not a fair measure.
src/deeponto/utils/text_utils.py
def idf_select(self, texts: Union[str, List[str]], pool_size: int = 200):\n\"\"\"Given a list of tokens, select a set candidates based on the inverted document frequency (idf) scores.\n\n We use `idf` instead of `tf` because labels have different lengths and thus tf is not a fair measure.\n \"\"\"\n candidate_pool = defaultdict(lambda: 0)\n # D := number of \"documents\", i.e., number of \"keys\" in the original index\n D = len(self.original_index)\n for token in self.tokenizer(texts):\n # each token is associated with some classes\n potential_candidates = self.constructed_index[token]\n if not potential_candidates:\n continue\n # We use idf instead of tf because the text for each class is of different length, tf is not a fair measure\n # inverse document frequency: with more classes to have the current token tk, the score decreases\n idf = math.log10(D / len(potential_candidates))\n for candidate in potential_candidates:\n # each candidate class is scored by sum(idf)\n candidate_pool[candidate] += idf\n candidate_pool = list(sorted(candidate_pool.items(), key=lambda item: item[1], reverse=True))\n # print(f\"Select {min(len(candidate_pool), pool_size)} candidates.\")\n # select the first K ranked\n return candidate_pool[:pool_size]\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.process_annotation_literal","title":"process_annotation_literal(annotation_literal, apply_lowercasing=False, normalise_identifiers=False)
","text":"Pre-process an annotation literal string.
Parameters:
Name Type Description Defaultannotation_literal
str
A literal string of an entity's annotation.
requiredapply_lowercasing
bool
A boolean that determines lowercasing or not. Defaults to False
.
False
normalise_identifiers
bool
Whether to normalise annotation text that is in the Java identifier format. Defaults to False
.
False
Returns:
Type Descriptionstr
the processed annotation literal string.
Source code insrc/deeponto/utils/text_utils.py
def process_annotation_literal(\n annotation_literal: str, apply_lowercasing: bool = False, normalise_identifiers: bool = False\n):\n\"\"\"Pre-process an annotation literal string.\n\n Args:\n annotation_literal (str): A literal string of an entity's annotation.\n apply_lowercasing (bool): A boolean that determines lowercasing or not. Defaults to `False`.\n normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format. Defaults to `False`.\n\n Returns:\n (str): the processed annotation literal string.\n \"\"\"\n\n # replace the underscores with spaces\n annotation_literal = annotation_literal.replace(\"_\", \" \")\n\n # if the annotation literal is a valid identifier with first letter capitalised\n # we suspect that it could be a Java style identifier that needs to be split\n if normalise_identifiers and annotation_literal[0].isupper() and annotation_literal.isidentifier():\n annotation_literal = split_java_identifier(annotation_literal)\n\n # lowercase the annotation literal if specfied\n if apply_lowercasing:\n annotation_literal = annotation_literal.lower()\n\n return annotation_literal\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.split_java_identifier","title":"split_java_identifier(java_style_identifier)
","text":"Split words in java's identifier style into natural language phrase.
Examples:
\"SuperNaturalPower\"
\\(\\rightarrow\\) \"Super Natural Power\"
\"APIReference\"
\\(\\rightarrow\\) \"API Reference\"
\"Covid19\"
\\(\\rightarrow\\) \"Covid 19\"
src/deeponto/utils/text_utils.py
def split_java_identifier(java_style_identifier: str):\nr\"\"\"Split words in java's identifier style into natural language phrase.\n\n Examples:\n - `\"SuperNaturalPower\"` $\\rightarrow$ `\"Super Natural Power\"`\n - `\"APIReference\"` $\\rightarrow$ `\"API Reference\"`\n - `\"Covid19\"` $\\rightarrow$ `\"Covid 19\"`\n \"\"\"\n # split at every capital letter or number (numbers are treated as capital letters)\n raw_words = re.findall(\"([0-9A-Z][a-z]*)\", java_style_identifier)\n words = []\n capitalized_word = \"\"\n for i, w in enumerate(raw_words):\n # the above regex pattern will split at capitals\n # so the capitalized words are split into characters\n # i.e., (len(w) == 1)\n if len(w) == 1:\n capitalized_word += w\n # edge case for the last word\n if i == len(raw_words) - 1:\n words.append(capitalized_word)\n\n # if the the current w is a full word, save the previous\n # cached capitalized_word and also save current full word\n elif capitalized_word:\n words.append(capitalized_word)\n words.append(w)\n capitalized_word = \"\"\n\n # just save the current full word otherwise\n else:\n words.append(w)\n\n return \" \".join(words)\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"DeepOnto","text":"A package for ontology engineering with deep learning.
News
deeponto.onto.taxonomy
; add the structural reasoner type. (v0.8.8)deeponto.align.oaei
for scripts at the sub-repository OAEI-Bio-ML as well as bug fixing. (v0.8.4)deeponto.onto.OntologyNormaliser
and deeponto.onto.OntologyProjector
(v0.8.0).deeponto.subs.bertsubs
and deeponto.onto.pruning
modules (v0.7.0).deeponto.probe.ontolama
and deeponto.onto.verbalisation
modules (v0.6.0). Check the complete changelog and FAQs. The FAQs page does not contain much information now but will be updated according to feedback.
"},{"location":"#about","title":"About","text":"\\(\\textsf{DeepOnto}\\) aims to provide building blocks for implementing deep learning models, constructing resources, and conducting evaluation for various ontology engineering purposes.
\\(\\textsf{DeepOnto}\\) relies on OWLAPI version 4 (written in Java) for ontologies.
We follow what has been implemented in mOWL that uses JPype to bridge Python and Java Virtual Machine (JVM). Please check JPype's installation page for successful JVM initialisation.
"},{"location":"#pytorch","title":"Pytorch","text":"\\(\\textsf{DeepOnto}\\) relies on Pytorch for deep learning framework.
We recommend installing Pytorch prior to installing DeepOnto following the commands listed on the Pytorch webpage. Notice that users can choose either GPU (with CUDA) or CPU version of Pytorch.
In case the most recent Pytorch version causes any incompatibility issues, use the following command (with CUDA 11.6
) known to work:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116\n
Basic usage of DeepOnto does not rely on GPUs, but for efficient deep learning model training, please make sure torch.cuda.is_available()
returns True
.
Other dependencies are specified in setup.cfg
and requirements.txt
which are supposed to be installed along with deeponto
.
# requiring Python>=3.8\npip install deeponto\n
"},{"location":"#install-from-git-repository","title":"Install from Git Repository","text":"To install the latest, probably unreleased version of deeponto, you can directly install from the repository.
pip install git+https://github.com/KRR-Oxford/DeepOnto.git\n
"},{"location":"#main-features","title":"Main Features","text":"
Figure: Illustration of DeepOnto's architecture.
"},{"location":"#ontology-processing","title":"Ontology Processing","text":"The base class of \\(\\textsf{DeepOnto}\\) is Ontology
, which serves as the main entry point for introducing the OWLAPI's features, such as accessing ontology entities, querying for ancestor/descendent (and parent/child) concepts, deleting entities, modifying axioms, and retrieving annotations. See quick usage at load an ontology. Along with these basic functionalities, several essential sub-modules are built to enhance the core module, including the following:
Ontology Reasoning (OntologyReasoner
): Each instance of \\(\\textsf{DeepOnto}\\) has a reasoner as its attribute. It is used for conducting reasoning activities, such as obtaining inferred subsumers and subsumees, as well as checking entailment and consistency.
Ontology Pruning (OntologyPruner
): This sub-module aims to incorporate pruning algorithms for extracting a sub-ontology from an input ontology. We currently implement the one proposed in [2], which introduces subsumption axioms between the asserted (atomic or complex) parents and children of the class targeted for removal.
Ontology Verbalisation (OntologyVerbaliser
): The recursive concept verbaliser proposed in [4] is implemented here, which can automatically transform a complex logical expression into a textual sentence based on entity names or labels available in the ontology. See verbalising ontology concepts.
Ontology Projection (OntologyProjector
): The projection algorithm adopted in the OWL2Vec* ontology embeddings is implemented here, which is to transform an ontology's TBox into a set of RDF triples. The relevant code is modified from the mOWL library.
Ontology Normalisation (OntologyNormaliser
): The implemented \\(\\mathcal{EL}\\) normalisation is also modified from the mOWL library, which is used to transform TBox axioms into normalised forms to support, e.g., geometric ontology embeddings.
Ontology Taxonomy (OntologyTaxonomy
): The taxonomy extracted from an ontology is a directed acyclic graph for the subsumption hierarchy, which is often used to support graph-based deep learning applications.
Individual tools and resources are implemented based on the core ontology processing module. Currently, \\(\\textsf{DeepOnto}\\) supports the following:
BERTMap [1] is a BERT-based ontology matching (OM) system originally developed in repo but is now maintained in \\(\\textsf{DeepOnto}\\). See Ontology Matching with BERTMap & BERTMapLt.
Bio-ML [2] is an OM resource that has been used in the Bio-ML track of the OAEI. See Bio-ML: A Comprehensive Documentation.
BERTSubs [3] is a system for ontology subsumption prediction. We have transformed its original experimental code into this project. See Subsumption Inference with BERTSubs.
OntoLAMA [4] is an evaluation of language model probing datasets for ontology subsumption inference. See OntoLAMA: Dataset Overview & Usage Guide for the use of the datasets and the prompt-based probing approach.
License
Copyright 2021-2023 Yuan He. Copyright 2023 Yuan He, Jiaoyan Chen. All rights reserved.
Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
"},{"location":"#citation","title":"Citation","text":"The preprint of our system paper for \\(\\textsf{DeepOnto}\\) is currently available at arxiv.
Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, and Brahmananda Sapkota. DeepOnto: A Python Package for Ontology Engineering with Deep Learning. arXiv preprint arXiv:2307.03067 (2023).
@article{he2023deeponto,\n title={DeepOnto: A Python Package for Ontology Engineering with Deep Learning},\n author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian and Allocca, Carlo and Kim, Taehun and Sapkota, Brahmananda},\n journal={arXiv preprint arXiv:2307.03067},\n year={2023}\n}\n
"},{"location":"#relevant-publications","title":"Relevant Publications","text":"Please report any bugs or queries by raising a GitHub issue or sending emails to the maintainers (Yuan He or Jiaoyan Chen) through:
first_name.last_name@cs.ox.ac.uk
"},{"location":"bertmap/","title":"Ontology Matching with BERTMap and BERTMapLt","text":"Paper
Paper for BERTMap: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).
@inproceedings{he2022bertmap,\n title={BERTMap: a BERT-based ontology alignment system},\n author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},\n booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n volume={36},\n number={5},\n pages={5684--5691},\n year={2022}\n}\n
This page gives the tutorial for \\(\\textsf{BERTMap}\\) family including the summary of the models and how to use them.
Figure 1. Pipeline illustration of BERTMap.
The ontology matching (OM) pipeline of \\(\\textsf{BERTMap}\\) consists of following steps:
(src_annotation, tgt_annotation, synonym_label)
into training and validation sets.Predict mappings for each class \\(c\\) of the source ontology \\(\\mathcal{O}\\) by:
Extend the raw predictions using an iterative algorithm based on the locality principle. To be specific, if \\(c\\) and \\(c'\\) are matched with a relatively high mapping score (\\(\\geq \\kappa\\)), then search for plausible mappings between the parents (resp. children) of \\(c\\) and the parents (resp. children) of \\(c'\\). This process is iterative because there would be new highly scored mappings at each round. Terminate mapping extension when there is no new mapping with score \\(\\geq \\kappa\\) found or it exceeds the maximum number of iterations. Note that \\(\\kappa\\) is set to \\(0.9\\) by default, as in the original paper.
Truncate the extended mappings by preserving only those with scores \\(\\geq \\lambda\\). In the original paper, \\(\\lambda\\) is supposed to be tuned on validation mappings \u2013 which are often not available. Also, \\(\\lambda\\) is not a sensitive hyperparameter in practice. Therefore, we manually set \\(\\lambda\\) to \\(0.9995\\) as a default value which usually yields a higher F1 score. Note that both \\(\\kappa\\) and \\(\\lambda\\) are made available in the configuration file.
Repair the rest of the mappings with the repair module built in LogMap (BERTMap does not focus on mapping repair). In short, a minimum set of inconsistent mappings will be removed (further improve precision).
Steps 5-8 are referred to as the global matching process which computes OM mappings from two input ontologies. \\(\\textsf{BERTMapLt}\\) is the light-weight version without BERT training and mapping refinement. The mapping filtering threshold for \\(\\textsf{BERTMapLt}\\) is \\(1.0\\) (i.e., string-matched).
In addition to the traditional OM procedure, the scoring modules of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) can be used to evaluate any class pair given their selected annotations. This is useful in ranking-based evaluation.
Warning
The \\(\\textsf{BERTMap}\\) family rely on sufficient class annotations for constructing training corpora of the BERT synonym classifier, especially under the unsupervised setting where there are no input mappings and/or external resources. It is very important to specify correct annotation properties in the configuration file.
"},{"location":"bertmap/#usage","title":"Usage","text":"To use \\(\\textsf{BERTMap}\\), a configuration file and two input ontologies to be matched should be imported.
from deeponto.onto import Ontology\nfrom deeponto.align.bertmap import BERTMapPipeline\n\nconfig_file = \"path_to_config.yaml\"\nsrc_onto_file = \"path_to_the_source_ontology.owl\" \ntgt_onto_file = \"path_to_the_target_ontology.owl\" \n\nconfig = BERTMapPipeline.load_bertmap_config(config_file)\nsrc_onto = Ontology(src_onto_file)\ntgt_onto = Ontology(tgt_onto_file)\n\nBERTMapPipeline(src_onto, tgt_onto, config)\n
The default configuration file can be loaded as:
from deeponto.align.bertmap import BERTMapPipeline, DEFAULT_CONFIG_FILE\n\nconfig = BERTMapPipeline.load_bertmap_config(DEFAULT_CONFIG_FILE)\n
The loaded configuration is a CfgNode
object supporting attribute access of dictionary keys. To customise the configuration, users can either copy the DEFAULT_CONFIG_FILE
, save it locally using BERTMapPipeline.save_bertmap_config
method, and modify it accordingly; or change it in the run time.
from deeponto.align.bertmap import BERTMapPipeline, DEFAULT_CONFIG_FILE\n\nconfig = BERTMapPipeline.load_bertmap_config(DEFAULT_CONFIG_FILE)\n\n# save the configuration file\nBERTMapPipeline.save_bertmap_config(config, \"path_to_saved_config.yaml\")\n\n# modify it in the run time\n# for example, add more annotation properties for synonyms\nconfig.annotation_property_iris.append(\"http://...\") \n
If using \\(\\textsf{BERTMap}\\) for scoring class pairs instead of global matching, disable automatic global matching and load class pairs to be scored.
from deeponto.onto import Ontology\nfrom deeponto.align.bertmap import BERTMapPipeline\n\nconfig_file = \"path_to_config.yaml\"\nsrc_onto_file = \"path_to_the_source_ontology.owl\" \ntgt_onto_file = \"path_to_the_target_ontology.owl\" \n\nconfig = BERTMapPipeline.load_bertmap_config(config_file)\nconfig.global_matching.enabled = False\nsrc_onto = Ontology(src_onto_file)\ntgt_onto = Ontology(tgt_onto_file)\n\nbertmap = BERTMapPipeline(src_onto, tgt_onto, config)\n\nclass_pairs_to_be_scored = [...] # (src_class_iri, tgt_class_iri)\nfor src_class_iri, tgt_class_iri in class_pairs_to_be_scored:\n # retrieve class annotations\n src_class_annotations = bertmap.src_annotation_index[src_class_iri]\n tgt_class_annotations = bertmap.tgt_annotation_index[tgt_class_iri]\n # the bertmap score\n bertmap_score = bertmap.mapping_predictor.bert_mapping_score(\n src_class_annotations, tgt_class_annotations\n )\n # the bertmaplt score\n bertmaplt_score = bertmap.mapping_predictor.edit_similarity_mapping_score(\n src_class_annotations, tgt_class_annotations\n )\n ...\n
Tip
The implemented \\(\\textsf{BERTMap}\\) by default searches for each source ontology class a set of possible matched target ontology classes. Because of this, it is recommended to set the source ontology as the one with a smaller number of classes for efficiency.
Note that in the original paper, the model is expected to match for both directions src2tgt
and tgt2src
, and also consider the combination of both results. However, this does not usually bring better performance and consumes significantly more time. Therefore, this feature is discarded and the users can choose which direction to match.
Warning
Occasionally, the fine-tuning loss may not be converging and the validation accuracy is not improving; in that case, set to a different random seed can usually fix the problem.
"},{"location":"bertmap/#configuration","title":"Configuration","text":"The default configuration file looks like:
model: bertmap # bertmap or bertmaplt\n\noutput_path: null # if not provided, the current path \".\" is used\n\nannotation_property_iris:\n- http://www.w3.org/2000/01/rdf-schema#label # rdfs:label\n- http://www.geneontology.org/formats/oboInOwl#hasSynonym\n- http://www.geneontology.org/formats/oboInOwl#hasExactSynonym\n- http://www.w3.org/2004/02/skos/core#exactMatch\n- http://www.ebi.ac.uk/efo/alternative_term\n- http://www.orpha.net/ORDO/Orphanet_#symbol\n- http://purl.org/sig/ont/fma/synonym\n- http://www.w3.org/2004/02/skos/core#prefLabel\n- http://www.w3.org/2004/02/skos/core#altLabel\n- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P108\n- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P90\n\n# additional corpora \nknown_mappings: null # cross-ontology corpus\nauxiliary_ontos: [] # auxiliary corpus\n\n# bert config\nbert: pretrained_path: emilyalsentzer/Bio_ClinicalBERT max_length_for_input: 128 num_epochs_for_training: 3.0\nbatch_size_for_training: 32\nbatch_size_for_prediction: 128\nresume_training: null\n\n# global matching config\nglobal_matching:\nenabled: true\nnum_raw_candidates: 200 num_best_predictions: 10 mapping_extension_threshold: 0.9 mapping_filtered_threshold: 0.9995 for_oaei: false\n
"},{"location":"bertmap/#bertmap-or-bertmaplt","title":"BERTMap or BERTMapLt","text":"config.model
By changing this parameter to bertmap
or bertmaplt
, users can switch between \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\). Note that \\(\\textsf{BERTMapLt}\\) does not use any training and mapping refinement parameters."},{"location":"bertmap/#annotation-properties","title":"Annotation Properties","text":"config.annotation_property_iris
The IRIs stored in this parameter refer to annotation properties with literal values that define the synonyms of an ontology class. Many ontology matching systems rely on synonyms for good performance, including the \\(\\textsf{BERTMap}\\) family. The default config.annotation_property_iris
are in line with the Bio-ML dataset, which will be constantly updated. Users can append or delete IRIs for specific input ontologies. Note that it is safe to specify all possible annotation properties regardless of input ontologies because the ones that are not used will be ignored.
"},{"location":"bertmap/#additional-training-data","title":"Additional Training Data","text":"The text semantics corpora by default (unsupervised setting) will consist of two intra-ontology sub-corpora built from two input ontologies (based on the specified annotation properties). To add more training data, users can opt to feed input mappings (cross-ontology sub-corpus) and/or a list of auxiliary ontologies (auxiliary sub-corpora).
config.known_mappings
Specify the path to input mapping file here; the input mapping file should be a .tsv
or .csv
file with three columns with headings: [\"SrcEntity\", \"TgtEntity\", \"Score\"]
. Each row corresponds to a triple \\((c, c', s(c, c'))\\) where \\(c\\) is a source ontology class, \\(c'\\) is a target ontology class, and \\(s(c, c')\\) is the matching score. Note that in the BERTMap context, input mapppings are assumed to be gold standard (reference) mappings with scores equal to \\(1.0\\). Regardless of scores specified in the mapping file, the scores of the input mapppings will be adjusted to \\(1.0\\) automatically. config.auxiliary_ontos
Specify a list of paths to auxiliary ontology files here. For each auxiliary ontology, a corresponding intra-ontology corpus will be created and thus produce more synonym and non-synonym samples."},{"location":"bertmap/#bert-settings","title":"BERT Settings","text":"config.bert.pretrained_path
\\(\\textsf{BERTMap}\\) uses the pre-trained Bio-Clincal BERT as specified in this parameter because it was originally applied on biomedical ontologies. For general purpose ontology matching, users can use pre-trained variants such as bert-base-uncased
. config.bert.batch_size_for_training
Batch size for BERT fine-tuning. config.bert.batch_size_for_prediction
Batch size for BERT validation and mapping prediction. Adjust these two parameters if users found an inappropriate GPU memory fit.
config.bert.resume_training
Set to true
if the BERT training process is somehow interrupted and users wish to continue training."},{"location":"bertmap/#global-matching-settings","title":"Global Matching Settings","text":"config.global_matching.enabled
As mentioned in usage, users can disable automatic global matching by setting this parameter to false
if they wish to use the mapping scoring module only. config.global_matching.num_raw_candidates
Set the number of raw candidates selected in the mapping prediction phase. config.global_matching.num_best_predictions
Set the number of best scored mappings preserved in the mapping prediction phase. The default value 10
is often more than enough. config.global_matching.mapping_extension_threshold
Set the score threshold of mappings used in the iterative mapping extension process. Higher value shortens the time but reduces the recall. config.global_matching.mapping_filtered_threshold
The score threshold of mappings preserved for final mapping refinement. config.global_matching.for_oaei
Set to false
for normal use and set to true
for the OAEI 2023 Bio-ML Track such that entities that are annotated as not used in alignment will be ignored during global matching."},{"location":"bertmap/#output-format","title":"Output Format","text":"Running \\(\\textsf{BERTMap}\\) will create a directory named bertmap
or bertmaplt
in the specified output path. The file structure of this directory is as follows:
bertmap\n\u251c\u2500\u2500 data\n\u2502 \u251c\u2500\u2500 fine-tune.data.json\n\u2502 \u2514\u2500\u2500 text-semantics.corpora.json\n\u251c\u2500\u2500 bert\n\u2502 \u251c\u2500\u2500 tensorboard\n\u2502 \u251c\u2500\u2500 checkpoint-{some_number}\n\u2502 \u2514\u2500\u2500 checkpoint-{some_number}\n\u251c\u2500\u2500 match\n\u2502 \u251c\u2500\u2500 logmap-repair\n\u2502 \u251c\u2500\u2500 raw_mappings.json\n\u2502 \u251c\u2500\u2500 repaired_mappings.tsv \n\u2502 \u251c\u2500\u2500 raw_mappings.tsv\n\u2502 \u251c\u2500\u2500 extended_mappings.tsv\n\u2502 \u2514\u2500\u2500 filtered_mappings.tsv\n\u251c\u2500\u2500 bertmap.log\n\u2514\u2500\u2500 config.yaml\n
It is worth mentioning that the match
sub-directory contains all the global matching files:
raw_mappings.tsv
The raw mapping predictions before mapping refinement. The .json
one is used internally to prevent accidental interruption. Note that bertmaplt
only produces raw mapping predictions (no mapping refinement). extended_mappings.tsv
The output mappings after applying mapping extension. filtered_mappings.tsv
The output mappings after mapping extension and threshold filtering. logmap-repair
A folder containing intermediate files needed for applying LogMap's debugger. repaired_mappings.tsv
The final output mappings after mapping repair."},{"location":"bertsubs/","title":"Subsumption Prediction with BERTSubs","text":"Paper
Paper for BERTSubs: Contextual Semantic Embeddings for Ontology Subsumption Prediction (accepted by WWW Journal in 2023).
@article{chen2023contextual,\n title={Contextual semantic embeddings for ontology subsumption prediction},\n author={Chen, Jiaoyan and He, Yuan and Geng, Yuxia and Jim{\\'e}nez-Ruiz, Ernesto and Dong, Hang and Horrocks, Ian},\n journal={World Wide Web},\n pages={1--23},\n year={2023},\n publisher={Springer}\n}\n
This page gives the tutorial for \\(\\textsf{BERTSubs}\\) including the functions, the summary of the models and usage instructions.
The current version of \\(\\textsf{BERTSubs}\\) is able to predict:
Figure 1. Pipeline illustration of BERTSubs.
The pipeline of \\(\\textsf{BERTSubs}\\) consists of following steps.
Corpus Construction: extracting a set of sentence pairs from positive and negative subsumptions from the target ontology (or ontologies), with one of the following three templates used for transforming each class into a sentence,
Model Fine-tuning: fine-tuning a language model such as BERT with the above sentence pairs.
Note that the optionally given subsumptions via a train subsumption file can also be used for fine-tuning. Please see more technical details in the paper.
"},{"location":"bertsubs/#evaluation-case-and-dataset-ontology-completion","title":"Evaluation Case and Dataset (Ontology Completion)","text":"The evaluation is implemented scripts/bertsubs_intra_evaluate.py. Download an ontology (e.g., FoodOn) and run:
python bertsubs_intra_evaluate.py --onto_file ./foodon-merged.0.4.8.owl\n
The parameter --subsumption_type can be set to 'restriction' for complex class subsumptions, and 'named_class' for named class subsumptions. Please see the programme for more parameters and their meanings.
It executes the following procedure:
The named class or complex class subsumption axioms of an ontology is partitioned into a train set, a valid set and a test set. They are saved as train, valid and test files, respectively.
The test and the valid subsumption axioms are removed from the original ontology, and a new ontology is saved.
Notice: for a named class test/valid subsumption, a set of negative candidate super classes are extracted from the ground truth super class's neighbourhood. For a complex class test/valid subsumption, a set of negative candidate super classes are randomly extracted from all the complex classes in the ontology.
"},{"location":"bertsubs/#usage","title":"Usage","text":"To run \\(\\textsf{BERTSubs}\\), a configuration file and one input ontology (or two ontologies) are mandatory. If candidate class pairs are given, a fine-tuned language model and a file with predicted scores of the candidate class pairs in the test file are output; otherwise, only the fine-grained language model is output. The test metrics (MRR and Hits@K) can also be output if the ground truth and a set of negative candidate super classes are given for the subclass of each valid/test subsumption.
The following code is for intra-ontology subsumption.
from yacs.config import CfgNode\nfrom deeponto.complete.bertsubs import BERTSubsIntraPipeline, DEFAULT_CONFIG_FILE_INTRA\nfrom deeponto.utils import load_file\nfrom deeponto.onto import Ontology\n\nconfig = CfgNode(load_file(DEFAULT_CONFIG_FILE_INTRA)) # Load default configuration file\nconfig.onto_file = './foodon.owl'\nconfig.train_subsumption_file = './train_subsumptions.csv' # optional\nconfig.valid_subsumption_file = './valid_subsumptions.csv' # optional\nconfig.test_subsumption_file = './test_subsumptions.csv' #optional\nconfig.test_type = 'evaluation' #'evaluation': calculate metrics with ground truths given in the test_subsumption_file; 'prediction': predict scores for candidate subsumptions given in test_submission_file\nconfig.subsumption_type = 'named_class' # 'named_class' or 'restriction' \nconfig.prompt.prompt_type = 'isolated' # 'isolated', 'traversal', 'path' (three templates)\n\nonto = Ontology(owl_path=config.onto_file)\nintra_pipeline = BERTSubsIntraPipeline(onto=onto, config=config)\n
The following code is for inter-ontology subsumption.
from yacs.config import CfgNode\nfrom deeponto.complete.bertsubs import BERTSubsInterPipeline, DEFAULT_CONFIG_FILE_INTER\nfrom deeponto.utils import load_file\nfrom deeponto.onto import Ontology\n\nconfig = CfgNode(load_file(DEFAULT_CONFIG_FILE_INTER)) # Load default configuration file\nconfig.src_onto_file = './helis2foodon/helis_v1.00.owl'\nconfig.tgt_onto_file = './helis2foodon/foodon-merged.0.4.8.subs.owl'\nconfig.train_subsumption_file = './helis2foodon/train_subsumptions.csv' # optional\nconfig.valid_subsumption_file = './helis2foodon/valid_subsumptions.csv' # optional\nconfig.test_subsumption_file = './helis2foodon/test_subsumptions.csv' # optional\nconfig.test_type = 'evaluation' # 'evaluation', or 'prediction'\nconfig.subsumption_type = 'named_class' # 'named_class', or 'restriction'\nconfig.prompt.prompt_type = 'path' # 'isolated', 'traversal', 'path' (three templates)\n\nsrc_onto = Ontology(owl_path=config.src_onto_file)\ntgt_onto = Ontology(owl_path=config.tgt_onto_file)\ninter_pipeline = BERTSubsInterPipeline(src_onto=src_onto, tgt_onto=tgt_onto, config=config)\n
For more details on the configuration, please see the comment in the default configuration files default_config_intra.yaml and default_config_inter.yaml.
"},{"location":"bio-ml/","title":"Bio-ML: A Comprehensive Documentation","text":"paper
Paper for Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022). It was nominated as the best resource paper candidate at ISWC 2022.
@inproceedings{he2022machine,\n title={Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching},\n author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian},\n booktitle={The Semantic Web--ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23--27, 2022, Proceedings},\n pages={575--591},\n year={2022},\n organization={Springer}\n}\n
"},{"location":"bio-ml/#overview","title":"Overview","text":"\\(\\textsf{Bio-ML}\\) is a comprehensive ontology matching (OM) dataset that includes five ontology pairs for both equivalence and subsumption ontology matching. Two of these pairs are based on the Mondo ontology, and the remaining three are based on the UMLS ontology. The construction of these datasets encompasses several steps:
Dataset Download (License: CC BY 4.0 International):
Complete Documentation: https://krr-oxford.github.io/DeepOnto/bio-ml/ (this page).
In order to derive scalable Ontology Matching (OM) pairs, the ontology pruning algorithm propoased in the \\(\\textsf{Bio-ML}\\) paper can be utilised. This algorithm is designed to trim a large-scale ontology based on certain criteria, such as involvement in a reference mapping or association with a particular semantic type (see UMLS data scripts). The primary goal of the pruning function is to discard irrelevant ontology classes whilst preserving the relevant hierarchical structure.
More specifically, for each class, denoted as \\(c\\), that needs to be removed, subsumption axioms are created between the parent and child elements of \\(c\\). This step is followed by the removal of all axioms related to the unwanted classes.
Once a list of class IRIs to be removed has been compiled, the ontology pruning can be executed using the following code:
from deeponto.onto import Ontology, OntologyPruner\n\n# Load the DOID ontology\ndoid = Ontology(\"doid.owl\")\n\n# Initialise the ontology pruner\npruner = OntologyPruner(doid)\n\n# Specify the classes to be removed\nto_be_removed_class_iris = [\n \"http://purl.obolibrary.org/obo/DOID_0060158\",\n \"http://purl.obolibrary.org/obo/DOID_9969\"\n]\n\n# Perform the pruning operation\npruner.prune(to_be_removed_class_iris)\n\n# Save the pruned ontology locally\npruner.save_onto(\"doid.pruned.owl\") \n
"},{"location":"bio-ml/#subsumption-mapping-construction","title":"Subsumption Mapping Construction","text":"Ontology Matching (OM) datasets often include equivalence matching, but not subsumption matching. However, it is feasible to create a subsumption matching task from an equivalence matching task. Given a list of reference equivalence mappings, which take the form of \\({(c, c') | c \\equiv c' }\\), one can construct reference subsumption mappings by identifying the subsumers of \\(c'\\) and producing \\({(c, c'') | c \\equiv c', c' \\sqsubseteq c'' }\\). We have developed a subsumption mapping generator for this purpose.
from deeponto.onto import Ontology\nfrom deeponto.align.mapping import SubsFromEquivMappingGenerator, ReferenceMapping\n\n# Load the NCIT and DOID ontologies\nncit = Ontology(\"ncit.owl\")\ndoid = Ontology(\"doid.owl\")\n\n# Load the equivalence mappings\nncit2doid_equiv_mappings = ReferenceMapping.read_table_mappings(\"ncit2doid_equiv_mappings.tsv\") # The headings are [\"SrcEntity\", \"TgtEntity\", \"Score\"]\n\n# Initialise the subsumption mapping generator \n# and the mapping construction is automatically done\nsubs_generator = SubsFromEquivMappingGenerator(\n ncit, doid, ncit2doid_equiv_mappings, \n subs_generation_ratio=1, delete_used_equiv_tgt_class=True\n)\n
Output:
3299/4686 are used for creating at least one subsumption mapping.\n3305 subsumption mappings are created in the end.\n
Retrieve the generated subsumption mappings with:
subs_generator.subs_from_equivs\n
Output:
[('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C9311',\n 'http://purl.obolibrary.org/obo/DOID_120',\n 1.0),\n ('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C8410',\n 'http://purl.obolibrary.org/obo/DOID_1612',\n 1.0), ...]\n
See a concrete data script for this process at OAEI-Bio-ML/data_scripts/generate_subs_maps.py
.
The subs_generation_ratio
parameter determines at most how many subsumption mappings can be generated from an equivalence mapping. The delete_used_equiv_tgt_class
determines whether or not to sabotage equivalence mappings used for creating at least one subsumption mappings. If it is set to True
, then the target side of an (used) equivalence mapping will be marked as deleted from the target ontology. Then, apply ontology pruning to the list of to-be-deleted target ontology classes:
from deeponto.onto import OntologyPruner\n\npruner = OntologyPruner(doid)\npruner.prune(subs_generator.used_equiv_tgt_class_iris)\npruner.save_onto(\"doid.subs.owl\")\n
See a concrete data script for this process at OAEI-Bio-ML/data_scripts/generate_cand_maps.py
.
Note
In the OAEI 2023 version, the target class deletion is disabled as modularisation counteracts the effects of such deletion. For more details, refer to OAEI Bio-ML 2023.
"},{"location":"bio-ml/#candidate-mapping-generation","title":"Candidate Mapping Generation","text":"To evaluate an Ontology Matching (OM) model's capacity to identify correct mappings amid a pool of challenging negative candidates, we utilise the negative candidate mapping generation algorithm as proposed in the Bio-ML paper. This algorithm uses idf_sample
to generate candidates that are textually ambiguous (i.e., with similar naming), and neighbour_sample
to generate candidates that are structurally ambiguous (e.g., siblings). The algorithm ensures that none of the reference mappings are added as negative candidates. Additionally, for subsumption cases, the algorithm carefully excludes ancestors as they are technically correct subsumptions.
Use the following Python code to perform this operation:
from deeponto.onto import Ontology\nfrom deeponto.align.mapping import NegativeCandidateMappingGenerator, ReferenceMapping\nfrom deeponto.align.bertmap import BERTMapPipeline\n\n# Load the NCIT and DOID ontologies\nncit = Ontology(\"ncit.owl\")\ndoid = Ontology(\"doid.owl\")\n\n# Load the equivalence mappings\nncit2doid_equiv_mappings = ReferenceMapping.read_table_mappings(\"ncit2doid_equiv_mappings.tsv\") # The headings are [\"SrcEntity\", \"TgtEntity\", \"Score\"]\n\n# Load default config in BERTMap\nconfig = BERTMapPipeline.load_bertmap_config()\n\n# Initialise the candidate mapping generator\ncand_generator = NegativeCandidateMappingGenerator(\n ncit, doid, ncit2doid_equiv_mappings, \n annotation_property_iris = config.annotation_property_iris, # Used for idf sample\n tokenizer=Tokenizer.from_pretrained(config.bert.pretrained_path), # Used for idf sample\n max_hops=5, # Used for neighbour sample\n for_subsumptions=False, # Set to False because the input mappings in this example are equivalence mappings\n)\n\n# Sample candidate mappings for each reference equivalence mapping\nresults = []\nfor test_map in ncit2doid_equiv_mappings:\n valid_tgts, stats = neg_gen.mixed_sample(test_map, idf=50, neighbour=50)\n print(f\"STATS for {test_map}:\\n{stats}\")\n results.append((test_map.head, test_map.tail, valid_tgts))\nresults = pd.DataFrame(results, columns=[\"SrcEntity\", \"TgtEntity\", \"TgtCandidates\"])\nresults.to_csv(result_path, sep=\"\\t\", index=False)\n
See a concrete data script for this process at OAEI-Bio-ML/data_scripts/generate_cand_maps.py
.
The process of sampling using idf scores was originally proposed in the BERTMap paper. The annotation_property_iris
parameter specifies the list of annotation properties used to extract the names or aliases of an ontology class. The tokenizer
parameter refers to a pre-trained sub-word level tokenizer used to build the inverted annotation index. These aspects are thoroughly explained in the BERTMap tutorial.
Our evaluation protocol concerns two scenarios for OM: global matching for overall assessment and local ranking for partial assessment.
"},{"location":"bio-ml/#global-matching","title":"Global Matching","text":"As an overall assessment, given a complete set of reference mappings, an OM system is expected to compute a set of true mappings and compare against the reference mappings using Precision, Recall, and F-score metrics. With \\(\\textsf{DeepOnto}\\), the evaluation can be performed using the following code.
Matching Result
Download an example of matching result file. The three columns, \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
refer to the source class IRI, the target class IRI, and the matching score.
from deeponto.align.evaluation import AlignmentEvaluator\nfrom deeponto.align.mapping import ReferenceMapping, EntityMapping\n\n# load prediction mappings and reference mappings\npreds = EntityMapping.read_table_mappings(f\"{experiment_dir}/bertmap/match/repaired_mappings.tsv\")\nrefs = ReferenceMapping.read_table_mappings(f\"{data_dir}/refs_equiv/full.tsv\")\n\n# compute the precision, recall and F-score metrics\nresults = AlignmentEvaluator.f1(preds, refs)\nprint(results)\n
The associated formulas for Precision, Recall and F-score are:
\\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}, \\ \\ R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}, \\ \\ F_1 = \\frac{2 P R}{P + R}\\]where \\(\\mathcal{M}_{pred}\\) and \\(\\mathcal{M}_{ref}\\) denote the prediction mappings and reference mappings, respectively.
Output:
{'P': 0.887, 'R': 0.879, 'F1': 0.883}\n
For the semi-supervised setting where a small set of training mappings is provided, the training set should also be loaded and set as null (neither positive nor negative) with null_reference_mappings
during evaluation:
train_refs = ReferenceMapping.read_table_mappings(f\"{data_dir}/refs_equiv/train.tsv\")\nresults = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=train_refs)\n
When null reference mappings are involved, the formulas of Precision and Recall become:
\\[P = \\frac{|(\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}) - \\mathcal{M}_{null}|}{|\\mathcal{M}_{pred} - \\mathcal{M}_{null} |}, \\ \\ R = \\frac{|(\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}) - \\mathcal{M}_{null}|}{|\\mathcal{M}_{ref} - \\mathcal{M}_{null}|}\\]As for the OAEI 2023 version, some prediction mappings could involve classes that are marked as not used in alignment. Therefore, we need to filter out those mappings before evaluation.
from deeponto.onto import Ontology\nfrom deeponto.align.oaei import *\n\n# load the source and target ontologies and \n# extract classes that are marked as not used in alignment\nsrc_onto = Ontology(\"src_onto_file\")\ntgt_onto = Ontology(\"tgt_onto_file\")\nignored_class_index = get_ignored_class_index(src_onto)\nignored_class_index.update(get_ignored_class_index(tgt_onto))\n\n# filter the prediction mappings\npreds = remove_ignored_mappings(preds, ignored_class_index)\n\n# then compute the results\nresults = AlignmentEvaluator.f1(preds, refs, ...)\n
Tip
We have encapsulated above features in the matching_eval
function in the OAEI utilities.
However,
Therefore, the ranking-based evaluation protocol is presented as follows.
"},{"location":"bio-ml/#local-ranking","title":"Local Ranking","text":"An OM system is also expected to distinguish the reference mapping among a set of candidate mappings and the performance can be reflected in Hits@K and MRR metrics.
Warning
The reference subsumption mappings are inherently incomplete, so only the ranking metrics are adopted in evaluating system performance in subsumption matching.
Ranking Result
Download an example of raw (unscored) candidate mapping file and an example of scored candidate mapping file. The \"SrcEntity\"
and \"TgtEntity\"
columns refer to the source class IRI and the target class IRI involved in a reference mapping. The \"TgtCandidates\"
column stores a sequence of tgt_cand_iri
in the unscored file and a list of tuples (tgt_cand_iri, score)
in the scored file, which can be accessed by the built-in Python function eval
.
With \\(\\textsf{DeepOnto}\\), the evaluation can be performed as follows. First, an OM system needs to assign a score to each target candidate class and save the results as a list of tuples (tgt_cand_class_iri, matching_score)
.
from deeponto.utils import read_table\nimport pandas as pd\n\ntest_candidate_mappings = read_table(\"test.cands.tsv\").values.to_list()\nranking_results = []\nfor src_ref_class, tgt_ref_class, tgt_cands in test_candidate_mappings:\n tgt_cands = eval(tgt_cands) # transform string into list or sequence\n scored_cands = []\n for tgt_cand in tgt_cands:\n # assign a score to each candidate with an OM system\n ...\n scored_cands.append((tgt_cand, matching_score))\n ranking_results.append((src_ref_class, tgt_ref_class, scored_cands))\n# save the scored candidate mappings in the same format as the original `test.cands.tsv`\npd.DataFrame(ranking_results, columns=[\"SrcEntity\", \"TgtEntity\", \"TgtCandidates\"]).to_csv(\"scored.test.cands.tsv\", sep=\"\\t\", index=False)\n
Then, the ranking evaluation results can be obtained by:
from deeponto.align.oaei import *\n\n# If `has_score` is False, assume default ranking (see tips below)\nranking_eval(\"scored.test.cands.tsv\", has_score=True, Ks=[1, 5, 10])\n
Output:
{'MRR': 0.9586373098280843,\n 'Hits@1': 0.9371951219512196,\n 'Hits@5': 0.9820121951219513,\n 'Hits@10': 0.9878048780487805}\n
The associated formulas for MRR and Hits@K are:
\\[MRR = \\sum_i^N rank_i^{-1} / N, \\ \\ Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N\\]where \\(N\\) is the number of reference mappings used for testing, \\(rank_i\\) is the relative rank of the reference mapping among its candidate mappings.
Tip
If matching scores are not available, the target candidate classes should be sorted in descending order and saved in a list, the ranking_eval
function will compute scores according to the sorted list.
Below is a table showing the data statistics for the original Bio-ML used in OAEI 2022. In the Category column, \"Disease\" indicates that the data from Mondo mainly covers disease concepts, while \"Body\", \"Pharm\", and \"Neoplas\" denote semantic types of \"Body Part, Organ, or Organ Components\", \"Pharmacologic Substance\", and \"Neoplastic Process\" in UMLS, respectively.
Note that each subsumption matching task is constructed from an equivalence matching task subject to target ontology class deletion, therefore #TgtCls (subs)
differs from #TgtCls
.
Source Task Category #SrcCls #TgtCls #TgtCls(\\(\\sqsubseteq\\)) #Ref(\\(\\equiv\\)) #Ref(\\(\\sqsubseteq\\)) Mondo OMIM-ORDO Disease 9,642 8,838 8,735 3,721 103 Mondo NCIT-DOID Disease 6,835 8,448 5,113 4,686 3,339 UMLS SNOMED-FMA Body 24,182 64,726 59,567 7,256 5,506 UMLS SNOMED-NCIT Pharm 16,045 15,250 12,462 5,803 4,225 UMLS SNOMED-NCIT Neoplas 11,271 13,956 13,790 3,804 213
The datasets, which can be downloaded from Zenodo, include Mondo.zip
and UMLS.zip
for resources constructed from Mondo and UMLS, respectively. Each .zip
file contains three folders: raw_data
, equiv_match
, and subs_match
, corresponding to the raw source ontologies, data for equivalence matching, and data for subsumption matching, respectively. The detailed file structure is illustrated in the figure below.
"},{"location":"bio-ml/#oaei-bio-ml-2023","title":"OAEI Bio-ML 2023","text":"
For the OAEI 2023 version, we implemented several updates, including:
Locality Module Enrichment: In response to the loss of ontology context due to pruning, we used the locality module technique (access the code) to enrich pruned ontologies with logical modules that provide context for existing classes. To ensure the completeness of reference mappings, the new classes added are annotated as not used in alignment with the annotation property use_in_alignment
set to false
. While these supplemental classes can be used by OM systems as auxiliary information, they can be excluded from the alignment process. Even they are considered in the final output mappings, our evaluation will ensure that they are excluded in the metric computation (see Evaluation Framework).
Simplified Task Settings: For each of the five OM pairs, we simplified the task settings to the following:
{task_name}/refs_equiv/full.tsv
is used for global matching evaluation.{task_name}/refs_equiv/test.tsv
is used for global matching evaluation.{task_name}/refs_equiv/test.cands.tsv
for local ranking evaluation.Subsumption Matching:
{task_name}/refs_subs/test.cands.tsv
. Bio-LLM: A Special Sub-Track for Large Language Models: We introduced a unique sub-track for Large Language Model (LLM)-based OM systems. We extracted small but challenging subsets from the NCIT-DOID and SNOMED-FMA (Body) datasets for this purpose (refer to OAEI Bio-LLM 2023).
Below demonstrates the data statistics for the OAEI 2023 version of Bio-ML, where the input ontologies are enriched with locality modules compared to the pruned versions used in OAEI 2022. The augmented structural and logical contexts make these ontologies more similar to their original versions without any processing (available at raw_data
). The changes compared to the previous version (see Bio-ML OAEI 2022) are reflected in the +
numbers of ontology classes.
In the Category column, \"Disease\" indicates that the Mondo data are mainly about disease concepts, while \"Body\", \"Pharm\", and \"Neoplas\" denote semantic types of \"Body Part, Organ, or Organ Components\", \"Pharmacologic Substance\", and \"Neoplastic Process\" in UMLS, respectively.
Source Task Category #SrcCls #TgtCls #Ref(\\(\\equiv\\)) #Ref(\\(\\sqsubseteq\\)) Mondo OMIM-ORDO Disease 9,648 (+6) 9,275 (+437) 3,721 103 Mondo NCIT-DOID Disease 15,762 (+8,927) 8,465 (+17) 4,686 3,339 UMLS SNOMED-FMA Body 34,418 (+10,236) 88,955 (+24,229) 7,256 5,506 UMLS SNOMED-NCIT Pharm 29,500 (+13,455) 22,136 (+6,886) 5,803 4,225 UMLS SNOMED-NCIT Neoplas 22,971 (+11,700) 20,247 (+6291) 3,804 213
The file structure for the download datasets (from Zenodo) is also simplified this year to accommodate the changes. Detailed structure is presented in the following figure.
Remarks on this figure:
refs_equiv/full.tsv
in the unsupervised setting, and on refs_equiv/test.tsv
(with refs_equiv/train.tsv
set to null reference mappings) in the semi-supervised setting. Testing of the local ranking evaluation should be performed on refs_equiv/test.cands.tsv
for both settings.refs_equiv/test.cands.tsv
and the training mapping set refs_subs/train.tsv
is optional.test.cands.tsv
file in the Bio-LLM sub-track is different from the main Bio-LM track ones. See OAEI Bio-LLM 2023 for more information and how to evaluate on it.As Large Language Models (LLMs) are trending in the AI community, we formulate a special sub-track for evaluating LLM-based OM systems. However, evaluating LLMs with the current OM datasets can be time and resource intensive. To yield insightful results prior to full implementation, we leverage two challenging subsets extracted from the NCIT-DOID and the SNOMED-FMA (Body) equivalence matching datasets.
For each original dataset, we first randomly select 50 matched class pairs from ground truth mappings, but excluding pairs that can be aligned with direct string matching (i.e., having at least one shared label) to restrict the efficacy of conventional lexical matching. Next, with a fixed source ontology class, we further select 99 negative target ontology classes, thus forming a total of 100 candidate mappings (inclusive of the ground truth mapping). This selection is guided by the sub-word inverted index-based idf scores as in the BERTMap paper (see BERTMap tutorial for more details), which are capable of producing target ontology classes lexically akin to the fixed source class. We finally randomly choose 50 source classes that do not have a matched target class according to the ground truth mappings, and create 100 candidate mappings using the inverted index for each. Therefore, each subset comprises 50 source ontology classes with a match and 50 without. Each class is associated with 100 candidate mappings, culminating in a total extraction of 10,000, i.e., (50+50)*100, class pairs.
"},{"location":"bio-ml/#evaluation","title":"Evaluation","text":""},{"location":"bio-ml/#matching","title":"Matching","text":"From all the 10,000 class pairs in a given subset, the OM system is expected to predict the true mappings among them, which can be compared against the 50 available ground truth mappings using Precision, Recall, and F-score.
We use the same formulas in the main track evaluation framework to calculate Precision, Recall, and F-score. The prediction mappings \\(\\mathcal{M}_{pred}\\) are the class pairs an OM system predicts as true mappings, and the reference mappings \\(\\mathcal{M}_{ref}\\) refers to the 50 matched pairs.
\\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}, \\ \\ R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}, \\ \\ F_1 = \\frac{2 P R}{P + R}\\]"},{"location":"bio-ml/#ranking","title":"Ranking","text":"Given that each source class is associated with 100 candidate mappings, we can compute ranking-based metrics based on their scores. Specifically, we calculate:
\\(Hits@1\\) for the 50 matched source classes, counting a hit when the top-ranked candidate mapping is a ground truth mapping. The corresponding formula is:
\\[ Hits@K = \\sum_{(c, c') \\in \\mathcal{M}_{ref}} \\mathbb{I}_{rank_{c'} \\leq K} / |\\mathcal{M}_{ref}| \\]where \\(rank_{c'}\\) is the predicted relative rank of \\(c'\\) among its candidates, \\(\\mathbb{I}_{rank_{c'} \\leq K}\\) is a binary indicator function that outputs 1 if the rank is less than or equal to \\(K\\) and outputs 0 otherwise.
The \\(MRR\\) score is also computed for these matched source classes, summing the inverses of the ground truth mappings' relative ranks among candidate mappings. The corresponding formula is:
\\[ MRR = \\sum_{(c, c') \\in \\mathcal{M}_{ref}} rank_{c'}^{-1} / |\\mathcal{M}_{ref}| \\]For the 50 unmatched source classes, we compute the rejection rate (denoted as \\(RR\\)), counting a successful rejection when all the candidate mappings are predicted as false mappings. We assign each unmatched source class with a null class \\(c_{null}\\), which refers to any target class that does not have a match with the source class, and denote this set of unreferenced mappings as \\(\\mathcal{M}_{unref}\\).
\\[ RR = \\sum_{(c, c_{null}) \\in \\mathcal{M}_{unref}} \\prod_{d \\in \\mathcal{T}_c} (1 - \\mathbb{I}_{c \\equiv d}) / |\\mathcal{M}_{unref}| \\]where \\(\\mathcal{T}_c\\) is the set of target candidate classes for \\(c\\), and \\(\\mathbb{I}_{c \\equiv d}\\) is a binary indicator that outputs 0 if the OM system predicts a false mapping between \\(c\\) and \\(d\\), and outputs 1 otherwise. The product term in this equation returns 1 if all target candidate classes are predicted as unmatched, i.e., \\(\\forall d \\in \\mathcal{T}_c.\\mathbb{I}_{c \\equiv d}=0\\).
To summarise, the Bio-LLM sub-track provides two representative OM subsets and adopts a range of evaluation metrics to gain meaningful insights from this partial assessment, thus promoting robust and efficient development of LLM-based OM systems.
"},{"location":"bio-ml/#oaei-participation","title":"OAEI Participation","text":"To participate in the OAEI track, please visit the OAEI Bio-ML website for more information, especially on the instructions of system submission or direct result submission. In the following, we present the formats of result files we expect participants to submit.
"},{"location":"bio-ml/#result-submission-format","title":"Result Submission Format","text":"For the main Bio-ML track, we expect two result files for each setting:
(1) A prediction mapping file named match.result.tsv
in the same format as the reference mapping file (e.g., task_name/refs_equiv/full.tsv
).
Matching Result
Download an example of mapping file. The three columns, \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
refer to the source class IRI, the target class IRI, and the matching score.
(2) A scored or ranked candidate mapping file named rank.result.tsv
in the same format as the test candidate mapping file (e.g., task_name/refs_equiv/test.cands.tsv
).
Ranking Result
Download an example of raw (unscored) candidate mapping file and an example of scored candidate mapping file. The \"SrcEntity\"
and \"TgtEntity\"
columns refer to the source class IRI and the target class IRI involved in a reference mapping. The \"TgtCandidates\"
column stores a sequence of tgt_cand_iri
in the unscored file and a list of tuples (tgt_cand_iri, score)
in the scored file, which can be accessed by the built-in Python function eval
.
We also accept a result file without scores and in that case we assume the list of tgt_cand_iri
has been sorted in descending order.
Note that each OM pair is accompanied with an unsupervised and a semi-supervised setting and thus separate sets of result files should be submitted. Moreover, for subsumption matching, only the ranking result file in (2) is required.
For the Bio-LLM sub-track, we expect one result file (similar to (2) but requiring a list of triples) for the task:
(3) A scored or ranked (with answers) candidate mapping file named biollm.result.tsv
in the same format as the test candidate mapping file (i.e., task_name/test.cands.tsv
).
Bio-LLM Result
Download an example of bio-llm mapping file. The \"SrcEntity\"
and \"TgtEntity\"
columns refer to the source class IRI and the target class IRI involved in a reference mapping. The \"TgtCandidates\"
column stores a sequence of a list of triples (tgt_cand_iri, score, answer)
in the scored file, which can be accessed by the built-in Python function eval
. The additional answer
values are True
or False
indicating whether the OM system predicts (src_class_iri, tgt_cand_iri)
as a true mapping.
It is important to notice that the answer
values are necessary for the matching evaluation of P, R, F-score, and the computation of rejection rate, the score
values are used for ranking evaluation of MRR and Hits@1.
deeponto.complete
.check_consistency()
at deeponto.onto.Ontology
.deeponto.onto.OntologyVerbaliser
.deeponto.subs
to deeponto.complete
.deeponto.probe.ontolama
into deeponto.complete
....
"},{"location":"changelog/#v088-2023-october","title":"v0.8.8 (2023 October)","text":""},{"location":"changelog/#added_1","title":"Added","text":"deeponto.onto.OntologyVerbaliser
.\"struct\"
(Structural Reasoner) at deeponto.onto.OntologyReasoner
.load_reasoner()
method at deeponto.onto.OntologyReasoner
for convenience of changing the reasoner type and remove reload_reasoner()
method as it is a special case of load_reasoner()
.rdflib
into the dependencies for building graph-related features.deeponto.onto.taxonomy
for building the taxonomy over ontologies and potentially other structured data.read_table_mappings()
method at deeponto.align.mapping
from using dataframe.iterrows()
to dataframe.itertuples()
which is much more efficient.deeponto.utils.process_annotation_literal()
to False
.slf4j
to warn
to prevent tons of printing at ELK (issue (#13)[https://github.com/KRR-Oxford/DeepOnto/issues/13]).deeponto.align.oaei
.reasoner_type
argument at deeponto.onto.OntologyReasoner
, now supporting hermit
(default) and elk
.get_all_axioms()
method at deeponto.onto.Ontology
. Add get_iri()
method at deeponto.onto.Ontology
.
Add new features into deeponto.onto.OntologyVerbaliser
including:
verbalise_object_property_subsumption()
for object property subsumption axioms.
verbalise_class_expression()
.verbalise_class_subsumption()
for class subsumption axioms;verbalise_class_equivalence()
for class equivalence axioms;verbalise_class_assertion()
for class assertion axioms;verbalise_relation_assertion()
for relation assertion axioms;auto-correction
option for fixing entity names.keep_iri
option for keeping entity IRIs.add_quantifier_word
option for adding quantifier words as in the Manchester syntax.
Add get_assertion_axioms()
method at deeponto.onto.Ontology
.
get_axiom_type()
method at deeponto.onto.Ontology
.owl_individuals
attribute at deeponto.onto.Ontology
.get_owl_objects()
method to be anonymous as it is only used for creating pre-processed entity index at deeponto.onto.Ontology
.get_owl_object_from_iri()
method to get_owl_object()
at deeponto.onto.Ontology
.ERROR
.set_seed()
method at deeponto.utils
..verbalise_class_expression()
method by adding an option to keep entity IRIs without verbalising them using .vocabs
at deeponto.onto.OntologyVerbaliser
.apply_lowercasing
value to False
for both .get_annotations()
and .build_annotation_index()
methods at deeponto.onto.Ontology
..get_owl_object_annotations()
to .get_annotations()
at deeponto.onto.Ontology
.use_in_alignment
annotation in BERTMap for the OAEI.deeponto.align.oaei
.read_table_mappings
method to allow None
for threshold.deeponto.onto.OntologyPruner
.f1
and MRR
method in deeponto.align.evaluation.AlignmentEvaluator
.deeponto.onto.OntologyNormaliser
.deeponto.onto.OntologyProjector
.transformers
to transformers[torch]
.lib
from mowl to direct import.get_owl_object_annotations
by adding uniqify
at the end to preserve the order.deeponto.subs.bertsubs
; its inter-ontology setting is also imported at deeponto.align.bertsubs
.deeponto.onto.OntologyPruner
as a separate module.deeponto.onto.Ontology
; if started already, skip this step.get_owl_object_annotations
at deeponto.onto.Ontology
by preserving the relative order of annotation retrieval, i.e., create set
first and use the .add()
function instead of casting the list
into set
in the end.check_deprecated
at deeponto.onto.Ontology
by adding a check for the \\(\\texttt{owl:deprecated}\\) annotation property -- if this property does not exist in the current ontology, return False
(not deprecated).remove_axiom
for removing an axiom from the ontology at deeponto.onto.Ontology
(note that the counterpart add_axiom
has already been available).check_named_entity
for checking if an entity is named at deeponto.onto.Ontology
.get_subsumption_axioms
for getting subsumption axioms subject to different entity types at deeponto.onto.Ontology
.get_asserted_complex_classes
for getting all complex classes that occur in ontology (subsumption and/or equivalence) axioms at deeponto.onto.Ontology
.get_asserted_parents
and get_asserted_children
for getting asserted parent and children for a given entity at deeponto.onto.Ontology
.check_deprecation
for checking an owl object's deprecation (annotated) at deeponto.onto.Ontology
.en_core_web_sm
download into the initialisation of OntologyVerbaliser
.deeponto.onto.Ontology
.deeponto.onto.OntologyReasoner
:super_entities_of
\\(\\rightarrow\\) get_inferred_super_entities
sub_entities_of
\\(\\rightarrow\\) get_inferred_sub_entities
deeponto.onto.Ontology
.deeponto.lama
.deeponto.onto.verbalisation
.deeponto.onto.verbalisation
.src/
layout.The code before v0.5.0 is no longer available.
"},{"location":"faqs/","title":"FAQs","text":"Q1: System compatibility?
Q2: Encountering issues with the JPype installation?
Q3: Missing system-level dependencies on Linux?
g++
and python-dev
need to be installed.paper
Paper for OntoLAMA: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).
@inproceedings{he-etal-2023-language,\n title = \"Language Model Analysis for Ontology Subsumption Inference\",\n author = \"He, Yuan and\n Chen, Jiaoyan and\n Jimenez-Ruiz, Ernesto and\n Dong, Hang and\n Horrocks, Ian\",\n booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n month = jul,\n year = \"2023\",\n address = \"Toronto, Canada\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2023.findings-acl.213\",\n doi = \"10.18653/v1/2023.findings-acl.213\",\n pages = \"3439--3453\"\n}\n
This page provides an overview of the \\(\\textsf{OntoLAMA}\\) datasets, how to use them, and the related probing approach introduced in the research paper.
"},{"location":"ontolama/#overview","title":"Overview","text":"\\(\\textsf{OntoLAMA}\\) is a set of language model (LM) probing datasets and a prompt-based probing method for ontology subsumption inference or ontology completion. The work follows the \"LMs-as-KBs\" literature but focuses on conceptualised knowledge extracted from formalised KBs such as the OWL ontologies. Specifically, the subsumption inference (SI) task is introduced and formulated in the Natural Language Inference (NLI) style, where the sub-concept and the super-concept involved in a subsumption axiom are verbalised and fitted into a template to form the premise and hypothesis, respectively. The sampled axioms are verified through ontology reasoning. The SI task is further divided into Atomic SI and Complex SI where the former involves only atomic named concepts and the latter involves both atomic and complex concepts. Real-world ontologies of different scales and domains are used for constructing OntoLAMA and in total there are four Atomic SI datasets and two Complex SI datasets.
"},{"location":"ontolama/#useful-links","title":"Useful Links","text":"Source #NamedConcepts #EquivAxioms #Dataset (Train/Dev/Test) Schema.org 894 - Atomic SI: 808/404/2,830 DOID 11,157 - Atomic SI: 90,500/11,312/11,314 FoodOn 30,995 2,383 Atomic SI: 768,486/96,060/96,062 Complex SI: 3,754/1,850/13,080 GO 43,303 11,456 Atomic SI: 772,870/96,608/96,610 Complex SI: 72,318/9,040/9,040 MNLI - - biMNLI: 235,622/26,180/12,906
"},{"location":"ontolama/#usage","title":"Usage","text":"Users have two options for accessing the OntoLAMA datasets. They can either download the datasets directly from Zenodo or use the Huggingface Datasets platform.
If using Huggingface, users should first install the dataset
package:
pip install datasets\n
Then, a dataset can be accessed by:
from datasets import load_dataset\n# dataset = load_dataset(\"krr-oxford/OntoLAMA\", dataset_name)\n# for example, loading the Complex SI dataset of Go\ndataset = load_dataset(\"krr-oxford/OntoLAMA\", \"go-complex-SI\") \n
Options of dataset_name
include:
\"bimnli\"
(from MNLI)\"schemaorg-atomic-SI\"
(from Schema.org)\"doid-atomic-SI\"
(from DOID)\"foodon-atomic-SI\"
, \"foodon-complex-SI\"
(from FoodOn)\"go-atomic-SI\"
, \"go-complex-SI\"
(from Go)After loading the dataset, a particular data split can be accessed by:
dataset[split_name] # split_name = \"train\", \"validation\", or \"test\"\n
Please refer to the Huggingface page for examples of data points and explanations of data fields.
If downloading from Zenodo, users can simply target on specific .jsonl
files.
\\(\\textsf{OntoLAMA}\\) adopts the prompt-based probing approach to examine an LM's knowledge. Specifically, it wraps the verbalised sub-concept and super-concept into a template with a masked position; the LM is expected to predict the masked token and determine whether there exists a subsumption relationship between the two concepts.
The verbalisation algorithm has been implemented as a separate ontology processing module, see verbalise ontology concepts.
To conduct probing, users can write the following code into a script, e.g., probing.py
:
from openprompt.config import get_config\nfrom deeponto.complete.ontolama import run_inference\n\nconfig, args = get_config()\n# you can then manipulate the configuration before running the inference\nconfig.learning_setting = \"few_shot\" # zero_shot, full\nconfig.manual_template.choice = 0 # using the first template in the template file\n...\n\n# run the subsumption inference\nrun_inference(config, args)\n
Then, run the script with the following command:
python probing.py --config_yaml config.yaml\n
See an example of config.yaml
at DeepOnto/scripts/ontolama/config.yaml
The template file for the SI task (two templates) is located in DeepOnto/scripts/ontolama/si_templates.txt
.
The template file for the biMNLI task (two templates) is located in DeepOnto/scripts/ontolama/nli_templates.txt
.
The label word file for both SI and biMNLI tasks is located in DeepOnto/scripts/ontolama/label_words.jsonl
.
\\(\\textsf{DeepOnto}\\) extends from the OWLAPI and implements many useful methods for ontology processing and reasoning, integrated in the base class Ontology
.
This page gives typical examples of how to use Ontology
. There are other more specific usages, please refer to the documentation by clicking Ontology
.
Ontology
can be easily loaded from a local ontology file by its path:
from deeponto.onto import Ontology\n
Importing Ontology
will require JVM memory allocation (defaults to 8g
; if nohup
is used to run the program in the backend, use nohup echo \"8g\" | python command
):
Please enter the maximum memory located to JVM: [8g]: 16g\n\n16g maximum memory allocated to JVM.\nJVM started successfully.\n
Loading an ontology from a local file:
onto = Ontology(\"path_to_ontology.owl\")\n
It also possible to choose a reasoner to be used:
onto = Ontology(\"path_to_ontology.owl\", \"hermit\")\n
Tip
For faster (but incomplete) reasoning over larger ontologies, choose a reasoner like \"elk\"
.
The most fundamental feature of Ontology
is to access entities in the ontology such as classes (or concepts) and properties (object, data, and annotation properties). To get an entity by its IRI, do the following:
from deeponto.onto import Ontology\n# e.g., load the disease ontology\ndoid = Ontology(\"doid.owl\")\n# class or property IRI as input\ndoid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\")\n
To get the asserted parents or children of a given class or property, do the following:
doid.get_asserted_parents(doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"))\ndoid.get_asserted_children(doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"))\n
To obtain the literal values (as Set[str]
) of an annotation property (such as \\(\\texttt{rdfs:label}\\)) for an entity:
# note that annotations with no language tags are deemed as in English (\"en\")\ndoid.get_annotations(\n doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"),\n annotation_property_iri='http://www.w3.org/2000/01/rdf-schema#label',\n annotation_language_tag=None,\n apply_lowercasing=False,\n normalise_identifiers=False\n)\n
Output:
{'carotenemia'}\n
To get the special entities related to top (\\(\\top\\)) and bottom (\\(\\bot\\)), for example, to get \\(\\texttt{owl:Thing}\\):
doid.OWLThing\n
"},{"location":"ontology/#ontology-reasoning","title":"Ontology Reasoning","text":"Ontology
has an important attribute .reasoner
for conducting reasoning activities. Currently, two types of reasoners are supported, i.e., HermitT and ELK.
To get the super-entities (a super-class, or a super-propety) of an entity, do the following:
doid_class = doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\")\ndoid.reasoner.get_inferred_super_entities(doid_class, direct=False) \n
Output:
['http://purl.obolibrary.org/obo/DOID_0014667',\n'http://purl.obolibrary.org/obo/DOID_0060158',\n'http://purl.obolibrary.org/obo/DOID_4']\n
The outputs are IRIs of the corresponding super-entities. direct
is a boolean value indicating whether the returned entities are parents (direct=True
) or ancestors (direct=False
).
To get the sub-entities, simply replace the method name with sub_entities_of
.
To retrieve the entailed instances of a class:
doid.reasoner.instances_of(doid_class)\n
"},{"location":"ontology/#checking-entailment","title":"Checking Entailment","text":"The implemented reasoner also supports several entailment checks for subsumption, disjointness, and so on. For example:
doid.reasoner.check_subsumption(doid_potential_sub_entity, doid_potential_super_entity)\n
"},{"location":"ontology/#feature-requests","title":"Feature Requests","text":"Should you have any feature requests (such as those commonly used in the OWLAPI), please raise a ticket in the \\(\\textsf{DeepOnto}\\) GitHub repository.
"},{"location":"verbaliser/","title":"Verbalise Ontology Concepts","text":"Verbalising concept expressions is very useful for models that take textual inputs. While the named concepts can be verbalised simply using their names (or labels), complex concepts that involve logical operators require a more sophisticated algorithm. In \\(\\textsf{DeepOnto}\\), we have implemented the recursive concept verbaliser originally proposed in the OntoLAMA paper to address the need.
Paper
The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).
@inproceedings{he-etal-2023-language,\n title = \"Language Model Analysis for Ontology Subsumption Inference\",\n author = \"He, Yuan and\n Chen, Jiaoyan and\n Jimenez-Ruiz, Ernesto and\n Dong, Hang and\n Horrocks, Ian\",\n booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n month = jul,\n year = \"2023\",\n address = \"Toronto, Canada\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2023.findings-acl.213\",\n doi = \"10.18653/v1/2023.findings-acl.213\",\n pages = \"3439--3453\"\n}\n
This rule-based verbaliser (found in OntologyVerbaliser
) first parses a complex concept expression into a sub-formula tree (with OntologySyntaxParser
). Each intermediate node within the tree represents the decomposition of a specific logical operator, while the leaf nodes are named concepts or properties. The verbaliser then recursively merges the verbalisations in a bottom-to-top manner, creating the overall textual representation of the complex concept. An example of this process is shown in the following figure:
Figure 1. Verbalising a complex concept recursively.
To use the verbaliser, do the following:
from deeponto.onto import Ontology, OntologyVerbaliser\n\n# load an ontology and init the verbaliser\nonto = Ontology(\"some_ontology_file.owl\")\nverbaliser = OntologyVerbaliser(onto)\n
To verbalise a complex concept expression:
# get complex concepts asserted in the ontology\ncomplex_concepts = list(onto.get_asserted_complex_classes())\n\n# verbalise the first complex concept\nv_concept = verbaliser.verbalise_class_expression(complex_concepts[0])\n
To verbaliser a class subsumption axiom:
# get subsumption axioms from the ontology\nsubsumption_axioms = onto.get_subsumption_axioms(entity_type=\"Classes\")\n\n# verbalise the first subsumption axiom\nv_sub, v_super = verbaliser.verbalise_class_subsumption_axiom(subsumption_axioms[0])\n
Tip
The concept verbaliser is under development to incorporate the parsing of various axiom types. Please check the existing functions of OntologyVerbaliser
for specific usage.
Notice that the verbalised result is a CfgNode
object which keeps track of the recursive process. Users can access the final verbalisation by:
result.verbal\n
Users can also manually update the vocabulary for named entities by:
verbaliser.update_entity_name(entity_iri, entity_name)\n
This is useful when the entity labels are not naturally fitted into the verbalised sentence.
Moreover, users can see the parsed sub-formula tree using:
tree = verbaliser.parser.parse(str(subsumption_axioms[0]))\ntree.render_image()\n
Note that rendering the image requires graphiviz
to be installed. Check this link for installing graphiviz
.
See an example with image at OntologySyntaxParser
.
AlignmentEvaluator()
","text":"Class that provides evaluation metrics for alignment.
Source code insrc/deeponto/align/evaluation.py
def __init__(self):\n pass\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.precision","title":"precision(prediction_mappings, reference_mappings)
staticmethod
","text":"The percentage of correct predictions.
\\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}\\] Source code insrc/deeponto/align/evaluation.py
@staticmethod\ndef precision(prediction_mappings: List[EntityMapping], reference_mappings: List[ReferenceMapping]) -> float:\nr\"\"\"The percentage of correct predictions.\n\n $$P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}$$\n \"\"\"\n preds = [p.to_tuple() for p in prediction_mappings]\n refs = [r.to_tuple() for r in reference_mappings]\n return len(set(preds).intersection(set(refs))) / len(set(preds))\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.recall","title":"recall(prediction_mappings, reference_mappings)
staticmethod
","text":"The percentage of correct retrievals.
\\[R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}\\] Source code insrc/deeponto/align/evaluation.py
@staticmethod\ndef recall(prediction_mappings: List[EntityMapping], reference_mappings: List[ReferenceMapping]) -> float:\nr\"\"\"The percentage of correct retrievals.\n\n $$R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}$$\n \"\"\"\n preds = [p.to_tuple() for p in prediction_mappings]\n refs = [r.to_tuple() for r in reference_mappings]\n return len(set(preds).intersection(set(refs))) / len(set(refs))\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.f1","title":"f1(prediction_mappings, reference_mappings, null_reference_mappings=[])
staticmethod
","text":"Compute the F1 score given the prediction and reference mappings.
\\[F_1 = \\frac{2 P R}{P + R}\\]null_reference_mappings
is an additional set whose elements should be ignored in the calculation, i.e., neither positive nor negative. Specifically, both \\(\\mathcal{M}_{pred}\\) and \\(\\mathcal{M}_{ref}\\) will substract \\(\\mathcal{M}_{null}\\) from them.
src/deeponto/align/evaluation.py
@staticmethod\ndef f1(\n prediction_mappings: List[EntityMapping],\n reference_mappings: List[ReferenceMapping],\n null_reference_mappings: List[ReferenceMapping] = [],\n):\nr\"\"\"Compute the F1 score given the prediction and reference mappings.\n\n $$F_1 = \\frac{2 P R}{P + R}$$\n\n `null_reference_mappings` is an additional set whose elements\n should be **ignored** in the calculation, i.e., **neither positive nor negative**.\n Specifically, both $\\mathcal{M}_{pred}$ and $\\mathcal{M}_{ref}$ will **substract**\n $\\mathcal{M}_{null}$ from them.\n \"\"\"\n preds = [p.to_tuple() for p in prediction_mappings]\n refs = [r.to_tuple() for r in reference_mappings]\n null_refs = [n.to_tuple() for n in null_reference_mappings]\n # elements in the {null_set} are removed from both {pred} and {ref} (ignored)\n if null_refs:\n preds = set(preds) - set(null_refs)\n refs = set(refs) - set(null_refs)\n P = len(set(preds).intersection(set(refs))) / len(set(preds))\n R = len(set(preds).intersection(set(refs))) / len(set(refs))\n F1 = 2 * P * R / (P + R)\n\n return {\"P\": round(P, 3), \"R\": round(R, 3), \"F1\": round(F1, 3)}\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.hits_at_K","title":"hits_at_K(reference_and_candidates, K)
staticmethod
","text":"Compute \\(Hits@K\\) for a list of (reference_mapping, candidate_mappings)
pair.
It is computed as the number of a reference_mapping
existed in the first \\(K\\) ranked candidate_mappings
, divided by the total number of input pairs.
src/deeponto/align/evaluation.py
@staticmethod\ndef hits_at_K(reference_and_candidates: List[Tuple[ReferenceMapping, List[EntityMapping]]], K: int):\nr\"\"\"Compute $Hits@K$ for a list of `(reference_mapping, candidate_mappings)` pair.\n\n It is computed as the number of a `reference_mapping` existed in the first $K$ ranked `candidate_mappings`,\n divided by the total number of input pairs.\n\n $$Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N$$\n \"\"\"\n n_hits = 0\n for pred, cands in reference_and_candidates:\n ordered_candidates = [c.to_tuple() for c in EntityMapping.sort_entity_mappings_by_score(cands, k=K)]\n if pred.to_tuple() in ordered_candidates:\n n_hits += 1\n return n_hits / len(reference_and_candidates)\n
"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank","title":"mean_reciprocal_rank(reference_and_candidates)
staticmethod
","text":"Compute \\(MRR\\) for a list of (reference_mapping, candidate_mappings)
pair.
src/deeponto/align/evaluation.py
@staticmethod\ndef mean_reciprocal_rank(reference_and_candidates: List[Tuple[ReferenceMapping, List[EntityMapping]]]):\nr\"\"\"Compute $MRR$ for a list of `(reference_mapping, candidate_mappings)` pair.\n\n $$MRR = \\sum_i^N rank_i^{-1} / N$$\n \"\"\"\n sum_inverted_ranks = 0\n for pred, cands in reference_and_candidates:\n ordered_candidates = [c.to_tuple() for c in EntityMapping.sort_entity_mappings_by_score(cands)]\n if pred.to_tuple() in ordered_candidates:\n rank = ordered_candidates.index(pred.to_tuple()) + 1\n else:\n rank = math.inf\n sum_inverted_ranks += 1 / rank\n return sum_inverted_ranks / len(reference_and_candidates)\n
"},{"location":"deeponto/align/mapping/","title":"Mapping","text":""},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping","title":"EntityMapping(src_entity_iri, tgt_entity_iri, relation=DEFAULT_REL, score=0.0)
","text":"A datastructure for entity mapping.
Such entities should be named and have an IRI.
Attributes:
Name Type Descriptionsrc_entity_iri
str
The IRI of the source entity, usually its IRI if available.
tgt_entity_iri
str
The IRI of the target entity, usually its IRI if available.
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
score
float
The score that indicates the confidence of this mapping. Defaults to 0.0
.
Parameters:
Name Type Description Defaultsrc_entity_iri
str
The IRI of the source entity, usually its IRI if available.
requiredtgt_entity_iri
str
The IRI of the target entity, usually its IRI if available.
requiredrelation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
score
float
The score that indicates the confidence of this mapping. Defaults to 0.0
.
0.0
Source code in src/deeponto/align/mapping.py
def __init__(self, src_entity_iri: str, tgt_entity_iri: str, relation: str = DEFAULT_REL, score: float = 0.0):\n\"\"\"Intialise an entity mapping.\n\n Args:\n src_entity_iri (str): The IRI of the source entity, usually its IRI if available.\n tgt_entity_iri (str): The IRI of the target entity, usually its IRI if available.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n score (float, optional): The score that indicates the confidence of this mapping. Defaults to `0.0`.\n \"\"\"\n self.head = src_entity_iri\n self.tail = tgt_entity_iri\n self.relation = relation\n self.score = score\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.from_owl_objects","title":"from_owl_objects(src_entity, tgt_entity, relation=DEFAULT_REL, score=0.0)
classmethod
","text":"Create an entity mapping from two OWLObject
entities which have an IRI.
Parameters:
Name Type Description Defaultsrc_entity
OWLObject
The source entity in OWLObject
.
tgt_entity
OWLObject
The target entity in OWLObject
.
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
score
float
The score that indicates the confidence of this mapping. Defaults to 0.0
.
0.0
Returns:
Type DescriptionEntityMapping
The entity mapping created from the source and target entities.
Source code insrc/deeponto/align/mapping.py
@classmethod\ndef from_owl_objects(\n cls, src_entity: OWLObject, tgt_entity: OWLObject, relation: str = DEFAULT_REL, score: float = 0.0\n):\n\"\"\"Create an entity mapping from two `OWLObject` entities which have an IRI.\n\n Args:\n src_entity (OWLObject): The source entity in `OWLObject`.\n tgt_entity (OWLObject): The target entity in `OWLObject`.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n score (float, optional): The score that indicates the confidence of this mapping. Defaults to `0.0`.\n Returns:\n (EntityMapping): The entity mapping created from the source and target entities.\n \"\"\"\n return cls(str(src_entity.getIRI()), str(tgt_entity.getIRI()), relation, score)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.to_tuple","title":"to_tuple(with_score=False)
","text":"Transform an entity mapping (self
) to a tuple representation
Note that relation
is discarded and score
is optionally preserved).
src/deeponto/align/mapping.py
def to_tuple(self, with_score: bool = False):\n\"\"\"Transform an entity mapping (`self`) to a tuple representation\n\n Note that `relation` is discarded and `score` is optionally preserved).\n \"\"\"\n if with_score:\n return (self.head, self.tail, self.score)\n else:\n return (self.head, self.tail)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.as_tuples","title":"as_tuples(entity_mappings, with_score=False)
staticmethod
","text":"Transform a list of entity mappings to their tuple representations.
Note that relation
is discarded and score
is optionally preserved).
src/deeponto/align/mapping.py
@staticmethod\ndef as_tuples(entity_mappings: List[EntityMapping], with_score: bool = False):\n\"\"\"Transform a list of entity mappings to their tuple representations.\n\n Note that `relation` is discarded and `score` is optionally preserved).\n \"\"\"\n return [m.to_tuple(with_score=with_score) for m in entity_mappings]\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.sort_entity_mappings_by_score","title":"sort_entity_mappings_by_score(entity_mappings, k=None)
staticmethod
","text":"Sort the entity mappings in a list by their scores in descending order.
Parameters:
Name Type Description Defaultentity_mappings
List[EntityMapping]
A list entity mappings to sort.
requiredk
int
The number of top \\(k\\) scored entities preserved if specified. Defaults to None
which means to return all entity mappings.
None
Returns:
Type DescriptionList[EntityMapping]
A list of sorted entity mappings.
Source code insrc/deeponto/align/mapping.py
@staticmethod\ndef sort_entity_mappings_by_score(entity_mappings: List[EntityMapping], k: Optional[int] = None):\nr\"\"\"Sort the entity mappings in a list by their scores in descending order.\n\n Args:\n entity_mappings (List[EntityMapping]): A list entity mappings to sort.\n k (int, optional): The number of top $k$ scored entities preserved if specified. Defaults to `None` which\n means to return **all** entity mappings.\n\n Returns:\n (List[EntityMapping]): A list of sorted entity mappings.\n \"\"\"\n return list(sorted(entity_mappings, key=lambda x: x.score, reverse=True))[:k]\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.read_table_mappings","title":"read_table_mappings(table_of_mappings_file, threshold=None, relation=DEFAULT_REL, is_reference=False)
staticmethod
","text":"Read entity mappings from .csv
or .tsv
files.
Mapping Table Format
The columns of the mapping table must have the headings: \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
.
Parameters:
Name Type Description Defaulttable_of_mappings_file
str
The path to the table (.csv
or .tsv
) of mappings.
threshold
Optional[float]
Mappings with scores less than threshold
will not be loaded. Defaults to 0.0.
None
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
is_reference
bool
Whether the loaded mappings are reference mappigns; if so, threshold
is disabled and mapping scores are all set to \\(1.0\\). Defaults to False
.
False
Returns:
Type DescriptionList[EntityMapping]
A list of entity mappings loaded from the table file.
Source code insrc/deeponto/align/mapping.py
@staticmethod\ndef read_table_mappings(\n table_of_mappings_file: str,\n threshold: Optional[float] = None,\n relation: str = DEFAULT_REL,\n is_reference: bool = False,\n) -> List[EntityMapping]:\nr\"\"\"Read entity mappings from `.csv` or `.tsv` files.\n\n !!! note \"Mapping Table Format\"\n\n The columns of the mapping table must have the headings: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`.\n\n Args:\n table_of_mappings_file (str): The path to the table (`.csv` or `.tsv`) of mappings.\n threshold (Optional[float], optional): Mappings with scores less than `threshold` will not be loaded. Defaults to 0.0.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n is_reference (bool): Whether the loaded mappings are reference mappigns; if so, `threshold` is disabled and mapping scores\n are all set to $1.0$. Defaults to `False`.\n\n Returns:\n (List[EntityMapping]): A list of entity mappings loaded from the table file.\n \"\"\"\n df = read_table(table_of_mappings_file)\n entity_mappings = []\n for dp in df.itertuples():\n if is_reference:\n entity_mappings.append(ReferenceMapping(dp.SrcEntity, dp.TgtEntity, relation))\n else:\n # allow `None` for threshold\n if not threshold or dp[\"Score\"] >= threshold:\n entity_mappings.append(EntityMapping(dp.SrcEntity, dp.TgtEntity, relation, dp.Score))\n return entity_mappings\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping","title":"ReferenceMapping(src_entity_iri, tgt_entity_iri, relation=DEFAULT_REL, candidate_mappings=[])
","text":" Bases: EntityMapping
A datastructure for entity mapping that acts as a reference mapping.
A reference mapppings is a ground truth entity mapping (with \\(score = 1.0\\)) and can have several entity mappings as candidates. These candidate mappings should have the same head
(i.e., source entity) as the reference mapping.
Attributes:
Name Type Descriptionsrc_entity_iri
str
The IRI of the source entity, usually its IRI if available.
tgt_entity_iri
str
The IRI of the target entity, usually its IRI if available.
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
Parameters:
Name Type Description Defaultsrc_entity_iri
str
The IRI of the source entity, usually its IRI if available.
requiredtgt_entity_iri
str
The IRI of the target entity, usually its IRI if available.
requiredrelation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
candidate_mappings
List[EntityMapping]
A list of entity mappings that are candidates for this reference mapping. Defaults to []
.
[]
Source code in src/deeponto/align/mapping.py
def __init__(\n self,\n src_entity_iri: str,\n tgt_entity_iri: str,\n relation: str = DEFAULT_REL,\n candidate_mappings: Optional[List[EntityMapping]] = [],\n):\nr\"\"\"Intialise a reference mapping.\n\n Args:\n src_entity_iri (str): The IRI of the source entity, usually its IRI if available.\n tgt_entity_iri (str): The IRI of the target entity, usually its IRI if available.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n candidate_mappings (List[EntityMapping], optional): A list of entity mappings that are candidates for this reference mapping. Defaults to `[]`.\n \"\"\"\n super().__init__(src_entity_iri, tgt_entity_iri, relation, 1.0)\n self.candidates = []\n for candidate in candidate_mappings:\n self.add_candidate(candidate)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping.add_candidate","title":"add_candidate(candidate_mapping)
","text":"Add a candidate mapping whose relation and head entity are the same as the reference mapping's.
Source code insrc/deeponto/align/mapping.py
def add_candidate(self, candidate_mapping: EntityMapping):\n\"\"\"Add a candidate mapping whose relation and head entity are the\n same as the reference mapping's.\n \"\"\"\n if self.relation != candidate_mapping.relation:\n raise ValueError(\n f\"Expect relation of candidate mapping to be {self.relation} but got {candidate_mapping.relation}\"\n )\n if self.head != candidate_mapping.head:\n raise ValueError(\"Candidate mapping does not have the same head entity as the anchor mapping.\")\n self.candidates.append(candidate_mapping)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping.read_table_mappings","title":"read_table_mappings(table_of_mappings_file, relation=DEFAULT_REL)
staticmethod
","text":"Read reference mappings from .csv
or .tsv
files.
Mapping Table Format
The columns of the mapping table must have the headings: \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
.
Parameters:
Name Type Description Defaulttable_of_mappings_file
str
The path to the table (.csv
or .tsv
) of mappings.
relation
str
A symbol that represents what semantic relation this mapping stands for. Defaults to <?rel>
which means unspecified. Suggested inputs are \"<EquivalentTo>\"
and \"<SubsumedBy>\"
.
DEFAULT_REL
Returns:
Type DescriptionList[ReferenceMapping]
A list of reference mappings loaded from the table file.
Source code insrc/deeponto/align/mapping.py
@staticmethod\ndef read_table_mappings(table_of_mappings_file: str, relation: str = DEFAULT_REL):\nr\"\"\"Read reference mappings from `.csv` or `.tsv` files.\n\n !!! note \"Mapping Table Format\"\n\n The columns of the mapping table must have the headings: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`.\n\n Args:\n table_of_mappings_file (str): The path to the table (`.csv` or `.tsv`) of mappings.\n relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `<?rel>` which means unspecified.\n Suggested inputs are `\"<EquivalentTo>\"` and `\"<SubsumedBy>\"`.\n\n Returns:\n (List[ReferenceMapping]): A list of reference mappings loaded from the table file.\n \"\"\"\n return EntityMapping.read_table_mappings(table_of_mappings_file, relation=relation, is_reference=True)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator","title":"SubsFromEquivMappingGenerator(src_onto, tgt_onto, equiv_mappings, subs_generation_ratio=None, delete_used_equiv_tgt_class=True)
","text":"Generating subsumption mappings from gold standard equivalence mappings.
paper
The online subsumption mapping construction algorithm is proposed in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).
This generator has an attribute delete_used_equiv_tgt_class
for determining whether or not to sabotage the equivalence mappings used to create \\(\\geq 1\\) subsumption mappings. The reason is that, if the equivalence mapping is broken, then the OM tool is expected to predict subsumption mappings directly without relying on the equivalence mappings as an intermediate.
Attributes:
Name Type Descriptionsrc_onto
Ontology
The source ontology.
tgt_onto
Ontology
The target ontology.
equiv_class_pairs
List[Tuple[str, str]]
A list of class pairs (in IRIs) that are equivalent according to the input equivalence mappings.
subs_generation_ratio
int
The maximum number of subsumption mappings generated from each equivalence mapping. Defaults to None
which means there is no limit on the number of subsumption mappings.
delete_used_equiv_tgt_class
bool
Whether to mark the target side of an equivalence mapping used for creating at least one subsumption mappings as \"deleted\". Defaults to True
.
src/deeponto/align/mapping.py
def __init__(\n self,\n src_onto: Ontology,\n tgt_onto: Ontology,\n equiv_mappings: List[ReferenceMapping],\n subs_generation_ratio: Optional[int] = None,\n delete_used_equiv_tgt_class: bool = True,\n):\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n self.equiv_class_pairs = [m.to_tuple() for m in equiv_mappings]\n self.subs_generation_ratio = subs_generation_ratio\n self.delete_used_equiv_tgt_class = delete_used_equiv_tgt_class\n\n subs_from_equivs, self.used_equiv_tgt_class_iris = self.online_construction()\n # turn into triples with scores 1.0\n self.subs_from_equivs = [(c, p, 1.0) for c, p in subs_from_equivs]\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator.online_construction","title":"online_construction()
","text":"An online algorithm for constructing subsumption mappings from gold standard equivalence mappings.
Let \\(t\\) denote the boolean value that indicates if the target class involved in an equivalence mapping will be deleted. If \\(t\\) is true, then for each equivalent class pair \\((c, c')\\), do the following:
Steps 1 and 2 ensure that target classes that have been involved in a subsumption mapping have no conflicts with target classes that have been used to create a subsumption mapping.
This algorithm is online because the construction and deletion depend on the order of the input equivalent class pairs.
Source code insrc/deeponto/align/mapping.py
def online_construction(self):\nr\"\"\"An **online** algorithm for constructing subsumption mappings from gold standard equivalence mappings.\n\n Let $t$ denote the boolean value that indicates if the target class involved in an equivalence mapping\n will be deleted. If $t$ is true, then for each equivalent class pair $(c, c')$, do the following:\n\n 1. If $c'$ has been inolved in a subsumption mapping, skip this pair as otherwise $c'$ will need to be deleted.\n 2. For each parent class of $c'$, skip it if it has been marked deleted (i.e., involved in an equivalence mapping that has been used to create a subsumption mapping).\n 3. If any subsumption mapping has been created from $(c, c')$, mark $c'$ as deleted.\n\n Steps 1 and 2 ensure that target classes that have been **involved in a subsumption mapping** have **no conflicts** with\n target classes that have been **used to create a subsumption mapping**.\n\n This algorithm is *online* because the construction and deletion depend on the order of the input equivalent class pairs.\n \"\"\"\n subs_class_pairs = []\n in_subs = defaultdict(lambda: False) # in a subsumption mapping\n used_equivs = defaultdict(lambda: False) # in a used equivalence mapping\n\n for src_class_iri, tgt_class_iri in self.equiv_class_pairs:\n\n cur_subs_pairs = []\n\n # NOTE (1) an equiv pair is skipped if the target side is marked constructed\n if self.delete_used_equiv_tgt_class and in_subs[tgt_class_iri]:\n continue\n\n # construct subsumption pairs by matching the source class and the target class's parents\n tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n # tgt_class_parent_iris = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class, direct=True)\n tgt_class_parent_iris = [str(p.getIRI()) for p in self.tgt_onto.get_asserted_parents(tgt_class, named_only=True)]\n for parent_iri in tgt_class_parent_iris:\n # skip this parent if it is marked as \"used\"\n if self.delete_used_equiv_tgt_class and used_equivs[parent_iri]:\n continue\n cur_subs_pairs.append((src_class_iri, parent_iri))\n # if successfully created, mark this parent as \"in\"\n if self.delete_used_equiv_tgt_class:\n in_subs[parent_iri] = True\n\n # mark the target class as \"used\" because it has been used for creating a subsumption mapping\n if self.delete_used_equiv_tgt_class and cur_subs_pairs:\n used_equivs[tgt_class_iri] = True\n\n if self.subs_generation_ratio and len(cur_subs_pairs) > self.subs_generation_ratio:\n cur_subs_pairs = random.sample(cur_subs_pairs, self.subs_generation_ratio)\n subs_class_pairs += cur_subs_pairs\n\n used_equiv_tgt_class_iris = None\n if self.delete_used_equiv_tgt_class:\n used_equiv_tgt_class_iris = [iri for iri, used in used_equivs.items() if used is True]\n logger.info(\n f\"{len(used_equiv_tgt_class_iris)}/{len(self.equiv_class_pairs)} are used for creating at least one subsumption mapping.\"\n )\n\n subs_class_pairs = uniqify(subs_class_pairs)\n logger.info(f\"{len(subs_class_pairs)} subsumption mappings are created in the end.\")\n\n return subs_class_pairs, used_equiv_tgt_class_iris\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator.save_subs","title":"save_subs(save_path)
","text":"Save the constructed subsumption mappings (in tuples) to a local .tsv
file.
src/deeponto/align/mapping.py
def save_subs(self, save_path: str):\n\"\"\"Save the constructed subsumption mappings (in tuples) to a local `.tsv` file.\"\"\"\n subs_df = pd.DataFrame(self.subs_from_equivs, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n subs_df.to_csv(save_path, sep=\"\\t\", index=False)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator","title":"NegativeCandidateMappingGenerator(src_onto, tgt_onto, reference_class_mappings, annotation_property_iris, tokenizer, max_hops=5, for_subsumption=False)
","text":"Generating negative candidate mappings for each gold standard mapping.
Note that the source side of the golden standard mapping is fixed, i.e., candidate mappings are generated according to the target side.
paper
The candidate mapping generation algorithm is proposed in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).
Source code insrc/deeponto/align/mapping.py
def __init__(\n self,\n src_onto: Ontology,\n tgt_onto: Ontology,\n reference_class_mappings: List[ReferenceMapping], # equivalence or subsumption\n annotation_property_iris: List[str], # for text-based candidates\n tokenizer: Tokenizer, # for text-based candidates\n max_hops: int = 5, # for graph-based candidates\n for_subsumption: bool = False, # if for subsumption, avoid adding ancestors as candidates\n):\n\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n self.reference_class_mappings = reference_class_mappings\n self.reference_class_dict = defaultdict(list) # to prevent wrongly adding negative candidates\n for m in self.reference_class_mappings:\n src_class_iri, tgt_class_iri = m.to_tuple()\n self.reference_class_dict[src_class_iri].append(tgt_class_iri)\n\n # for IDF sample\n self.tgt_annotation_index, self.annotation_property_iris = self.tgt_onto.build_annotation_index(\n annotation_property_iris, apply_lowercasing=True\n )\n self.tokenizer = tokenizer\n self.tgt_inverted_annotation_index = self.tgt_onto.build_inverted_annotation_index(\n self.tgt_annotation_index, self.tokenizer\n )\n\n # for neighbour sample\n self.max_hops = max_hops\n\n # if for subsumption, avoid adding ancestors as candidates\n self.for_subsumption = for_subsumption\n # if for subsumption, add (src_class, tgt_class_ancestor) into the reference mappings\n if self.for_subsumption:\n for m in self.reference_class_mappings:\n src_class_iri, tgt_class_iri = m.to_tuple()\n tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n tgt_class_ancestors = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class)\n for tgt_ancestor_iri in tgt_class_ancestors:\n self.reference_class_dict[src_class_iri].append(tgt_ancestor_iri)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.mixed_sample","title":"mixed_sample(reference_class_mapping, **strategy2nums)
","text":"A mixed sampling approach that combines several sampling strategies.
As introduced in the Bio-ML paper, this mixed approach guarantees that the number of samples for each strategy is either the maximum that can be sampled or the required number.
Specifically, at each sampling iteration, the number of candidates is first increased by the number of previously sampled candidates, as in the worst case, all the candidates sampled at this iteration will be duplicated with the previous.
The random sampling is used as the amending strategy, i.e., if other sampling strategies cannot retrieve the specified number of samples, then use random sampling to amend the number.
Parameters:
Name Type Description Defaultreference_class_mapping
ReferenceMapping
The reference class mapping for generating the candidate mappings.
required**strategy2nums
int
The keyword arguments that specify the expected number of candidates for each sampling strategy.
{}
Source code in src/deeponto/align/mapping.py
def mixed_sample(self, reference_class_mapping: ReferenceMapping, **strategy2nums):\n\"\"\"A mixed sampling approach that combines several sampling strategies.\n\n As introduced in the Bio-ML paper, this mixed approach guarantees that the number of samples for each\n strategy is either the **maximum that can be sampled** or the required number.\n\n Specifically, at each sampling iteration, the number of candidates is **first increased by the number of \n previously sampled candidates**, as in the worst case, all the candidates sampled at this iteration\n will be duplicated with the previous. \n\n The random sampling is used as the amending strategy, i.e., if other sampling strategies cannot retrieve\n the specified number of samples, then use random sampling to amend the number.\n\n Args:\n reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n **strategy2nums (int): The keyword arguments that specify the expected number of candidates for each\n sampling strategy.\n \"\"\"\n\n valid_tgt_candidate_iris = []\n sample_stats = defaultdict(lambda: 0)\n i = 0\n total_num_candidates = 0\n for strategy, num_canddiates in strategy2nums.items():\n i += 1\n if strategy in SAMPLING_OPTIONS:\n sampler = getattr(self, f\"{strategy}_sample\")\n # for ith iteration, the worst case is when all n_cands are duplicated\n # or should be excluded from other reference targets so we generate\n # NOTE: total_num_candidates + num_candidates + len(excluded_tgt_class_iris)\n # candidates first and prune the rest; another edge case is when sampled\n # candidates are not sufficient and we use random sample to meet n_cands\n cur_valid_tgt_candidate_iris = sampler(\n reference_class_mapping, total_num_candidates + num_canddiates\n )\n # remove the duplicated candidates (and excluded refs) and prune the tail\n cur_valid_tgt_candidate_iris = list(\n set(cur_valid_tgt_candidate_iris) - set(valid_tgt_candidate_iris)\n )[:num_canddiates]\n sample_stats[strategy] += len(cur_valid_tgt_candidate_iris)\n # use random samples for complementation if not enough\n while len(cur_valid_tgt_candidate_iris) < num_canddiates:\n amend_candidate_iris = self.random_sample(\n reference_class_mapping, num_canddiates - len(cur_valid_tgt_candidate_iris)\n )\n amend_candidate_iris = list(\n set(amend_candidate_iris)\n - set(valid_tgt_candidate_iris)\n - set(cur_valid_tgt_candidate_iris)\n )\n cur_valid_tgt_candidate_iris += amend_candidate_iris\n assert len(cur_valid_tgt_candidate_iris) == num_canddiates\n # record how many random samples to amend\n if strategy != \"random\":\n sample_stats[\"random\"] += num_canddiates - sample_stats[strategy]\n valid_tgt_candidate_iris += cur_valid_tgt_candidate_iris\n total_num_candidates += num_canddiates\n else:\n raise ValueError(f\"Invalid sampling trategy: {strategy}.\")\n assert len(valid_tgt_candidate_iris) == total_num_candidates\n\n # TODO: add the candidate mappings into the reference mapping \n\n return valid_tgt_candidate_iris, sample_stats\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.random_sample","title":"random_sample(reference_class_mapping, num_candidates)
","text":"Randomly sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\).
The sampled candidate classes will be combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).
Parameters:
Name Type Description Defaultreference_class_mapping
ReferenceMapping
The reference class mapping for generating the candidate mappings.
requirednum_candidates
int
The expected number of candidate mappings to generate.
required Source code insrc/deeponto/align/mapping.py
def random_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"**Randomly** sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$.\n\n The sampled candidate classes will be combined with the source reference class $c$ to get a set of\n candidate mappings $\\{(c, c'_{cand})\\}$.\n\n Args:\n reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n num_candidates (int): The expected number of candidate mappings to generate.\n \"\"\"\n ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n all_tgt_class_iris = set(self.tgt_onto.owl_classes.keys())\n valid_tgt_class_iris = all_tgt_class_iris - set(\n self.reference_class_dict[ref_src_class_iri]\n ) # exclude gold standards\n assert not ref_tgt_class_iri in valid_tgt_class_iris\n return random.sample(valid_tgt_class_iris, num_candidates)\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.idf_sample","title":"idf_sample(reference_class_mapping, num_candidates)
","text":"Sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\) based on the \\(idf\\) scores w.r.t. the inverted annotation index (sub-word level).
Candidate classes with higher \\(idf\\) scores will be considered first, and then combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).
Parameters:
Name Type Description Defaultreference_class_mapping
ReferenceMapping
The reference class mapping for generating the candidate mappings.
requirednum_candidates
int
The expected number of candidate mappings to generate.
required Source code insrc/deeponto/align/mapping.py
def idf_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"Sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$ based on the $idf$ scores\n w.r.t. the inverted annotation index (sub-word level).\n\n Candidate classes with higher $idf$ scores will be considered first, and then combined with the source reference class $c$\n to get a set of candidate mappings $\\{(c, c'_{cand})\\}$.\n\n Args:\n reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n num_candidates (int): The expected number of candidate mappings to generate.\n \"\"\"\n ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n\n tgt_candidates = self.tgt_inverted_annotation_index.idf_select(\n self.tgt_annotation_index[ref_tgt_class_iri]\n ) # select all non-trivial candidates first\n valid_tgt_class_iris = []\n for tgt_candidate_iri, _ in tgt_candidates:\n # valid as long as it is not one of the reference target\n if tgt_candidate_iri not in self.reference_class_dict[ref_src_class_iri]:\n valid_tgt_class_iris.append(tgt_candidate_iri)\n if len(valid_tgt_class_iris) == num_candidates:\n break\n assert not ref_tgt_class_iri in valid_tgt_class_iris\n return valid_tgt_class_iris\n
"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.neighbour_sample","title":"neighbour_sample(reference_class_mapping, num_candidates)
","text":"Sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\) based on the subsumption hierarchy.
Define one-hop as one edge derived from an asserted subsumption axiom, i.e., to the parent class or the child class. Candidates classes with nearer hops will be considered first, and then combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).
Parameters:
Name Type Description Defaultreference_class_mapping
ReferenceMapping
The reference class mapping for generating the candidate mappings.
requirednum_candidates
int
The expected number of candidate mappings to generate.
required Source code insrc/deeponto/align/mapping.py
def neighbour_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"Sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$ based on the **subsumption\n hierarchy**.\n\n Define one-hop as one edge derived from an **asserted** subsumption axiom, i.e., to the parent class or the child class.\n Candidates classes with nearer hops will be considered first, and then combined with the source reference class $c$\n to get a set of candidate mappings $\\{(c, c'_{cand})\\}$.\n\n Args:\n reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n num_candidates (int): The expected number of candidate mappings to generate.\n \"\"\"\n ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n\n valid_tgt_class_iris = set()\n cur_hop = 1\n frontier = [ref_tgt_class_iri]\n # extract from the nearest neighbours until enough candidates or max hop\n while len(valid_tgt_class_iris) < num_candidates and cur_hop <= self.max_hops:\n\n neighbours_of_cur_hop = []\n for tgt_class_iri in frontier:\n tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n parents = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class, direct=True)\n children = self.tgt_onto.reasoner.get_inferred_sub_entities(tgt_class, direct=True)\n neighbours_of_cur_hop += parents + children # used for further hop expansion\n\n valid_neighbours_of_cur_hop = set(neighbours_of_cur_hop) - set(self.reference_class_dict[ref_src_class_iri])\n # print(valid_neighbours_of_cur_hop)\n\n # NOTE if by adding neighbours of current hop the require number will be met\n # we randomly pick among them\n if len(valid_neighbours_of_cur_hop) > num_candidates - len(valid_tgt_class_iris):\n valid_neighbours_of_cur_hop = random.sample(\n valid_neighbours_of_cur_hop, num_candidates - len(valid_tgt_class_iris)\n )\n valid_tgt_class_iris.update(valid_neighbours_of_cur_hop)\n\n frontier = neighbours_of_cur_hop # update the frontier with all possible neighbors\n cur_hop += 1\n\n assert not ref_tgt_class_iri in valid_tgt_class_iris\n return list(valid_tgt_class_iris)\n
"},{"location":"deeponto/align/oaei/","title":"OAEI Utilities","text":"This page concerns utility functions used in the OAEI.
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.get_ignored_class_index","title":"get_ignored_class_index(onto)
","text":"Get an index for filtering classes that are marked as not used in alignment.
This is indicated by the special class annotation use_in_alignment
with the following IRI: http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment
src/deeponto/align/oaei.py
def get_ignored_class_index(onto: Ontology):\n\"\"\"Get an index for filtering classes that are marked as not used in alignment.\n\n This is indicated by the special class annotation `use_in_alignment` with the following IRI:\n http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\n \"\"\"\n ignored_class_index = defaultdict(lambda: False)\n for class_iri, class_obj in onto.owl_classes.items():\n use_in_alignment = onto.get_annotations(\n class_obj, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\"\n )\n if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n ignored_class_index[class_iri] = True\n return ignored_class_index\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.remove_ignored_mappings","title":"remove_ignored_mappings(mappings, ignored_class_index)
","text":"Filter prediction mappings that involve classes to be ignored.
Source code insrc/deeponto/align/oaei.py
def remove_ignored_mappings(mappings: List[EntityMapping], ignored_class_index: dict):\n\"\"\"Filter prediction mappings that involve classes to be ignored.\"\"\"\n results = []\n for m in mappings:\n if ignored_class_index[m.head] or ignored_class_index[m.tail]:\n continue\n results.append(m)\n return results\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.matching_eval","title":"matching_eval(pred_maps_file, ref_maps_file, null_ref_maps_file=None, ignored_class_index=None, pred_maps_threshold=None)
","text":"Conduct global matching evaluation for the prediction mappings against the reference mappings.
The prediction mappings are formatted the same as full.tsv
(the full reference mappings), with three columns: \"SrcEntity\"
, \"TgtEntity\"
, and \"Score\"
, indicating the source class IRI, the target class IRI, and the corresponding mapping score.
An ignored_class_index
needs to be constructed for omitting prediction mappings that involve a class marked as not used in alignment.
Use the following code to obtain such index for both the source and target ontologies:
ignored_class_index = get_ignored_class_index(src_onto)\nignored_class_index.update(get_ignored_class_index(tgt_onto))\n
Source code in src/deeponto/align/oaei.py
def matching_eval(\n pred_maps_file: str,\n ref_maps_file: str,\n null_ref_maps_file: Optional[str] = None,\n ignored_class_index: Optional[dict] = None,\n pred_maps_threshold: Optional[float] = None,\n):\nr\"\"\"Conduct **global matching** evaluation for the prediction mappings against the\n reference mappings.\n\n The prediction mappings are formatted the same as `full.tsv` (the full reference mappings),\n with three columns: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`, indicating the source\n class IRI, the target class IRI, and the corresponding mapping score.\n\n An `ignored_class_index` needs to be constructed for omitting prediction mappings\n that involve a class marked as **not used in alignment**.\n\n Use the following code to obtain such index for both the source and target ontologies:\n\n ```python\n ignored_class_index = get_ignored_class_index(src_onto)\n ignored_class_index.update(get_ignored_class_index(tgt_onto))\n ```\n \"\"\"\n refs = ReferenceMapping.read_table_mappings(ref_maps_file, relation=\"=\")\n preds = EntityMapping.read_table_mappings(pred_maps_file, relation=\"=\", threshold=pred_maps_threshold)\n if ignored_class_index:\n preds = remove_ignored_mappings(preds, ignored_class_index)\n null_refs = ReferenceMapping.read_table_mappings(null_ref_maps_file, relation=\"=\") if null_ref_maps_file else []\n results = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=null_refs)\n return results\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.read_candidate_mappings","title":"read_candidate_mappings(cand_maps_file, for_biollm=False, threshold=0.0)
","text":"Load scored or already ranked candidate mappings.
The predicted candidate mappings are formatted the same as test.cands.tsv
, with three columns: \"SrcEntity\"
, \"TgtEntity\"
, and \"TgtCandidates\"
, indicating the source reference class IRI, the target reference class IRI, and a list of tuples in the form of (target_candidate_class_IRI, score)
where score
is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, \"TgtCandidates\"
refers to a list of triples in the form of (target_candidate_class_IRI, score, answer)
where the answer
is required for computing matching scores.
This method loads the candidate mappings in this format and parse them into the inputs of mean_reciprocal_rank
and [hits_at_K
][[mean_reciprocal_rank
][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].
For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of f1
.
src/deeponto/align/oaei.py
def read_candidate_mappings(cand_maps_file: str, for_biollm: bool = False, threshold: float = 0.0):\nr\"\"\"Load scored or already ranked candidate mappings.\n\n The predicted candidate mappings are formatted the same as `test.cands.tsv`, with three columns:\n `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"TgtCandidates\"`, indicating the source reference class IRI, the\n target reference class IRI, and a list of **tuples** in the form of `(target_candidate_class_IRI, score)` where\n `score` is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, `\"TgtCandidates\"`\n refers to a list of **triples** in the form of `(target_candidate_class_IRI, score, answer)` where the `answer` is\n required for computing matching scores.\n\n This method loads the candidate mappings in this format and parse them into the inputs of [`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank]\n and [`hits_at_K`][[`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].\n\n For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of [`f1`][deeponto.align.evaluation.AlignmentEvaluator.f1].\n \"\"\"\n\n all_cand_maps = read_table(cand_maps_file).values.tolist()\n cands = []\n unmatched_cands = []\n preds = [] # only used for bio-llm\n refs = [] # only used for bio-llm\n\n for src_ref_class, tgt_ref_class, tgt_cands in all_cand_maps:\n ref_map = ReferenceMapping(src_ref_class, tgt_ref_class, \"=\")\n tgt_cands = eval(tgt_cands)\n has_score = True if all([not isinstance(x, str) for x in tgt_cands]) else False\n cand_maps = []\n refs.append(ref_map) if tgt_ref_class != \"UnMatched\" else None\n if for_biollm:\n for t, s, a in tgt_cands:\n m = EntityMapping(src_ref_class, t, \"=\", s)\n cand_maps.append(m)\n if a is True and s >= threshold: # only keep first one\n preds.append(m)\n elif has_score:\n cand_maps = [EntityMapping(src_ref_class, t, \"=\", s) for t, s in tgt_cands]\n else:\n warnings.warn(\"Input candidate mappings do not have a score, assume default rank in descending order.\")\n cand_maps = [\n EntityMapping(src_ref_class, t, \"=\", (len(tgt_cands) - i) / len(tgt_cands))\n for i, t in enumerate(tgt_cands)\n ]\n cand_maps = EntityMapping.sort_entity_mappings_by_score(cand_maps)\n if for_biollm and tgt_ref_class == \"UnMatched\":\n unmatched_cands.append((ref_map, cand_maps))\n else:\n cands.append((ref_map, cand_maps))\n\n if for_biollm:\n return cands, unmatched_cands, preds, refs\n else:\n return cands\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.ranking_result_file_check","title":"ranking_result_file_check(cand_maps_file, ref_cand_maps_file)
","text":"Check if the ranking result file is formatted correctly as the original test.cands.tsv
file provided in the dataset.
src/deeponto/align/oaei.py
def ranking_result_file_check(cand_maps_file: str, ref_cand_maps_file: str):\nr\"\"\"Check if the ranking result file is formatted correctly as the original\n `test.cands.tsv` file provided in the dataset.\n \"\"\"\n formatted_cand_maps = read_candidate_mappings(cand_maps_file)\n formatted_ref_cand_maps = read_candidate_mappings(ref_cand_maps_file)\n assert len(formatted_cand_maps) == len(\n formatted_ref_cand_maps\n ), f\"Mismatched number of reference mappings: {len(formatted_cand_maps)}; should be {len(formatted_ref_cand_maps)}.\"\n for i in range(len(formatted_cand_maps)):\n anchor, cands = formatted_cand_maps[i]\n ref_anchor, ref_cands = formatted_ref_cand_maps[i]\n assert (\n anchor.to_tuple() == ref_anchor.to_tuple()\n ), f\"Mismatched reference mapping: {anchor}; should be {ref_anchor}.\"\n cands = [c.to_tuple() for c in cands]\n ref_cands = [rc.to_tuple() for rc in ref_cands]\n assert not (\n set(cands) - set(ref_cands)\n ), f\"Mismatch set of candidate mappings for the reference mapping: {anchor}.\"\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.ranking_eval","title":"ranking_eval(cand_maps_file, Ks=[1, 5, 10])
","text":"Conduct local ranking evaluation for the scored or ranked candidate mappings.
See read_candidate_mappings
for the file format and loading.
src/deeponto/align/oaei.py
def ranking_eval(cand_maps_file: str, Ks=[1, 5, 10]):\nr\"\"\"Conduct **local ranking** evaluation for the scored or ranked candidate mappings.\n\n See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.\n \"\"\"\n formatted_cand_maps = read_candidate_mappings(cand_maps_file)\n results = {\"MRR\": AlignmentEvaluator.mean_reciprocal_rank(formatted_cand_maps)}\n for K in Ks:\n results[f\"Hits@{K}\"] = AlignmentEvaluator.hits_at_K(formatted_cand_maps, K=K)\n return results\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.is_rejection","title":"is_rejection(preds, cands)
","text":"A successful rejection means none of the candidate mappings are predicted as true mappings.
Source code insrc/deeponto/align/oaei.py
def is_rejection(preds: List[EntityMapping], cands: List[EntityMapping]):\n\"\"\"A successful rejection means none of the candidate mappings are predicted as true mappings.\"\"\"\n return set([p.to_tuple() for p in preds]).intersection(set([c.to_tuple() for c in cands])) == set()\n
"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.biollm_eval","title":"biollm_eval(cand_maps_file, Ks=[1], threshold=0.0)
","text":"Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.
See read_candidate_mappings
for the file format and loading.
src/deeponto/align/oaei.py
def biollm_eval(cand_maps_file, Ks=[1], threshold: float = 0.0):\nr\"\"\"Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.\n\n See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.\n \"\"\"\n matched_cand_maps, unmatched_cand_maps, preds, refs = read_candidate_mappings(\n cand_maps_file, for_biollm=True, threshold=threshold\n )\n\n results = AlignmentEvaluator.f1(preds, refs)\n for K in Ks:\n results[f\"Hits@{K}\"] = AlignmentEvaluator.hits_at_K(matched_cand_maps, K=K)\n results[\"MRR\"] = AlignmentEvaluator.mean_reciprocal_rank(matched_cand_maps)\n rej = 0\n for _, cs in unmatched_cand_maps:\n rej += int(is_rejection(preds, cs))\n results[\"RR\"] = rej / len(unmatched_cand_maps)\n return results\n
"},{"location":"deeponto/align/bertmap/","title":"BERTMap","text":"Paper
\\(\\textsf{BERTMap}\\) is proposed in the paper: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).
@inproceedings{he2022bertmap,\n title={BERTMap: a BERT-based ontology alignment system},\n author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},\n booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n volume={36},\n number={5},\n pages={5684--5691},\n year={2022}\n}\n
\\(\\textsf{BERTMap}\\) is a BERT-based ontology matching (OM) system consisting of following components:
\\(\\textsf{BERTMapLt}\\) is a light-weight version of \\(\\textsf{BERTMap}\\) without the BERT module and mapping refiner.
See the tutorial for \\(\\textsf{BERTMap}\\) here.
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline","title":"BERTMapPipeline(src_onto, tgt_onto, config)
","text":"Class for the whole ontology alignment pipeline of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) models.
Note
Parameters related to BERT training are None
by default. They will be constructed for \\(\\textsf{BERTMap}\\) and stay as None
for \\(\\textsf{BERTMapLt}\\).
Attributes:
Name Type Descriptionconfig
CfgNode
The configuration for BERTMap or BERTMapLt.
name
str
The name of the model, either bertmap
or bertmaplt
.
output_path
str
The path to the output directory.
src_onto
Ontology
The source ontology to be matched.
tgt_onto
Ontology
The target ontology to be matched.
annotation_property_iris
List[str]
The annotation property IRIs used for extracting synonyms and nonsynonyms.
src_annotation_index
dict
A dictionary that stores the (class_iri, class_annotations)
pairs from src_onto
according to annotation_property_iris
.
tgt_annotation_index
dict
A dictionary that stores the (class_iri, class_annotations)
pairs from tgt_onto
according to annotation_property_iris
.
known_mappings
List[ReferenceMapping]
List of known mappings for constructing the cross-ontology corpus.
auxliary_ontos
List[Ontology]
List of auxiliary ontolgoies for constructing any auxiliary corpus.
corpora
dict
A dictionary that stores the summary
of built text semantics corpora and the sampled synonyms
and nonsynonyms
.
finetune_data
dict
A dictionary that stores the training
and validation
splits of samples from corpora
.
bert
BERTSynonymClassifier
A BERT model for synonym classification and mapping prediction.
best_checkpoint
str
The path to the best BERT checkpoint which will be loaded after training.
mapping_predictor
MappingPredictor
The predictor function based on class annotations, used for global matching or mapping scoring.
Parameters:
Name Type Description Defaultsrc_onto
Ontology
The source ontology for alignment.
requiredtgt_onto
Ontology
The target ontology for alignment.
requiredconfig
CfgNode
The configuration for BERTMap or BERTMapLt.
required Source code insrc/deeponto/align/bertmap/pipeline.py
def __init__(self, src_onto: Ontology, tgt_onto: Ontology, config: CfgNode):\n\"\"\"Initialise the BERTMap or BERTMapLt model.\n\n Args:\n src_onto (Ontology): The source ontology for alignment.\n tgt_onto (Ontology): The target ontology for alignment.\n config (CfgNode): The configuration for BERTMap or BERTMapLt.\n \"\"\"\n # load the configuration and confirm model name is valid\n self.config = config\n self.name = self.config.model\n if not self.name in MODEL_OPTIONS.keys():\n raise RuntimeError(f\"`model` {self.name} in the config file is not one of the supported.\")\n\n # create the output directory, e.g., experiments/bertmap\n self.config.output_path = \".\" if not self.config.output_path else self.config.output_path\n self.config.output_path = os.path.abspath(self.config.output_path)\n self.output_path = os.path.join(self.config.output_path, self.name)\n create_path(self.output_path)\n\n # create logger and progress manager (hidden attribute) \n self.logger = create_logger(self.name, self.output_path)\n self.enlighten_manager = enlighten.get_manager()\n\n # ontology\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n self.annotation_property_iris = self.config.annotation_property_iris\n self.logger.info(f\"Load the following configurations:\\n{print_dict(self.config)}\")\n config_path = os.path.join(self.output_path, \"config.yaml\")\n self.logger.info(f\"Save the configuration file at {config_path}.\")\n self.save_bertmap_config(self.config, config_path)\n\n # build the annotation thesaurus\n self.src_annotation_index, _ = self.src_onto.build_annotation_index(self.annotation_property_iris, apply_lowercasing=True)\n self.tgt_annotation_index, _ = self.tgt_onto.build_annotation_index(self.annotation_property_iris, apply_lowercasing=True)\n if (not self.src_annotation_index) or (not self.tgt_annotation_index):\n raise RuntimeError(\"No class annotations found in input ontologies; unable to produce alignment.\")\n\n # provided mappings if any\n self.known_mappings = self.config.known_mappings\n if self.known_mappings:\n self.known_mappings = ReferenceMapping.read_table_mappings(self.known_mappings)\n\n # auxiliary ontologies if any\n self.auxiliary_ontos = self.config.auxiliary_ontos\n if self.auxiliary_ontos:\n self.auxiliary_ontos = [Ontology(ao) for ao in self.auxiliary_ontos]\n\n self.data_path = os.path.join(self.output_path, \"data\")\n # load or construct the corpora\n self.corpora_path = os.path.join(self.data_path, \"text-semantics.corpora.json\")\n self.corpora = self.load_text_semantics_corpora()\n\n # load or construct fine-tune data\n self.finetune_data_path = os.path.join(self.data_path, \"fine-tune.data.json\")\n self.finetune_data = self.load_finetune_data()\n\n # load the bert model and train\n self.bert_config = self.config.bert\n self.bert_pretrained_path = self.bert_config.pretrained_path\n self.bert_finetuned_path = os.path.join(self.output_path, \"bert\")\n self.bert_resume_training = self.bert_config.resume_training\n self.bert_synonym_classifier = None\n self.best_checkpoint = None\n if self.name == \"bertmap\":\n self.bert_synonym_classifier = self.load_bert_synonym_classifier()\n # train if the loaded classifier is not in eval mode\n if self.bert_synonym_classifier.eval_mode == False:\n self.logger.info(\n f\"Data statistics:\\n \\\n{print_dict(self.bert_synonym_classifier.data_stat)}\"\n )\n self.bert_synonym_classifier.train(self.bert_resume_training)\n # turn on eval mode after training\n self.bert_synonym_classifier.eval()\n # NOTE potential redundancy here: after training, load the best checkpoint\n self.best_checkpoint = self.load_best_checkpoint()\n if not self.best_checkpoint:\n raise RuntimeError(f\"No best checkpoint found for the BERT synonym classifier model.\")\n self.logger.info(f\"Fine-tuning finished, found best checkpoint at {self.best_checkpoint}.\")\n else:\n self.logger.info(f\"No training needed; skip BERT fine-tuning.\")\n\n # pretty progress bar tracking\n self.enlighten_status = self.enlighten_manager.status_bar(\n status_format=u'Global Matching{fill}Stage: {demo}{fill}{elapsed}',\n color='bold_underline_bright_white_on_lightslategray',\n justify=enlighten.Justify.CENTER, demo='Initializing',\n autorefresh=True, min_delta=0.5\n )\n\n # mapping predictions\n self.global_matching_config = self.config.global_matching\n\n # build ignored class index for OAEI\n self.ignored_class_index = None \n if self.global_matching_config.for_oaei:\n self.ignored_class_index = defaultdict(lambda: False)\n for src_class_iri, src_class in self.src_onto.owl_classes.items():\n use_in_alignment = self.src_onto.get_annotations(src_class, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\")\n if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n self.ignored_class_index[src_class_iri] = True\n for tgt_class_iri, tgt_class in self.tgt_onto.owl_classes.items():\n use_in_alignment = self.tgt_onto.get_annotations(tgt_class, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\")\n if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n self.ignored_class_index[tgt_class_iri] = True\n\n self.mapping_predictor = MappingPredictor(\n output_path=self.output_path,\n tokenizer_path=self.bert_config.pretrained_path,\n src_annotation_index=self.src_annotation_index,\n tgt_annotation_index=self.tgt_annotation_index,\n bert_synonym_classifier=self.bert_synonym_classifier,\n num_raw_candidates=self.global_matching_config.num_raw_candidates,\n num_best_predictions=self.global_matching_config.num_best_predictions,\n batch_size_for_prediction=self.bert_config.batch_size_for_prediction,\n logger=self.logger,\n enlighten_manager=self.enlighten_manager,\n enlighten_status=self.enlighten_status,\n ignored_class_index=self.ignored_class_index,\n )\n self.mapping_refiner = None\n\n # if global matching is disabled (potentially used for class pair scoring)\n if self.config.global_matching.enabled:\n self.mapping_predictor.mapping_prediction() # mapping prediction\n if self.name == \"bertmap\":\n self.mapping_refiner = MappingRefiner(\n output_path=self.output_path,\n src_onto=self.src_onto,\n tgt_onto=self.tgt_onto,\n mapping_predictor=self.mapping_predictor,\n mapping_extension_threshold=self.global_matching_config.mapping_extension_threshold,\n mapping_filtered_threshold=self.global_matching_config.mapping_filtered_threshold,\n logger=self.logger,\n enlighten_manager=self.enlighten_manager,\n enlighten_status=self.enlighten_status\n )\n self.mapping_refiner.mapping_extension() # mapping extension\n self.mapping_refiner.mapping_repair() # mapping repair\n self.enlighten_status.update(demo=\"Finished\") \n else:\n self.enlighten_status.update(demo=\"Skipped\") \n\n self.enlighten_status.close()\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_or_construct","title":"load_or_construct(data_file, data_name, construct_func, *args, **kwargs)
","text":"Load existing data or construct a new one.
An auxlirary function that checks the existence of a data file and loads it if it exists. Otherwise, construct new data with the input construct_func
which is supported generate a local data file.
src/deeponto/align/bertmap/pipeline.py
def load_or_construct(self, data_file: str, data_name: str, construct_func: Callable, *args, **kwargs):\n\"\"\"Load existing data or construct a new one.\n\n An auxlirary function that checks the existence of a data file and loads it if it exists.\n Otherwise, construct new data with the input `construct_func` which is supported generate\n a local data file.\n \"\"\"\n if os.path.exists(data_file):\n self.logger.info(f\"Load existing {data_name} from {data_file}.\")\n else:\n self.logger.info(f\"Construct new {data_name} and save at {data_file}.\")\n construct_func(*args, **kwargs)\n # load the data file that is supposed to be saved locally\n return load_file(data_file)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_text_semantics_corpora","title":"load_text_semantics_corpora()
","text":"Load or construct text semantics corpora.
See TextSemanticsCorpora
.
src/deeponto/align/bertmap/pipeline.py
def load_text_semantics_corpora(self):\n\"\"\"Load or construct text semantics corpora.\n\n See [`TextSemanticsCorpora`][deeponto.align.bertmap.text_semantics.TextSemanticsCorpora].\n \"\"\"\n data_name = \"text semantics corpora\"\n\n if self.name == \"bertmap\":\n\n def construct():\n corpora = TextSemanticsCorpora(\n src_onto=self.src_onto,\n tgt_onto=self.tgt_onto,\n annotation_property_iris=self.annotation_property_iris,\n class_mappings=self.known_mappings,\n auxiliary_ontos=self.auxiliary_ontos,\n )\n self.logger.info(str(corpora))\n corpora.save(self.data_path)\n\n return self.load_or_construct(self.corpora_path, data_name, construct)\n\n self.logger.info(f\"No training needed; skip the construction of {data_name}.\")\n return None\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_finetune_data","title":"load_finetune_data()
","text":"Load or construct fine-tuning data from text semantics corpora.
Steps of constructing fine-tuning data from text semantics:
src/deeponto/align/bertmap/pipeline.py
def load_finetune_data(self):\nr\"\"\"Load or construct fine-tuning data from text semantics corpora.\n\n Steps of constructing fine-tuning data from text semantics:\n\n 1. Mix synonym and nonsynonym data.\n 2. Randomly sample 90% as training samples and 10% as validation.\n \"\"\"\n data_name = \"fine-tuning data\"\n\n if self.name == \"bertmap\":\n\n def construct():\n finetune_data = dict()\n samples = self.corpora[\"synonyms\"] + self.corpora[\"nonsynonyms\"]\n random.shuffle(samples)\n split_index = int(0.9 * len(samples)) # split at 90%\n finetune_data[\"training\"] = samples[:split_index]\n finetune_data[\"validation\"] = samples[split_index:]\n save_file(finetune_data, self.finetune_data_path)\n\n return self.load_or_construct(self.finetune_data_path, data_name, construct)\n\n self.logger.info(f\"No training needed; skip the construction of {data_name}.\")\n return None\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_bert_synonym_classifier","title":"load_bert_synonym_classifier()
","text":"Load the BERT model from a pre-trained or a local checkpoint.
bert-uncased
.eval
mode for mapping predictions.self.bert_resume_training
is True
, it will be loaded from the latest saved checkpoint.src/deeponto/align/bertmap/pipeline.py
def load_bert_synonym_classifier(self):\n\"\"\"Load the BERT model from a pre-trained or a local checkpoint.\n\n - If loaded from pre-trained, it means to start training from a pre-trained model such as `bert-uncased`.\n - If loaded from local, turn on the `eval` mode for mapping predictions.\n - If `self.bert_resume_training` is `True`, it will be loaded from the latest saved checkpoint.\n \"\"\"\n checkpoint = self.load_best_checkpoint() # load the best checkpoint or nothing\n eval_mode = True\n # if no checkpoint has been found, start training from scratch OR resume training\n # no point to load the best checkpoint if resume training (will automatically search for the latest checkpoint)\n if not checkpoint or self.bert_resume_training:\n checkpoint = self.bert_pretrained_path\n eval_mode = False # since it is for training now\n\n return BERTSynonymClassifier(\n loaded_path=checkpoint,\n output_path=self.bert_finetuned_path,\n eval_mode=eval_mode,\n max_length_for_input=self.bert_config.max_length_for_input,\n num_epochs_for_training=self.bert_config.num_epochs_for_training,\n batch_size_for_training=self.bert_config.batch_size_for_training,\n batch_size_for_prediction=self.bert_config.batch_size_for_prediction,\n training_data=self.finetune_data[\"training\"],\n validation_data=self.finetune_data[\"validation\"],\n )\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_best_checkpoint","title":"load_best_checkpoint()
","text":"Find the best checkpoint by searching for trainer states in each checkpoint file.
Source code insrc/deeponto/align/bertmap/pipeline.py
def load_best_checkpoint(self) -> Optional[str]:\n\"\"\"Find the best checkpoint by searching for trainer states in each checkpoint file.\"\"\"\n best_checkpoint = -1\n\n if os.path.exists(self.bert_finetuned_path):\n for file in os.listdir(self.bert_finetuned_path):\n # load trainer states from each checkpoint file\n if file.startswith(\"checkpoint\"):\n trainer_state = load_file(\n os.path.join(self.bert_finetuned_path, file, \"trainer_state.json\")\n )\n checkpoint = int(trainer_state[\"best_model_checkpoint\"].split(\"/\")[-1].split(\"-\")[-1])\n # find the latest best checkpoint\n if checkpoint > best_checkpoint:\n best_checkpoint = checkpoint\n\n if best_checkpoint == -1:\n best_checkpoint = None\n else:\n best_checkpoint = os.path.join(self.bert_finetuned_path, f\"checkpoint-{best_checkpoint}\")\n\n return best_checkpoint\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_bertmap_config","title":"load_bertmap_config(config_file=None)
staticmethod
","text":"Load the BERTMap configuration in .yaml
. If the file is not provided, use the default configuration.
src/deeponto/align/bertmap/pipeline.py
@staticmethod\ndef load_bertmap_config(config_file: Optional[str] = None):\n\"\"\"Load the BERTMap configuration in `.yaml`. If the file\n is not provided, use the default configuration.\n \"\"\"\n if not config_file:\n config_file = DEFAULT_CONFIG_FILE\n print(f\"Use the default configuration at {DEFAULT_CONFIG_FILE}.\") \n if not config_file.endswith(\".yaml\"):\n raise RuntimeError(\"Configuration file should be in `yaml` format.\")\n return CfgNode(load_file(config_file))\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.save_bertmap_config","title":"save_bertmap_config(config, config_file)
staticmethod
","text":"Save the BERTMap configuration in .yaml
.
src/deeponto/align/bertmap/pipeline.py
@staticmethod\ndef save_bertmap_config(config: CfgNode, config_file: str):\n\"\"\"Save the BERTMap configuration in `.yaml`.\"\"\"\n with open(config_file, \"w\") as c:\n config.dump(stream=c, sort_keys=False, default_flow_style=False)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus","title":"AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)
","text":"A thesaurus class for synonyms and non-synonyms extracted from an ontology.
Some related definitions of arguments here:
synonym_group
is a set of annotation phrases that are synonymous to each other;transitivity
of synonyms means if A and B are synonymous and B and C are synonymous, then A and C are synonymous. This is achieved by a connected graph-based algorithm.synonym_pair
is a pair synonymous annotation phrase which can be extracted from the cartesian product of a synonym_group
and itself. NOTE that reflexivity and symmetry are preserved meaning that (i) every phrase A is a synonym of itself and (ii) if (A, B) is a synonym pair then (B, A) is a synonym pair, too.Attributes:
Name Type Descriptiononto
Ontology
An ontology to construct the annotation thesaurus from.
annotation_index
Dict[str, Set[str]]
An index of the class annotations with (class_iri, annotations)
pairs.
annotation_property_iris
List[str]
A list of annotation property IRIs used to extract the annotations.
average_number_of_annotations_per_class
int
The average number of (extracted) annotations per ontology class.
apply_transitivity
bool
Apply synonym transitivity to merge synonym groups or not.
synonym_groups
List[Set[str]]
The list of synonym groups extracted from the ontology according to specified annotation properties.
Parameters:
Name Type Description Defaultonto
Ontology
The input ontology to extract annotations from.
requiredannotation_property_iris
List[str]
Specify which annotation properties to be used.
requiredapply_transitivity
bool
Apply synonym transitivity to merge synonym groups or not. Defaults to False
.
False
Source code in src/deeponto/align/bertmap/text_semantics.py
def __init__(self, onto: Ontology, annotation_property_iris: List[str], apply_transitivity: bool = False):\nr\"\"\"Initialise a thesaurus for ontology class annotations.\n\n Args:\n onto (Ontology): The input ontology to extract annotations from.\n annotation_property_iris (List[str]): Specify which annotation properties to be used.\n apply_transitivity (bool, optional): Apply synonym transitivity to merge synonym groups or not. Defaults to `False`.\n \"\"\"\n\n self.onto = onto\n # build the annotation index to extract synonyms from `onto`\n # the input property iris may not exist in this ontology\n # the output property iris will be truncated to the existing ones\n index, iris = self.onto.build_annotation_index(\n annotation_property_iris=annotation_property_iris,\n entity_type=\"Classes\",\n apply_lowercasing=True,\n )\n self.annotation_index = index\n self.annotation_property_iris = iris\n total_number_of_annotations = sum([len(v) for v in self.annotation_index.values()])\n self.average_number_of_annotations_per_class = total_number_of_annotations / len(self.annotation_index)\n\n # synonym groups\n self.apply_transitivity = apply_transitivity\n self.synonym_groups = list(self.annotation_index.values())\n if self.apply_transitivity:\n self.synonym_groups = self.merge_synonym_groups_by_transitivity(self.synonym_groups)\n\n # summary\n self.info = {\n type(self).__name__: {\n \"ontology\": self.onto.info[type(self.onto).__name__],\n \"average_number_of_annotations_per_class\": round(self.average_number_of_annotations_per_class, 3),\n \"number_of_synonym_groups\": len(self.synonym_groups),\n }\n }\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.get_synonym_pairs","title":"get_synonym_pairs(synonym_group, remove_duplicates=True)
staticmethod
","text":"Get synonym pairs from a synonym group through a cartesian product.
Parameters:
Name Type Description Defaultsynonym_group
Set[str]
A set of annotation phrases that are synonymous to each other.
requiredReturns:
Type DescriptionList[Tuple[str, str]]
A list of synonym pairs.
Source code insrc/deeponto/align/bertmap/text_semantics.py
@staticmethod\ndef get_synonym_pairs(synonym_group: Set[str], remove_duplicates: bool = True):\n\"\"\"Get synonym pairs from a synonym group through a cartesian product.\n\n Args:\n synonym_group (Set[str]): A set of annotation phrases that are synonymous to each other.\n\n Returns:\n (List[Tuple[str, str]]): A list of synonym pairs.\n \"\"\"\n synonym_pairs = list(itertools.product(synonym_group, synonym_group))\n if remove_duplicates:\n return uniqify(synonym_pairs)\n else:\n return synonym_pairs\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.merge_synonym_groups_by_transitivity","title":"merge_synonym_groups_by_transitivity(synonym_groups)
staticmethod
","text":"Merge synonym groups by transitivity.
Synonym groups that share a common annotation phrase will be merged. NOTE that for multiple ontologies, we can merge their synonym groups by first concatenating them then use this function.
Note
In \\(\\textsf{BERTMap}\\) experiments we have considered this as a data augmentation approach but it does not bring a significant performance improvement. However, if the overall number of annotations is not large enough then this could be a good option.
Parameters:
Name Type Description Defaultsynonym_groups
List[Set[str]]
A sequence of synonym groups to be merged.
requiredReturns:
Type DescriptionList[Set[str]]
A list of merged synonym groups.
Source code insrc/deeponto/align/bertmap/text_semantics.py
@staticmethod\ndef merge_synonym_groups_by_transitivity(synonym_groups: List[Set[str]]):\nr\"\"\"Merge synonym groups by transitivity.\n\n Synonym groups that share a common annotation phrase will be merged. NOTE that for\n multiple ontologies, we can merge their synonym groups by first concatenating them\n then use this function.\n\n !!! note\n\n In $\\textsf{BERTMap}$ experiments we have considered this as a data augmentation approach\n but it does not bring a significant performance improvement. However, if the\n overall number of annotations is not large enough then this could be a good option.\n\n Args:\n synonym_groups (List[Set[str]]): A sequence of synonym groups to be merged.\n\n Returns:\n (List[Set[str]]): A list of merged synonym groups.\n \"\"\"\n synonym_pairs = []\n for synonym_group in synonym_groups:\n # gather synonym pairs from the self-product of a synonym group\n synonym_pairs += AnnotationThesaurus.get_synonym_pairs(synonym_group, remove_duplicates=False)\n synonym_pairs = uniqify(synonym_pairs)\n merged_grouped_synonyms = AnnotationThesaurus.connected_labels(synonym_pairs)\n return merged_grouped_synonyms\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.connected_annotations","title":"connected_annotations(synonym_pairs)
staticmethod
","text":"Build a graph for adjacency among the class annotations (labels) such that the transitivity of synonyms is ensured.
Auxiliary function for merge_synonym_groups_by_transitivity
.
Parameters:
Name Type Description Defaultsynonym_pairs
List[Tuple[str, str]]
List of pairs of phrases that are synonymous.
requiredReturns:
Type DescriptionList[Set[str]]
A list of synonym groups.
Source code insrc/deeponto/align/bertmap/text_semantics.py
@staticmethod\ndef connected_annotations(synonym_pairs: List[Tuple[str, str]]):\n\"\"\"Build a graph for adjacency among the class annotations (labels) such that\n the **transitivity** of synonyms is ensured.\n\n Auxiliary function for [`merge_synonym_groups_by_transitivity`][deeponto.align.bertmap.text_semantics.AnnotationThesaurus.merge_synonym_groups_by_transitivity].\n\n Args:\n synonym_pairs (List[Tuple[str, str]]): List of pairs of phrases that are synonymous.\n\n Returns:\n (List[Set[str]]): A list of synonym groups.\n \"\"\"\n graph = nx.Graph()\n graph.add_edges_from(synonym_pairs)\n # nx.draw(G, with_labels = True)\n connected = list(nx.connected_components(graph))\n return connected\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.synonym_sampling","title":"synonym_sampling(num_samples=None)
","text":"Sample synonym pairs from a list of synonym groups extracted from the input ontology.
According to the \\(\\textsf{BERTMap}\\) paper, synonyms are defined as label pairs that belong to the same ontology class.
NOTE this has been validated for getting the same results as in the original \\(\\textsf{BERTMap}\\) repository.
Parameters:
Name Type Description Defaultnum_samples
int
The (maximum) number of unique samples extracted. Defaults to None
.
None
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique synonym pair samples.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def synonym_sampling(self, num_samples: Optional[int] = None):\nr\"\"\"Sample synonym pairs from a list of synonym groups extracted from the input ontology.\n\n According to the $\\textsf{BERTMap}$ paper, **synonyms** are defined as label pairs that belong\n to the same ontology class.\n\n NOTE this has been validated for getting the same results as in the original $\\textsf{BERTMap}$ repository.\n\n Args:\n num_samples (int, optional): The (maximum) number of **unique** samples extracted. Defaults to `None`.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique synonym pair samples.\n \"\"\"\n synonym_pool = []\n for synonym_group in self.synonym_groups:\n # do not remove duplicates in the loop to save time\n synonym_pairs = self.get_synonym_pairs(synonym_group, remove_duplicates=False)\n synonym_pool += synonym_pairs\n # remove duplicates afer the loop\n synonym_pool = uniqify(synonym_pool)\n\n if (not num_samples) or (num_samples >= len(synonym_pool)):\n # print(\"Return all synonym pairs without downsampling.\")\n return synonym_pool\n else:\n return random.sample(synonym_pool, num_samples)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.soft_nonsynonym_sampling","title":"soft_nonsynonym_sampling(num_samples, max_iter=5)
","text":"Sample soft non-synonyms from a list of synonym groups extracted from the input ontology.
According to the \\(\\textsf{BERTMap}\\) paper, soft non-synonyms are defined as label pairs from two different synonym groups that are randomly selected.
Parameters:
Name Type Description Defaultnum_samples
int
The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups).
requiredmax_iter
int
The maximum number of iterations for conducting sampling. Defaults to 5
.
5
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique (soft) non-synonym pair samples.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def soft_nonsynonym_sampling(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample **soft** non-synonyms from a list of synonym groups extracted from the input ontology.\n\n According to the $\\textsf{BERTMap}$ paper, **soft non-synonyms** are defined as label pairs\n from two *different* synonym groups that are **randomly** selected.\n\n Args:\n num_samples (int): The (maximum) number of **unique** samples extracted; this is\n required **unlike for synonym sampling** because the non-synonym pool is **significantly\n larger** (considering random combinations of different synonym groups).\n max_iter (int): The maximum number of iterations for conducting sampling. Defaults to `5`.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique (soft) non-synonym pair samples.\n \"\"\"\n nonsyonym_pool = []\n # randomly select disjoint synonym group pairs from all\n for _ in range(num_samples):\n left_synonym_group, right_synonym_group = tuple(random.sample(self.synonym_groups, 2))\n try:\n # randomly choose one label from a synonym group\n left_label = random.choice(list(left_synonym_group))\n right_label = random.choice(list(right_synonym_group))\n nonsyonym_pool.append((left_label, right_label))\n except:\n # skip if there are no class labels\n continue\n\n # DataUtils.uniqify is too slow so we should avoid operating it too often\n nonsyonym_pool = uniqify(nonsyonym_pool)\n\n while len(nonsyonym_pool) < num_samples and max_iter > 0:\n max_iter = max_iter - 1 # reduce the iteration to prevent exhausting loop\n nonsyonym_pool += self.soft_nonsynonym_sampling(num_samples - len(nonsyonym_pool), max_iter)\n nonsyonym_pool = uniqify(nonsyonym_pool)\n\n return nonsyonym_pool\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.weighted_random_choices_of_sibling_groups","title":"weighted_random_choices_of_sibling_groups(k=1)
","text":"Randomly (weighted) select a number of sibling class groups.
The weights are computed according to the sizes of the sibling class groups.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def weighted_random_choices_of_sibling_groups(self, k: int = 1):\n\"\"\"Randomly (weighted) select a number of sibling class groups.\n\n The weights are computed according to the sizes of the sibling class groups.\n \"\"\"\n weights = [len(s) for s in self.onto.sibling_class_groups]\n weights = [w / sum(weights) for w in weights] # normalised\n return random.choices(self.onto.sibling_class_groups, weights=weights, k=k)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.hard_nonsynonym_sampling","title":"hard_nonsynonym_sampling(num_samples, max_iter=5)
","text":"Sample hard non-synonyms from sibling classes of the input ontology.
According to the \\(\\textsf{BERTMap}\\) paper, hard non-synonyms are defined as label pairs that belong to two disjoint ontology classes. For practical reason, the condition is eased to two sibling ontology classes.
Parameters:
Name Type Description Defaultnum_samples
int
The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups).
requiredmax_iter
int
The maximum number of iterations for conducting sampling. Defaults to 5
.
5
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique (hard) non-synonym pair samples.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def hard_nonsynonym_sampling(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample **hard** non-synonyms from sibling classes of the input ontology.\n\n According to the $\\textsf{BERTMap}$ paper, **hard non-synonyms** are defined as label pairs\n that belong to two **disjoint** ontology classes. For practical reason, the condition\n is eased to two **sibling** ontology classes.\n\n Args:\n num_samples (int): The (maximum) number of **unique** samples extracted; this is\n required **unlike for synonym sampling** because the non-synonym pool is **significantly\n larger** (considering random combinations of different synonym groups).\n max_iter (int): The maximum number of iterations for conducting sampling. Defaults to `5`.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique (hard) non-synonym pair samples.\n \"\"\"\n # intialise the sibling class groups\n self.onto.sibling_class_groups\n\n if not self.onto.sibling_class_groups:\n warnings.warn(\"Skip hard negative sampling as no sibling class groups are defined.\")\n return []\n\n # flatten the disjointness groups into all pairs of hard neagtives\n nonsynonym_pool = []\n # randomly (weighted) select a number of sibling class groups with replacement\n sibling_class_groups = self.weighted_random_choices_of_sibling_groups(k=num_samples)\n\n for sibling_class_group in sibling_class_groups:\n # random select two sibling classes; no weights this time\n left_class_iri, right_class_iri = tuple(random.sample(sibling_class_group, 2))\n try:\n # random select a label for each of them\n left_label = random.choice(list(self.annotation_index[left_class_iri]))\n right_label = random.choice(list(self.annotation_index[right_class_iri]))\n # add the label pair to the pool\n nonsynonym_pool.append((left_label, right_label))\n except:\n # skip them if there are no class labels\n continue\n\n # DataUtils.uniqify is too slow so we should avoid operating it too often\n nonsynonym_pool = uniqify(nonsynonym_pool)\n\n while len(nonsynonym_pool) < num_samples and max_iter > 0:\n max_iter = max_iter - 1 # reduce the iteration to prevent exhausting loop\n nonsynonym_pool += self.hard_nonsynonym_sampling(num_samples - len(nonsynonym_pool), max_iter)\n nonsynonym_pool = uniqify(nonsynonym_pool)\n\n return nonsynonym_pool\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.IntraOntologyTextSemanticsCorpus","title":"IntraOntologyTextSemanticsCorpus(onto, annotation_property_iris, soft_negative_ratio=2, hard_negative_ratio=2)
","text":"Class for creating the intra-ontology text semantics corpus from an ontology.
As defined in the \\(\\textsf{BERTMap}\\) paper, the intra-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the ontology class annotations.
Attributes:
Name Type Descriptiononto
Ontology
An ontology to construct the intra-ontology text semantics corpus from.
annotation_property_iris
List[str]
Specify which annotation properties to be used.
soft_negative_ratio
int
The expected negative sample ratio of the soft non-synonyms to the extracted synonyms. Defaults to 2
.
hard_negative_ratio
int
The expected negative sample ratio of the hard non-synonyms to the extracted synonyms. Defaults to 2
. However, hard non-synonyms are sometimes insufficient given an ontology's hierarchy, the soft ones are used to compensate the number in this case.
src/deeponto/align/bertmap/text_semantics.py
def __init__(\n self,\n onto: Ontology,\n annotation_property_iris: List[str],\n soft_negative_ratio: int = 2,\n hard_negative_ratio: int = 2,\n):\n self.onto = onto\n # $\\textsf{BERTMap}$ does not apply synonym transitivity\n self.thesaurus = AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)\n\n self.synonyms = self.thesaurus.synonym_sampling()\n # sample hard negatives first as they might not be enough\n num_hard = hard_negative_ratio * len(self.synonyms)\n self.hard_nonsynonyms = self.thesaurus.hard_nonsynonym_sampling(num_hard)\n # compensate the number of hard negatives as soft negatives are almost always available\n num_soft = (soft_negative_ratio + hard_negative_ratio) * len(self.synonyms) - len(self.hard_nonsynonyms)\n self.soft_nonsynonyms = self.thesaurus.soft_nonsynonym_sampling(num_soft)\n\n self.info = {\n type(self).__name__: {\n \"num_synonyms\": len(self.synonyms),\n \"num_nonsynonyms\": len(self.soft_nonsynonyms) + len(self.hard_nonsynonyms),\n \"num_soft_nonsynonyms\": len(self.soft_nonsynonyms),\n \"num_hard_nonsynonyms\": len(self.hard_nonsynonyms),\n \"annotation_thesaurus\": self.thesaurus.info[\"AnnotationThesaurus\"],\n }\n }\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.IntraOntologyTextSemanticsCorpus.save","title":"save(save_path)
","text":"Save the intra-ontology corpus (a .json
file for label pairs and its summary) in the specified directory.
src/deeponto/align/bertmap/text_semantics.py
def save(self, save_path: str):\n\"\"\"Save the intra-ontology corpus (a `.json` file for label pairs\n and its summary) in the specified directory.\n \"\"\"\n create_path(save_path)\n save_json = {\n \"summary\": self.info,\n \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.soft_nonsynonyms + self.hard_nonsynonyms],\n }\n save_file(save_json, os.path.join(save_path, \"intra-onto.corpus.json\"))\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus","title":"CrossOntologyTextSemanticsCorpus(class_mappings, src_onto, tgt_onto, annotation_property_iris, negative_ratio=4)
","text":"Class for creating the cross-ontology text semantics corpus from two ontologies and provided mappings between them.
As defined in the \\(\\textsf{BERTMap}\\) paper, the cross-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the annotations/labels of class pairs involved in the provided cross-ontology mappigns.
Attributes:
Name Type Descriptionclass_mappings
List[ReferenceMapping]
A list of cross-ontology class mappings.
src_onto
Ontology
The source ontology whose class IRIs are heads of the class_mappings
.
tgt_onto
Ontology
The target ontology whose class IRIs are tails of the class_mappings
.
annotation_property_iris
List[str]
A list of annotation property IRIs used to extract the annotations.
negative_ratio
int
The expected negative sample ratio of the non-synonyms to the extracted synonyms. Defaults to 4
. NOTE that we do not have hard non-synonyms at the cross-ontology level.
src/deeponto/align/bertmap/text_semantics.py
def __init__(\n self,\n class_mappings: List[ReferenceMapping],\n src_onto: Ontology,\n tgt_onto: Ontology,\n annotation_property_iris: List[str],\n negative_ratio: int = 4,\n):\n self.class_mappings = class_mappings\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n # build the annotation thesaurus for each ontology\n self.src_thesaurus = AnnotationThesaurus(src_onto, annotation_property_iris)\n self.tgt_thesaurus = AnnotationThesaurus(tgt_onto, annotation_property_iris)\n self.negative_ratio = negative_ratio\n\n self.synonyms = self.synonym_sampling_from_mappings()\n num_negative = negative_ratio * len(self.synonyms)\n self.nonsynonyms = self.nonsynonym_sampling_from_mappings(num_negative)\n\n self.info = {\n type(self).__name__: {\n \"num_synonyms\": len(self.synonyms),\n \"num_nonsynonyms\": len(self.nonsynonyms),\n \"num_mappings\": len(self.class_mappings),\n \"src_annotation_thesaurus\": self.src_thesaurus.info[\"AnnotationThesaurus\"],\n \"tgt_annotation_thesaurus\": self.tgt_thesaurus.info[\"AnnotationThesaurus\"],\n }\n }\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.save","title":"save(save_path)
","text":"Save the cross-ontology corpus (a .json
file for label pairs and its summary) in the specified directory.
src/deeponto/align/bertmap/text_semantics.py
def save(self, save_path: str):\n\"\"\"Save the cross-ontology corpus (a `.json` file for label pairs\n and its summary) in the specified directory.\n \"\"\"\n create_path(save_path)\n save_json = {\n \"summary\": self.info,\n \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.nonsynonyms],\n }\n save_file(save_json, os.path.join(save_path, \"cross-onto.corpus.json\"))\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.synonym_sampling_from_mappings","title":"synonym_sampling_from_mappings()
","text":"Sample synonyms from cross-ontology class mappings.
Arguments of this method are all class attributes. See CrossOntologyTextSemanticsCorpus
.
According to the \\(\\textsf{BERTMap}\\) paper, cross-ontology synonyms are defined as label pairs that belong to two matched classes. Suppose the class \\(C\\) from the source ontology and the class \\(D\\) from the target ontology are matched according to one of the class_mappings
, then the cartesian product of labels of \\(C\\) and labels of \\(D\\) form cross-ontology synonyms. Note that identity synonyms in the form of \\((a, a)\\) are removed because they have been covered in the intra-ontology case.
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique synonym pair samples from ontology class mappings.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def synonym_sampling_from_mappings(self):\nr\"\"\"Sample synonyms from cross-ontology class mappings.\n\n Arguments of this method are all class attributes.\n See [`CrossOntologyTextSemanticsCorpus`][deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus].\n\n According to the $\\textsf{BERTMap}$ paper, **cross-ontology synonyms** are defined as label pairs\n that belong to two **matched** classes. Suppose the class $C$ from the source ontology\n and the class $D$ from the target ontology are matched according to one of the `class_mappings`,\n then the cartesian product of labels of $C$ and labels of $D$ form cross-ontology synonyms.\n Note that **identity synonyms** in the form of $(a, a)$ are removed because they have been covered\n in the intra-ontology case.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique synonym pair samples from ontology class mappings.\n \"\"\"\n synonym_pool = []\n\n for class_mapping in self.class_mappings:\n src_class_iri, tgt_class_iri = class_mapping.to_tuple()\n src_class_annotations = self.src_thesaurus.annotation_index[src_class_iri]\n tgt_class_annotations = self.tgt_thesaurus.annotation_index[tgt_class_iri]\n synonym_pairs = list(itertools.product(src_class_annotations, tgt_class_annotations))\n # remove the identity synonyms as the have been covered in the intra-ontology case\n synonym_pairs = [(l, r) for l, r in synonym_pairs if l != r]\n backward_synonym_pairs = [(r, l) for l, r in synonym_pairs]\n synonym_pool += synonym_pairs + backward_synonym_pairs\n\n synonym_pool = uniqify(synonym_pool)\n return synonym_pool\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.nonsynonym_sampling_from_mappings","title":"nonsynonym_sampling_from_mappings(num_samples, max_iter=5)
","text":"Sample non-synonyms from cross-ontology class mappings.
Arguments of this method are all class attributes. See CrossOntologyTextSemanticsCorpus
.
According to the \\(\\textsf{BERTMap}\\) paper, cross-ontology non-synonyms are defined as label pairs that belong to two unmatched classes. Assume that the provided class mappings are self-contained in the sense that they are complete for the classes involved in them, then we can randomly sample two cross-ontology classes that are not matched according to the mappings and take their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since the number of incorrect mappings is much larger than the number of correct ones.
Returns:
Type DescriptionList[Tuple[str, str]]
A list of unique nonsynonym pair samples from ontology class mappings.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def nonsynonym_sampling_from_mappings(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample non-synonyms from cross-ontology class mappings.\n\n Arguments of this method are all class attributes.\n See [`CrossOntologyTextSemanticsCorpus`][deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus].\n\n According to the $\\textsf{BERTMap}$ paper, **cross-ontology non-synonyms** are defined as label pairs\n that belong to two **unmatched** classes. Assume that the provided class mappings are self-contained\n in the sense that they are complete for the classes involved in them, then we can randomly\n sample two cross-ontology classes that are not matched according to the mappings and take\n their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since\n the number of incorrect mappings is much larger than the number of correct ones.\n\n Returns:\n (List[Tuple[str, str]]): A list of unique nonsynonym pair samples from ontology class mappings.\n \"\"\"\n nonsynonym_pool = []\n\n # form cross-ontology synonym groups\n cross_onto_synonym_group_pair = []\n for class_mapping in self.class_mappings:\n src_class_iri, tgt_class_iri = class_mapping.to_tuple()\n src_class_annotations = self.src_thesaurus.annotation_index[src_class_iri]\n tgt_class_annotations = self.tgt_thesaurus.annotation_index[tgt_class_iri]\n # let each matched class pair's annotations form a synonym group_pair\n cross_onto_synonym_group_pair.append((src_class_annotations, tgt_class_annotations))\n\n # randomly select disjoint synonym group pairs from all\n for _ in range(num_samples):\n left_class_pair, right_class_pair = tuple(random.sample(cross_onto_synonym_group_pair, 2))\n try:\n # randomly choose one label from a synonym group\n left_label = random.choice(list(left_class_pair[0])) # choosing the src side by [0]\n right_label = random.choice(list(right_class_pair[1])) # choosing the tgt side by [1]\n nonsynonym_pool.append((left_label, right_label))\n except:\n # skip if there are no class labels\n continue\n\n # DataUtils.uniqify is too slow so we should avoid operating it too often\n nonsynonym_pool = uniqify(nonsynonym_pool)\n while len(nonsynonym_pool) < num_samples and max_iter > 0:\n max_iter = max_iter - 1 # reduce the iteration to prevent exhausting loop\n nonsynonym_pool += self.nonsynonym_sampling_from_mappings(num_samples - len(nonsynonym_pool), max_iter)\n nonsynonym_pool = uniqify(nonsynonym_pool)\n return nonsynonym_pool\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora","title":"TextSemanticsCorpora(src_onto, tgt_onto, annotation_property_iris, class_mappings=None, auxiliary_ontos=None)
","text":"Class for creating the collection text semantics corpora.
As defined in the \\(\\textsf{BERTMap}\\) paper, the collection of text semantics corpora contains at least two intra-ontology sub-corpora from the source and target ontologies, respectively. If some class mappings are provided, then a cross-ontology sub-corpus will be created. If some additional auxiliary ontologies are provided, the intra-ontology corpora created from them will serve as the auxiliary sub-corpora.
Attributes:
Name Type Descriptionsrc_onto
Ontology
The source ontology to be matched or aligned.
tgt_onto
Ontology
The target ontology to be matched or aligned.
annotation_property_iris
List[str]
A list of annotation property IRIs used to extract the annotations.
class_mappings
List[ReferenceMapping]
A list of cross-ontology class mappings between the source and the target ontologies. Defaults to None
.
auxiliary_ontos
List[Ontology]
A list of auxiliary ontologies for augmenting more synonym/non-synonym samples. Defaults to None
.
src/deeponto/align/bertmap/text_semantics.py
def __init__(\n self,\n src_onto: Ontology,\n tgt_onto: Ontology,\n annotation_property_iris: List[str],\n class_mappings: Optional[List[ReferenceMapping]] = None,\n auxiliary_ontos: Optional[List[Ontology]] = None,\n):\n self.synonyms = []\n self.nonsynonyms = []\n\n # build intra-ontology corpora\n # negative sample ratios are by default\n self.intra_src_onto_corpus = IntraOntologyTextSemanticsCorpus(src_onto, annotation_property_iris)\n self.add_samples_from_sub_corpus(self.intra_src_onto_corpus)\n self.intra_tgt_onto_corpus = IntraOntologyTextSemanticsCorpus(tgt_onto, annotation_property_iris)\n self.add_samples_from_sub_corpus(self.intra_tgt_onto_corpus)\n\n # build cross-ontolgoy corpora\n self.class_mappings = class_mappings\n self.cross_onto_corpus = None\n if self.class_mappings:\n self.cross_onto_corpus = CrossOntologyTextSemanticsCorpus(\n class_mappings, src_onto, tgt_onto, annotation_property_iris\n )\n self.add_samples_from_sub_corpus(self.cross_onto_corpus)\n\n # build auxiliary ontology corpora (same as intra-ontology)\n self.auxiliary_ontos = auxiliary_ontos\n self.auxiliary_onto_corpora = []\n if self.auxiliary_ontos:\n for auxiliary_onto in self.auxiliary_ontos:\n self.auxiliary_onto_corpora.append(\n IntraOntologyTextSemanticsCorpus(auxiliary_onto, annotation_property_iris)\n )\n for auxiliary_onto_corpus in self.auxiliary_onto_corpora:\n self.add_samples_from_sub_corpus(auxiliary_onto_corpus)\n\n # DataUtils.uniqify the samples\n self.synonyms = uniqify(self.synonyms)\n self.nonsynonyms = uniqify(self.nonsynonyms)\n # remove invalid nonsynonyms\n self.nonsynonyms = list(set(self.nonsynonyms) - set(self.synonyms))\n\n # summary\n self.info = {\n type(self).__name__: {\n \"num_synonyms\": len(self.synonyms),\n \"num_nonsynonyms\": len(self.nonsynonyms),\n \"intra_src_onto_corpus\": self.intra_src_onto_corpus.info[\"IntraOntologyTextSemanticsCorpus\"],\n \"intra_tgt_onto_corpus\": self.intra_tgt_onto_corpus.info[\"IntraOntologyTextSemanticsCorpus\"],\n \"cross_onto_corpus\": self.cross_onto_corpus.info[\"CrossOntologyTextSemanticsCorpus\"]\n if self.cross_onto_corpus\n else None,\n \"auxiliary_onto_corpora\": [\n a.info[\"IntraOntologyTextSemanticsCorpus\"] for a in self.auxiliary_onto_corpora\n ],\n }\n }\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora.save","title":"save(save_path)
","text":"Save the overall text semantics corpora (a .json
file for label pairs and its summary) in the specified directory.
src/deeponto/align/bertmap/text_semantics.py
def save(self, save_path: str):\n\"\"\"Save the overall text semantics corpora (a `.json` file for label pairs\n and its summary) in the specified directory.\n \"\"\"\n create_path(save_path)\n save_json = {\n \"summary\": self.info,\n \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.nonsynonyms],\n }\n save_file(save_json, os.path.join(save_path, \"text-semantics.corpora.json\"))\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora.add_samples_from_sub_corpus","title":"add_samples_from_sub_corpus(sub_corpus)
","text":"Add synonyms and non-synonyms from each sub-corpus to the overall collection.
Source code insrc/deeponto/align/bertmap/text_semantics.py
def add_samples_from_sub_corpus(\n self, sub_corpus: Union[IntraOntologyTextSemanticsCorpus, CrossOntologyTextSemanticsCorpus]\n):\n\"\"\"Add synonyms and non-synonyms from each sub-corpus to the overall collection.\"\"\"\n self.synonyms += sub_corpus.synonyms\n if isinstance(sub_corpus, IntraOntologyTextSemanticsCorpus):\n self.nonsynonyms += sub_corpus.soft_nonsynonyms + sub_corpus.hard_nonsynonyms\n else:\n self.nonsynonyms += sub_corpus.nonsynonyms\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier","title":"BERTSynonymClassifier(loaded_path, output_path, eval_mode, max_length_for_input, num_epochs_for_training=None, batch_size_for_training=None, batch_size_for_prediction=None, training_data=None, validation_data=None)
","text":"Class for BERT synonym classifier.
The main scoring module of \\(\\textsf{BERTMap}\\) consisting of a BERT model and a binary synonym classifier.
Attributes:
Name Type Descriptionloaded_path
str
The path to the checkpoint of a pre-trained BERT model.
output_path
str
The path to the output BERT model (usually fine-tuned).
eval_mode
bool
Set to False
if the model is loaded for training.
max_length_for_input
int
The maximum length of an input sequence.
num_epochs_for_training
int
The number of epochs for training a BERT model.
batch_size_for_training
int
The batch size for training a BERT model.
batch_size_for_prediction
int
The batch size for making predictions.
training_data
Dataset
Data for training the model if for_training
is set to True
. Defaults to None
.
validation_data
Dataset
Data for validating the model if for_training
is set to True
. Defaults to None
.
training_args
TrainingArguments
Training arguments for training the model if for_training
is set to True
. Defaults to None
.
trainer
Trainer
The model trainer fed with training_args
and data samples. Defaults to None
.
softmax
torch.nn.SoftMax
The softmax layer used for normalising synonym scores. Defaults to None
.
src/deeponto/align/bertmap/bert_classifier.py
def __init__(\n self,\n loaded_path: str,\n output_path: str,\n eval_mode: bool,\n max_length_for_input: int,\n num_epochs_for_training: Optional[float] = None,\n batch_size_for_training: Optional[int] = None,\n batch_size_for_prediction: Optional[int] = None,\n training_data: Optional[List[Tuple[str, str, int]]] = None, # (sentence1, sentence2, label)\n validation_data: Optional[List[Tuple[str, str, int]]] = None,\n):\n # Load the pretrained BERT model from the given path\n self.loaded_path = loaded_path\n print(f\"Loading a BERT model from: {self.loaded_path}.\")\n self.model = AutoModelForSequenceClassification.from_pretrained(\n self.loaded_path, output_hidden_states=eval_mode\n )\n self.tokenizer = Tokenizer.from_pretrained(loaded_path)\n\n self.output_path = output_path\n self.eval_mode = eval_mode\n self.max_length_for_input = max_length_for_input\n self.num_epochs_for_training = num_epochs_for_training\n self.batch_size_for_training = batch_size_for_training\n self.batch_size_for_prediction = batch_size_for_prediction\n self.training_data = None\n self.validation_data = None\n self.data_stat = {}\n self.training_args = None\n self.trainer = None\n self.softmax = None\n\n # load the pre-trained BERT model and set it to eval mode (static)\n if self.eval_mode:\n self.eval()\n # load the pre-trained BERT model for fine-tuning\n else:\n if not training_data:\n raise RuntimeError(\"Training data should be provided when `for_training` is `True`.\")\n if not validation_data:\n raise RuntimeError(\"Validation data should be provided when `for_training` is `True`.\")\n # load data (max_length is used for truncation)\n self.training_data = self.load_dataset(training_data, \"training\")\n self.validation_data = self.load_dataset(validation_data, \"validation\")\n self.data_stat = {\n \"num_training\": len(self.training_data),\n \"num_validation\": len(self.validation_data),\n }\n\n # generate training arguments\n epoch_steps = len(self.training_data) // self.batch_size_for_training # total steps of an epoch\n if torch.cuda.device_count() > 0:\n epoch_steps = epoch_steps // torch.cuda.device_count() # to deal with multi-gpus case\n # keep logging steps consisitent even for small batch size\n # report logging on every 0.02 epoch\n logging_steps = int(epoch_steps * 0.02)\n # eval on every 0.2 epoch\n eval_steps = 10 * logging_steps\n # generate the training arguments\n self.training_args = TrainingArguments(\n output_dir=self.output_path,\n num_train_epochs=self.num_epochs_for_training,\n per_device_train_batch_size=self.batch_size_for_training,\n per_device_eval_batch_size=self.batch_size_for_training,\n warmup_ratio=0.0,\n weight_decay=0.01,\n logging_steps=logging_steps,\n logging_dir=f\"{self.output_path}/tensorboard\",\n eval_steps=eval_steps,\n evaluation_strategy=\"steps\",\n do_train=True,\n do_eval=True,\n save_steps=eval_steps,\n save_total_limit=2,\n load_best_model_at_end=True,\n )\n # build the trainer\n self.trainer = Trainer(\n model=self.model,\n args=self.training_args,\n train_dataset=self.training_data,\n eval_dataset=self.validation_data,\n compute_metrics=self.compute_metrics,\n tokenizer=self.tokenizer._tokenizer,\n )\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.train","title":"train(resume_from_checkpoint=None)
","text":"Start training the BERT model.
Source code insrc/deeponto/align/bertmap/bert_classifier.py
def train(self, resume_from_checkpoint: Optional[Union[bool, str]] = None):\n\"\"\"Start training the BERT model.\"\"\"\n if self.eval_mode:\n raise RuntimeError(\"Training cannot be started in `eval` mode.\")\n self.trainer.train(resume_from_checkpoint=resume_from_checkpoint)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.eval","title":"eval()
","text":"To eval mode.
Source code insrc/deeponto/align/bertmap/bert_classifier.py
def eval(self):\n\"\"\"To eval mode.\"\"\"\n print(\"The BERT model is set to eval mode for making predictions.\")\n self.model.eval()\n # TODO: to implement multi-gpus for inference\n self.device = self.get_device(device_num=0)\n self.model.to(self.device)\n self.softmax = torch.nn.Softmax(dim=1).to(self.device)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.predict","title":"predict(sent_pairs)
","text":"Run prediction pipeline for synonym classification.
Return the softmax
probailities of predicting pairs as synonyms (index=1
).
src/deeponto/align/bertmap/bert_classifier.py
def predict(self, sent_pairs: List[Tuple[str, str]]):\nr\"\"\"Run prediction pipeline for synonym classification.\n\n Return the `softmax` probailities of predicting pairs as synonyms (`index=1`).\n \"\"\"\n inputs = self.process_inputs(sent_pairs)\n with torch.no_grad():\n return self.softmax(self.model(**inputs).logits)[:, 1]\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.load_dataset","title":"load_dataset(data, split)
","text":"Load the list of (annotation1, annotation2, label)
samples into a datasets.Dataset
.
src/deeponto/align/bertmap/bert_classifier.py
def load_dataset(self, data: List[Tuple[str, str, int]], split: str) -> Dataset:\nr\"\"\"Load the list of `(annotation1, annotation2, label)` samples into a `datasets.Dataset`.\"\"\"\n\n def iterate():\n for sample in data:\n yield {\"annotation1\": sample[0], \"annotation2\": sample[1], \"labels\": sample[2]}\n\n dataset = Dataset.from_generator(iterate)\n # NOTE: no padding here because the Trainer class supports dynamic padding\n dataset = dataset.map(\n lambda examples: self.tokenizer._tokenizer(\n examples[\"annotation1\"], examples[\"annotation2\"], max_length=self.max_length_for_input, truncation=True\n ),\n batched=True,\n desc=f\"Load {split} data:\",\n )\n return dataset\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.process_inputs","title":"process_inputs(sent_pairs)
","text":"Process input sentence pairs for the BERT model.
Transform the sentences into BERT input embeddings and load them into the device. This function is called only when the BERT model is about to make predictions (eval
mode).
src/deeponto/align/bertmap/bert_classifier.py
def process_inputs(self, sent_pairs: List[Tuple[str, str]]):\nr\"\"\"Process input sentence pairs for the BERT model.\n\n Transform the sentences into BERT input embeddings and load them into the device.\n This function is called only when the BERT model is about to make predictions (`eval` mode).\n \"\"\"\n return self.tokenizer._tokenizer(\n sent_pairs,\n return_tensors=\"pt\",\n max_length=self.max_length_for_input,\n padding=True,\n truncation=True,\n ).to(self.device)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.compute_metrics","title":"compute_metrics(pred)
staticmethod
","text":"Add more evaluation metrics into the training log.
Source code insrc/deeponto/align/bertmap/bert_classifier.py
@staticmethod\ndef compute_metrics(pred):\n\"\"\"Add more evaluation metrics into the training log.\"\"\"\n # TODO: currently only accuracy is added, will expect more in the future if needed\n labels = pred.label_ids\n preds = pred.predictions.argmax(-1)\n acc = accuracy_score(labels, preds)\n return {\"accuracy\": acc}\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.get_device","title":"get_device(device_num=0)
staticmethod
","text":"Get a device (GPU or CPU) for the torch model
Source code insrc/deeponto/align/bertmap/bert_classifier.py
@staticmethod\ndef get_device(device_num: int = 0):\n\"\"\"Get a device (GPU or CPU) for the torch model\"\"\"\n # If there's a GPU available...\n if torch.cuda.is_available():\n # Tell PyTorch to use the GPU.\n device = torch.device(f\"cuda:{device_num}\")\n print(\"There are %d GPU(s) available.\" % torch.cuda.device_count())\n print(\"We will use the GPU:\", torch.cuda.get_device_name(device_num))\n # If not...\n else:\n print(\"No GPU available, using the CPU instead.\")\n device = torch.device(\"cpu\")\n return device\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.set_seed","title":"set_seed(seed_val=888)
staticmethod
","text":"Set random seed for reproducible results.
Source code insrc/deeponto/align/bertmap/bert_classifier.py
@staticmethod\ndef set_seed(seed_val: int = 888):\n\"\"\"Set random seed for reproducible results.\"\"\"\n random.seed(seed_val)\n np.random.seed(seed_val)\n torch.manual_seed(seed_val)\n torch.cuda.manual_seed_all(seed_val)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor","title":"MappingPredictor(output_path, tokenizer_path, src_annotation_index, tgt_annotation_index, bert_synonym_classifier, num_raw_candidates, num_best_predictions, batch_size_for_prediction, logger, enlighten_manager, enlighten_status, ignored_class_index=None)
","text":"Class for the mapping prediction module of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) models.
Attributes:
Name Type Descriptiontokenizer
Tokenizer
The tokenizer used for constructing the inverted annotation index and candidate selection.
src_annotation_index
dict
A dictionary that stores the (class_iri, class_annotations)
pairs from src_onto
according to annotation_property_iris
.
tgt_annotation_index
dict
A dictionary that stores the (class_iri, class_annotations)
pairs from tgt_onto
according to annotation_property_iris
.
tgt_inverted_annotation_index
InvertedIndex
The inverted index built from tgt_annotation_index
used for target class candidate selection.
bert_synonym_classifier
BERTSynonymClassifier
The BERT synonym classifier fine-tuned on text semantics corpora.
num_raw_candidates
int
The maximum number of selected target class candidates for a source class.
num_best_predictions
int
The maximum number of best scored mappings presevred for a source class.
batch_size_for_prediction
int
The batch size of class annotation pairs for computing synonym scores.
ignored_class_index
dict
OAEI arguemnt, a dictionary that stores the (class_iri, used_in_alignment)
pairs.
src/deeponto/align/bertmap/mapping_prediction.py
def __init__(\n self,\n output_path: str,\n tokenizer_path: str,\n src_annotation_index: dict,\n tgt_annotation_index: dict,\n bert_synonym_classifier: Optional[BERTSynonymClassifier],\n num_raw_candidates: Optional[int],\n num_best_predictions: Optional[int],\n batch_size_for_prediction: int,\n logger: Logger,\n enlighten_manager: enlighten.Manager,\n enlighten_status: enlighten.StatusBar,\n ignored_class_index: Optional[dict] = None,\n):\n self.logger = logger\n self.enlighten_manager = enlighten_manager\n self.enlighten_status = enlighten_status\n\n self.tokenizer = Tokenizer.from_pretrained(tokenizer_path)\n\n self.logger.info(\"Build inverted annotation index for candidate selection.\")\n self.src_annotation_index = src_annotation_index\n self.tgt_annotation_index = tgt_annotation_index\n self.tgt_inverted_annotation_index = Ontology.build_inverted_annotation_index(\n tgt_annotation_index, self.tokenizer\n )\n # the fundamental judgement for whether bertmap or bertmaplt is loaded\n self.bert_synonym_classifier = bert_synonym_classifier\n self.num_raw_candidates = num_raw_candidates\n self.num_best_predictions = num_best_predictions\n self.batch_size_for_prediction = batch_size_for_prediction\n self.output_path = output_path\n\n # for the OAEI, adding in check for classes that are not used in alignment\n self.ignored_class_index = ignored_class_index\n\n self.init_class_mapping = lambda head, tail, score: EntityMapping(head, tail, \"<EquivalentTo>\", score)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.bert_mapping_score","title":"bert_mapping_score(src_class_annotations, tgt_class_annotations)
","text":"\\(\\textsf{BERTMap}\\)'s main mapping score module which utilises the fine-tuned BERT synonym classifier.
Compute the synonym score for each pair of src-tgt class annotations, and return the average score as the mapping score. Apply string matching before applying the BERT module to filter easy mappings (with scores \\(1.0\\)).
Source code insrc/deeponto/align/bertmap/mapping_prediction.py
def bert_mapping_score(\n self,\n src_class_annotations: Set[str],\n tgt_class_annotations: Set[str],\n):\nr\"\"\"$\\textsf{BERTMap}$'s main mapping score module which utilises the fine-tuned BERT synonym\n classifier.\n\n Compute the **synonym score** for each pair of src-tgt class annotations, and return\n the **average** score as the mapping score. Apply string matching before applying the\n BERT module to filter easy mappings (with scores $1.0$).\n \"\"\"\n\n if not src_class_annotations or not tgt_class_annotations:\n warnings.warn(\"Return zero score due to empty input class annotations...\")\n return 0.0\n\n # apply string matching before applying the bert module\n prelim_score = self.edit_similarity_mapping_score(\n src_class_annotations,\n tgt_class_annotations,\n string_match_only=True,\n )\n if prelim_score == 1.0:\n return prelim_score\n # apply BERT classifier and define mapping score := Average(SynonymScores)\n class_annotation_pairs = list(itertools.product(src_class_annotations, tgt_class_annotations))\n synonym_scores = self.bert_synonym_classifier.predict(class_annotation_pairs)\n # only one element tensor is able to be extracted as a scalar by .item()\n return float(torch.mean(synonym_scores).item())\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.edit_similarity_mapping_score","title":"edit_similarity_mapping_score(src_class_annotations, tgt_class_annotations, string_match_only=False)
staticmethod
","text":"\\(\\textsf{BERTMap}\\)'s string match module and \\(\\textsf{BERTMapLt}\\)'s mapping prediction function.
Compute the normalised edit similarity (1 - normalised edit distance)
for each pair of src-tgt class annotations, and return the maximum score as the mapping score.
src/deeponto/align/bertmap/mapping_prediction.py
@staticmethod\ndef edit_similarity_mapping_score(\n src_class_annotations: Set[str],\n tgt_class_annotations: Set[str],\n string_match_only: bool = False,\n):\nr\"\"\"$\\textsf{BERTMap}$'s string match module and $\\textsf{BERTMapLt}$'s mapping prediction function.\n\n Compute the **normalised edit similarity** `(1 - normalised edit distance)` for each pair\n of src-tgt class annotations, and return the **maximum** score as the mapping score.\n \"\"\"\n\n if not src_class_annotations or not tgt_class_annotations:\n warnings.warn(\"Return zero score due to empty input class annotations...\")\n return 0.0\n\n # edge case when src and tgt classes have an exact match of annotation\n if len(src_class_annotations.intersection(tgt_class_annotations)) > 0:\n return 1.0\n # a shortcut to save time for $\\textsf{BERTMap}$\n if string_match_only:\n return 0.0\n annotation_pairs = itertools.product(src_class_annotations, tgt_class_annotations)\n sim_scores = [levenshtein.normalized_similarity(src, tgt) for src, tgt in annotation_pairs]\n return max(sim_scores)\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction_for_src_class","title":"mapping_prediction_for_src_class(src_class_iri)
","text":"Predict \\(N\\) best scored mappings for a source ontology class, where \\(N\\) is specified in self.num_best_predictions
.
If using the BERT synonym classifier module:
batch_size_for_prediction
, i.e., stop adding annotations of a target class candidate into the current batch if this operation will cause the size of current batch to exceed the limit.src/deeponto/align/bertmap/mapping_prediction.py
def mapping_prediction_for_src_class(self, src_class_iri: str) -> List[EntityMapping]:\nr\"\"\"Predict $N$ best scored mappings for a source ontology class, where\n $N$ is specified in `self.num_best_predictions`.\n\n 1. Apply the **string matching** module to compute \"easy\" mappings.\n 2. Return the mappings if found any, or if there is no BERT synonym classifier\n as in $\\textsf{BERTMapLt}$.\n 3. If using the BERT synonym classifier module:\n\n - Generate batches for class annotation pairs. Each batch contains the combinations of the\n source class annotations and $M$ target candidate classes' annotations. $M$ is determined\n by `batch_size_for_prediction`, i.e., stop adding annotations of a target class candidate into\n the current batch if this operation will cause the size of current batch to exceed the limit.\n - Compute the synonym scores for each batch and aggregate them into mapping scores; preserve\n $N$ best scored candidates and update them in the next batch. By this dynamic process, we eventually\n get $N$ best scored mappings for a source ontology class.\n \"\"\"\n\n src_class_annotations = self.src_annotation_index[src_class_iri]\n # previously wrongly put tokenizer again !!!\n tgt_class_candidates = self.tgt_inverted_annotation_index.idf_select(\n list(src_class_annotations), pool_size=len(self.tgt_annotation_index.keys())\n ) # [(tgt_class_iri, idf_score)]\n # if some classes are set to be ignored, remove them from the candidates\n if self.ignored_class_index:\n tgt_class_candidates = [(iri, idf_score) for iri, idf_score in tgt_class_candidates if not self.ignored_class_index[iri]]\n # select a truncated number of candidates\n tgt_class_candidates = tgt_class_candidates[:self.num_raw_candidates]\n best_scored_mappings = []\n\n # for string matching: save time if already found string-matched candidates\n def string_match():\n\"\"\"Compute string-matched mappings.\"\"\"\n string_matched_mappings = []\n for tgt_candidate_iri, _ in tgt_class_candidates:\n tgt_candidate_annotations = self.tgt_annotation_index[tgt_candidate_iri]\n prelim_score = self.edit_similarity_mapping_score(\n src_class_annotations,\n tgt_candidate_annotations,\n string_match_only=True,\n )\n if prelim_score > 0.0:\n # if src_class_annotations.intersection(tgt_candidate_annotations):\n string_matched_mappings.append(\n self.init_class_mapping(src_class_iri, tgt_candidate_iri, prelim_score)\n )\n\n return string_matched_mappings\n\n best_scored_mappings += string_match()\n # return string-matched mappings if found or if there is no bert module (bertmaplt)\n if best_scored_mappings or not self.bert_synonym_classifier:\n self.logger.info(f\"The best scored class mappings for {src_class_iri} are\\n{best_scored_mappings}\")\n return best_scored_mappings\n\n def generate_batched_annotations(batch_size: int):\n\"\"\"Generate batches of class annotations for the input source class and its\n target candidates.\n \"\"\"\n batches = []\n # the `nums`` parameter determines how the annotations are grouped\n current_batch = CfgNode({\"annotations\": [], \"nums\": []})\n for i, (tgt_candidate_iri, _) in enumerate(tgt_class_candidates):\n tgt_candidate_annotations = self.tgt_annotation_index[tgt_candidate_iri]\n annotation_pairs = list(itertools.product(src_class_annotations, tgt_candidate_annotations))\n current_batch.annotations += annotation_pairs\n num_annotation_pairs = len(annotation_pairs)\n current_batch.nums.append(num_annotation_pairs)\n # collect when the batch is full or for the last target class candidate\n if sum(current_batch.nums) > batch_size or i == len(tgt_class_candidates) - 1:\n batches.append(current_batch)\n current_batch = CfgNode({\"annotations\": [], \"nums\": []})\n return batches\n\n def bert_match():\n\"\"\"Compute mappings with fine-tuned BERT synonym classifier.\"\"\"\n bert_matched_mappings = []\n class_annotation_batches = generate_batched_annotations(self.batch_size_for_prediction)\n batch_base_candidate_idx = (\n 0 # after each batch, the base index will be increased by # of covered target candidates\n )\n device = self.bert_synonym_classifier.device\n\n # intialize N prediction scores and N corresponding indices w.r.t `tgt_class_candidates`\n final_best_scores = torch.tensor([-1] * self.num_best_predictions).to(device)\n final_best_idxs = torch.tensor([-1] * self.num_best_predictions).to(device)\n\n for annotation_batch in class_annotation_batches:\n\n synonym_scores = self.bert_synonym_classifier.predict(annotation_batch.annotations)\n # aggregating to mappings cores\n grouped_synonym_scores = torch.split(\n synonym_scores,\n split_size_or_sections=annotation_batch.nums,\n )\n mapping_scores = torch.stack([torch.mean(chunk) for chunk in grouped_synonym_scores])\n assert len(mapping_scores) == len(annotation_batch.nums)\n\n # preserve N best scored mappings\n # scale N in case there are less than N tgt candidates in this batch\n N = min(len(mapping_scores), self.num_best_predictions)\n batch_best_scores, batch_best_idxs = torch.topk(mapping_scores, k=N)\n batch_best_idxs += batch_base_candidate_idx\n\n # we do the substitution for every batch to prevent from memory overflow\n final_best_scores, _idxs = torch.topk(\n torch.cat([batch_best_scores, final_best_scores]),\n k=self.num_best_predictions,\n )\n final_best_idxs = torch.cat([batch_best_idxs, final_best_idxs])[_idxs]\n\n # update the index for target candidate classes\n batch_base_candidate_idx += len(annotation_batch.nums)\n\n for candidate_idx, mapping_score in zip(final_best_idxs, final_best_scores):\n # ignore intial values (-1.0) for dummy mappings\n # the threshold 0.9 is for mapping extension\n if mapping_score.item() >= 0.9:\n tgt_candidate_iri = tgt_class_candidates[candidate_idx.item()][0]\n bert_matched_mappings.append(\n self.init_class_mapping(\n src_class_iri,\n tgt_candidate_iri,\n mapping_score.item(),\n )\n )\n\n assert len(bert_matched_mappings) <= self.num_best_predictions\n self.logger.info(f\"The best scored class mappings for {src_class_iri} are\\n{bert_matched_mappings}\")\n return bert_matched_mappings\n\n return bert_match()\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction","title":"mapping_prediction()
","text":"Apply global matching for each class in the source ontology.
See mapping_prediction_for_src_class
.
If this process is accidentally stopped, it can be resumed from already saved predictions. The progress bar keeps track of the number of source ontology classes that have been matched.
Source code insrc/deeponto/align/bertmap/mapping_prediction.py
def mapping_prediction(self):\nr\"\"\"Apply global matching for each class in the source ontology.\n\n See [`mapping_prediction_for_src_class`][deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction_for_src_class].\n\n If this process is accidentally stopped, it can be resumed from already saved predictions. The progress\n bar keeps track of the number of source ontology classes that have been matched.\n \"\"\"\n self.logger.info(\"Start global matching for each class in the source ontology.\")\n\n match_dir = os.path.join(self.output_path, \"match\")\n try:\n mapping_index = load_file(os.path.join(match_dir, \"raw_mappings.json\"))\n self.logger.info(\"Load the existing mapping prediction file.\")\n except:\n mapping_index = dict()\n create_path(match_dir)\n\n progress_bar = self.enlighten_manager.counter(\n total=len(self.src_annotation_index), desc=\"Mapping Prediction\", unit=\"per src class\"\n )\n self.enlighten_status.update(demo=\"Mapping Prediction\")\n\n for i, src_class_iri in enumerate(self.src_annotation_index.keys()):\n # skip computed classes\n if src_class_iri in mapping_index.keys():\n self.logger.info(f\"[Class {i}] Skip matching {src_class_iri} as already computed.\")\n progress_bar.update()\n continue\n # for OAEI\n if self.ignored_class_index and self.ignored_class_index[src_class_iri]:\n self.logger.info(f\"[Class {i}] Skip matching {src_class_iri} as marked as not used in alignment.\")\n progress_bar.update()\n continue\n mappings = self.mapping_prediction_for_src_class(src_class_iri)\n mapping_index[src_class_iri] = [m.to_tuple(with_score=True) for m in mappings]\n\n if i % 100 == 0 or i == len(self.src_annotation_index) - 1:\n save_file(mapping_index, os.path.join(match_dir, \"raw_mappings.json\"))\n # also save a .tsv version\n mapping_in_tuples = list(itertools.chain.from_iterable(mapping_index.values()))\n mapping_df = pd.DataFrame(mapping_in_tuples, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n mapping_df.to_csv(os.path.join(match_dir, \"raw_mappings.tsv\"), sep=\"\\t\", index=False)\n self.logger.info(\"Save currently computed mappings to prevent undesirable loss.\")\n\n progress_bar.update()\n\n self.logger.info(\"Finished mapping prediction for each class in the source ontology.\")\n progress_bar.close()\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner","title":"MappingRefiner(output_path, src_onto, tgt_onto, mapping_predictor, mapping_extension_threshold, mapping_filtered_threshold, logger, enlighten_manager, enlighten_status)
","text":"Class for the mapping refinement module of \\(\\textsf{BERTMap}\\).
\\(\\textsf{BERTMapLt}\\) does not go through mapping refinement for its being \"light\". All the attributes of this class are supposed to be passed from BERTMapPipeline
.
Attributes:
Name Type Descriptionsrc_onto
Ontology
The source ontology to be matched.
tgt_onto
Ontology
The target ontology to be matched.
mapping_predictor
MappingPredictor
The mapping prediction module of BERTMap.
mapping_extension_threshold
float
Mappings with scores \\(\\geq\\) this value will be considered in the iterative mapping extension process.
raw_mappings
List[EntityMapping]
List of raw class mappings predicted in the global matching phase.
mapping_score_dict
dict
A dynamic dictionary that keeps track of mappings (with scores) that have already been computed.
mapping_filter_threshold
float
Mappings with scores \\(\\geq\\) this value will be preserved for the final mapping repairing.
Source code insrc/deeponto/align/bertmap/mapping_refinement.py
def __init__(\n self,\n output_path: str,\n src_onto: Ontology,\n tgt_onto: Ontology,\n mapping_predictor: MappingPredictor,\n mapping_extension_threshold: float,\n mapping_filtered_threshold: float,\n logger: Logger,\n enlighten_manager: enlighten.Manager,\n enlighten_status: enlighten.StatusBar\n):\n self.output_path = output_path\n self.logger = logger\n self.enlighten_manager = enlighten_manager\n self.enlighten_status = enlighten_status\n\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n\n # iterative mapping extension\n self.mapping_predictor = mapping_predictor\n self.mapping_extension_threshold = mapping_extension_threshold # \\kappa\n self.raw_mappings = EntityMapping.read_table_mappings(\n os.path.join(self.output_path, \"match\", \"raw_mappings.tsv\"),\n threshold=self.mapping_extension_threshold,\n relation=\"<EquivalentTo>\",\n )\n # keep track of already scored mappings to prevent duplicated predictions\n self.mapping_score_dict = dict()\n for m in self.raw_mappings:\n src_class_iri, tgt_class_iri, score = m.to_tuple(with_score=True)\n self.mapping_score_dict[(src_class_iri, tgt_class_iri)] = score\n\n # the threshold for final filtering the extended mappings\n self.mapping_filtered_threshold = mapping_filtered_threshold # \\lambda\n\n # logmap mapping repair folder\n self.logmap_repair_path = os.path.join(self.output_path, \"match\", \"logmap-repair\")\n\n # paths for mapping extension and repair\n self.extended_mapping_path = os.path.join(self.output_path, \"match\", \"extended_mappings.tsv\")\n self.filtered_mapping_path = os.path.join(self.output_path, \"match\", \"filtered_mappings.tsv\")\n self.repaired_mapping_path = os.path.join(self.output_path, \"match\", \"repaired_mappings.tsv\")\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.mapping_extension","title":"mapping_extension(max_iter=10)
","text":"Iterative mapping extension based on the locality principle.
For each class pair \\((c, c')\\) (scored in the global matching phase) with score \\(\\geq \\kappa\\), search for plausible mappings between the parents of \\(c\\) and \\(c'\\), and between the children of \\(c\\) and \\(c'\\). This is an iterative process as the set newly discovered mappings can act renew the frontier for searching. Terminate if no new mappings with score \\(\\geq \\kappa\\) can be found or the limit max_iter
has been reached. Note that \\(\\kappa\\) is set to \\(0.9\\) by default (can be altered in the configuration file). The mapping extension progress bar keeps track of the total number of extended mappings (including the previously predicted ones).
A further filtering will be performed by only preserving mappings with score \\(\\geq \\lambda\\), in the original BERTMap paper, \\(\\lambda\\) is determined by the validation mappings, but in practice \\(\\lambda\\) is not a sensitive hyperparameter and validation mappings are often not available. Therefore, we manually set \\(\\lambda\\) to \\(0.9995\\) by default (can be altered in the configuration file). The mapping filtering progress bar keeps track of the total number of filtered mappings (this bar is purely for logging purpose).
Parameters:
Name Type Description Defaultmax_iter
int
The maximum number of mapping extension iterations. Defaults to 10
.
10
Source code in src/deeponto/align/bertmap/mapping_refinement.py
def mapping_extension(self, max_iter: int = 10):\nr\"\"\"Iterative mapping extension based on the locality principle.\n\n For each class pair $(c, c')$ (scored in the global matching phase) with score \n $\\geq \\kappa$, search for plausible mappings between the parents of $c$ and $c'$,\n and between the children of $c$ and $c'$. This is an iterative process as the set \n newly discovered mappings can act renew the frontier for searching. Terminate if\n no new mappings with score $\\geq \\kappa$ can be found or the limit `max_iter` has \n been reached. Note that $\\kappa$ is set to $0.9$ by default (can be altered\n in the configuration file). The mapping extension progress bar keeps track of the \n total number of extended mappings (including the previously predicted ones).\n\n A further filtering will be performed by only preserving mappings with score $\\geq \\lambda$,\n in the original BERTMap paper, $\\lambda$ is determined by the validation mappings, but\n in practice $\\lambda$ is not a sensitive hyperparameter and validation mappings are often\n not available. Therefore, we manually set $\\lambda$ to $0.9995$ by default (can be altered\n in the configuration file). The mapping filtering progress bar keeps track of the \n total number of filtered mappings (this bar is purely for logging purpose).\n\n Args:\n max_iter (int, optional): The maximum number of mapping extension iterations. Defaults to `10`.\n \"\"\"\n\n num_iter = 0\n self.enlighten_status.update(demo=\"Mapping Extension\")\n extension_progress_bar = self.enlighten_manager.counter(\n desc=f\"Mapping Extension [Iteration #{num_iter}]\", unit=\"mapping\"\n )\n filtering_progress_bar = self.enlighten_manager.counter(\n desc=f\"Mapping Filtering\", unit=\"mapping\"\n )\n\n if os.path.exists(self.extended_mapping_path) and os.path.exists(self.filtered_mapping_path):\n self.logger.info(\n f\"Found extended and filtered mapping files at {self.extended_mapping_path}\"\n + f\" and {self.filtered_mapping_path}.\\nPlease check file integrity; if incomplete, \"\n + \"delete them and re-run the program.\"\n )\n\n # for animation purposes\n extension_progress_bar.desc = f\"Mapping Extension\"\n for _ in EntityMapping.read_table_mappings(self.extended_mapping_path):\n extension_progress_bar.update()\n\n self.enlighten_status.update(demo=\"Mapping Filtering\")\n for _ in EntityMapping.read_table_mappings(self.filtered_mapping_path):\n filtering_progress_bar.update()\n\n extension_progress_bar.close()\n filtering_progress_bar.close()\n\n return\n # intialise the frontier, explored, final expansion sets with the raw mappings\n # NOTE be careful of address pointers\n frontier = [m.to_tuple() for m in self.raw_mappings]\n expansion = [m.to_tuple(with_score=True) for m in self.raw_mappings]\n # for animation purposes\n for _ in range(len(expansion)):\n extension_progress_bar.update()\n\n self.logger.info(\n f\"Start mapping extension for each class pair with score >= {self.mapping_extension_threshold}.\"\n )\n while frontier and num_iter < max_iter:\n new_mappings = []\n for src_class_iri, tgt_class_iri in frontier:\n # one hop extension makes sure new mappings are really \"new\"\n cur_new_mappings = self.one_hop_extend(src_class_iri, tgt_class_iri)\n extension_progress_bar.update(len(cur_new_mappings))\n new_mappings += cur_new_mappings\n # add new mappings to the expansion set\n expansion += new_mappings\n # renew frontier with the newly discovered mappings\n frontier = [(x, y) for x, y, _ in new_mappings]\n\n self.logger.info(f\"Add {len(new_mappings)} mappings at iteration #{num_iter}.\")\n num_iter += 1\n extension_progress_bar.desc = f\"Mapping Extension [Iteration #{num_iter}]\"\n\n num_extended = len(expansion) - len(self.raw_mappings)\n self.logger.info(\n f\"Finished iterative mapping extension with {num_extended} new mappings and in total {len(expansion)} extended mappings.\"\n )\n\n extended_mapping_df = pd.DataFrame(expansion, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n extended_mapping_df.to_csv(self.extended_mapping_path, sep=\"\\t\", index=False)\n\n self.enlighten_status.update(demo=\"Mapping Filtering\")\n\n filtered_expansion = [\n (src, tgt, score) for src, tgt, score in expansion if score >= self.mapping_filtered_threshold\n ]\n self.logger.info(\n f\"Filtered the extended mappings by a threshold of {self.mapping_filtered_threshold}.\"\n + f\"There are {len(filtered_expansion)} mappings left for mapping repair.\"\n )\n\n for _ in range(len(filtered_expansion)):\n filtering_progress_bar.update()\n\n filtered_mapping_df = pd.DataFrame(filtered_expansion, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n filtered_mapping_df.to_csv(self.filtered_mapping_path, sep=\"\\t\", index=False)\n\n extension_progress_bar.close()\n filtering_progress_bar.close()\n return filtered_expansion\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.one_hop_extend","title":"one_hop_extend(src_class_iri, tgt_class_iri, pool_size=200)
","text":"Extend mappings from a scored class pair \\((c, c')\\) by searching from one-hop neighbors.
Search for plausible mappings between the parents of \\(c\\) and \\(c'\\), and between the children of \\(c\\) and \\(c'\\). Mappings that are not already computed (recorded in self.mapping_score_dict
) and have a score \\(\\geq\\) self.mapping_extension_threshold
will be returned as new mappings.
Parameters:
Name Type Description Defaultsrc_class_iri
str
The IRI of the source ontology class \\(c\\).
requiredtgt_class_iri
str
The IRI of the target ontology class \\(c'\\).
requiredpool_size
int
The maximum number of plausible mappings to be extended. Defaults to 200.
200
Returns:
Type DescriptionList[EntityMapping]
A list of one-hop extended mappings.
Source code insrc/deeponto/align/bertmap/mapping_refinement.py
def one_hop_extend(self, src_class_iri: str, tgt_class_iri: str, pool_size: int = 200):\nr\"\"\"Extend mappings from a scored class pair $(c, c')$ by\n searching from one-hop neighbors.\n\n Search for plausible mappings between the parents of $c$ and $c'$,\n and between the children of $c$ and $c'$. Mappings that are not\n already computed (recorded in `self.mapping_score_dict`) and have\n a score $\\geq$ `self.mapping_extension_threshold` will be returned as\n **new** mappings.\n\n Args:\n src_class_iri (str): The IRI of the source ontology class $c$.\n tgt_class_iri (str): The IRI of the target ontology class $c'$.\n pool_size (int, optional): The maximum number of plausible mappings to be extended. Defaults to 200.\n\n Returns:\n (List[EntityMapping]): A list of one-hop extended mappings.\n \"\"\"\n\n def get_iris(owl_objects):\n return [str(x.getIRI()) for x in owl_objects]\n\n src_class = self.src_onto.get_owl_object(src_class_iri)\n src_class_parent_iris = get_iris(self.src_onto.get_asserted_parents(src_class, named_only=True))\n src_class_children_iris = get_iris(self.src_onto.get_asserted_children(src_class, named_only=True))\n\n tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n tgt_class_parent_iris = get_iris(self.tgt_onto.get_asserted_parents(tgt_class, named_only=True))\n tgt_class_children_iris = get_iris(self.tgt_onto.get_asserted_children(tgt_class, named_only=True))\n\n # pair up parents and children, respectively; NOTE set() might not be necessary\n parent_pairs = list(set(itertools.product(src_class_parent_iris, tgt_class_parent_iris)))\n children_pairs = list(set(itertools.product(src_class_children_iris, tgt_class_children_iris)))\n\n candidate_pairs = parent_pairs + children_pairs\n # downsample if the number of candidates is too large\n if len(candidate_pairs) > pool_size:\n candidate_pairs = random.sample(candidate_pairs, pool_size)\n\n extended_mappings = []\n for src_candidate_iri, tgt_candidate_iri in parent_pairs + children_pairs:\n\n # if already computed meaning that it is not a new mapping\n if (src_candidate_iri, tgt_candidate_iri) in self.mapping_score_dict:\n continue\n\n src_candidate_annotations = self.mapping_predictor.src_annotation_index[src_candidate_iri]\n tgt_candidate_annotations = self.mapping_predictor.tgt_annotation_index[tgt_candidate_iri]\n score = self.mapping_predictor.bert_mapping_score(src_candidate_annotations, tgt_candidate_annotations)\n # add to already scored collection\n self.mapping_score_dict[(src_candidate_iri, tgt_candidate_iri)] = score\n\n # skip mappings with low scores\n if score < self.mapping_extension_threshold:\n continue\n\n extended_mappings.append((src_candidate_iri, tgt_candidate_iri, score))\n\n self.logger.info(\n f\"New mappings (in tuples) extended from {(src_class_iri, tgt_class_iri)} are:\\n\" + f\"{extended_mappings}\"\n )\n\n return extended_mappings\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.mapping_repair","title":"mapping_repair()
","text":"Repair the filtered mappings with LogMap's debugger.
Note
A sub-folder under match
named logmap-repair
contains LogMap-related intermediate files.
src/deeponto/align/bertmap/mapping_refinement.py
def mapping_repair(self):\n\"\"\"Repair the filtered mappings with LogMap's debugger.\n\n !!! note\n\n A sub-folder under `match` named `logmap-repair` contains LogMap-related intermediate files.\n \"\"\"\n\n # progress bar for animation purposes\n self.enlighten_status.update(demo=\"Mapping Repairing\")\n repair_progress_bar = self.enlighten_manager.counter(\n desc=f\"Mapping Repairing\", unit=\"mapping\"\n )\n\n # skip repairing if already found the file\n if os.path.exists(self.repaired_mapping_path):\n self.logger.info(\n f\"Found the repaired mapping file at {self.repaired_mapping_path}.\"\n + \"\\nPlease check file integrity; if incomplete, \"\n + \"delete it and re-run the program.\"\n )\n # update progress bar for animation purposes\n for _ in EntityMapping.read_table_mappings(self.repaired_mapping_path):\n repair_progress_bar.update()\n repair_progress_bar.close()\n return \n\n # start mapping repair\n self.logger.info(\"Repair the filtered mappings with LogMap debugger.\")\n # formatting the filtered mappings\n self.logmap_repair_formatting()\n\n # run the LogMap repair module on the extended mappings\n run_logmap_repair(\n self.src_onto.owl_path,\n self.tgt_onto.owl_path,\n os.path.join(self.logmap_repair_path, f\"filtered_mappings_for_LogMap_repair.txt\"),\n self.logmap_repair_path,\n Ontology.get_max_jvm_memory()\n )\n\n # create table mappings from LogMap repair outputs\n with open(os.path.join(self.logmap_repair_path, \"mappings_repaired_with_LogMap.tsv\"), \"r\") as f:\n lines = f.readlines()\n with open(os.path.join(self.output_path, \"match\", \"repaired_mappings.tsv\"), \"w+\") as f:\n f.write(\"SrcEntity\\tTgtEntity\\tScore\\n\")\n for line in lines:\n src_ent_iri, tgt_ent_iri, score = line.split(\"\\t\")\n f.write(f\"{src_ent_iri}\\t{tgt_ent_iri}\\t{score}\")\n repair_progress_bar.update()\n\n self.logger.info(\"Mapping repair finished.\")\n repair_progress_bar.close()\n
"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.logmap_repair_formatting","title":"logmap_repair_formatting()
","text":"Transform the filtered mapping file into the LogMap format.
An auxiliary function of the mapping repair module which requires mappings to be formatted as LogMap's input format.
Source code insrc/deeponto/align/bertmap/mapping_refinement.py
def logmap_repair_formatting(self):\n\"\"\"Transform the filtered mapping file into the LogMap format.\n\n An auxiliary function of the mapping repair module which requires mappings\n to be formatted as LogMap's input format.\n \"\"\"\n # read the filtered mapping file and convert to tuples\n filtered_mappings = EntityMapping.read_table_mappings(self.filtered_mapping_path)\n filtered_mappings_in_tuples = [m.to_tuple(with_score=True) for m in filtered_mappings]\n\n # write the mappings into logmap format\n lines = []\n for src_class_iri, tgt_class_iri, score in filtered_mappings_in_tuples:\n lines.append(f\"{src_class_iri}|{tgt_class_iri}|=|{score}|CLS\\n\")\n\n # create a path to prevent error\n create_path(self.logmap_repair_path)\n formatted_file = os.path.join(self.logmap_repair_path, f\"filtered_mappings_for_LogMap_repair.txt\")\n with open(formatted_file, \"w\") as f:\n f.writelines(lines)\n\n return lines\n
"},{"location":"deeponto/align/bertsubs/","title":"BERTSubs (Inter)","text":""},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline","title":"BERTSubsInterPipeline(src_onto, tgt_onto, config)
","text":"Class for the model training and prediction/validation pipeline of inter-ontology subsumption of BERTSubs.
Attributes:
Name Type Descriptionsrc_onto
Ontology
Source ontology (the sub-class side).
tgt_onto
Ontology
Target ontology (the super-class side).
config
CfgNode
Configuration.
src_sampler
SubsumptionSampler
Object for sampling-related functions of the source ontology.
tgt_sampler
SubsumptionSampler
Object for sampling-related functions of the target ontology.
Source code insrc/deeponto/complete/bertsubs/pipeline_inter.py
def __init__(self, src_onto: Ontology, tgt_onto: Ontology, config: CfgNode):\n self.src_onto = src_onto\n self.tgt_onto = tgt_onto\n self.config = config\n self.config.label_property = self.config.src_label_property\n self.src_sampler = SubsumptionSampler(onto=self.src_onto, config=self.config)\n self.config.label_property = self.config.tgt_label_property\n self.tgt_sampler = SubsumptionSampler(onto=self.tgt_onto, config=self.config)\n start_time = datetime.datetime.now()\n\n read_subsumptions = lambda file_name: [line.strip().split(',') for line in open(file_name).readlines()]\n test_subsumptions = None if config.test_subsumption_file is None or config.test_subsumption_file == 'None' \\\n else read_subsumptions(config.test_subsumption_file)\n valid_subsumptions = None if config.valid_subsumption_file is None or config.valid_subsumption_file == 'None' \\\n else read_subsumptions(config.valid_subsumption_file)\n\n if config.use_ontology_subsumptions_training:\n src_subsumptions = BERTSubsIntraPipeline.extract_subsumptions_from_ontology(onto=self.src_onto,\n subsumption_type=config.subsumption_type)\n tgt_subsumptions = BERTSubsIntraPipeline.extract_subsumptions_from_ontology(onto=self.tgt_onto,\n subsumption_type=config.subsumption_type)\n src_subsumptions0, tgt_subsumptions0 = [], []\n if config.subsumption_type == 'named_class':\n for subs in src_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n src_subsumptions0.append([str(c1.getIRI()), str(c2.getIRI())])\n for subs in tgt_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n tgt_subsumptions0.append([str(c1.getIRI()), str(c2.getIRI())])\n elif config.subsumption_type == 'restriction':\n for subs in src_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n src_subsumptions0.append([str(c1.getIRI()), str(c2)])\n for subs in tgt_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n tgt_subsumptions0.append([str(c1.getIRI()), str(c2)])\n restrictions = BERTSubsIntraPipeline.extract_restrictions_from_ontology(onto=self.tgt_onto)\n print('restrictions in the target ontology: %d' % len(restrictions))\n else:\n warnings.warn('Unknown subsumption type %s' % config.subsumption_type)\n sys.exit(0)\n print('Positive train subsumptions from the source/target ontology: %d/%d' % (\n len(src_subsumptions0), len(tgt_subsumptions0)))\n\n src_tr = self.src_sampler.generate_samples(subsumptions=src_subsumptions0)\n tgt_tr = self.tgt_sampler.generate_samples(subsumptions=tgt_subsumptions0)\n else:\n src_tr, tgt_tr = [], []\n\n if config.train_subsumption_file is None or config.train_subsumption_file == 'None':\n tr = src_tr + tgt_tr\n else:\n train_subsumptions = read_subsumptions(config.train_subsumption_file)\n tr = self.inter_ontology_sampling(subsumptions=train_subsumptions, pos_dup=config.fine_tune.train_pos_dup,\n neg_dup=config.fine_tune.train_neg_dup)\n tr = tr + src_tr + tgt_tr\n\n if len(tr) == 0:\n warnings.warn('No training samples extracted')\n if config.fine_tune.do_fine_tune:\n sys.exit(0)\n\n end_time = datetime.datetime.now()\n print('data pre-processing costs %.1f minutes' % ((end_time - start_time).seconds / 60))\n\n start_time = datetime.datetime.now()\n torch.cuda.empty_cache()\n bert_trainer = BERTSubsumptionClassifierTrainer(config.fine_tune.pretrained, train_data=tr,\n val_data=tr[0:int(len(tr) / 5)],\n max_length=config.prompt.max_length,\n early_stop=config.fine_tune.early_stop)\n\n epoch_steps = len(bert_trainer.tra) // config.fine_tune.batch_size # total steps of an epoch\n logging_steps = int(epoch_steps * 0.02) if int(epoch_steps * 0.02) > 0 else 5\n eval_steps = 5 * logging_steps\n training_args = TrainingArguments(\n output_dir=config.fine_tune.output_dir,\n num_train_epochs=config.fine_tune.num_epochs,\n per_device_train_batch_size=config.fine_tune.batch_size,\n per_device_eval_batch_size=config.fine_tune.batch_size,\n warmup_ratio=config.fine_tune.warm_up_ratio,\n weight_decay=0.01,\n logging_steps=logging_steps,\n logging_dir=f\"{config.fine_tune.output_dir}/tb\",\n eval_steps=eval_steps,\n evaluation_strategy=\"steps\",\n do_train=True,\n do_eval=True,\n save_steps=eval_steps,\n load_best_model_at_end=True,\n save_total_limit=1,\n metric_for_best_model=\"accuracy\",\n greater_is_better=True\n )\n if config.fine_tune.do_fine_tune and (config.prompt.prompt_type == 'traversal' or (\n config.prompt.prompt_type == 'path' and config.prompt.use_sub_special_token)):\n bert_trainer.add_special_tokens(['<SUB>'])\n\n bert_trainer.train(train_args=training_args, do_fine_tune=config.fine_tune.do_fine_tune)\n if config.fine_tune.do_fine_tune:\n bert_trainer.trainer.save_model(\n output_dir=os.path.join(config.fine_tune.output_dir, 'fine-tuned-checkpoint'))\n print('fine-tuning done, fine-tuned model saved')\n else:\n print('pretrained or fine-tuned model loaded.')\n end_time = datetime.datetime.now()\n print('Fine-tuning costs %.1f minutes' % ((end_time - start_time).seconds / 60))\n\n bert_trainer.model.eval()\n self.device = torch.device(f\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n bert_trainer.model.to(self.device)\n self.tokenize = lambda x: bert_trainer.tokenizer(x, max_length=config.prompt.max_length, truncation=True,\n padding=True, return_tensors=\"pt\")\n softmax = torch.nn.Softmax(dim=1)\n self.classifier = lambda x: softmax(bert_trainer.model(**x).logits)[:, 1]\n\n if valid_subsumptions is not None:\n self.evaluate(target_subsumptions=valid_subsumptions, test_type='valid')\n\n if test_subsumptions is not None:\n if config.test_type == 'evaluation':\n self.evaluate(target_subsumptions=test_subsumptions, test_type='test')\n elif config.test_type == 'prediction':\n self.predict(target_subsumptions=test_subsumptions)\n else:\n warnings.warn(\"Unknown test_type: %s\" % config.test_type)\n print('\\n ------------------------- done! ---------------------------\\n\\n\\n')\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.inter_ontology_sampling","title":"inter_ontology_sampling(subsumptions, pos_dup=1, neg_dup=1)
","text":"Transform inter-ontology subsumptions to two-string samples
Parameters:
Name Type Description Defaultsubsumptions
List[List]
A list of subsumptions; each subsumption is composed of two IRIs.
requiredpos_dup
int
Positive sample duplication.
1
neg_dup
int
Negative sample duplication.
1
Source code in src/deeponto/complete/bertsubs/pipeline_inter.py
def inter_ontology_sampling(self, subsumptions: List[List], pos_dup: int = 1, neg_dup: int = 1):\nr\"\"\"Transform inter-ontology subsumptions to two-string samples\n Args:\n subsumptions (List[List]): A list of subsumptions; each subsumption is composed of two IRIs.\n pos_dup (int): Positive sample duplication.\n neg_dup (int): Negative sample duplication.\n \"\"\"\n pos_samples = list()\n for subs in subsumptions:\n sub_strs = self.src_sampler.subclass_to_strings(subcls=subs[0])\n sup_strs = self.tgt_sampler.supclass_to_strings(supcls=subs[1],\n subsumption_type=self.config.subsumption_type)\n for sub_str in sub_strs:\n for sup_str in sup_strs:\n pos_samples.append([sub_str, sup_str, 1])\n pos_samples = pos_dup * pos_samples\n\n neg_subsumptions = list()\n for subs in subsumptions:\n for _ in range(neg_dup):\n neg_c = self.tgt_sampler.get_negative_sample(subclass_iri=subs[1],\n subsumption_type=self.config.subsumption_type)\n neg_subsumptions.append([subs[0], neg_c])\n\n neg_samples = list()\n for subs in neg_subsumptions:\n sub_strs = self.src_sampler.subclass_to_strings(subcls=subs[0])\n sup_strs = self.tgt_sampler.supclass_to_strings(supcls=subs[1],\n subsumption_type=self.config.subsumption_type)\n for sub_str in sub_strs:\n for sup_str in sup_strs:\n neg_samples.append([sub_str, sup_str, 0])\n\n if len(neg_samples) < len(pos_samples):\n neg_samples = neg_samples + [random.choice(neg_samples) for _ in range(len(pos_samples) - len(neg_samples))]\n if len(neg_samples) > len(pos_samples):\n pos_samples = pos_samples + [random.choice(pos_samples) for _ in range(len(neg_samples) - len(pos_samples))]\n print('training mappings, pos_samples: %d, neg_samples: %d' % (len(pos_samples), len(neg_samples)))\n all_samples = [s for s in pos_samples + neg_samples if s[0] != '' and s[1] != '']\n return all_samples\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.inter_ontology_subsumption_to_sample","title":"inter_ontology_subsumption_to_sample(subsumption)
","text":"Transform an inter ontology subsumption into a sample (a two-string list).
Parameters:
Name Type Description Defaultsubsumption
List
a subsumption composed of two IRIs.
required Source code insrc/deeponto/complete/bertsubs/pipeline_inter.py
def inter_ontology_subsumption_to_sample(self, subsumption: List):\nr\"\"\"Transform an inter ontology subsumption into a sample (a two-string list).\n\n Args:\n subsumption (List): a subsumption composed of two IRIs.\n \"\"\"\n subcls, supcls = subsumption[0], subsumption[1]\n substrs = self.src_sampler.subclass_to_strings(subcls=subcls)\n supstrs = self.tgt_sampler.supclass_to_strings(supcls=supcls, subsumption_type='named_class')\n samples = list()\n for substr in substrs:\n for supstr in supstrs:\n samples.append([substr, supstr])\n return samples\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.score","title":"score(samples)
","text":"Score the samples with the classifier.
Parameters:
Name Type Description Defaultsamples
List[List]
Each item is a list with two strings (input).
required Source code insrc/deeponto/complete/bertsubs/pipeline_inter.py
def score(self, samples):\nr\"\"\"Score the samples with the classifier.\n\n Args:\n samples (List[List]): Each item is a list with two strings (input).\n \"\"\"\n sample_size = len(samples)\n scores = np.zeros(sample_size)\n batch_num = math.ceil(sample_size / self.config.evaluation.batch_size)\n for i in range(batch_num):\n j = (i + 1) * self.config.evaluation.batch_size \\\n if (i + 1) * self.config.evaluation.batch_size <= sample_size else sample_size\n inputs = self.tokenize(samples[i * self.config.evaluation.batch_size:j])\n inputs.to(self.device)\n with torch.no_grad():\n batch_scores = self.classifier(inputs)\n scores[i * self.config.evaluation.batch_size:j] = batch_scores.cpu().numpy()\n return scores\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.evaluate","title":"evaluate(target_subsumptions, test_type='test')
","text":"Test and calculate the metrics according to a given list of subsumptions.
Parameters:
Name Type Description Defaulttarget_subsumptions
List[List]
A list of subsumptions, each of which of is a two-component list (subclass_iri, super_class_iri_or_str)
.
test_type
str
\"test\"
or \"valid\"
.
'test'
Source code in src/deeponto/complete/bertsubs/pipeline_inter.py
def evaluate(self, target_subsumptions: List[List], test_type: str = 'test'):\nr\"\"\"Test and calculate the metrics according to a given list of subsumptions.\n\n Args:\n target_subsumptions (List[List]): A list of subsumptions, each of which of is a two-component list `(subclass_iri, super_class_iri_or_str)`.\n test_type (str): `\"test\"` or `\"valid\"`.\n \"\"\"\n MRR_sum, hits1_sum, hits5_sum, hits10_sum = 0, 0, 0, 0\n MRR, Hits1, Hits5, Hits10 = 0, 0, 0, 0\n size_sum, size_n = 0, 0\n for k0, test in enumerate(target_subsumptions):\n subcls, gt = test[0], test[1]\n candidates = test[1:]\n candidate_subsumptions = [[subcls, c] for c in candidates]\n candidate_scores = np.zeros(len(candidate_subsumptions))\n for k1, candidate_subsumption in enumerate(candidate_subsumptions):\n samples = self.inter_ontology_subsumption_to_sample(subsumption=candidate_subsumption)\n size_sum += len(samples)\n size_n += 1\n scores = self.score(samples=samples)\n candidate_scores[k1] = np.average(scores)\n\n sorted_indexes = np.argsort(candidate_scores)[::-1]\n sorted_classes = [candidates[i] for i in sorted_indexes]\n rank = sorted_classes.index(gt) + 1\n MRR_sum += 1.0 / rank\n hits1_sum += 1 if gt in sorted_classes[:1] else 0\n hits5_sum += 1 if gt in sorted_classes[:5] else 0\n hits10_sum += 1 if gt in sorted_classes[:10] else 0\n num = k0 + 1\n MRR, Hits1, Hits5, Hits10 = MRR_sum / num, hits1_sum / num, hits5_sum / num, hits10_sum / num\n if num % 500 == 0:\n print('\\n%d tested, MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n' % (\n num, MRR, Hits1, Hits5, Hits10))\n print('\\n[%s], MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n' % (test_type, MRR, Hits1, Hits5, Hits10))\n print('%.2f samples per testing subsumption' % (size_sum / size_n))\n
"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.predict","title":"predict(target_subsumptions)
","text":"Predict a score for each given subsumption.
The scores will be saved in test_subsumption_scores.csv
.
Parameters:
Name Type Description Defaulttarget_subsumptions
List[List]
Each item is a list with the first element as the sub-class, and the remaining elements as n candidate super-classes.
required Source code insrc/deeponto/complete/bertsubs/pipeline_inter.py
def predict(self, target_subsumptions: List[List]):\nr\"\"\"Predict a score for each given subsumption. \n\n The scores will be saved in `test_subsumption_scores.csv`.\n\n Args:\n target_subsumptions (List[List]): Each item is a list with the first element as the sub-class,\n and the remaining elements as n candidate super-classes.\n \"\"\"\n out_lines = []\n for test in target_subsumptions:\n subcls, candidates = test[0], test[1:]\n candidate_subsumptions = [[subcls, c] for c in candidates]\n candidate_scores = []\n\n for candidate_subsumption in candidate_subsumptions:\n samples = self.inter_ontology_subsumption_to_sample(subsumption=candidate_subsumption)\n scores = self.score(samples=samples)\n candidate_scores.append(np.average(scores))\n out_lines.append(','.join([str(i) for i in candidate_scores]))\n\n out_file = 'test_subsumption_scores.csv'\n with open(out_file, 'w') as f:\n for line in out_lines:\n f.write('%s\\n' % line)\n print('Predicted subsumption scores are saved to %s' % out_file)\n
"},{"location":"deeponto/align/logmap/","title":"LogMap","text":"Run LogMap matcher 4.0 in a jar
command.
Credit
See LogMap repository at: https://github.com/ernestojimenezruiz/logmap-matcher.
"},{"location":"deeponto/align/logmap/#deeponto.align.logmap.run_logmap_repair","title":"run_logmap_repair(src_onto_path, tgt_onto_path, mapping_file_path, output_path, max_jvm_memory='10g')
","text":"Run the repair module of LogMap with java -jar
.
src/deeponto/align/logmap/__init__.py
def run_logmap_repair(\n src_onto_path: str, tgt_onto_path: str, mapping_file_path: str, output_path: str, max_jvm_memory: str = \"10g\"\n):\n\"\"\"Run the repair module of LogMap with `java -jar`.\"\"\"\n\n # find logmap directory\n logmap_path = os.path.dirname(__file__)\n\n # obtain absolute paths\n src_onto_path = os.path.abspath(src_onto_path)\n tgt_onto_path = os.path.abspath(tgt_onto_path)\n mapping_file_path = os.path.abspath(mapping_file_path)\n output_path = os.path.abspath(output_path)\n\n # run jar command\n print(f\"Run the repair module of LogMap from {logmap_path}.\")\n repair_command = (\n f\"java -Xms500m -Xmx{max_jvm_memory} -DentityExpansionLimit=100000000 -jar {logmap_path}/logmap-matcher-4.0.jar DEBUGGER \"\n + f\"file:{src_onto_path} file:{tgt_onto_path} TXT {mapping_file_path}\"\n + f\" {output_path} false false\"\n )\n print(f\"The jar command is:\\n{repair_command}.\")\n run_jar(repair_command)\n
"},{"location":"deeponto/complete/ontolama/","title":"OntoLAMA","text":""},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.inference.run_inference","title":"run_inference(config, args)
","text":"Main entry for running the OpenPrompt script.
Source code insrc/deeponto/complete/ontolama/inference.py
def run_inference(config, args):\n\"\"\"Main entry for running the OpenPrompt script.\n \"\"\"\n global CUR_TEMPLATE, CUR_VERBALIZER\n # exit()\n # init logger, create log dir and set log level, etc.\n if args.resume and args.test:\n raise Exception(\"cannot use flag --resume and --test together\")\n if args.resume or args.test:\n config.logging.path = EXP_PATH = args.resume or args.test\n else:\n EXP_PATH = config_experiment_dir(config)\n init_logger(\n os.path.join(EXP_PATH, \"log.txt\"),\n config.logging.file_level,\n config.logging.console_level,\n )\n # save config to the logger directory\n save_config_to_yaml(config)\n\n # load dataset. The valid_dataset can be None\n train_dataset, valid_dataset, test_dataset, Processor = OntoLAMADataProcessor.load_inference_dataset(\n config, test=args.test is not None or config.learning_setting == \"zero_shot\"\n )\n\n # main\n if config.learning_setting == \"full\":\n res = trainer(\n EXP_PATH,\n config,\n Processor,\n resume=args.resume,\n test=args.test,\n train_dataset=train_dataset,\n valid_dataset=valid_dataset,\n test_dataset=test_dataset,\n )\n elif config.learning_setting == \"few_shot\":\n if config.few_shot.few_shot_sampling is None:\n raise ValueError(\"use few_shot setting but config.few_shot.few_shot_sampling is not specified\")\n seeds = config.sampling_from_train.seed\n res = 0\n for seed in seeds:\n if not args.test:\n sampler = FewShotSampler(\n num_examples_per_label=config.sampling_from_train.num_examples_per_label,\n also_sample_dev=config.sampling_from_train.also_sample_dev,\n num_examples_per_label_dev=config.sampling_from_train.num_examples_per_label_dev,\n )\n train_sampled_dataset, valid_sampled_dataset = sampler(\n train_dataset=train_dataset, valid_dataset=valid_dataset, seed=seed\n )\n result = trainer(\n os.path.join(EXP_PATH, f\"seed-{seed}\"),\n config,\n Processor,\n resume=args.resume,\n test=args.test,\n train_dataset=train_sampled_dataset,\n valid_dataset=valid_sampled_dataset,\n test_dataset=test_dataset,\n )\n else:\n result = trainer(\n os.path.join(EXP_PATH, f\"seed-{seed}\"),\n config,\n Processor,\n test=args.test,\n test_dataset=test_dataset,\n )\n res += result\n res /= len(seeds)\n elif config.learning_setting == \"zero_shot\":\n res = trainer(\n EXP_PATH,\n config,\n Processor,\n zero=True,\n train_dataset=train_dataset,\n valid_dataset=valid_dataset,\n test_dataset=test_dataset,\n )\n\n return config, CUR_TEMPLATE, CUR_VERBALIZER\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase","title":"SubsumptionSamplerBase(onto)
","text":"Base Class for Sampling Subsumption Pairs.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def __init__(self, onto: Ontology):\n self.onto = onto\n self.progress_manager = enlighten.get_manager()\n\n # for faster sampling\n self.concept_iris = list(self.onto.owl_classes.keys())\n self.object_property_iris = list(self.onto.owl_object_properties.keys())\n self.sibling_concept_groups = self.onto.sibling_class_groups\n self.sibling_auxiliary_dict = defaultdict(list)\n for i, sib_group in enumerate(self.sibling_concept_groups):\n for sib in sib_group:\n self.sibling_auxiliary_dict[sib].append(i)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_named_concept","title":"random_named_concept()
","text":"Randomly draw a named concept's IRI.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def random_named_concept(self) -> str:\n\"\"\"Randomly draw a named concept's IRI.\"\"\"\n return random.choice(self.concept_iris)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_object_property","title":"random_object_property()
","text":"Randomly draw a object property's IRI.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def random_object_property(self) -> str:\n\"\"\"Randomly draw a object property's IRI.\"\"\"\n return random.choice(self.object_property_iris)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.get_siblings","title":"get_siblings(concept_iri)
","text":"Get the sibling concepts of the given concept.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def get_siblings(self, concept_iri: str):\n\"\"\"Get the sibling concepts of the given concept.\"\"\"\n sibling_group = self.sibling_auxiliary_dict[concept_iri]\n sibling_group = [self.sibling_concept_groups[i] for i in sibling_group]\n sibling_group = list(itertools.chain.from_iterable(sibling_group))\n return sibling_group\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_sibling","title":"random_sibling(concept_iri)
","text":"Randomly draw a sibling concept for a given concept.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def random_sibling(self, concept_iri: str) -> str:\n\"\"\"Randomly draw a sibling concept for a given concept.\"\"\"\n sibling_group = self.get_siblings(concept_iri)\n if sibling_group:\n return random.choice(sibling_group)\n else:\n # not every concept has a sibling concept\n return None\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler","title":"AtomicSubsumptionSampler(onto)
","text":" Bases: SubsumptionSamplerBase
Sampler for constructing the Atomic Subsumption Inference (SI) dataset.
Positive samples come from the entailed subsumptions.
Soft negative samples come from the pairs of randomly selected concepts, subject to passing the assumed disjointness check.
Hard negative samples come from the pairs of randomly selected sibling concepts, subject to passing the assumed disjointness check.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def __init__(self, onto: Ontology):\n super().__init__(onto)\n\n # compute the sibling concept pairs for faster hard negative sampling\n self.sibling_pairs = []\n for sib_group in self.sibling_concept_groups:\n self.sibling_pairs += [(x, y) for x, y in itertools.product(sib_group, sib_group) if x != y]\n self.maximum_num_hard_negatives = len(self.sibling_pairs)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler.positive_sampling","title":"positive_sampling(num_samples=None)
","text":"Sample named concept pairs that are involved in a subsumption axiom.
An extracted pair \\((C, D)\\) indicates \\(\\mathcal{O} \\models C \\sqsubseteq D\\) where \\(\\mathcal{O}\\) is the input ontology.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def positive_sampling(self, num_samples: Optional[int] = None):\nr\"\"\"Sample named concept pairs that are involved in a subsumption axiom.\n\n An extracted pair $(C, D)$ indicates $\\mathcal{O} \\models C \\sqsubseteq D$ where\n $\\mathcal{O}$ is the input ontology.\n \"\"\"\n pbar = self.progress_manager.counter(desc=\"Sample Positive Subsumptions\", unit=\"pair\")\n positives = []\n for concept_iri in self.concept_iris:\n owl_concept = self.onto.owl_classes[concept_iri]\n for subsumer_iri in self.onto.reasoner.get_inferred_super_entities(owl_concept, direct=False):\n positives.append((concept_iri, subsumer_iri))\n pbar.update()\n positives = list(set(sorted(positives)))\n if num_samples:\n positives = random.sample(positives, num_samples)\n print(f\"Sample {len(positives)} unique positive subsumption pairs.\")\n return positives\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler.negative_sampling","title":"negative_sampling(negative_sample_type, num_samples, apply_assumed_disjointness_alternative=True)
","text":"Sample named concept pairs that are involved in a disjoiness (assumed) axiom, which then implies non-subsumption.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def negative_sampling(\n self,\n negative_sample_type: str,\n num_samples: int,\n apply_assumed_disjointness_alternative: bool = True,\n):\nr\"\"\"Sample named concept pairs that are involved in a disjoiness (assumed) axiom, which then\n implies non-subsumption.\n \"\"\"\n if negative_sample_type == \"soft\":\n draw_one = lambda: tuple(random.sample(self.concept_iris, k=2))\n elif negative_sample_type == \"hard\":\n draw_one = lambda: random.choice(self.sibling_pairs)\n else:\n raise RuntimeError(f\"{negative_sample_type} not supported.\")\n\n negatives = []\n max_iter = 2 * num_samples\n\n # which method to validate the negative sample\n valid_negative = self.onto.reasoner.check_assumed_disjoint\n if apply_assumed_disjointness_alternative:\n valid_negative = self.onto.reasoner.check_assumed_disjoint_alternative\n\n print(f\"Sample {negative_sample_type} negative subsumption pairs.\")\n # create two bars for process tracking\n added_bar = self.progress_manager.counter(total=num_samples, desc=\"Sample Negative Subsumptions\", unit=\"pair\")\n iter_bar = self.progress_manager.counter(total=max_iter, desc=\"#Iteration\", unit=\"it\")\n i = 0\n added = 0\n while added < num_samples and i < max_iter:\n sub_concept_iri, super_concept_iri = draw_one()\n sub_concept = self.onto.get_owl_object(sub_concept_iri)\n super_concept = self.onto.get_owl_object(super_concept_iri)\n # collect class iri if accepted\n if valid_negative(sub_concept, super_concept):\n neg = (sub_concept_iri, super_concept_iri)\n negatives.append(neg)\n added += 1\n added_bar.update(1)\n if added == num_samples:\n negatives = list(set(sorted(negatives)))\n added = len(negatives)\n added_bar.count = added\n i += 1\n iter_bar.update(1)\n negatives = list(set(sorted(negatives)))\n print(f\"Sample {len(negatives)} unique positive subsumption pairs.\")\n return negatives\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler","title":"ComplexSubsumptionSampler(onto)
","text":" Bases: SubsumptionSamplerBase
Sampler for constructing the Complex Subsumption Inference (SI) dataset.
To obtain complex concept expressions on both sides of the subsumption relationship (as a sub-concept or a super-concept), this sampler utilises the equivalence axioms in the form of \\(C \\equiv C_{comp}\\) where \\(C\\) is atomic and \\(C_{comp}\\) is complex.
An equivalence axiom like \\(C \\equiv C_{comp}\\) is deemed as an anchor axiom.
Positive samples are in the form of \\(C_{sub} \\sqsubseteq C_{comp}\\) or \\(C_{comp} \\sqsubseteq C_{super}\\) where \\(C_{sub}\\) is an entailed sub-concept of \\(C\\) and \\(C_{comp}\\), \\(C_{super}\\) is an entailed super-concept of \\(C\\) and \\(C_{comp}\\).
Negative samples are formed by replacing one of the named entities in the anchor axiom, the modified sub-concept and super-concept need to pass the assumed disjointness check to be accepted as a valid negative sample. Without loss of generality, suppose we choose \\(C \\sqsubseteq C_{comp}\\) and replace a named entity in \\(C_{comp}'\\) to form \\(C \\sqsubseteq C_{comp}'\\), then \\(C\\) and \\(C_{comp}'\\) is a valid negative only if they satisfy the assumed disjointness check.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def __init__(self, onto: Ontology):\n super().__init__(onto)\n self.anchor_axioms = self.onto.get_equivalence_axioms(\"Classes\")\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.positive_sampling_from_anchor","title":"positive_sampling_from_anchor(anchor_axiom)
","text":"Returns all positive subsumption pairs extracted from an anchor equivalence axiom.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def positive_sampling_from_anchor(self, anchor_axiom: OWLAxiom):\n\"\"\"Returns all positive subsumption pairs extracted from an anchor equivalence axiom.\"\"\"\n sub_axiom = list(anchor_axiom.asOWLSubClassOfAxioms())[0]\n atomic_concept, complex_concept = sub_axiom.getSubClass(), sub_axiom.getSuperClass()\n # determine which is the atomic concept\n if complex_concept.isClassExpressionLiteral():\n atomic_concept, complex_concept = complex_concept, atomic_concept\n\n # intialise the positive samples from the anchor equivalence axiom\n positives = list(anchor_axiom.asOWLSubClassOfAxioms())\n for super_concept_iri in self.onto.reasoner.get_inferred_super_entities(atomic_concept, direct=False):\n positives.append(\n self.onto.owl_data_factory.getOWLSubClassOfAxiom(\n complex_concept, self.onto.get_owl_object(super_concept_iri)\n )\n )\n for sub_concept_iri in self.onto.reasoner.get_inferred_sub_entities(atomic_concept, direct=False):\n positives.append(\n self.onto.owl_data_factory.getOWLSubClassOfAxiom(\n self.onto.get_owl_object(sub_concept_iri), complex_concept\n )\n )\n\n # TESTING\n # for p in positives:\n # assert self.onto.reasoner.owl_reasoner.isEntailed(p) \n\n return list(set(sorted(positives)))\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.positive_sampling","title":"positive_sampling(num_samples_per_anchor=10)
","text":"Sample positive subsumption axioms that involve one atomic and one complex concepts.
An extracted pair \\((C, D)\\) indicates \\(\\mathcal{O} \\models C \\sqsubseteq D\\) where \\(\\mathcal{O}\\) is the input ontology.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def positive_sampling(self, num_samples_per_anchor: Optional[int] = 10):\nr\"\"\"Sample positive subsumption axioms that involve one atomic and one complex concepts.\n\n An extracted pair $(C, D)$ indicates $\\mathcal{O} \\models C \\sqsubseteq D$ where\n $\\mathcal{O}$ is the input ontology.\n \"\"\"\n print(f\"Maximum number of positive samples for each anchor is set to {num_samples_per_anchor}.\")\n pbar = self.progress_manager.counter(desc=\"Sample Positive Subsumptions from\", unit=\"anchor axiom\")\n positives = dict()\n for anchor in self.anchor_axioms:\n positives_from_anchor = self.positive_sampling_from_anchor(anchor)\n if num_samples_per_anchor and num_samples_per_anchor < len(positives_from_anchor):\n positives_from_anchor = random.sample(positives_from_anchor, k = num_samples_per_anchor)\n positives[str(anchor)] = positives_from_anchor\n pbar.update()\n # positives = list(set(sorted(positives)))\n print(f\"Sample {sum([len(v) for v in positives.values()])} unique positive subsumption pairs.\")\n return positives\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.negative_sampling","title":"negative_sampling(num_samples_per_anchor=10)
","text":"Sample negative subsumption axioms that involve one atomic and one complex concepts.
An extracted pair \\((C, D)\\) indicates \\(C\\) and \\(D\\) pass the assumed disjointness check.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def negative_sampling(self, num_samples_per_anchor: Optional[int] = 10):\nr\"\"\"Sample negative subsumption axioms that involve one atomic and one complex concepts.\n\n An extracted pair $(C, D)$ indicates $C$ and $D$ pass the [assumed disjointness check][deeponto.onto.OntologyReasoner.check_assumed_disjoint].\n \"\"\"\n print(f\"Maximum number of negative samples for each anchor is set to {num_samples_per_anchor}.\")\n pbar = self.progress_manager.counter(desc=\"Sample Negative Subsumptions from\", unit=\"anchor axiom\")\n negatives = dict()\n for anchor in self.anchor_axioms:\n negatives_from_anchor = []\n i, max_iter = 0, num_samples_per_anchor + 2\n while i < max_iter and len(negatives_from_anchor) < num_samples_per_anchor:\n corrupted_anchor = self.random_corrupt(anchor)\n corrupted_sub_axiom = random.choice(list(corrupted_anchor.asOWLSubClassOfAxioms()))\n sub_concept, super_concept = corrupted_sub_axiom.getSubClass(), corrupted_sub_axiom.getSuperClass()\n if self.onto.reasoner.check_assumed_disjoint_alternative(sub_concept, super_concept):\n negatives_from_anchor.append(corrupted_sub_axiom)\n i += 1\n negatives[str(anchor)] = list(set(sorted(negatives_from_anchor)))\n pbar.update()\n # negatives = list(set(sorted(negatives)))\n print(f\"Sample {sum([len(v) for v in negatives.values()])} unique positive subsumption pairs.\")\n return negatives\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.random_corrupt","title":"random_corrupt(axiom)
","text":"Randomly change an IRI in the input axiom and return a new one.
Source code insrc/deeponto/complete/ontolama/subsumption_sampler.py
def random_corrupt(self, axiom: OWLAxiom):\n\"\"\"Randomly change an IRI in the input axiom and return a new one.\n \"\"\"\n replaced_iri = random.choice(re.findall(IRI, str(axiom)))[1:-1]\n replaced_entity = self.onto.get_owl_object(replaced_iri)\n replacement_iri = None\n if self.onto.get_entity_type(replaced_entity) == \"Classes\":\n replacement_iri = self.random_named_concept()\n elif self.onto.get_entity_type(replaced_entity) == \"ObjectProperties\":\n replacement_iri = self.random_object_property()\n else:\n # NOTE: to extend to other types of entities in future\n raise RuntimeError(\"Unknown type of axiom.\")\n return self.onto.replace_entity(axiom, replaced_iri, replacement_iri)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor","title":"OntoLAMADataProcessor()
","text":" Bases: DataProcessor
Class for processing the OntoLAMA data points.
Source code insrc/deeponto/complete/ontolama/data_processor.py
def __init__(self):\n super().__init__()\n self.labels = [\"negative\", \"positive\"]\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.load_dataset","title":"load_dataset(task_name, split)
staticmethod
","text":"Load a specific OntoLAMA dataset from huggingface dataset hub.
Source code insrc/deeponto/complete/ontolama/data_processor.py
@staticmethod\ndef load_dataset(task_name: str, split: str):\n\"\"\"Load a specific OntoLAMA dataset from huggingface dataset hub.\"\"\"\n # TODO: remove use_auth_token after going to public\n return load_dataset(\"krr-oxford/OntoLAMA\", task_name, split=split, use_auth_token=True)\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.get_examples","title":"get_examples(task_name, split)
","text":"Load a specific OntoLAMA dataset and transform the data points into input examples for prompt-based inference.
Source code insrc/deeponto/complete/ontolama/data_processor.py
def get_examples(self, task_name, split):\n\"\"\"Load a specific OntoLAMA dataset and transform the data points into\n input examples for prompt-based inference.\n \"\"\"\n\n dataset = self.load_dataset(task_name, split)\n\n premise_name = \"v_sub_concept\"\n hypothesis_name = \"v_super_concept\"\n # different data fields for the bimnli dataset\n if \"bimnli\" in task_name:\n premise_name = \"premise\"\n hypothesis_name = \"hypothesis\"\n\n prompt_samples = []\n for samp in dataset:\n inp = InputExample(text_a=samp[premise_name], text_b=samp[hypothesis_name], label=samp[\"label\"])\n prompt_samples.append(inp)\n\n return prompt_samples\n
"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.load_inference_dataset","title":"load_inference_dataset(config, return_class=True, test=False)
classmethod
","text":"A plm loader using a global config. It will load the train, valid, and test set (if exists) simulatenously.
Parameters:
Name Type Description Defaultconfig
CfgNode
The global config from the CfgNode.
requiredreturn_class
bool
Whether return the data processor class for future usage.
True
Returns:
Type DescriptionOptional[List[InputExample]]
The train dataset.
Optional[List[InputExample]]
The valid dataset.
Optional[List[InputExample]]
The test dataset.
Optional[OntoLAMADataProcessor]
The data processor object.
Source code insrc/deeponto/complete/ontolama/data_processor.py
@classmethod\ndef load_inference_dataset(cls, config: CfgNode, return_class=True, test=False):\nr\"\"\"A plm loader using a global config.\n It will load the train, valid, and test set (if exists) simulatenously.\n\n Args:\n config (CfgNode): The global config from the CfgNode.\n return_class (bool): Whether return the data processor class for future usage.\n\n Returns:\n (Optional[List[InputExample]]): The train dataset.\n (Optional[List[InputExample]]): The valid dataset.\n (Optional[List[InputExample]]): The test dataset.\n (Optional[OntoLAMADataProcessor]): The data processor object.\n \"\"\"\n dataset_config = config.dataset\n\n processor = cls()\n\n train_dataset = None\n valid_dataset = None\n if not test:\n try:\n train_dataset = processor.get_examples(dataset_config.task_name, \"train\")\n except FileNotFoundError:\n logger.warning(f\"Has no training dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n try:\n valid_dataset = processor.get_examples(dataset_config.task_name, \"validation\")\n except FileNotFoundError:\n logger.warning(f\"Has no validation dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n\n test_dataset = None\n try:\n test_dataset = processor.get_examples(dataset_config.task_name, \"test\")\n except FileNotFoundError:\n logger.warning(f\"Has no test dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n # checking whether donwloaded.\n if (train_dataset is None) and (valid_dataset is None) and (test_dataset is None):\n logger.error(\n \"Dataset is empty. Either there is no download or the path is wrong. \"\n + \"If not downloaded, please `cd datasets/` and `bash download_xxx.sh`\"\n )\n exit()\n if return_class:\n return train_dataset, valid_dataset, test_dataset, processor\n else:\n return train_dataset, valid_dataset, test_dataset\n
"},{"location":"deeponto/complete/bertsubs/","title":"BERTSubs (Intra)","text":""},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline","title":"BERTSubsIntraPipeline(onto, config)
","text":"Class for the intra-ontology subsumption prediction setting of BERTSubs.
Attributes:
Name Type Descriptiononto
Ontology
The target ontology.
config
CfgNode
The configuration for BERTSubs.
sampler
SubsumptionSample
The subsumption sampler for BERTSubs.
Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
def __init__(self, onto: Ontology, config: CfgNode):\n self.onto = onto\n self.config = config\n self.sampler = SubsumptionSampler(onto=onto, config=config)\n start_time = datetime.datetime.now()\n\n n = 0\n for k in self.sampler.named_classes:\n n += len(self.sampler.iri_label[k])\n print(\n \"%d named classes, %.1f labels per class\"\n % (len(self.sampler.named_classes), n / len(self.sampler.named_classes))\n )\n\n read_subsumptions = lambda file_name: [line.strip().split(\",\") for line in open(file_name).readlines()]\n test_subsumptions = (\n None\n if config.test_subsumption_file is None or config.test_subsumption_file == \"None\"\n else read_subsumptions(config.test_subsumption_file)\n )\n\n # The train/valid subsumptions are not given. They will be extracted from the given ontology:\n if config.train_subsumption_file is None or config.train_subsumption_file == \"None\":\n subsumptions0 = self.extract_subsumptions_from_ontology(\n onto=onto, subsumption_type=config.subsumption_type\n )\n random.shuffle(subsumptions0)\n valid_size = int(len(subsumptions0) * config.valid.valid_ratio)\n train_subsumptions0, valid_subsumptions0 = subsumptions0[valid_size:], subsumptions0[0:valid_size]\n train_subsumptions, valid_subsumptions = [], []\n if config.subsumption_type == \"named_class\":\n for subs in train_subsumptions0:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n train_subsumptions.append([str(c1.getIRI()), str(c2.getIRI())])\n\n size_sum = 0\n for subs in valid_subsumptions0:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n neg_candidates = BERTSubsIntraPipeline.get_test_neg_candidates_named_class(\n subclass=c1, gt=c2, max_neg_size=config.valid.max_neg_size, onto=onto\n )\n size = len(neg_candidates)\n size_sum += size\n if size > 0:\n item = [str(c1.getIRI()), str(c2.getIRI())] + [str(c.getIRI()) for c in neg_candidates]\n valid_subsumptions.append(item)\n print(\"\\t average neg candidate size in validation: %.2f\" % (size_sum / len(valid_subsumptions)))\n\n elif config.subsumption_type == \"restriction\":\n for subs in train_subsumptions0:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n train_subsumptions.append([str(c1.getIRI()), str(c2)])\n\n restrictions = BERTSubsIntraPipeline.extract_restrictions_from_ontology(onto=onto)\n print(\"restrictions: %d\" % len(restrictions))\n size_sum = 0\n for subs in valid_subsumptions0:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n c2_neg = BERTSubsIntraPipeline.get_test_neg_candidates_restriction(\n subcls=c1, max_neg_size=config.valid.max_neg_size, restrictions=restrictions, onto=onto\n )\n size_sum += len(c2_neg)\n item = [str(c1.getIRI()), str(c2)] + [str(r) for r in c2_neg]\n valid_subsumptions.append(item)\n print(\"valid candidate negative avg. size: %.1f\" % (size_sum / len(valid_subsumptions)))\n else:\n warnings.warn(\"Unknown subsumption type %s\" % config.subsumption_type)\n sys.exit(0)\n\n # The train/valid subsumptions are given:\n else:\n train_subsumptions = read_subsumptions(config.train_subsumption_file)\n valid_subsumptions = read_subsumptions(config.valid_subsumption_file)\n\n print(\"Positive train/valid subsumptions: %d/%d\" % (len(train_subsumptions), len(valid_subsumptions)))\n tr = self.sampler.generate_samples(subsumptions=train_subsumptions)\n va = self.sampler.generate_samples(subsumptions=valid_subsumptions, duplicate=False)\n\n end_time = datetime.datetime.now()\n print(\"data pre-processing costs %.1f minutes\" % ((end_time - start_time).seconds / 60))\n\n start_time = datetime.datetime.now()\n torch.cuda.empty_cache()\n bert_trainer = BERTSubsumptionClassifierTrainer(\n config.fine_tune.pretrained,\n train_data=tr,\n val_data=va,\n max_length=config.prompt.max_length,\n early_stop=config.fine_tune.early_stop,\n )\n\n epoch_steps = len(bert_trainer.tra) // config.fine_tune.batch_size # total steps of an epoch\n logging_steps = int(epoch_steps * 0.02) if int(epoch_steps * 0.02) > 0 else 5\n eval_steps = 5 * logging_steps\n training_args = TrainingArguments(\n output_dir=config.fine_tune.output_dir,\n num_train_epochs=config.fine_tune.num_epochs,\n per_device_train_batch_size=config.fine_tune.batch_size,\n per_device_eval_batch_size=config.fine_tune.batch_size,\n warmup_ratio=config.fine_tune.warm_up_ratio,\n weight_decay=0.01,\n logging_steps=logging_steps,\n logging_dir=f\"{config.fine_tune.output_dir}/tb\",\n eval_steps=eval_steps,\n evaluation_strategy=\"steps\",\n do_train=True,\n do_eval=True,\n save_steps=eval_steps,\n load_best_model_at_end=True,\n save_total_limit=1,\n metric_for_best_model=\"accuracy\",\n greater_is_better=True,\n )\n if config.fine_tune.do_fine_tune and (\n config.prompt.prompt_type == \"traversal\"\n or (config.prompt.prompt_type == \"path\" and config.prompt.use_sub_special_token)\n ):\n bert_trainer.add_special_tokens([\"<SUB>\"])\n\n bert_trainer.train(train_args=training_args, do_fine_tune=config.fine_tune.do_fine_tune)\n if config.fine_tune.do_fine_tune:\n bert_trainer.trainer.save_model(\n output_dir=os.path.join(config.fine_tune.output_dir, \"fine-tuned-checkpoint\")\n )\n print(\"fine-tuning done, fine-tuned model saved\")\n else:\n print(\"pretrained or fine-tuned model loaded.\")\n end_time = datetime.datetime.now()\n print(\"Fine-tuning costs %.1f minutes\" % ((end_time - start_time).seconds / 60))\n\n bert_trainer.model.eval()\n self.device = torch.device(f\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n bert_trainer.model.to(self.device)\n self.tokenize = lambda x: bert_trainer.tokenizer(\n x, max_length=config.prompt.max_length, truncation=True, padding=True, return_tensors=\"pt\"\n )\n softmax = torch.nn.Softmax(dim=1)\n self.classifier = lambda x: softmax(bert_trainer.model(**x).logits)[:, 1]\n\n self.evaluate(target_subsumptions=valid_subsumptions, test_type=\"valid\")\n if test_subsumptions is not None:\n if config.test_type == \"evaluation\":\n self.evaluate(target_subsumptions=test_subsumptions, test_type=\"test\")\n elif config.test_type == \"prediction\":\n self.predict(target_subsumptions=test_subsumptions)\n else:\n warnings.warn(\"Unknown test_type: %s\" % config.test_type)\n print(\"\\n ------------------------- done! ---------------------------\\n\\n\\n\")\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.score","title":"score(samples)
","text":"The scoring function based on the fine-tuned BERT classifier.
Parameters:
Name Type Description Defaultsamples
List[Tuple]
A list of input sentence pairs to be scored.
required Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
def score(self, samples: List[List]):\nr\"\"\"The scoring function based on the fine-tuned BERT classifier.\n\n Args:\n samples (List[Tuple]): A list of input sentence pairs to be scored.\n \"\"\"\n sample_size = len(samples)\n scores = np.zeros(sample_size)\n batch_num = math.ceil(sample_size / self.config.evaluation.batch_size)\n for i in range(batch_num):\n j = (\n (i + 1) * self.config.evaluation.batch_size\n if (i + 1) * self.config.evaluation.batch_size <= sample_size\n else sample_size\n )\n inputs = self.tokenize(samples[i * self.config.evaluation.batch_size : j])\n inputs.to(self.device)\n with torch.no_grad():\n batch_scores = self.classifier(inputs)\n scores[i * self.config.evaluation.batch_size : j] = batch_scores.cpu().numpy()\n return scores\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.evaluate","title":"evaluate(target_subsumptions, test_type='test')
","text":"Test and calculate the metrics for a given list of subsumption pairs.
Parameters:
Name Type Description Defaulttarget_subsumptions
List[Tuple]
A list of subsumption pairs.
requiredtest_type
str
test
for testing or valid
for validation.
'test'
Source code in src/deeponto/complete/bertsubs/pipeline_intra.py
def evaluate(self, target_subsumptions: List[List], test_type: str = \"test\"):\nr\"\"\"Test and calculate the metrics for a given list of subsumption pairs.\n\n Args:\n target_subsumptions (List[Tuple]): A list of subsumption pairs.\n test_type (str): `test` for testing or `valid` for validation.\n \"\"\"\n\n MRR_sum, hits1_sum, hits5_sum, hits10_sum = 0, 0, 0, 0\n MRR, Hits1, Hits5, Hits10 = 0, 0, 0, 0\n size_sum, size_n = 0, 0\n for k0, test in enumerate(target_subsumptions):\n subcls, gt = test[0], test[1]\n candidates = test[1:]\n\n candidate_subsumptions = [[subcls, c] for c in candidates]\n candidate_scores = np.zeros(len(candidate_subsumptions))\n for k1, candidate_subsumption in enumerate(candidate_subsumptions):\n samples = self.sampler.subsumptions_to_samples(subsumptions=[candidate_subsumption], sample_label=None)\n size_sum += len(samples)\n size_n += 1\n scores = self.score(samples=samples)\n candidate_scores[k1] = np.average(scores)\n\n sorted_indexes = np.argsort(candidate_scores)[::-1]\n sorted_classes = [candidates[i] for i in sorted_indexes]\n\n rank = sorted_classes.index(gt) + 1\n MRR_sum += 1.0 / rank\n hits1_sum += 1 if gt in sorted_classes[:1] else 0\n hits5_sum += 1 if gt in sorted_classes[:5] else 0\n hits10_sum += 1 if gt in sorted_classes[:10] else 0\n num = k0 + 1\n MRR, Hits1, Hits5, Hits10 = MRR_sum / num, hits1_sum / num, hits5_sum / num, hits10_sum / num\n if num % 500 == 0:\n print(\n \"\\n%d tested, MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n\"\n % (num, MRR, Hits1, Hits5, Hits10)\n )\n print(\n \"\\n[%s], MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n\" % (test_type, MRR, Hits1, Hits5, Hits10)\n )\n print(\"%.2f samples per testing subsumption\" % (size_sum / size_n))\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.predict","title":"predict(target_subsumptions)
","text":"Predict a score for each given subsumption in the list.
The scores will be saved in test_subsumption_scores.csv
.
Parameters:
Name Type Description Defaulttarget_subsumptions
List[List]
Each item is a list where the first element is a fixed ontology class \\(C\\), and the remaining elements are potential (candidate) super-classes of \\(C\\).
required Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
def predict(self, target_subsumptions: List[List]):\nr\"\"\"Predict a score for each given subsumption in the list.\n\n The scores will be saved in `test_subsumption_scores.csv`.\n\n Args:\n target_subsumptions (List[List]): Each item is a list where the first element is a fixed ontology class $C$,\n and the remaining elements are potential (candidate) super-classes of $C$.\n \"\"\"\n out_lines = []\n for test in target_subsumptions:\n subcls, candidates = test[0], test[1:]\n candidate_subsumptions = [[subcls, c] for c in candidates]\n candidate_scores = []\n\n for candidate_subsumption in candidate_subsumptions:\n samples = self.sampler.subsumptions_to_samples(subsumptions=[candidate_subsumption], sample_label=None)\n scores = self.score(samples=samples)\n candidate_scores.append(np.average(scores))\n\n out_lines.append(\",\".join([str(i) for i in candidate_scores]))\n\n out_file = \"test_subsumption_scores.csv\"\n with open(out_file, \"w\") as f:\n for line in out_lines:\n f.write(\"%s\\n\" % line)\n print(\"Predicted subsumption scores are saved to %s\" % out_file)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.extract_subsumptions_from_ontology","title":"extract_subsumptions_from_ontology(onto, subsumption_type)
staticmethod
","text":"Extract target subsumptions from a given ontology.
Parameters:
Name Type Description Defaultonto
Ontology
The target ontology.
requiredsubsumption_type
str
the type of subsumptions, options are \"named_class\"
or \"restriction\"
.
src/deeponto/complete/bertsubs/pipeline_intra.py
@staticmethod\ndef extract_subsumptions_from_ontology(onto: Ontology, subsumption_type: str):\nr\"\"\"Extract target subsumptions from a given ontology.\n\n Args:\n onto (Ontology): The target ontology.\n subsumption_type (str): the type of subsumptions, options are `\"named_class\"` or `\"restriction\"`.\n\n \"\"\"\n all_subsumptions = onto.get_subsumption_axioms(entity_type=\"Classes\")\n subsumptions = []\n if subsumption_type == \"restriction\":\n for subs in all_subsumptions:\n if (\n not onto.check_deprecated(owl_object=subs.getSubClass())\n and not onto.check_named_entity(owl_object=subs.getSuperClass())\n and SubsumptionSampler.is_basic_existential_restriction(\n complex_class_str=str(subs.getSuperClass())\n )\n ):\n subsumptions.append(subs)\n elif subsumption_type == \"named_class\":\n for subs in all_subsumptions:\n c1, c2 = subs.getSubClass(), subs.getSuperClass()\n if (\n onto.check_named_entity(owl_object=c1)\n and not onto.check_deprecated(owl_object=c1)\n and onto.check_named_entity(owl_object=c2)\n and not onto.check_deprecated(owl_object=c2)\n ):\n subsumptions.append(subs)\n else:\n warnings.warn(\"\\nUnknown subsumption type: %s\\n\" % subsumption_type)\n return subsumptions\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.extract_restrictions_from_ontology","title":"extract_restrictions_from_ontology(onto)
staticmethod
","text":"Extract basic existential restriction from an ontology.
Parameters:
Name Type Description Defaultonto
Ontology
The target ontology.
requiredReturns:
Name Type Descriptionrestrictions
List
a list of existential restrictions.
Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
@staticmethod\ndef extract_restrictions_from_ontology(onto: Ontology):\nr\"\"\"Extract basic existential restriction from an ontology.\n\n Args:\n onto (Ontology): The target ontology.\n Returns:\n restrictions (List): a list of existential restrictions.\n \"\"\"\n restrictions = []\n for complexC in onto.get_asserted_complex_classes():\n if SubsumptionSampler.is_basic_existential_restriction(complex_class_str=str(complexC)):\n restrictions.append(complexC)\n return restrictions\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.get_test_neg_candidates_restriction","title":"get_test_neg_candidates_restriction(subcls, max_neg_size, restrictions, onto)
staticmethod
","text":"Get a list of negative candidate class restrictions for testing.
Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
@staticmethod\ndef get_test_neg_candidates_restriction(subcls, max_neg_size, restrictions, onto):\n\"\"\"Get a list of negative candidate class restrictions for testing.\"\"\"\n neg_restrictions = list()\n n = max_neg_size * 2 if max_neg_size * 2 <= len(restrictions) else len(restrictions)\n for r in random.sample(restrictions, n):\n if not onto.reasoner.check_subsumption(sub_entity=subcls, super_entity=r):\n neg_restrictions.append(r)\n if len(neg_restrictions) >= max_neg_size:\n break\n return neg_restrictions\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.get_test_neg_candidates_named_class","title":"get_test_neg_candidates_named_class(subclass, gt, max_neg_size, onto, max_depth=3, max_width=8)
staticmethod
","text":"Get a list of negative candidate named classes for testing.
Source code insrc/deeponto/complete/bertsubs/pipeline_intra.py
@staticmethod\ndef get_test_neg_candidates_named_class(subclass, gt, max_neg_size, onto, max_depth=3, max_width=8):\n\"\"\"Get a list of negative candidate named classes for testing.\"\"\"\n all_nebs, seeds = set(), [gt]\n depth = 1\n while depth <= max_depth:\n new_seeds = set()\n for seed in seeds:\n nebs = set()\n for nc_iri in onto.reasoner.get_inferred_sub_entities(\n seed, direct=True\n ) + onto.reasoner.get_inferred_super_entities(seed, direct=True):\n nc = onto.owl_classes[nc_iri]\n if onto.check_named_entity(owl_object=nc) and not onto.check_deprecated(owl_object=nc):\n nebs.add(nc)\n new_seeds = new_seeds.union(nebs)\n all_nebs = all_nebs.union(nebs)\n depth += 1\n seeds = random.sample(new_seeds, max_width) if len(new_seeds) > max_width else new_seeds\n all_nebs = (\n all_nebs\n - {onto.owl_classes[iri] for iri in onto.reasoner.get_inferred_super_entities(subclass, direct=False)}\n - {subclass}\n )\n if len(all_nebs) > max_neg_size:\n return random.sample(all_nebs, max_neg_size)\n else:\n return list(all_nebs)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler","title":"SubsumptionSampler(onto, config)
","text":"Class for sampling functions for training the subsumption prediction model.
Attributes:
Name Type Descriptiononto
Ontology
The target ontology.
config
CfgNode
The loaded configuration.
named_classes
Set[str]
IRIs of named classes that are not deprecated.
iri_label
Dict[str, List]
key -- class iris from named_classes
, value -- a list of labels.
restrictionObjects
Set[OWLClassExpression]
Basic existential restrictions that appear in the ontology.
restrictions
set[str]
Strings of basic existential restrictions corresponding to restrictionObjects
.
restriction_label
Dict[str
List]): key -- existential restriction string, value -- a list of existential restriction labels.
verb
OntologyVerbaliser
object for verbalisation.
Source code insrc/deeponto/complete/bertsubs/text_semantics.py
def __init__(self, onto: Ontology, config: CfgNode):\n self.onto = onto\n self.config = config\n self.named_classes = self.extract_named_classes(onto=onto)\n self.iri_label = dict()\n for iri in self.named_classes:\n self.iri_label[iri] = []\n for p in config.label_property:\n strings = onto.get_annotations(\n owl_object=onto.get_owl_object(iri),\n annotation_property_iri=p,\n annotation_language_tag=None,\n apply_lowercasing=False,\n normalise_identifiers=False,\n )\n for s in strings:\n if s not in self.iri_label[iri]:\n self.iri_label[iri].append(s)\n\n self.restrictionObjects = set()\n self.restrictions = set()\n self.restriction_label = dict()\n self.verb = OntologyVerbaliser(onto=onto)\n for complexC in onto.get_asserted_complex_classes():\n s = str(complexC)\n self.restriction_label[s] = []\n if self.is_basic_existential_restriction(complex_class_str=s):\n self.restrictionObjects.add(complexC)\n self.restrictions.add(s)\n self.restriction_label[s].append(self.verb.verbalise_class_expression(complexC).verbal)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.is_basic_existential_restriction","title":"is_basic_existential_restriction(complex_class_str)
staticmethod
","text":"Determine if a complex class expression is a basic existential restriction.
Source code insrc/deeponto/complete/bertsubs/text_semantics.py
@staticmethod\ndef is_basic_existential_restriction(complex_class_str: str):\n\"\"\"Determine if a complex class expression is a basic existential restriction.\"\"\"\n IRI = \"<https?:\\\\/\\\\/(?:www\\\\.)?[-a-zA-Z0-9@:%._\\\\+~#=]{1,256}\\\\.[a-zA-Z0-9()]{1,6}\\\\b(?:[-a-zA-Z0-9()@:%_\\\\+.~#?&\\\\/=]*)>\"\n p = rf\"ObjectSomeValuesFrom\\({IRI}\\s{IRI}\\)\"\n if re.match(p, complex_class_str):\n return True\n else:\n return False\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.generate_samples","title":"generate_samples(subsumptions, duplicate=True)
","text":"Generate text samples from subsumptions.
Parameters:
Name Type Description Defaultsubsumptions
List[List]
A list of subsumptions, each of which of is a two-component list (sub_class_iri, super_class_iri_or_str)
.
duplicate
bool
True
-- duplicate the positive and negative samples, False
-- do not duplicate.
True
Returns:
Type DescriptionList[List]
A list of samples, each element is a triple in the form of (sub_class_string, super_class_string, label_index)
.
src/deeponto/complete/bertsubs/text_semantics.py
def generate_samples(self, subsumptions: List[List], duplicate: bool = True):\nr\"\"\"Generate text samples from subsumptions.\n\n Args:\n subsumptions (List[List]): A list of subsumptions, each of which of is a two-component list `(sub_class_iri, super_class_iri_or_str)`.\n duplicate (bool): `True` -- duplicate the positive and negative samples, `False` -- do not duplicate.\n\n Returns:\n (List[List]): A list of samples, each element is a triple\n in the form of `(sub_class_string, super_class_string, label_index)`.\n \"\"\"\n if duplicate:\n pos_dup, neg_dup = self.config.fine_tune.train_pos_dup, self.config.fine_tune.train_neg_dup\n else:\n pos_dup, neg_dup = 1, 1\n neg_subsumptions = list()\n for subs in subsumptions:\n c1 = subs[0]\n for _ in range(neg_dup):\n neg_c = self.get_negative_sample(subclass_iri=c1, subsumption_type=self.config.subsumption_type)\n if neg_c is not None:\n neg_subsumptions.append([c1, neg_c])\n pos_samples = self.subsumptions_to_samples(subsumptions=subsumptions, sample_label=1)\n pos_samples = pos_dup * pos_samples\n neg_samples = self.subsumptions_to_samples(subsumptions=neg_subsumptions, sample_label=0)\n if len(neg_samples) < len(pos_samples):\n neg_samples = neg_samples + [\n random.choice(neg_samples) for _ in range(len(pos_samples) - len(neg_samples))\n ]\n if len(neg_samples) > len(pos_samples):\n pos_samples = pos_samples + [\n random.choice(pos_samples) for _ in range(len(neg_samples) - len(pos_samples))\n ]\n print(\"pos_samples: %d, neg_samples: %d\" % (len(pos_samples), len(neg_samples)))\n all_samples = [s for s in pos_samples + neg_samples if s[0] != \"\" and s[1] != \"\"]\n random.shuffle(all_samples)\n return all_samples\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.subsumptions_to_samples","title":"subsumptions_to_samples(subsumptions, sample_label)
","text":"Transform subsumptions into samples of strings.
Parameters:
Name Type Description Defaultsubsumptions
List[List]
The given subsumptions.
requiredsample_label
Union[int, None]
1
(positive), 0
(negative), None
(no label).
Returns:
Type DescriptionList[List]
A list of samples, each element is a triple in the form of (sub_class_string, super_class_string, label_index)
.
src/deeponto/complete/bertsubs/text_semantics.py
def subsumptions_to_samples(self, subsumptions: List[List], sample_label: Union[int, None]):\nr\"\"\"Transform subsumptions into samples of strings.\n\n Args:\n subsumptions (List[List]): The given subsumptions.\n sample_label (Union[int, None]): `1` (positive), `0` (negative), `None` (no label).\n\n Returns:\n (List[List]): A list of samples, each element is a triple\n in the form of `(sub_class_string, super_class_string, label_index)`.\n\n \"\"\"\n local_samples = list()\n for subs in subsumptions:\n subcls, supcls = subs[0], subs[1]\n substrs = self.iri_label[subcls] if subcls in self.iri_label and len(self.iri_label[subcls]) > 0 else [\"\"]\n\n if self.config.subsumption_type == \"named_class\":\n supstrs = self.iri_label[supcls] if supcls in self.iri_label and len(self.iri_label[supcls]) else [\"\"]\n else:\n if supcls in self.restriction_label and len(self.restriction_label[supcls]) > 0:\n supstrs = self.restriction_label[supcls]\n else:\n supstrs = [self.verb.verbalise_class_expression(supcls).verbal]\n\n if self.config.use_one_label:\n substrs, supstrs = substrs[0:1], supstrs[0:1]\n\n if self.config.prompt.prompt_type == \"isolated\":\n for substr in substrs:\n for supstr in supstrs:\n local_samples.append([substr, supstr])\n\n elif self.config.prompt.prompt_type == \"traversal\":\n subs_list_strs = set()\n for _ in range(self.config.prompt.context_dup):\n context_sub, no_duplicate = self.traversal_subsumptions(\n cls=subcls,\n hop=self.config.prompt.prompt_hop,\n direction=\"subclass\",\n max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n )\n subs_list = [self.named_subsumption_to_str(subsum) for subsum in context_sub]\n subs_list_str = \" <SEP> \".join(subs_list)\n subs_list_strs.add(subs_list_str)\n if no_duplicate:\n break\n\n if self.config.subsumption_type == \"named_class\":\n sups_list_strs = set()\n for _ in range(self.config.prompt.context_dup):\n context_sup, no_duplicate = self.traversal_subsumptions(\n cls=supcls,\n hop=self.config.prompt.prompt_hop,\n direction=\"supclass\",\n max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n )\n sups_list = [self.named_subsumption_to_str(subsum) for subsum in context_sup]\n sups_list_str = \" <SEP> \".join(sups_list)\n sups_list_strs.add(sups_list_str)\n if no_duplicate:\n break\n else:\n sups_list_strs = set(supstrs)\n\n for subs_list_str in subs_list_strs:\n for substr in substrs:\n s1 = substr + \" <SEP> \" + subs_list_str\n for sups_list_str in sups_list_strs:\n for supstr in supstrs:\n s2 = supstr + \" <SEP> \" + sups_list_str\n local_samples.append([s1, s2])\n\n elif self.config.prompt.prompt_type == \"path\":\n sep_token = \"<SUB>\" if self.config.prompt.use_sub_special_token else \"<SEP>\"\n\n s1_set = set()\n for _ in range(self.config.prompt.context_dup):\n context_sub, no_duplicate = self.path_subsumptions(\n cls=subcls, hop=self.config.prompt.prompt_hop, direction=\"subclass\"\n )\n if len(context_sub) > 0:\n s1 = \"\"\n for i in range(len(context_sub)):\n subsum = context_sub[len(context_sub) - i - 1]\n subc = subsum[0]\n s1 += \"%s %s \" % (\n self.iri_label[subc][0]\n if subc in self.iri_label and len(self.iri_label[subc]) > 0\n else \"\",\n sep_token,\n )\n for substr in substrs:\n s1_set.add(s1 + substr)\n else:\n for substr in substrs:\n s1_set.add(\"%s %s\" % (sep_token, substr))\n\n if no_duplicate:\n break\n\n if self.config.subsumption_type == \"named_class\":\n s2_set = set()\n for _ in range(self.config.prompt.context_dup):\n context_sup, no_duplicate = self.path_subsumptions(\n cls=supcls, hop=self.config.prompt.prompt_hop, direction=\"supclass\"\n )\n if len(context_sup) > 0:\n s2 = \"\"\n for subsum in context_sup:\n supc = subsum[1]\n s2 += \" %s %s\" % (\n sep_token,\n self.iri_label[supc][0]\n if supc in self.iri_label and len(self.iri_label[supc]) > 0\n else \"\",\n )\n for supstr in supstrs:\n s2_set.add(supstr + s2)\n else:\n for supstr in supstrs:\n s2_set.add(\"%s %s\" % (supstr, sep_token))\n\n if no_duplicate:\n break\n else:\n s2_set = set(supstrs)\n\n for s1 in s1_set:\n for s2 in s2_set:\n local_samples.append([s1, s2])\n\n else:\n print(f\"unknown context type {self.config.prompt.prompt_type}\")\n sys.exit(0)\n\n if sample_label is not None:\n for i in range(len(local_samples)):\n local_samples[i].append(sample_label)\n\n return local_samples\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.get_negative_sample","title":"get_negative_sample(subclass_iri, subsumption_type='named_class')
","text":"Given a named subclass, get a negative class for a negative subsumption.
Parameters:
Name Type Description Defaultsubclass_iri
str
IRI of a given sub-class.
requiredsubsumption_type
str
named_class
or restriction
.
'named_class'
Source code in src/deeponto/complete/bertsubs/text_semantics.py
def get_negative_sample(self, subclass_iri: str, subsumption_type: str = \"named_class\"):\nr\"\"\"Given a named subclass, get a negative class for a negative subsumption.\n\n Args:\n subclass_iri (str): IRI of a given sub-class.\n subsumption_type (str): `named_class` or `restriction`.\n \"\"\"\n subclass = self.onto.get_owl_object(iri=subclass_iri)\n if subsumption_type == \"named_class\":\n if self.config.no_reasoning:\n parents = self.onto.get_asserted_parents(owl_object=subclass, named_only=True)\n ancestors = set([str(item.getIRI()) for item in parents])\n else:\n ancestors = set(self.onto.reasoner.get_inferred_super_entities(subclass, direct=False))\n neg_c = random.sample(self.named_classes - ancestors, 1)[0]\n return neg_c\n else:\n for neg_c in random.sample(self.restrictionObjects, 5):\n if self.config.no_reasoning:\n return str(neg_c)\n else:\n if not self.onto.reasoner.check_subsumption(sub_entity=subclass, super_entity=neg_c):\n return str(neg_c)\n return None\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.named_subsumption_to_str","title":"named_subsumption_to_str(subsum)
","text":"Transform a named subsumption into string with <SUB>
and classes' labels.
Parameters:
Name Type Description Defaultsubsum
List[Tuple]
A list of subsumption pairs in the form of (sub_class_iri, super_class_iri)
.
src/deeponto/complete/bertsubs/text_semantics.py
def named_subsumption_to_str(self, subsum: List):\nr\"\"\"Transform a named subsumption into string with `<SUB>` and classes' labels.\n\n Args:\n subsum (List[Tuple]): A list of subsumption pairs in the form of `(sub_class_iri, super_class_iri)`.\n \"\"\"\n subc, supc = subsum[0], subsum[1]\n subs = self.iri_label[subc][0] if subc in self.iri_label and len(self.iri_label[subc]) > 0 else \"\"\n sups = self.iri_label[supc][0] if supc in self.iri_label and len(self.iri_label[supc]) > 0 else \"\"\n return \"%s <SUB> %s\" % (subs, sups)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.subclass_to_strings","title":"subclass_to_strings(subcls)
","text":"Transform a sub-class into strings (with the path or traversal context template).
Parameters:
Name Type Description Defaultsubcls
str
IRI of the sub-class.
required Source code insrc/deeponto/complete/bertsubs/text_semantics.py
def subclass_to_strings(self, subcls):\nr\"\"\"Transform a sub-class into strings (with the path or traversal context template).\n\n Args:\n subcls (str): IRI of the sub-class.\n \"\"\"\n substrs = self.iri_label[subcls] if subcls in self.iri_label and len(self.iri_label[subcls]) > 0 else [\"\"]\n\n if self.config.use_one_label:\n substrs = substrs[0:1]\n\n if self.config.prompt.prompt_type == \"isolated\":\n return substrs\n\n elif self.config.prompt.prompt_type == \"traversal\":\n subs_list_strs = set()\n for _ in range(self.config.prompt.context_dup):\n context_sub, no_duplicate = self.traversal_subsumptions(\n cls=subcls,\n hop=self.config.prompt.prompt_hop,\n direction=\"subclass\",\n max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n )\n subs_list = [self.named_subsumption_to_str(subsum) for subsum in context_sub]\n subs_list_str = \" <SEP> \".join(subs_list)\n subs_list_strs.add(subs_list_str)\n if no_duplicate:\n break\n\n strs = list()\n for subs_list_str in subs_list_strs:\n for substr in substrs:\n s1 = substr + \" <SEP> \" + subs_list_str\n strs.append(s1)\n return strs\n\n elif self.config.prompt.prompt_type == \"path\":\n sep_token = \"<SUB>\" if self.config.prompt.use_sub_special_token else \"<SEP>\"\n\n s1_set = set()\n for _ in range(self.config.prompt.context_dup):\n context_sub, no_duplicate = self.path_subsumptions(\n cls=subcls, hop=self.config.prompt.prompt_hop, direction=\"subclass\"\n )\n if len(context_sub) > 0:\n s1 = \"\"\n for i in range(len(context_sub)):\n subsum = context_sub[len(context_sub) - i - 1]\n subc = subsum[0]\n s1 += \"%s %s \" % (\n self.iri_label[subc][0]\n if subc in self.iri_label and len(self.iri_label[subc]) > 0\n else \"\",\n sep_token,\n )\n for substr in substrs:\n s1_set.add(s1 + substr)\n else:\n for substr in substrs:\n s1_set.add(\"%s %s\" % (sep_token, substr))\n if no_duplicate:\n break\n\n return list(s1_set)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.supclass_to_strings","title":"supclass_to_strings(supcls, subsumption_type='named_class')
","text":"Transform a super-class into strings (with the path or traversal context template if the subsumption type is \"named_class\"
).
Parameters:
Name Type Description Defaultsupcls
str
IRI of the super-class.
requiredsubsumption_type
str
The type of the subsumption.
'named_class'
Source code in src/deeponto/complete/bertsubs/text_semantics.py
def supclass_to_strings(self, supcls: str, subsumption_type: str = \"named_class\"):\nr\"\"\"Transform a super-class into strings (with the path or traversal context template if the subsumption type is `\"named_class\"`).\n\n Args:\n supcls (str): IRI of the super-class.\n subsumption_type (str): The type of the subsumption.\n \"\"\"\n\n if subsumption_type == \"named_class\":\n supstrs = self.iri_label[supcls] if supcls in self.iri_label and len(self.iri_label[supcls]) else [\"\"]\n else:\n if supcls in self.restriction_label and len(self.restriction_label[supcls]) > 0:\n supstrs = self.restriction_label[supcls]\n else:\n warnings.warn(\"Warning: %s has no descriptions\" % supcls)\n supstrs = [\"\"]\n\n if self.config.use_one_label:\n if subsumption_type == \"named_class\":\n supstrs = supstrs[0:1]\n\n if self.config.prompt.prompt_type == \"isolated\":\n return supstrs\n\n elif self.config.prompt.prompt_type == \"traversal\":\n if subsumption_type == \"named_class\":\n sups_list_strs = set()\n for _ in range(self.config.prompt.context_dup):\n context_sup, no_duplicate = self.traversal_subsumptions(\n cls=supcls,\n hop=self.config.prompt.prompt_hop,\n direction=\"supclass\",\n max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n )\n sups_list = [self.named_subsumption_to_str(subsum) for subsum in context_sup]\n sups_list_str = \" <SEP> \".join(sups_list)\n sups_list_strs.add(sups_list_str)\n if no_duplicate:\n break\n\n else:\n sups_list_strs = set(supstrs)\n\n strs = list()\n for sups_list_str in sups_list_strs:\n for supstr in supstrs:\n s2 = supstr + \" <SEP> \" + sups_list_str\n strs.append(s2)\n return strs\n\n elif self.config.prompt.prompt_type == \"path\":\n sep_token = \"<SUB>\" if self.config.prompt.use_sub_special_token else \"<SEP>\"\n\n if subsumption_type == \"named_class\":\n s2_set = set()\n for _ in range(self.config.prompt.context_dup):\n context_sup, no_duplicate = self.path_subsumptions(\n cls=supcls, hop=self.config.prompt.prompt_hop, direction=\"supclass\"\n )\n if len(context_sup) > 0:\n s2 = \"\"\n for subsum in context_sup:\n supc = subsum[1]\n s2 += \" %s %s\" % (\n sep_token,\n self.iri_label[supc][0]\n if supc in self.iri_label and len(self.iri_label[supc]) > 0\n else \"\",\n )\n for supstr in supstrs:\n s2_set.add(supstr + s2)\n else:\n for supstr in supstrs:\n s2_set.add(\"%s %s\" % (supstr, sep_token))\n\n if no_duplicate:\n break\n else:\n s2_set = set(supstrs)\n\n return list(s2_set)\n\n else:\n print(\"unknown context type %s\" % self.config.prompt.prompt_type)\n sys.exit(0)\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.traversal_subsumptions","title":"traversal_subsumptions(cls, hop=1, direction='subclass', max_subsumptions=5)
","text":"Given a class, get its subsumptions by traversing the class hierarchy.
If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\nIf the class is a super-class in the subsumption axiom, get subsumptions from upside.\n
Parameters:
Name Type Description Defaultcls
str
IRI of a named class.
requiredhop
int
The depth of the path.
1
direction
str
subclass
(downside path) or supclass
(upside path).
'subclass'
max_subsumptions
int
The maximum number of subsumptions to consider.
5
Source code in src/deeponto/complete/bertsubs/text_semantics.py
def traversal_subsumptions(self, cls: str, hop: int = 1, direction: str = \"subclass\", max_subsumptions: int = 5):\nr\"\"\"Given a class, get its subsumptions by traversing the class hierarchy.\n\n If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\n If the class is a super-class in the subsumption axiom, get subsumptions from upside.\n\n Args:\n cls (str): IRI of a named class.\n hop (int): The depth of the path.\n direction (str): `subclass` (downside path) or `supclass` (upside path).\n max_subsumptions (int): The maximum number of subsumptions to consider.\n \"\"\"\n subsumptions = list()\n seeds = [cls]\n d = 1\n no_duplicate = True\n while d <= hop:\n new_seeds = list()\n for s in seeds:\n if direction == \"subclass\":\n tmp = self.onto.reasoner.get_inferred_sub_entities(\n self.onto.get_owl_object(iri=s), direct=True\n )\n if len(tmp) > 1:\n no_duplicate = False\n random.shuffle(tmp)\n for c in tmp:\n if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n subsumptions.append([c, s])\n if c not in new_seeds:\n new_seeds.append(c)\n elif direction == \"supclass\":\n tmp = self.onto.reasoner.get_inferred_super_entities(\n self.onto.get_owl_object(iri=s), direct=True\n )\n if len(tmp) > 1:\n no_duplicate = False\n random.shuffle(tmp)\n for c in tmp:\n if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n subsumptions.append([s, c])\n if c not in new_seeds:\n new_seeds.append(c)\n else:\n warnings.warn(\"Unknown direction: %s\" % direction)\n if len(subsumptions) >= max_subsumptions:\n subsumptions = random.sample(subsumptions, max_subsumptions)\n break\n else:\n seeds = new_seeds\n random.shuffle(seeds)\n d += 1\n return subsumptions, no_duplicate\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.path_subsumptions","title":"path_subsumptions(cls, hop=1, direction='subclass')
","text":"Given a class, get its path subsumptions.
If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\nIf the class is a super-class in the subsumption axiom, get subsumptions from upside.\n
Parameters:
Name Type Description Defaultcls
str
IRI of a named class.
requiredhop
int
The depth of the path.
1
direction
str
subclass
(downside path) or supclass
(upside path).
'subclass'
Source code in src/deeponto/complete/bertsubs/text_semantics.py
def path_subsumptions(self, cls: str, hop: int = 1, direction: str = \"subclass\"):\nr\"\"\"Given a class, get its path subsumptions.\n\n If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\n If the class is a super-class in the subsumption axiom, get subsumptions from upside.\n\n Args:\n cls (str): IRI of a named class.\n hop (int): The depth of the path.\n direction (str): `subclass` (downside path) or `supclass` (upside path).\n \"\"\"\n subsumptions = list()\n seed = cls\n d = 1\n no_duplicate = True\n while d <= hop:\n if direction == \"subclass\":\n tmp = self.onto.reasoner.get_inferred_sub_entities(\n self.onto.get_owl_object(iri=seed), direct=True\n )\n if len(tmp) > 1:\n no_duplicate = False\n end = True\n if len(tmp) > 0:\n random.shuffle(tmp)\n for c in tmp:\n if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n subsumptions.append([c, seed])\n seed = c\n end = False\n break\n if end:\n break\n elif direction == \"supclass\":\n tmp = self.onto.reasoner.get_inferred_super_entities(\n self.onto.get_owl_object(iri=seed), direct=True\n )\n if len(tmp) > 1:\n no_duplicate = False\n end = True\n if len(tmp) > 0:\n random.shuffle(tmp)\n for c in tmp:\n if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n subsumptions.append([seed, c])\n seed = c\n end = False\n break\n if end:\n break\n else:\n warnings.warn(\"Unknown direction: %s\" % direction)\n\n d += 1\n return subsumptions, no_duplicate\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer","title":"BERTSubsumptionClassifierTrainer(bert_checkpoint, train_data, val_data, max_length=128, early_stop=False, early_stop_patience=10)
","text":"Source code in src/deeponto/complete/bertsubs/bert_classifier.py
def __init__(\n self,\n bert_checkpoint: str,\n train_data: List,\n val_data: List,\n max_length: int = 128,\n early_stop: bool = False,\n early_stop_patience: int = 10,\n):\n print(f\"initialize BERT for Binary Classification from the Pretrained BERT model at: {bert_checkpoint} ...\")\n\n # BERT\n self.model = AutoModelForSequenceClassification.from_pretrained(bert_checkpoint)\n self.tokenizer = AutoTokenizer.from_pretrained(bert_checkpoint)\n self.trainer = None\n\n self.max_length = max_length\n self.tra = self.load_dataset(train_data, max_length=self.max_length, count_token_size=True)\n self.val = self.load_dataset(val_data, max_length=self.max_length, count_token_size=True)\n print(f\"text max length: {self.max_length}\")\n print(f\"data files loaded with sizes:\")\n print(f\"\\t[# Train]: {len(self.tra)}, [# Val]: {len(self.val)}\")\n\n # early stopping\n self.early_stop = early_stop\n self.early_stop_patience = early_stop_patience\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.add_special_tokens","title":"add_special_tokens(tokens)
","text":"Add additional special tokens into the tokenizer's vocab.
Parameters:
Name Type Description Defaulttokens
List[str]
additional tokens to add, e.g., [\"<SUB>\",\"<EOA>\",\"<EOC>\"]
src/deeponto/complete/bertsubs/bert_classifier.py
def add_special_tokens(self, tokens: List):\nr\"\"\"Add additional special tokens into the tokenizer's vocab.\n Args:\n tokens (List[str]): additional tokens to add, e.g., `[\"<SUB>\",\"<EOA>\",\"<EOC>\"]`\n \"\"\"\n special_tokens_dict = {\"additional_special_tokens\": tokens}\n self.tokenizer.add_special_tokens(special_tokens_dict)\n self.model.resize_token_embeddings(len(self.tokenizer))\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.train","title":"train(train_args, do_fine_tune=True)
","text":"Initiate the Huggingface trainer with input arguments and start training.
Parameters:
Name Type Description Defaulttrain_args
TrainingArguments
Arguments for training.
requireddo_fine_tune
bool
False
means loading the checkpoint without training. Defaults to True
.
True
Source code in src/deeponto/complete/bertsubs/bert_classifier.py
def train(self, train_args: TrainingArguments, do_fine_tune: bool = True):\nr\"\"\"Initiate the Huggingface trainer with input arguments and start training.\n Args:\n train_args (TrainingArguments): Arguments for training.\n do_fine_tune (bool): `False` means loading the checkpoint without training. Defaults to `True`.\n \"\"\"\n self.trainer = Trainer(\n model=self.model,\n args=train_args,\n train_dataset=self.tra,\n eval_dataset=self.val,\n compute_metrics=self.compute_metrics,\n tokenizer=self.tokenizer,\n )\n if self.early_stop:\n self.trainer.add_callback(EarlyStoppingCallback(early_stopping_patience=self.early_stop_patience))\n if do_fine_tune:\n self.trainer.train()\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.compute_metrics","title":"compute_metrics(pred)
staticmethod
","text":"Auxiliary function to add accurate metric into evaluation.
Source code insrc/deeponto/complete/bertsubs/bert_classifier.py
@staticmethod\ndef compute_metrics(pred):\n\"\"\"Auxiliary function to add accurate metric into evaluation.\n \"\"\"\n labels = pred.label_ids\n preds = pred.predictions.argmax(-1)\n acc = accuracy_score(labels, preds)\n return {\"accuracy\": acc}\n
"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.load_dataset","title":"load_dataset(data, max_length=512, count_token_size=False)
","text":"Load a Huggingface dataset from a list of samples.
Parameters:
Name Type Description Defaultdata
List[Tuple]
Data samples in a list.
requiredmax_length
int
Maximum length of the input sequence.
512
count_token_size
bool
Whether or not to count the token sizes of the data. Defaults to False
.
False
Source code in src/deeponto/complete/bertsubs/bert_classifier.py
def load_dataset(self, data: List, max_length: int = 512, count_token_size: bool = False) -> Dataset:\nr\"\"\"Load a Huggingface dataset from a list of samples.\n Args:\n data (List[Tuple]): Data samples in a list.\n max_length (int): Maximum length of the input sequence.\n count_token_size (bool): Whether or not to count the token sizes of the data. Defaults to `False`.\n \"\"\"\n # data_df = pd.DataFrame(data, columns=[\"sent1\", \"sent2\", \"labels\"])\n # dataset = Dataset.from_pandas(data_df)\n\n def iterate():\n for sample in data:\n yield {\"sent1\": sample[0], \"sent2\": sample[1], \"labels\": sample[2]}\n\n dataset = Dataset.from_generator(iterate)\n\n if count_token_size:\n tokens = self.tokenizer(dataset[\"sent1\"], dataset[\"sent2\"])\n l_sum, num_128, num_256, num_512, l_max = 0, 0, 0, 0, 0\n for item in tokens[\"input_ids\"]:\n l = len(item)\n l_sum += l\n if l <= 128:\n num_128 += 1\n if l <= 256:\n num_256 += 1\n if l <= 512:\n num_512 += 1\n if l > l_max:\n l_max = l\n print(\"average token size: %.2f\" % (l_sum / len(tokens[\"input_ids\"])))\n print(\"ratio of token size <= 128: %.3f\" % (num_128 / len(tokens[\"input_ids\"])))\n print(\"ratio of token size <= 256: %.3f\" % (num_256 / len(tokens[\"input_ids\"])))\n print(\"ratio of token size <= 512: %.3f\" % (num_512 / len(tokens[\"input_ids\"])))\n print(\"max token size: %d\" % l_max)\n dataset = dataset.map(\n lambda examples: self.tokenizer(\n examples[\"sent1\"], examples[\"sent2\"], max_length=max_length, truncation=True\n ),\n batched=True,\n num_proc=1,\n )\n return dataset\n
"},{"location":"deeponto/onto/normalisation/","title":"Ontology Pruning","text":""},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser","title":"OntologyNormaliser()
","text":"Class for ontology normalisation.
Credit
The code of this class originates from the mOWL library, which utilises the normalisation functionality from the Java library Jcel
.
The normalisation process transforms ontology axioms into normal forms in the Description Logic \\(\\mathcal{EL}\\), including:
where \\(C\\) and \\(C'\\) can be named concepts or \\(\\top\\), \\(D\\) is a named concept or \\(\\bot\\), \\(r\\) is a role (property).
Attributes:
Name Type Descriptiononto
Ontology
The input ontology to be normalised.
temp_super_class_index
Dict[OWLCLassExpression, OWLClass]
A dictionary in the form of {complex_sub_class: temp_super_class}
, which means temp_super_class
is created during the normalisation of a complex subsumption axiom that has complex_sub_class
as the sub-class.
src/deeponto/onto/normalisation.py
def __init__(self):\n return\n
"},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser.normalise","title":"normalise(ontology)
","text":"Performs the \\(\\mathcal{EL}\\) normalisation.
Parameters:
Name Type Description Defaultontology
Ontology
An ontology to be normalised.
requiredReturns:
Type Descriptionlist[OWLAxiom]
A list of normalised TBox axioms.
Source code insrc/deeponto/onto/normalisation.py
def normalise(self, ontology: Ontology):\nr\"\"\"Performs the $\\mathcal{EL}$ normalisation.\n\n Args:\n ontology (Ontology): An ontology to be normalised.\n\n Returns:\n (list[OWLAxiom]): A list of normalised TBox axioms.\n \"\"\"\n\n processed_owl_onto = self.preprocess_ontology(ontology)\n root_ont = processed_owl_onto\n translator = Translator(\n processed_owl_onto.getOWLOntologyManager().getOWLDataFactory(), IntegerOntologyObjectFactoryImpl()\n )\n axioms = HashSet()\n axioms.addAll(root_ont.getAxioms())\n translator.getTranslationRepository().addAxiomEntities(root_ont)\n\n for ont in root_ont.getImportsClosure():\n axioms.addAll(ont.getAxioms())\n translator.getTranslationRepository().addAxiomEntities(ont)\n\n intAxioms = translator.translateSA(axioms)\n\n normaliser = OntologyNormalizer()\n\n factory = IntegerOntologyObjectFactoryImpl()\n normalised_ontology = normaliser.normalize(intAxioms, factory)\n self.rTranslator = ReverseAxiomTranslator(translator, processed_owl_onto)\n\n normalised_axioms = []\n # revert the jcel axioms to the original OWLAxioms\n for ax in normalised_ontology:\n try:\n axiom = self.rTranslator.visit(ax)\n normalised_axioms.append(axiom)\n except Exception as e:\n logging.info(\"Reverse translation. Ignoring axiom: %s\", ax)\n logging.info(e)\n\n return list(set(axioms))\n
"},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser.preprocess_ontology","title":"preprocess_ontology(ontology)
","text":"Preprocess the ontology to remove axioms that are not supported by the normalisation process.
Source code insrc/deeponto/onto/normalisation.py
def preprocess_ontology(self, ontology: Ontology):\n\"\"\"Preprocess the ontology to remove axioms that are not supported by the normalisation process.\"\"\"\n\n tbox_axioms = ontology.owl_onto.getTBoxAxioms(Imports.fromBoolean(True))\n new_tbox_axioms = HashSet()\n\n for axiom in tbox_axioms:\n axiom_as_str = axiom.toString()\n\n if \"UnionOf\" in axiom_as_str:\n continue\n elif \"MinCardinality\" in axiom_as_str:\n continue\n elif \"ComplementOf\" in axiom_as_str:\n continue\n elif \"AllValuesFrom\" in axiom_as_str:\n continue\n elif \"MaxCardinality\" in axiom_as_str:\n continue\n elif \"ExactCardinality\" in axiom_as_str:\n continue\n elif \"Annotation\" in axiom_as_str:\n continue\n elif \"ObjectHasSelf\" in axiom_as_str:\n continue\n elif \"urn:swrl\" in axiom_as_str:\n continue\n elif \"EquivalentObjectProperties\" in axiom_as_str:\n continue\n elif \"SymmetricObjectProperty\" in axiom_as_str:\n continue\n elif \"AsymmetricObjectProperty\" in axiom_as_str:\n continue\n elif \"ObjectOneOf\" in axiom_as_str:\n continue\n else:\n new_tbox_axioms.add(axiom)\n\n processed_owl_onto = ontology.owl_manager.createOntology(new_tbox_axioms)\n # NOTE: the returned object is `owlapi.OWLOntology` not `deeponto.onto.Ontology`\n return processed_owl_onto\n
"},{"location":"deeponto/onto/ontology/","title":"Ontology","text":"Python classes in this page are strongly dependent on the OWLAPI library. The base class Ontology
extends several features including convenient access to specially defined entities (e.g., owl:Thing
and owl:Nothing
), indexing of entities in the signature with their IRIs as keys, and some other customised functions for specific ontology engineering purposes. Ontology
also has an OntologyReasoner
attribute which provides reasoning facilities such as classifying entities, checking entailment, and so on. Users who are familiar with the OWLAPI should feel relatively easy to extend the Python classes here.
Ontology(owl_path, reasoner_type='hermit')
","text":"Ontology class that extends from the Java library OWLAPI.
Typing from OWLAPI
Types with OWL
prefix are mostly imported from the OWLAPI library by, for example, from org.semanticweb.owlapi.model import OWLObject
.
Attributes:
Name Type Descriptionowl_path
str
The path to the OWL ontology file.
owl_manager
OWLOntologyManager
A ontology manager for creating OWLOntology
.
owl_onto
OWLOntology
An OWLOntology
created by owl_manger
from owl_path
.
owl_iri
str
The IRI of the owl_onto
.
owl_classes
dict[str, OWLClass]
A dictionary that stores the (iri, ontology_class)
pairs.
owl_object_properties
dict[str, OWLObjectProperty]
A dictionary that stores the (iri, ontology_object_property)
pairs.
owl_data_properties
dict[str, OWLDataProperty]
A dictionary that stores the (iri, ontology_data_property)
pairs.
owl_annotation_properties
dict[str, OWLAnnotationProperty]
A dictionary that stores the (iri, ontology_annotation_property)
pairs.
owl_individuals
dict[str, OWLIndividual]
A dictionary that stores the (iri, ontology_individual)
pairs.
owl_data_factory
OWLDataFactory
A data factory for manipulating axioms.
reasoner_type
str
The type of reasoner used. Defaults to \"hermit\"
. Options are [\"hermit\", \"elk\", \"struct\"]
.
reasoner
OntologyReasoner
A reasoner for ontology inference.
Parameters:
Name Type Description Defaultowl_path
str
The path to the OWL ontology file.
requiredreasoner_type
str
The type of reasoner used. Defaults to \"hermit\"
. Options are [\"hermit\", \"elk\", \"struct\"]
.
'hermit'
Source code in src/deeponto/onto/ontology.py
def __init__(self, owl_path: str, reasoner_type: str = \"hermit\"):\n\"\"\"Initialise a new ontology.\n\n Args:\n owl_path (str): The path to the OWL ontology file.\n reasoner_type (str): The type of reasoner used. Defaults to `\"hermit\"`. Options are `[\"hermit\", \"elk\", \"struct\"]`.\n \"\"\"\n self.owl_path = os.path.abspath(owl_path)\n self.owl_manager = OWLManager.createOWLOntologyManager()\n self.owl_onto = self.owl_manager.loadOntologyFromOntologyDocument(IRI.create(File(self.owl_path)))\n self.owl_iri = str(self.owl_onto.getOntologyID().getOntologyIRI().get())\n self.owl_classes = self._get_owl_objects(\"Classes\")\n self.owl_object_properties = self._get_owl_objects(\"ObjectProperties\")\n # for some reason the top object property is included\n if OWL_TOP_OBJECT_PROPERTY in self.owl_object_properties.keys():\n del self.owl_object_properties[OWL_TOP_OBJECT_PROPERTY]\n self.owl_data_properties = self._get_owl_objects(\"DataProperties\")\n self.owl_data_factory = self.owl_manager.getOWLDataFactory()\n self.owl_annotation_properties = self._get_owl_objects(\"AnnotationProperties\")\n self.owl_individuals = self._get_owl_objects(\"Individuals\")\n\n # reasoning\n self.reasoner_type = reasoner_type\n self.reasoner = OntologyReasoner(self, self.reasoner_type)\n\n # hidden attributes\n self._multi_children_classes = None\n self._sibling_class_groups = None\n self._axiom_type = AxiomType # for development use\n\n # summary\n self.info = {\n type(self).__name__: {\n \"loaded_from\": os.path.basename(self.owl_path),\n \"num_classes\": len(self.owl_classes),\n \"num_object_properties\": len(self.owl_object_properties),\n \"num_data_properties\": len(self.owl_data_properties),\n \"num_annotation_properties\": len(self.owl_annotation_properties),\n \"num_individuals\": len(self.owl_individuals),\n \"reasoner_type\": self.reasoner_type,\n }\n }\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.name","title":"name
property
","text":"Return the name of the ontology file.
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLThing","title":"OWLThing
property
","text":"Return OWLThing
.
OWLNothing
property
","text":"Return OWLNoThing
.
OWLTopObjectProperty
property
","text":"Return OWLTopObjectProperty
.
OWLBottomObjectProperty
property
","text":"Return OWLBottomObjectProperty
.
OWLTopDataProperty
property
","text":"Return OWLTopDataProperty
.
OWLBottomDataProperty
property
","text":"Return OWLBottomDataProperty
.
sibling_class_groups: List[List[str]]
property
","text":"Return grouped sibling classes (with a common direct parent);
NOTE that only groups with size > 1 will be considered
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_entity_type","title":"get_entity_type(entity, is_singular=False)
staticmethod
","text":"A handy method to get the type
of an OWLObject
entity.
src/deeponto/onto/ontology.py
@staticmethod\ndef get_entity_type(entity: OWLObject, is_singular: bool = False):\n\"\"\"A handy method to get the `type` of an `OWLObject` entity.\"\"\"\n if isinstance(entity, OWLClassExpression):\n return \"Classes\" if not is_singular else \"Class\"\n elif isinstance(entity, OWLObjectPropertyExpression):\n return \"ObjectProperties\" if not is_singular else \"ObjectProperty\"\n elif isinstance(entity, OWLDataPropertyExpression):\n return \"DataProperties\" if not is_singular else \"DataProperty\"\n elif isinstance(entity, OWLIndividual):\n return \"Individuals\" if not is_singular else \"Individual\"\n else:\n # NOTE: add further options in future\n pass\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_max_jvm_memory","title":"get_max_jvm_memory()
staticmethod
","text":"Get the maximum heap size assigned to the JVM.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef get_max_jvm_memory():\n\"\"\"Get the maximum heap size assigned to the JVM.\"\"\"\n if jpype.isJVMStarted():\n return int(Runtime.getRuntime().maxMemory())\n else:\n raise RuntimeError(\"Cannot retrieve JVM memory as it is not started.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_owl_object","title":"get_owl_object(iri)
","text":"Get an OWLObject
given its IRI.
src/deeponto/onto/ontology.py
def get_owl_object(self, iri: str):\n\"\"\"Get an `OWLObject` given its IRI.\"\"\"\n if iri in self.owl_classes.keys():\n return self.owl_classes[iri]\n elif iri in self.owl_object_properties.keys():\n return self.owl_object_properties[iri]\n elif iri in self.owl_data_properties.keys():\n return self.owl_data_properties[iri]\n elif iri in self.owl_annotation_properties.keys():\n return self.owl_annotation_properties[iri]\n elif iri in self.owl_individuals.keys():\n return self.owl_individuals[iri]\n else:\n raise KeyError(f\"Cannot retrieve unknown IRI: {iri}.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_iri","title":"get_iri(owl_object)
","text":"Get the IRI of an OWLObject
. Raises an exception if there is no associated IRI.
src/deeponto/onto/ontology.py
def get_iri(self, owl_object: OWLObject):\n\"\"\"Get the IRI of an `OWLObject`. Raises an exception if there is no associated IRI.\"\"\"\n try:\n return str(owl_object.getIRI())\n except:\n raise RuntimeError(\"Input owl object does not have IRI.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_axiom_type","title":"get_axiom_type(axiom)
staticmethod
","text":"Get the axiom type (in str
) for the given axiom.
Check full list at: http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/AxiomType.html.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef get_axiom_type(axiom: OWLAxiom):\nr\"\"\"Get the axiom type (in `str`) for the given axiom.\n\n Check full list at: <http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/AxiomType.html>.\n \"\"\"\n return str(axiom.getAxiomType())\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_all_axioms","title":"get_all_axioms()
","text":"Return all axioms (in a list) asserted in the ontology.
Source code insrc/deeponto/onto/ontology.py
def get_all_axioms(self):\n\"\"\"Return all axioms (in a list) asserted in the ontology.\"\"\"\n return list(self.owl_onto.getAxioms())\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_subsumption_axioms","title":"get_subsumption_axioms(entity_type='Classes')
","text":"Return subsumption axioms (subject to input entity type) asserted in the ontology.
Parameters:
Name Type Description Defaultentity_type
str
The entity type to be considered. Defaults to \"Classes\"
. Options are \"Classes\"
, \"ObjectProperties\"
, \"DataProperties\"
, and \"AnnotationProperties\"
.
'Classes'
Returns:
Type DescriptionList[OWLAxiom]
A list of equivalence axioms subject to input entity type.
Source code insrc/deeponto/onto/ontology.py
def get_subsumption_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return subsumption axioms (subject to input entity type) asserted in the ontology.\n\n Args:\n entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n Options are `\"Classes\"`, `\"ObjectProperties\"`, `\"DataProperties\"`, and `\"AnnotationProperties\"`.\n Returns:\n (List[OWLAxiom]): A list of equivalence axioms subject to input entity type.\n \"\"\"\n if entity_type == \"Classes\":\n return list(self.owl_onto.getAxioms(AxiomType.SUBCLASS_OF))\n elif entity_type == \"ObjectProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.SUB_OBJECT_PROPERTY))\n elif entity_type == \"DataProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.SUB_DATA_PROPERTY))\n elif entity_type == \"AnnotationProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.SUB_ANNOTATION_PROPERTY_OF))\n else:\n raise ValueError(f\"Unknown entity type {entity_type}.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_equivalence_axioms","title":"get_equivalence_axioms(entity_type='Classes')
","text":"Return equivalence axioms (subject to input entity type) asserted in the ontology.
Parameters:
Name Type Description Defaultentity_type
str
The entity type to be considered. Defaults to \"Classes\"
. Options are \"Classes\"
, \"ObjectProperties\"
, and \"DataProperties\"
.
'Classes'
Returns:
Type Descriptionlist[OWLAxiom]
A list of equivalence axioms subject to input entity type.
Source code insrc/deeponto/onto/ontology.py
def get_equivalence_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return equivalence axioms (subject to input entity type) asserted in the ontology.\n\n Args:\n entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n Options are `\"Classes\"`, `\"ObjectProperties\"`, and `\"DataProperties\"`.\n Returns:\n (list[OWLAxiom]): A list of equivalence axioms subject to input entity type.\n \"\"\"\n if entity_type == \"Classes\":\n return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_CLASSES))\n elif entity_type == \"ObjectProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_OBJECT_PROPERTIES))\n elif entity_type == \"DataProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_DATA_PROPERTIES))\n else:\n raise ValueError(f\"Unknown entity type {entity_type}.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_assertion_axioms","title":"get_assertion_axioms(entity_type='Classes')
","text":"Return assertion (ABox) axioms (subject to input entity type) asserted in the ontology.
Parameters:
Name Type Description Defaultentity_type
str
The entity type to be considered. Defaults to \"Classes\"
. Options are \"Classes\"
, \"ObjectProperties\"
, and \"DataProperties\"
.
'Classes'
Returns:
Type Descriptionlist[OWLAxiom]
A list of assertion axioms subject to input entity type.
Source code insrc/deeponto/onto/ontology.py
def get_assertion_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return assertion (ABox) axioms (subject to input entity type) asserted in the ontology.\n\n Args:\n entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n Options are `\"Classes\"`, `\"ObjectProperties\"`, and `\"DataProperties\"`.\n Returns:\n (list[OWLAxiom]): A list of assertion axioms subject to input entity type.\n \"\"\"\n if entity_type == \"Classes\":\n return list(self.owl_onto.getAxioms(AxiomType.CLASS_ASSERTION))\n elif entity_type == \"ObjectProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.OBJECT_PROPERTY_ASSERTION))\n elif entity_type == \"DataProperties\":\n return list(self.owl_onto.getAxioms(AxiomType.DATA_PROPERTY_ASSERTION))\n elif entity_type == \"Annotations\":\n return list(self.owl_onto.getAxioms(AxiomType.ANNOTATION_ASSERTION))\n else:\n raise ValueError(f\"Unknown entity type {entity_type}.\")\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_parents","title":"get_asserted_parents(owl_object, named_only=False)
","text":"Get all the asserted parents of a given owl object.
Parameters:
Name Type Description Defaultowl_object
OWLObject
An owl object that could have a parent.
requirednamed_only
bool
If True
, return parents that are named classes.
False
Returns:
Type Descriptionset[OWLObject]
The parent set of the given owl object.
Source code insrc/deeponto/onto/ontology.py
def get_asserted_parents(self, owl_object: OWLObject, named_only: bool = False):\nr\"\"\"Get all the asserted parents of a given owl object.\n\n Args:\n owl_object (OWLObject): An owl object that could have a parent.\n named_only (bool): If `True`, return parents that are named classes.\n Returns:\n (set[OWLObject]): The parent set of the given owl object.\n \"\"\"\n entity_type = self.get_entity_type(owl_object)\n if entity_type == \"Classes\":\n parents = set(EntitySearcher.getSuperClasses(owl_object, self.owl_onto))\n elif entity_type.endswith(\"Properties\"):\n parents = set(EntitySearcher.getSuperProperties(owl_object, self.owl_onto))\n else:\n raise ValueError(f\"Unsupported entity type {entity_type}.\")\n if named_only:\n parents = set([p for p in parents if self.check_named_entity(p)])\n return parents\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_children","title":"get_asserted_children(owl_object, named_only=False)
","text":"Get all the asserted children of a given owl object.
Parameters:
Name Type Description Defaultowl_object
OWLObject
An owl object that could have a child.
requirednamed_only
bool
If True
, return children that are named classes.
False
Returns:
Type Descriptionset[OWLObject]
The children set of the given owl object.
Source code insrc/deeponto/onto/ontology.py
def get_asserted_children(self, owl_object: OWLObject, named_only: bool = False):\nr\"\"\"Get all the asserted children of a given owl object.\n\n Args:\n owl_object (OWLObject): An owl object that could have a child.\n named_only (bool): If `True`, return children that are named classes.\n Returns:\n (set[OWLObject]): The children set of the given owl object.\n \"\"\"\n entity_type = self.get_entity_type(owl_object)\n if entity_type == \"Classes\":\n children = set(EntitySearcher.getSubClasses(owl_object, self.owl_onto))\n elif entity_type.endswith(\"Properties\"):\n children = set(EntitySearcher.getSubProperties(owl_object, self.owl_onto))\n else:\n raise ValueError(f\"Unsupported entity type {entity_type}.\")\n if named_only:\n children = set([c for c in children if self.check_named_entity(c)])\n return children\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_complex_classes","title":"get_asserted_complex_classes(gci_only=False)
","text":"Get complex classes that occur in at least one of the ontology axioms.
Parameters:
Name Type Description Defaultgci_only
bool
If True
, consider complex classes that occur in GCIs only; otherwise consider those that occur in equivalence axioms as well.
False
Returns:
Type Descriptionset[OWLClassExpression]
A set of complex classes.
Source code insrc/deeponto/onto/ontology.py
def get_asserted_complex_classes(self, gci_only: bool = False):\n\"\"\"Get complex classes that occur in at least one of the ontology axioms.\n\n Args:\n gci_only (bool): If `True`, consider complex classes that occur in GCIs only; otherwise consider\n those that occur in equivalence axioms as well.\n Returns:\n (set[OWLClassExpression]): A set of complex classes.\n \"\"\"\n complex_classes = []\n\n for gci in self.get_subsumption_axioms(\"Classes\"):\n super_class = gci.getSuperClass()\n sub_class = gci.getSubClass()\n if not OntologyReasoner.has_iri(super_class):\n complex_classes.append(super_class)\n if not OntologyReasoner.has_iri(sub_class):\n complex_classes.append(sub_class)\n\n # also considering equivalence axioms\n if not gci_only:\n for eq in self.get_equivalence_axioms(\"Classes\"):\n gci = list(eq.asOWLSubClassOfAxioms())[0]\n super_class = gci.getSuperClass()\n sub_class = gci.getSubClass()\n if not OntologyReasoner.has_iri(super_class):\n complex_classes.append(super_class)\n if not OntologyReasoner.has_iri(sub_class):\n complex_classes.append(sub_class)\n\n return set(complex_classes)\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_annotations","title":"get_annotations(owl_object, annotation_property_iri=None, annotation_language_tag=None, apply_lowercasing=False, normalise_identifiers=False)
","text":"Get the annotation literals of the given OWLObject
.
Parameters:
Name Type Description Defaultowl_object
Union[OWLObject, str]
An OWLObject
or its IRI.
annotation_property_iri
str
Any particular annotation property IRI of interest. Defaults to None
.
None
annotation_language_tag
str
Any particular annotation language tag of interest; NOTE that not every annotation has a language tag, in this case assume it is in English. Defaults to None
. Options are \"en\"
, \"ge\"
etc.
None
apply_lowercasing
bool
Whether or not to apply lowercasing to annotation literals. Defaults to False
.
False
normalise_identifiers
bool
Whether to normalise annotation text that is in the Java identifier format. Defaults to False
.
False
Returns:
Type Descriptionset[str]
A set of annotation literals of the given OWLObject
.
src/deeponto/onto/ontology.py
def get_annotations(\n self,\n owl_object: Union[OWLObject, str],\n annotation_property_iri: Optional[str] = None,\n annotation_language_tag: Optional[str] = None,\n apply_lowercasing: bool = False,\n normalise_identifiers: bool = False,\n):\n\"\"\"Get the annotation literals of the given `OWLObject`.\n\n Args:\n owl_object (Union[OWLObject, str]): An `OWLObject` or its IRI.\n annotation_property_iri (str, optional):\n Any particular annotation property IRI of interest. Defaults to `None`.\n annotation_language_tag (str, optional):\n Any particular annotation language tag of interest; NOTE that not every\n annotation has a language tag, in this case assume it is in English.\n Defaults to `None`. Options are `\"en\"`, `\"ge\"` etc.\n apply_lowercasing (bool): Whether or not to apply lowercasing to annotation literals.\n Defaults to `False`.\n normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format.\n Defaults to `False`.\n Returns:\n (set[str]): A set of annotation literals of the given `OWLObject`.\n \"\"\"\n if isinstance(owl_object, str):\n owl_object = self.get_owl_object(owl_object)\n\n annotation_property = None\n if annotation_property_iri:\n # return an empty list if `annotation_property_iri` does not exist in this OWLOntology`\n annotation_property = self.get_owl_object(annotation_property_iri)\n\n annotations = []\n for annotation in EntitySearcher.getAnnotations(owl_object, self.owl_onto, annotation_property):\n annotation = annotation.getValue()\n # boolean that indicates whether the annotation's language is of interest\n fit_language = False\n if not annotation_language_tag:\n # it is set to `True` if `annotation_langauge` is not specified\n fit_language = True\n else:\n # restrict the annotations to a language if specified\n try:\n # NOTE: not every annotation has a language attribute\n fit_language = annotation.getLang() == annotation_language_tag\n except:\n # in the case when this annotation has no language tag\n # we assume it is in English\n if annotation_language_tag == \"en\":\n fit_language = True\n\n if fit_language:\n # only get annotations that have a literal value\n if annotation.isLiteral():\n annotations.append(\n process_annotation_literal(\n str(annotation.getLiteral()), apply_lowercasing, normalise_identifiers\n )\n )\n\n return uniqify(annotations)\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_consistency","title":"check_consistency()
","text":"Check if the ontology is consistent according to the pre-loaded reasoner.
Source code insrc/deeponto/onto/ontology.py
def check_consistency(self):\n\"\"\"Check if the ontology is consistent according to the pre-loaded reasoner.\n \"\"\"\n logging.info(f\"Checking consistency with `{self.reasoner_type}` reasoner.\")\n return self.reasoner.owl_reasoner.isConsistent()\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_named_entity","title":"check_named_entity(owl_object)
","text":"Check if the input entity is a named atomic entity. That is, it is not a complex entity, \\(\\top\\), or \\(\\bot\\).
Source code insrc/deeponto/onto/ontology.py
def check_named_entity(self, owl_object: OWLObject):\nr\"\"\"Check if the input entity is a named atomic entity. That is,\n it is not a complex entity, $\\top$, or $\\bot$.\n \"\"\"\n entity_type = self.get_entity_type(owl_object)\n top = TOP_BOTTOMS[entity_type].TOP\n bottom = TOP_BOTTOMS[entity_type].BOTTOM\n if OntologyReasoner.has_iri(owl_object):\n iri = str(owl_object.getIRI())\n # check if the entity is TOP or BOTTOM\n return iri != top and iri != bottom\n return False\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_deprecated","title":"check_deprecated(owl_object)
","text":"Check if the given OWL object is marked as deprecated according to \\(\\texttt{owl:deprecated}\\).
NOTE: the string literal indicating deprecation is either 'true'
or 'True'
. Also, if \\(\\texttt{owl:deprecated}\\) is not defined in this ontology, return False
by default.
src/deeponto/onto/ontology.py
def check_deprecated(self, owl_object: OWLObject):\nr\"\"\"Check if the given OWL object is marked as deprecated according to $\\texttt{owl:deprecated}$.\n\n NOTE: the string literal indicating deprecation is either `'true'` or `'True'`. Also, if $\\texttt{owl:deprecated}$\n is not defined in this ontology, return `False` by default.\n \"\"\"\n if not OWL_DEPRECATED in self.owl_annotation_properties.keys():\n # return False if owl:deprecated is not defined in this ontology\n return False\n\n deprecated = self.get_annotations(owl_object, annotation_property_iri=OWL_DEPRECATED)\n if deprecated and (list(deprecated)[0] == \"true\" or list(deprecated)[0] == \"True\"):\n return True\n else:\n return False\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.save_onto","title":"save_onto(save_path)
","text":"Save the ontology file to the given path.
Source code insrc/deeponto/onto/ontology.py
def save_onto(self, save_path: str):\n\"\"\"Save the ontology file to the given path.\"\"\"\n self.owl_onto.saveOntology(IRI.create(File(save_path).toURI()))\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.build_annotation_index","title":"build_annotation_index(annotation_property_iris=[RDFS_LABEL], entity_type='Classes', apply_lowercasing=False, normalise_identifiers=False)
","text":"Build an annotation index for a given type of entities.
Parameters:
Name Type Description Defaultannotation_property_iris
list[str]
A list of annotation property IRIs (it is possible that not every annotation property IRI is in use); if not provided, the built-in rdfs:label
is considered. Defaults to [RDFS_LABEL]
.
[RDFS_LABEL]
entity_type
str
The entity type to be considered. Defaults to \"Classes\"
. Options are \"Classes\"
, \"ObjectProperties\"
, \"DataProperties\"
, etc.
'Classes'
apply_lowercasing
bool
Whether or not to apply lowercasing to annotation literals. Defaults to True
.
False
normalise_identifiers
bool
Whether to normalise annotation text that is in the Java identifier format. Defaults to False
.
False
Returns:
Type DescriptionTuple[dict, list[str]]
The built annotation index, and the list of annotation property IRIs that are in use.
Source code insrc/deeponto/onto/ontology.py
def build_annotation_index(\n self,\n annotation_property_iris: List[str] = [RDFS_LABEL],\n entity_type: str = \"Classes\",\n apply_lowercasing: bool = False,\n normalise_identifiers: bool = False,\n):\n\"\"\"Build an annotation index for a given type of entities.\n\n Args:\n annotation_property_iris (list[str]): A list of annotation property IRIs (it is possible\n that not every annotation property IRI is in use); if not provided, the built-in\n `rdfs:label` is considered. Defaults to `[RDFS_LABEL]`.\n entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n Options are `\"Classes\"`, `\"ObjectProperties\"`, `\"DataProperties\"`, etc.\n apply_lowercasing (bool): Whether or not to apply lowercasing to annotation literals.\n Defaults to `True`.\n normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format.\n Defaults to `False`.\n\n Returns:\n (Tuple[dict, list[str]]): The built annotation index, and the list of annotation property IRIs that are in use.\n \"\"\"\n\n annotation_index = defaultdict(set)\n # example: Classes => owl_classes; ObjectProperties => owl_object_properties\n entity_type = \"owl_\" + split_java_identifier(entity_type).replace(\" \", \"_\").lower()\n entity_index = getattr(self, entity_type)\n\n # preserve available annotation properties\n annotation_property_iris = [\n airi for airi in annotation_property_iris if airi in self.owl_annotation_properties.keys()\n ]\n\n # build the annotation index without duplicated literals\n for airi in annotation_property_iris:\n for iri, entity in entity_index.items():\n annotation_index[iri].update(\n self.get_annotations(\n owl_object=entity,\n annotation_property_iri=airi,\n annotation_language_tag=None,\n apply_lowercasing=apply_lowercasing,\n normalise_identifiers=normalise_identifiers,\n )\n )\n\n return annotation_index, annotation_property_iris\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.build_inverted_annotation_index","title":"build_inverted_annotation_index(annotation_index, tokenizer)
staticmethod
","text":"Build an inverted annotation index given an annotation index and a tokenizer.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef build_inverted_annotation_index(annotation_index: dict, tokenizer: Tokenizer):\n\"\"\"Build an inverted annotation index given an annotation index and a tokenizer.\"\"\"\n return InvertedIndex(annotation_index, tokenizer)\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.add_axiom","title":"add_axiom(owl_axiom, return_undo=True)
","text":"Add an axiom into the current ontology.
Parameters:
Name Type Description Defaultowl_axiom
OWLAxiom
An axiom to be added.
requiredreturn_undo
bool
Returning the undo operation or not. Defaults to True
.
True
Source code in src/deeponto/onto/ontology.py
def add_axiom(self, owl_axiom: OWLAxiom, return_undo: bool = True):\n\"\"\"Add an axiom into the current ontology.\n\n Args:\n owl_axiom (OWLAxiom): An axiom to be added.\n return_undo (bool, optional): Returning the undo operation or not. Defaults to `True`.\n \"\"\"\n change = AddAxiom(self.owl_onto, owl_axiom)\n result = self.owl_onto.applyChange(change)\n logger.info(f\"[{str(result)}] Adding the axiom {str(owl_axiom)} into the ontology.\")\n if return_undo:\n return change.reverseChange()\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.remove_axiom","title":"remove_axiom(owl_axiom, return_undo=True)
","text":"Remove an axiom from the current ontology.
Parameters:
Name Type Description Defaultowl_axiom
OWLAxiom
An axiom to be removed.
requiredreturn_undo
bool
Returning the undo operation or not. Defaults to True
.
True
Source code in src/deeponto/onto/ontology.py
def remove_axiom(self, owl_axiom: OWLAxiom, return_undo: bool = True):\n\"\"\"Remove an axiom from the current ontology.\n\n Args:\n owl_axiom (OWLAxiom): An axiom to be removed.\n return_undo (bool, optional): Returning the undo operation or not. Defaults to `True`.\n \"\"\"\n change = RemoveAxiom(self.owl_onto, owl_axiom)\n result = self.owl_onto.applyChange(change)\n logger.info(f\"[{str(result)}] Removing the axiom {str(owl_axiom)} from the ontology.\")\n if return_undo:\n return change.reverseChange()\n
"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.replace_entity","title":"replace_entity(owl_object, entity_iri, replacement_iri)
","text":"Replace an entity in a class expression with another entity.
Parameters:
Name Type Description Defaultowl_object
OWLObject
An OWLObject
entity to be manipulated.
entity_iri
str
IRI of the entity to be replaced.
requiredreplacement_iri
str
IRI of the entity to replace.
requiredReturns:
Type DescriptionOWLObject
The changed OWLObject
entity.
src/deeponto/onto/ontology.py
def replace_entity(self, owl_object: OWLObject, entity_iri: str, replacement_iri: str):\n\"\"\"Replace an entity in a class expression with another entity.\n\n Args:\n owl_object (OWLObject): An `OWLObject` entity to be manipulated.\n entity_iri (str): IRI of the entity to be replaced.\n replacement_iri (str): IRI of the entity to replace.\n\n Returns:\n (OWLObject): The changed `OWLObject` entity.\n \"\"\"\n iri_dict = {IRI.create(entity_iri): IRI.create(replacement_iri)}\n replacer = OWLObjectDuplicator(self.owl_data_factory, iri_dict)\n return replacer.duplicateObject(owl_object)\n
"},{"location":"deeponto/onto/projection/","title":"Ontology Projection","text":""},{"location":"deeponto/onto/projection/#deeponto.onto.projection.OntologyProjector","title":"OntologyProjector(bidirectional_taxonomy=False, only_taxonomy=False, include_literals=False)
","text":"Class for ontology projection -- transforming ontology axioms into triples.
Credit
The code of this class originates from the mOWL library.
Attributes:
Name Type Descriptionbidirectional_taxonomy
bool
If True
then per each SubClass
edge one SuperClass
edge will be generated. Defaults to False
.
only_taxonomy
bool
If True
, then projection will only include subClass
edges. Defaults to False
.
include_literals
bool
If True
the projection will also include triples involving data property assertions and annotations. Defaults to False
.
Parameters:
Name Type Description Defaultbidirectional_taxonomy
bool
description. If True
then per each SubClass
edge one SuperClass
edge will be generated. Defaults to False
.
False
only_taxonomy
bool
If True
, then projection will only include subClass
edges. Defaults to False
.
False
include_literals
bool
description. If True
the projection will also include triples involving data property assertions and annotations. Defaults to False
.
False
Source code in src/deeponto/onto/projection.py
def __init__(self, bidirectional_taxonomy: bool=False, only_taxonomy: bool=False, include_literals: bool=False):\n\"\"\"Initialise an ontology projector.\n\n Args:\n bidirectional_taxonomy (bool, optional): _description_. If `True` then per each `SubClass` edge one `SuperClass` edge will\n be generated. Defaults to `False`.\n only_taxonomy (bool, optional): If `True`, then projection will only include `subClass` edges. Defaults to `False`.\n include_literals (bool, optional): _description_. If `True` the projection will also include triples involving data property\n assertions and annotations. Defaults to `False`.\n \"\"\"\n self.bidirectional_taxonomy = bidirectional_taxonomy\n self.include_literals = include_literals\n self.only_taxonomy = only_taxonomy\n self.projector = Projector(self.bidirectional_taxonomy, self.only_taxonomy,\n self.include_literals)\n
"},{"location":"deeponto/onto/projection/#deeponto.onto.projection.OntologyProjector.project","title":"project(ontology)
","text":"The projection algorithm implemented in OWL2Vec*.
Parameters:
Name Type Description Defaultontology
Ontology
An ontology to be processed.
requiredReturns:
Type Descriptionset
Set of triples after projection.
Source code insrc/deeponto/onto/projection.py
def project(self, ontology: Ontology):\n\"\"\"The projection algorithm implemented in OWL2Vec*.\n\n Args:\n ontology (Ontology): An ontology to be processed.\n\n Returns:\n (set): Set of triples after projection.\n \"\"\"\n ontology = ontology.owl_onto\n if not isinstance(ontology, OWLOntology):\n raise TypeError(\n \"Input ontology must be of type `org.semanticweb.owlapi.model.OWLOntology`.\")\n edges = self.projector.project(ontology)\n triples = []\n for e in edges:\n s, r, o = str(e.src()), str(e.rel()), str(e.dst())\n if o != \"\":\n if r == \"http://subclassof\":\n r = str(RDFS.subClassOf)\n triples.append((s, r, o))\n return set(triples)\n
"},{"location":"deeponto/onto/pruning/","title":"Ontology Pruning","text":""},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner","title":"OntologyPruner(onto)
","text":"Class for in-place ontology pruning.
Attributes:
Name Type Descriptiononto
Ontology
The input ontology to be pruned. Note that the pruning process is in-place.
Parameters:
Name Type Description Defaultonto
Ontology
The input ontology to be pruned. Note that the pruning process is in-place.
required Source code insrc/deeponto/onto/pruning.py
def __init__(self, onto: Ontology):\n\"\"\"Initialise an ontology pruner.\n\n Args:\n onto (Ontology): The input ontology to be pruned. Note that the pruning process is in-place.\n \"\"\"\n self.onto = onto\n self._pruning_applied = None\n
"},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner.save_onto","title":"save_onto(save_path)
","text":"Save the pruned ontology file to the given path.
Source code insrc/deeponto/onto/pruning.py
def save_onto(self, save_path: str):\n\"\"\"Save the pruned ontology file to the given path.\"\"\"\n logging.info(f\"{self._pruning_applied} pruning algorithm has been applied.\")\n logging.info(f\"Save the pruned ontology file to {save_path}.\")\n return self.onto.save_onto(save_path)\n
"},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner.prune","title":"prune(class_iris_to_be_removed)
","text":"Apply ontology pruning while preserving the relevant hierarchy.
paper
This refers to the ontology pruning algorithm introduced in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).
For each class \\(c\\) to be pruned, subsumption axioms will be created between \\(c\\)'s parents and children so as to preserve the relevant hierarchy.
Parameters:
Name Type Description Defaultclass_iris_to_be_removed
list[str]
Classes with IRIs in this list will be pruned and the relevant hierarchy will be repaired.
required Source code insrc/deeponto/onto/pruning.py
def prune(self, class_iris_to_be_removed: List[str]):\nr\"\"\"Apply ontology pruning while preserving the relevant hierarchy.\n\n !!! credit \"paper\"\n\n This refers to the ontology pruning algorithm introduced in the paper:\n [*Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022)*](https://link.springer.com/chapter/10.1007/978-3-031-19433-7_33).\n\n For each class $c$ to be pruned, subsumption axioms will be created between $c$'s parents and children so as to preserve the\n relevant hierarchy.\n\n Args:\n class_iris_to_be_removed (list[str]): Classes with IRIs in this list will be pruned and the relevant hierarchy will be repaired.\n \"\"\"\n\n # create the subsumption axioms first\n for cl_iri in class_iris_to_be_removed:\n cl = self.onto.get_owl_object(cl_iri)\n cl_parents = self.onto.get_asserted_parents(cl)\n cl_children = self.onto.get_asserted_children(cl)\n for parent, child in itertools.product(cl_parents, cl_children):\n sub_axiom = self.onto.owl_data_factory.getOWLSubClassOfAxiom(child, parent)\n self.onto.add_axiom(sub_axiom)\n\n # apply pruning\n class_remover = OWLEntityRemover(Collections.singleton(self.onto.owl_onto))\n for cl_iri in class_iris_to_be_removed:\n cl = self.onto.get_owl_object(cl_iri)\n cl.accept(class_remover)\n self.onto.owl_manager.applyChanges(class_remover.getChanges())\n
"},{"location":"deeponto/onto/reasoning/","title":"Ontology Reasoning","text":""},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner","title":"OntologyReasoner(onto, reasoner_type)
","text":"Ontology reasoner class that extends from the Java library OWLAPI.
Attributes:
Name Type Descriptiononto
Ontology
The input deeponto
ontology.
owl_reasoner_factory
OWLReasonerFactory
A reasoner factory for creating a reasoner.
owl_reasoner
OWLReasoner
The created reasoner.
owl_data_factory
OWLDataFactory
A data factory (inherited from onto
) for manipulating axioms.
Parameters:
Name Type Description Defaultonto
Ontology
The input ontology to conduct reasoning on.
requiredreasoner_type
str
The type of reasoner used. Options are [\"hermit\", \"elk\", \"struct\"]
.
src/deeponto/onto/ontology.py
def __init__(self, onto: Ontology, reasoner_type: str):\n\"\"\"Initialise an ontology reasoner.\n\n Args:\n onto (Ontology): The input ontology to conduct reasoning on.\n reasoner_type (str): The type of reasoner used. Options are `[\"hermit\", \"elk\", \"struct\"]`.\n \"\"\"\n self.onto = onto\n self.owl_reasoner_factory = None\n self.owl_reasoner = None\n self.reasoner_type = reasoner_type\n self.load_reasoner(self.reasoner_type)\n self.owl_data_factory = self.onto.owl_data_factory\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.load_reasoner","title":"load_reasoner(reasoner_type)
","text":"Load a new reaonser and dispose the old one if existed.
Source code insrc/deeponto/onto/ontology.py
def load_reasoner(self, reasoner_type: str):\n\"\"\"Load a new reaonser and dispose the old one if existed.\"\"\"\n assert reasoner_type in REASONER_DICT.keys(), f\"Unknown or unsupported reasoner type: {reasoner_type}.\"\n\n if self.owl_reasoner:\n self.owl_reasoner.dispose()\n\n self.reasoner_type = reasoner_type\n self.owl_reasoner_factory = REASONER_DICT[self.reasoner_type]()\n # TODO: remove ELK message\n # somehow Level.ERROR does not prevent the INFO message from ELK\n # Logger.getLogger(\"org.semanticweb.elk\").setLevel(Level.OFF)\n\n self.owl_reasoner = self.owl_reasoner_factory.createReasoner(self.onto.owl_onto)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_entity_type","title":"get_entity_type(entity, is_singular=False)
staticmethod
","text":"A handy method to get the type of an entity (OWLObject
).
NOTE: This method is inherited from the Ontology Class.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef get_entity_type(entity: OWLObject, is_singular: bool = False):\n\"\"\"A handy method to get the type of an entity (`OWLObject`).\n\n NOTE: This method is inherited from the Ontology Class.\n \"\"\"\n return Ontology.get_entity_type(entity, is_singular)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.has_iri","title":"has_iri(entity)
staticmethod
","text":"Check if an entity has an IRI.
Source code insrc/deeponto/onto/ontology.py
@staticmethod\ndef has_iri(entity: OWLObject):\n\"\"\"Check if an entity has an IRI.\"\"\"\n try:\n entity.getIRI()\n return True\n except:\n return False\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_inferred_super_entities","title":"get_inferred_super_entities(entity, direct=False)
","text":"Return the IRIs of named super-entities of a given OWLObject
according to the reasoner.
A mixture of getSuperClasses
, getSuperObjectProperties
, getSuperDataProperties
functions imported from the OWLAPI reasoner. The type of input entity will be automatically determined. The top entity such as owl:Thing
is ignored.
Parameters:
Name Type Description Defaultentity
OWLObject
An OWLObject
entity of interest.
direct
bool
Return parents (direct=True
) or ancestors (direct=False
). Defaults to False
.
False
Returns:
Type Descriptionlist[str]
A list of IRIs of the super-entities of the given OWLObject
entity.
src/deeponto/onto/ontology.py
def get_inferred_super_entities(self, entity: OWLObject, direct: bool = False):\nr\"\"\"Return the IRIs of named super-entities of a given `OWLObject` according to the reasoner.\n\n A mixture of `getSuperClasses`, `getSuperObjectProperties`, `getSuperDataProperties`\n functions imported from the OWLAPI reasoner. The type of input entity will be\n automatically determined. The top entity such as `owl:Thing` is ignored.\n\n\n Args:\n entity (OWLObject): An `OWLObject` entity of interest.\n direct (bool, optional): Return parents (`direct=True`) or\n ancestors (`direct=False`). Defaults to `False`.\n\n Returns:\n (list[str]): A list of IRIs of the super-entities of the given `OWLObject` entity.\n \"\"\"\n entity_type = self.get_entity_type(entity)\n get_super = f\"getSuper{entity_type}\"\n TOP = TOP_BOTTOMS[entity_type].TOP # get the corresponding TOP entity\n super_entities = getattr(self.owl_reasoner, get_super)(entity, direct).getFlattened()\n super_entity_iris = [str(s.getIRI()) for s in super_entities]\n # the root node is owl#Thing\n if TOP in super_entity_iris:\n super_entity_iris.remove(TOP)\n return super_entity_iris\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_inferred_sub_entities","title":"get_inferred_sub_entities(entity, direct=False)
","text":"Return the IRIs of named sub-entities of a given OWLObject
according to the reasoner.
A mixture of getSubClasses
, getSubObjectProperties
, getSubDataProperties
functions imported from the OWLAPI reasoner. The type of input entity will be automatically determined. The bottom entity such as owl:Nothing
is ignored.
Parameters:
Name Type Description Defaultentity
OWLObject
An OWLObject
entity of interest.
direct
bool
Return parents (direct=True
) or ancestors (direct=False
). Defaults to False
.
False
Returns:
Type Descriptionlist[str]
A list of IRIs of the sub-entities of the given OWLObject
entity.
src/deeponto/onto/ontology.py
def get_inferred_sub_entities(self, entity: OWLObject, direct: bool = False):\n\"\"\"Return the IRIs of named sub-entities of a given `OWLObject` according to the reasoner.\n\n A mixture of `getSubClasses`, `getSubObjectProperties`, `getSubDataProperties`\n functions imported from the OWLAPI reasoner. The type of input entity will be\n automatically determined. The bottom entity such as `owl:Nothing` is ignored.\n\n Args:\n entity (OWLObject): An `OWLObject` entity of interest.\n direct (bool, optional): Return parents (`direct=True`) or\n ancestors (`direct=False`). Defaults to `False`.\n\n Returns:\n (list[str]): A list of IRIs of the sub-entities of the given `OWLObject` entity.\n \"\"\"\n entity_type = self.get_entity_type(entity)\n get_sub = f\"getSub{entity_type}\"\n BOTTOM = TOP_BOTTOMS[entity_type].BOTTOM\n sub_entities = getattr(self.owl_reasoner, get_sub)(entity, direct).getFlattened()\n sub_entity_iris = [str(s.getIRI()) for s in sub_entities]\n # the root node is owl#Thing\n if BOTTOM in sub_entity_iris:\n sub_entity_iris.remove(BOTTOM)\n return sub_entity_iris\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_subsumption","title":"check_subsumption(sub_entity, super_entity)
","text":"Check if the first entity is subsumed by the second entity according to the reasoner.
Source code insrc/deeponto/onto/ontology.py
def check_subsumption(self, sub_entity: OWLObject, super_entity: OWLObject):\n\"\"\"Check if the first entity is subsumed by the second entity according to the reasoner.\"\"\"\n entity_type = self.get_entity_type(sub_entity, is_singular=True)\n assert entity_type == self.get_entity_type(super_entity, is_singular=True)\n\n sub_axiom = getattr(self.owl_data_factory, f\"getOWLSub{entity_type}OfAxiom\")(sub_entity, super_entity)\n\n return self.owl_reasoner.isEntailed(sub_axiom)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_disjoint","title":"check_disjoint(entity1, entity2)
","text":"Check if two entities are disjoint according to the reasoner.
Source code insrc/deeponto/onto/ontology.py
def check_disjoint(self, entity1: OWLObject, entity2: OWLObject):\n\"\"\"Check if two entities are disjoint according to the reasoner.\"\"\"\n entity_type = self.get_entity_type(entity1)\n assert entity_type == self.get_entity_type(entity2)\n\n disjoint_axiom = getattr(self.owl_data_factory, f\"getOWLDisjoint{entity_type}Axiom\")([entity1, entity2])\n\n return self.owl_reasoner.isEntailed(disjoint_axiom)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_common_descendants","title":"check_common_descendants(entity1, entity2)
","text":"Check if two entities have a common decendant.
Entities can be OWL class or property expressions, and can be either atomic or complex. It takes longer computation time for the complex ones. Complex entities do not have an IRI. This method is optimised in the way that if there exists an atomic entity A
, we compute descendants for A
and compare them against the other entity which could be complex.
src/deeponto/onto/ontology.py
def check_common_descendants(self, entity1: OWLObject, entity2: OWLObject):\n\"\"\"Check if two entities have a common decendant.\n\n Entities can be **OWL class or property expressions**, and can be either **atomic\n or complex**. It takes longer computation time for the complex ones. Complex\n entities do not have an IRI. This method is optimised in the way that if\n there exists an atomic entity `A`, we compute descendants for `A` and\n compare them against the other entity which could be complex.\n \"\"\"\n entity_type = self.get_entity_type(entity1)\n assert entity_type == self.get_entity_type(entity2)\n\n if not self.has_iri(entity1) and not self.has_iri(entity2):\n logger.warn(\"Computing descendants for two complex entities is not efficient.\")\n\n # `computed` is the one we compute the descendants\n # `compared` is the one we compare `computed`'s descendant one-by-one\n # we set the atomic entity as `computed` for efficiency if there is one\n computed, compared = entity1, entity2\n if not self.has_iri(entity1) and self.has_iri(entity2):\n computed, compared = entity2, entity1\n\n # for every inferred child of `computed`, check if it is subsumed by `compared``\n for descendant_iri in self.get_inferred_sub_entities(computed, direct=False):\n # print(\"check a subsumption\")\n if self.check_subsumption(self.onto.get_owl_object(descendant_iri), compared):\n return True\n return False\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.instances_of","title":"instances_of(owl_class, direct=False)
","text":"Return the list of named individuals that are instances of a given OWL class expression.
Parameters:
Name Type Description Defaultowl_class
OWLClassExpression
An ontology class of interest.
requireddirect
bool
Return direct instances (direct=True
) or also include the sub-classes' instances (direct=False
). Defaults to False
.
False
Returns:
Type Descriptionlist[OWLIndividual]
A list of named individuals that are instances of owl_class
.
src/deeponto/onto/ontology.py
def instances_of(self, owl_class: OWLClassExpression, direct: bool = False):\n\"\"\"Return the list of named individuals that are instances of a given OWL class expression.\n\n Args:\n owl_class (OWLClassExpression): An ontology class of interest.\n direct (bool, optional): Return direct instances (`direct=True`) or\n also include the sub-classes' instances (`direct=False`). Defaults to `False`.\n\n Returns:\n (list[OWLIndividual]): A list of named individuals that are instances of `owl_class`.\n \"\"\"\n return list(self.owl_reasoner.getInstances(owl_class, direct).getFlattened())\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_instance","title":"check_instance(owl_instance, owl_class)
","text":"Check if a named individual is an instance of an OWL class.
Source code insrc/deeponto/onto/ontology.py
def check_instance(self, owl_instance: OWLIndividual, owl_class: OWLClassExpression):\n\"\"\"Check if a named individual is an instance of an OWL class.\"\"\"\n assertion_axiom = self.owl_data_factory.getOWLClassAssertionAxiom(owl_class, owl_instance)\n return self.owl_reasoner.isEntailed(assertion_axiom)\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_common_instances","title":"check_common_instances(owl_class1, owl_class2)
","text":"Check if two OWL class expressions have a common instance.
Class expressions can be atomic or complex, and it takes longer computation time for the complex ones. Complex classes do not have an IRI. This method is optimised in the way that if there exists an atomic class A
, we compute instances for A
and compare them against the other class which could be complex.
Difference with check_common_descendants
The inputs of this function are restricted to OWL class expressions. This is because descendant
is related to hierarchy and both class and property expressions have a hierarchy, but instance
is restricted to classes.
src/deeponto/onto/ontology.py
def check_common_instances(self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression):\n\"\"\"Check if two OWL class expressions have a common instance.\n\n Class expressions can be **atomic or complex**, and it takes longer computation time\n for the complex ones. Complex classes do not have an IRI. This method is optimised\n in the way that if there exists an atomic class `A`, we compute instances for `A` and\n compare them against the other class which could be complex.\n\n !!! note \"Difference with [`check_common_descendants`][deeponto.onto.OntologyReasoner.check_common_descendants]\"\n The inputs of this function are restricted to **OWL class expressions**. This is because\n `descendant` is related to hierarchy and both class and property expressions have a hierarchy,\n but `instance` is restricted to classes.\n \"\"\"\n\n if not self.has_iri(owl_class1) and not self.has_iri(owl_class2):\n logger.warn(\"Computing instances for two complex classes is not efficient.\")\n\n # `computed` is the one we compute the instances\n # `compared` is the one we compare `computed`'s descendant one-by-one\n # we set the atomic entity as `computed` for efficiency if there is one\n computed, compared = owl_class1, owl_class2\n if not self.has_iri(owl_class1) and self.has_iri(owl_class2):\n computed, compared = owl_class2, owl_class2\n\n # for every inferred instance of `computed`, check if it is subsumed by `compared``\n for instance in self.instances_of(computed, direct=False):\n if self.check_instance(instance, compared):\n return True\n return False\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_assumed_disjoint","title":"check_assumed_disjoint(owl_class1, owl_class2)
","text":"Check if two OWL class expressions satisfy the Assumed Disjointness.
Paper
The definition of Assumed Disjointness comes from the paper: Language Model Analysis for Ontology Subsumption Inference.
Assumed Disjointness (Definition)
Two class expressions \\(C\\) and \\(D\\) are assumed to be disjoint if they meet the followings:
Note that the special case where \\(C\\) and \\(D\\) are already disjoint is covered by the first check. The paper also proposed a practical alternative to decide Assumed Disjointness. See check_assumed_disjoint_alternative
.
Examples:
Suppose pre-load an ontology onto
from the disease ontology file doid.owl
.
>>> c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n>>> c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n>>> onto.reasoner.check_assumed_disjoint(c1, c2)\n[SUCCESSFULLY] Adding the axiom DisjointClasses(<http://purl.obolibrary.org/obo/DOID_0001816> <http://purl.obolibrary.org/obo/DOID_4058>) into the ontology.\n[CHECK1 True] input classes are still satisfiable;\n[SUCCESSFULLY] Removing the axiom from the ontology.\n[CHECK2 False] input classes have NO common descendant.\n[PASSED False] assumed disjointness check done.\nFalse\n
Source code in src/deeponto/onto/ontology.py
def check_assumed_disjoint(self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression):\nr\"\"\"Check if two OWL class expressions satisfy the Assumed Disjointness.\n\n !!! credit \"Paper\"\n\n The definition of **Assumed Disjointness** comes from the paper:\n [Language Model Analysis for Ontology Subsumption Inference](https://aclanthology.org/2023.findings-acl.213).\n\n !!! note \"Assumed Disjointness (Definition)\"\n Two class expressions $C$ and $D$ are assumed to be disjoint if they meet the followings:\n\n 1. By adding the disjointness axiom of them into the ontology, $C$ and $D$ are **still satisfiable**.\n 2. $C$ and $D$ **do not have a common descendant** (otherwise $C$ and $D$ can be satisfiable but their\n common descendants become the bottom $\\bot$.)\n\n Note that the special case where $C$ and $D$ are already disjoint is covered by the first check.\n The paper also proposed a practical alternative to decide Assumed Disjointness.\n See [`check_assumed_disjoint_alternative`][deeponto.onto.OntologyReasoner.check_assumed_disjoint_alternative].\n\n Examples:\n Suppose pre-load an ontology `onto` from the disease ontology file `doid.owl`.\n\n ```python\n >>> c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n >>> c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n >>> onto.reasoner.check_assumed_disjoint(c1, c2)\n [SUCCESSFULLY] Adding the axiom DisjointClasses(<http://purl.obolibrary.org/obo/DOID_0001816> <http://purl.obolibrary.org/obo/DOID_4058>) into the ontology.\n [CHECK1 True] input classes are still satisfiable;\n [SUCCESSFULLY] Removing the axiom from the ontology.\n [CHECK2 False] input classes have NO common descendant.\n [PASSED False] assumed disjointness check done.\n False\n ```\n \"\"\"\n # banner_message(\"Check Asssumed Disjointness\")\n\n entity_type = self.get_entity_type(owl_class1)\n assert entity_type == self.get_entity_type(owl_class2)\n\n # adding the disjointness axiom of `class1`` and `class2``\n disjoint_axiom = getattr(self.owl_data_factory, f\"getOWLDisjoint{entity_type}Axiom\")([owl_class1, owl_class2])\n undo_change = self.onto.add_axiom(disjoint_axiom, return_undo=True)\n self.load_reasoner(self.reasoner_type)\n\n # check if they are still satisfiable\n still_satisfiable = self.owl_reasoner.isSatisfiable(owl_class1)\n still_satisfiable = still_satisfiable and self.owl_reasoner.isSatisfiable(owl_class2)\n logger.info(f\"[CHECK1 {still_satisfiable}] input classes are still satisfiable;\")\n\n # remove the axiom and re-construct the reasoner\n undo_change_result = self.onto.owl_onto.applyChange(undo_change)\n logger.info(f\"[{str(undo_change_result)}] Removing the axiom from the ontology.\")\n self.load_reasoner(self.reasoner_type)\n\n # failing first check, there is no need to do the second.\n if not still_satisfiable:\n logger.info(\"Failed `satisfiability check`, skip the `common descendant` check.\")\n logger.info(f\"[PASSED {still_satisfiable}] assumed disjointness check done.\")\n return False\n\n # otherwise, the classes are still satisfiable and we should conduct the second check\n has_common_descendants = self.check_common_descendants(owl_class1, owl_class2)\n logger.info(f\"[CHECK2 {not has_common_descendants}] input classes have NO common descendant.\")\n logger.info(f\"[PASSED {not has_common_descendants}] assumed disjointness check done.\")\n return not has_common_descendants\n
"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_assumed_disjoint_alternative","title":"check_assumed_disjoint_alternative(owl_class1, owl_class2, verbose=False)
","text":"Check if two OWL class expressions satisfy the Assumed Disjointness.
Paper
The definition of Assumed Disjointness comes from the paper: Language Model Analysis for Ontology Subsumption Inference.
The practical alternative version of check_assumed_disjoint
with following conditions:
Assumed Disjointness (Practical Alternative)
Two class expressions \\(C\\) and \\(D\\) are assumed to be disjoint if they
If all the conditions have been met, then we assume class1
and class2
as disjoint.
Examples:
Suppose pre-load an ontology onto
from the disease ontology file doid.owl
.
>>> c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n>>> c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n>>> onto.reasoner.check_assumed_disjoint(c1, c2, verbose=True)\n[CHECK1 True] input classes have NO subsumption relationship;\n[CHECK2 False] input classes have NO common descendant;\nFailed the `common descendant check`, skip the `common instance` check.\n[PASSED False] assumed disjointness check done.\nFalse\n
In this alternative implementation, we do no need to add and remove axioms which will then be time-saving. Source code in src/deeponto/onto/ontology.py
def check_assumed_disjoint_alternative(\n self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression, verbose: bool = False\n):\nr\"\"\"Check if two OWL class expressions satisfy the Assumed Disjointness.\n\n !!! credit \"Paper\"\n\n The definition of **Assumed Disjointness** comes from the paper:\n [Language Model Analysis for Ontology Subsumption Inference](https://aclanthology.org/2023.findings-acl.213).\n\n The practical alternative version of [`check_assumed_disjoint`][deeponto.onto.OntologyReasoner.check_assumed_disjoint]\n with following conditions:\n\n\n !!! note \"Assumed Disjointness (Practical Alternative)\"\n Two class expressions $C$ and $D$ are assumed to be disjoint if they\n\n 1. **do not** have a **subsumption relationship** between them,\n 2. **do not** have a **common descendant** (in TBox),\n 3. **do not** have a **common instance** (in ABox).\n\n If all the conditions have been met, then we assume `class1` and `class2` as disjoint.\n\n Examples:\n Suppose pre-load an ontology `onto` from the disease ontology file `doid.owl`.\n\n ```python\n >>> c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n >>> c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n >>> onto.reasoner.check_assumed_disjoint(c1, c2, verbose=True)\n [CHECK1 True] input classes have NO subsumption relationship;\n [CHECK2 False] input classes have NO common descendant;\n Failed the `common descendant check`, skip the `common instance` check.\n [PASSED False] assumed disjointness check done.\n False\n ```\n In this alternative implementation, we do no need to add and remove axioms which will then\n be time-saving.\n \"\"\"\n # banner_message(\"Check Asssumed Disjointness (Alternative)\")\n\n # # Check for entailed disjointness (short-cut)\n # if self.check_disjoint(owl_class1, owl_class2):\n # print(f\"Input classes are already entailed as disjoint.\")\n # return True\n\n # Check for entailed subsumption,\n # common descendants and common instances\n\n has_subsumption = self.check_subsumption(owl_class1, owl_class2)\n has_subsumption = has_subsumption or self.check_subsumption(owl_class2, owl_class1)\n if verbose:\n logger.info(f\"[CHECK1 {not has_subsumption}] input classes have NO subsumption relationship;\")\n if has_subsumption:\n if verbose:\n logger.info(\"Failed the `subsumption check`, skip the `common descendant` check.\")\n logger.info(f\"[PASSED {not has_subsumption}] assumed disjointness check done.\")\n return False\n\n has_common_descendants = self.check_common_descendants(owl_class1, owl_class2)\n if verbose:\n logger.info(f\"[CHECK2 {not has_common_descendants}] input classes have NO common descendant;\")\n if has_common_descendants:\n if verbose:\n logger.info(\"Failed the `common descendant check`, skip the `common instance` check.\")\n logger.info(f\"[PASSED {not has_common_descendants}] assumed disjointness check done.\")\n return False\n\n # TODO: `check_common_instances` is still experimental because we have not tested it with ontologies of rich ABox.\n has_common_instances = self.check_common_instances(owl_class1, owl_class2)\n if verbose:\n logger.info(f\"[CHECK3 {not has_common_instances}] input classes have NO common instance;\")\n logger.info(f\"[PASSED {not has_common_instances}] assumed disjointness check done.\")\n return not has_common_instances\n
"},{"location":"deeponto/onto/taxonomy/","title":"Ontology Taxonomy","text":"Extracting the taxonomy from an ontology often comes in handy for graph-based machine learning techniques. Here we provide a basic Taxonomy
class built upon networkx.DiGraph
where nodes represent entities and edges represent subsumptions. We then provide the OntologyTaxonomy
class that extends the basic Taxonomy
. It utilises the simple structural reasoner to enrich the ontology subsumptions beyond asserted ones, and build the taxonomy over the expanded subsumptions. Each node represents a named class and has a label (rdfs:label
) attribute. The root node owl:Thing
is also specified for functions like counting the node depths, etc. Moreover, we provide the WordnetTaxonomy
class that wraps the WordNet knowledge graph for easier access.
Note
It is also possible to use OntologyProjector
to extract triples from the ontology as edges of the taxonomy. We will consider this feature in the future.
Taxonomy(edges, root_node=None)
","text":"Class for building the taxonomy over structured data.
Attributes:
Name Type Descriptionnodes
list
A list of entity ids.
edges
list
A list of (parent, child)
pairs.
graph
networkx.DiGraph
A directed graph that represents the taxonomy.
root_node
Optional[str]
Optional root node id. Defaults to None
.
src/deeponto/onto/taxonomy.py
def __init__(self, edges: list, root_node: Optional[str] = None):\n self.edges = edges\n self.graph = nx.DiGraph(self.edges)\n self.nodes = list(self.graph.nodes)\n self.root_node = root_node\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_node_attributes","title":"get_node_attributes(entity_id)
","text":"Get the attributes of the given entity.
Source code insrc/deeponto/onto/taxonomy.py
def get_node_attributes(self, entity_id: str):\n\"\"\"Get the attributes of the given entity.\"\"\"\n return self.graph.nodes[entity_id]\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_children","title":"get_children(entity_id, apply_transitivity=False)
","text":"Get the set of children for a given entity.
Source code insrc/deeponto/onto/taxonomy.py
def get_children(self, entity_id: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of children for a given entity.\"\"\"\n if not apply_transitivity:\n return set(self.graph.successors(entity_id))\n else:\n return set(itertools.chain.from_iterable(nx.dfs_successors(self.graph, entity_id).values()))\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_parents","title":"get_parents(entity_id, apply_transitivity=False)
","text":"Get the set of parents for a given entity.
Source code insrc/deeponto/onto/taxonomy.py
def get_parents(self, entity_id: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of parents for a given entity.\"\"\"\n if not apply_transitivity:\n return set(self.graph.predecessors(entity_id))\n else:\n # NOTE: the nx.dfs_predecessors does not give desirable results\n frontier = list(self.get_parents(entity_id))\n explored = set()\n descendants = frontier\n while frontier:\n for candidate in frontier:\n descendants += list(self.get_parents(candidate))\n explored.update(frontier)\n frontier = set(descendants) - explored\n return set(descendants)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_descendant_graph","title":"get_descendant_graph(entity_id)
","text":"Create a descendant graph (networkx.DiGraph
) for a given entity.
src/deeponto/onto/taxonomy.py
def get_descendant_graph(self, entity_id: str):\nr\"\"\"Create a descendant graph (`networkx.DiGraph`) for a given entity.\"\"\"\n descendants = self.get_children(entity_id, apply_transitivity=True)\n return self.graph.subgraph(list(descendants))\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_shortest_node_depth","title":"get_shortest_node_depth(entity_id)
","text":"Get the shortest depth of the given entity in the taxonomy.
Source code insrc/deeponto/onto/taxonomy.py
def get_shortest_node_depth(self, entity_id: str):\n\"\"\"Get the shortest depth of the given entity in the taxonomy.\"\"\"\n if not self.root_node:\n raise RuntimeError(\"No root node specified.\")\n return nx.shortest_path_length(self.graph, self.root_node, entity_id)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_longest_node_depth","title":"get_longest_node_depth(entity_id)
","text":"Get the longest depth of the given entity in the taxonomy.
Source code insrc/deeponto/onto/taxonomy.py
def get_longest_node_depth(self, entity_id: str):\n\"\"\"Get the longest depth of the given entity in the taxonomy.\"\"\"\n if not self.root_node:\n raise RuntimeError(\"No root node specified.\")\n return max([len(p) for p in nx.all_simple_paths(self.graph, self.root_node, entity_id)])\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_lowest_common_ancestor","title":"get_lowest_common_ancestor(entity_id1, entity_id2)
","text":"Get the lowest common ancestor of the given two entities.
Source code insrc/deeponto/onto/taxonomy.py
def get_lowest_common_ancestor(self, entity_id1: str, entity_id2: str):\n\"\"\"Get the lowest common ancestor of the given two entities.\"\"\"\n return nx.lowest_common_ancestor(self.graph, entity_id1, entity_id2)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy","title":"OntologyTaxonomy(onto, reasoner_type='struct')
","text":" Bases: Taxonomy
Class for building the taxonomy (top-down subsumption graph) from an ontology.
The nodes of this graph are named classes only, but the hierarchy is enriched (beyond asserted axioms) by an ontology reasoner.
Attributes:
Name Type Descriptiononto
Ontology
The input ontology to build the taxonomy.
reasoner_type
str
The type of reasoner used. Defaults to \"struct\"
. Options are [\"hermit\", \"elk\", \"struct\"]
.
reasoner
OntologyReasoner
An ontology reasoner used for completing the hierarchy. If the reasoner_type
is the same as onto.reasoner_type
, then re-use onto.reasoner
; otherwise, create a new one.
root_node
str
The root node that represents owl:Thing
.
nodes
list
A list of named class IRIs.
edges
list
A list of (parent, child)
class pairs. That is, if \\(C \\sqsubseteq D\\), then \\((D, C)\\) will be added as an edge.
graph
networkx.DiGraph
A directed subsumption graph.
Source code insrc/deeponto/onto/taxonomy.py
def __init__(self, onto: Ontology, reasoner_type: str = \"struct\"):\n self.onto = onto\n # the reasoner is used for completing the hierarchy\n self.reasoner_type = reasoner_type\n # re-use onto.reasoner if the reasoner type is the same; otherwise create a new one\n self.reasoner = (\n self.onto.reasoner\n if reasoner_type == self.onto.reasoner_type\n else OntologyReasoner(self.onto, reasoner_type)\n )\n root_node = \"owl:Thing\"\n subsumption_pairs = []\n for cl_iri, cl in self.onto.owl_classes.items():\n # NOTE: this is different from using self.onto.get_asserted_parents which does not conduct simple reasoning\n named_parents = self.reasoner.get_inferred_super_entities(cl, direct=True)\n if not named_parents:\n # if no parents then add root node as the parent\n named_parents.append(root_node)\n for named_parent in named_parents:\n subsumption_pairs.append((named_parent, cl_iri))\n super().__init__(edges=subsumption_pairs, root_node=root_node)\n\n # set node annotations (rdfs:label)\n for class_iri in self.nodes:\n if class_iri == self.root_node:\n self.graph.nodes[class_iri][\"label\"] = \"Thing\"\n else:\n owl_class = self.onto.get_owl_object(class_iri)\n self.graph.nodes[class_iri][\"label\"] = self.onto.get_annotations(owl_class, RDFS_LABEL)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_parents","title":"get_parents(class_iri, apply_transitivity=False)
","text":"Get the set of parents for a given class.
It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning. For more advanced logical reasoning, use the DL reasoner self.onto.reasoner
instead.
src/deeponto/onto/taxonomy.py
def get_parents(self, class_iri: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of parents for a given class.\n\n It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning.\n For more advanced logical reasoning, use the DL reasoner `self.onto.reasoner` instead.\n \"\"\"\n return super().get_parents(class_iri, apply_transitivity)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_children","title":"get_children(class_iri, apply_transitivity=False)
","text":"Get the set of children for a given class.
It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning. For more advanced logical reasoning, use the DL reasoner self.onto.reasoner
instead.
src/deeponto/onto/taxonomy.py
def get_children(self, class_iri: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of children for a given class.\n\n It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning.\n For more advanced logical reasoning, use the DL reasoner `self.onto.reasoner` instead.\n \"\"\"\n return super().get_children(class_iri, apply_transitivity)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_descendant_graph","title":"get_descendant_graph(class_iri)
","text":"Create a descendant graph (networkx.DiGraph
) for a given ontology class.
src/deeponto/onto/taxonomy.py
def get_descendant_graph(self, class_iri: str):\nr\"\"\"Create a descendant graph (`networkx.DiGraph`) for a given ontology class.\"\"\"\n super().get_descendant_graph(class_iri)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_shortest_node_depth","title":"get_shortest_node_depth(class_iri)
","text":"Get the shortest depth of the given named class in the taxonomy.
Source code insrc/deeponto/onto/taxonomy.py
def get_shortest_node_depth(self, class_iri: str):\n\"\"\"Get the shortest depth of the given named class in the taxonomy.\"\"\"\n return nx.shortest_path_length(self.graph, self.root_node, class_iri)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_longest_node_depth","title":"get_longest_node_depth(class_iri)
","text":"Get the longest depth of the given named class in the taxonomy.
Source code insrc/deeponto/onto/taxonomy.py
def get_longest_node_depth(self, class_iri: str):\n\"\"\"Get the longest depth of the given named class in the taxonomy.\"\"\"\n return max([len(p) for p in nx.all_simple_paths(self.graph, self.root_node, class_iri)])\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_lowest_common_ancestor","title":"get_lowest_common_ancestor(class_iri1, class_iri2)
","text":"Get the lowest common ancestor of the given two named classes.
Source code insrc/deeponto/onto/taxonomy.py
def get_lowest_common_ancestor(self, class_iri1: str, class_iri2: str):\n\"\"\"Get the lowest common ancestor of the given two named classes.\"\"\"\n return super().get_lowest_common_ancestor(class_iri1, class_iri2)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy","title":"WordnetTaxonomy(pos='n', include_membership=False)
","text":" Bases: Taxonomy
Class for the building the taxonomy (hypernym graph) from wordnet.
Attributes:
Name Type Descriptionpos
str
The pos-tag of entities to be extracted from wordnet.
nodes
list
A list of entity ids extracted from wordnet.
edges
list
A list of hyponym-hypernym pairs.
graph
networkx.DiGraph
A directed hypernym graph.
Parameters:
Name Type Description Defaultpos
str
The pos-tag of entities to be extracted from wordnet.
'n'
include_membership
bool
Whether to include instance_hypernyms
or not (e.g., London is an instance of City). Defaults to False
.
False
Source code in src/deeponto/onto/taxonomy.py
def __init__(self, pos: str = \"n\", include_membership: bool = False):\nr\"\"\"Initialise the wordnet taxonomy.\n\n Args:\n pos (str): The pos-tag of entities to be extracted from wordnet.\n include_membership (bool): Whether to include `instance_hypernyms` or not (e.g., London is an instance of City). Defaults to `False`.\n \"\"\"\n\n self.pos = pos\n synsets = self.fetch_synsets(pos=pos)\n hypernym_pairs = self.fetch_hypernyms(synsets, include_membership)\n super().__init__(edges=hypernym_pairs)\n\n # set node annotations\n for synset in synsets:\n try:\n self.graph.nodes[synset.name()][\"name\"] = synset.name().split(\".\")[0].replace(\"_\", \" \")\n self.graph.nodes[synset.name()][\"definition\"] = synset.definition()\n except:\n continue\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy.fetch_synsets","title":"fetch_synsets(pos='n')
staticmethod
","text":"Get synsets of certain pos-tag from wordnet.
Source code insrc/deeponto/onto/taxonomy.py
@staticmethod\ndef fetch_synsets(pos: str = \"n\"):\n\"\"\"Get synsets of certain pos-tag from wordnet.\"\"\"\n words = wn.words()\n synsets = set()\n for word in words:\n synsets.update(wn.synsets(word, pos=pos))\n logger.info(f'{len(synsets)} synsets (pos=\"{pos}\") fetched.')\n return synsets\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy.fetch_hypernyms","title":"fetch_hypernyms(synsets, include_membership=False)
staticmethod
","text":"Get hypernym-hyponym pairs from a given set of wordnet synsets.
Source code insrc/deeponto/onto/taxonomy.py
@staticmethod\ndef fetch_hypernyms(synsets: set, include_membership: bool = False):\n\"\"\"Get hypernym-hyponym pairs from a given set of wordnet synsets.\"\"\"\n hypernym_hyponym_pairs = []\n for synset in synsets:\n for h_synset in synset.hypernyms():\n hypernym_hyponym_pairs.append((h_synset.name(), synset.name()))\n if include_membership:\n for h_synset in synset.instance_hypernyms():\n hypernym_hyponym_pairs.append((h_synset.name(), synset.name()))\n logger.info(f\"{len(hypernym_hyponym_pairs)} hypernym-hyponym pairs fetched.\")\n return hypernym_hyponym_pairs\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler","title":"TaxonomyNegativeSampler(taxonomy, entity_weights=None)
","text":"Class for the efficient negative sampling with buffer over the taxonomy.
Attributes:
Name Type Descriptiontaxonomy
str
The taxonomy for negative sampling.
entity_weights
Optional[dict]
A dictionary with the taxonomy entities as keys and their corresponding weights as values. Defaults to None
.
src/deeponto/onto/taxonomy.py
def __init__(self, taxonomy: Taxonomy, entity_weights: Optional[dict] = None):\n self.taxonomy = taxonomy\n self.entities = self.taxonomy.nodes\n # uniform distribution if weights not provided\n self.entity_weights = entity_weights\n\n self._entity_probs = None\n if self.entity_weights:\n self._entity_probs = np.array([self.entity_weights[e] for e in self.entities])\n self._entity_probs = self._entity_probs / self._entity_probs.sum()\n self._buffer = []\n self._default_buffer_size = 10000\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler.fill","title":"fill(buffer_size=None)
","text":"Buffer a large collection of entities sampled with replacement for faster negative sampling.
Source code insrc/deeponto/onto/taxonomy.py
def fill(self, buffer_size: Optional[int] = None):\n\"\"\"Buffer a large collection of entities sampled with replacement for faster negative sampling.\"\"\"\n buffer_size = buffer_size if buffer_size else self._default_buffer_size\n if self._entity_probs:\n self._buffer = np.random.choice(self.entities, size=buffer_size, p=self._entity_probs)\n else:\n self._buffer = np.random.choice(self.entities, size=buffer_size)\n
"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler.sample","title":"sample(entity_id, n_samples, buffer_size=None)
","text":"Sample N negative samples for a given entity with replacement.
Source code insrc/deeponto/onto/taxonomy.py
def sample(self, entity_id: str, n_samples: int, buffer_size: Optional[int] = None):\n\"\"\"Sample N negative samples for a given entity with replacement.\"\"\"\n negative_samples = []\n positive_samples = self.taxonomy.get_parents(entity_id, True)\n while len(negative_samples) < n_samples:\n if len(self._buffer) < n_samples:\n self.fill(buffer_size)\n negative_samples += list(filter(lambda x: x not in positive_samples, self._buffer[:n_samples]))\n self._buffer = self._buffer[n_samples:] # remove the samples from the buffer\n return negative_samples[:n_samples]\n
"},{"location":"deeponto/onto/verbalisation/","title":"Ontology Verbalisation","text":"Verbalising an ontology into natural language texts is a challenging task. \\(\\textsf{DeepOnto}\\) provides some basic building blocks for achieving this goal. The implemented OntologyVerbaliser
is essentially a recursive concept verbaliser that first splits a complex concept \\(C\\) into a sub-formula tree, verbalising the leaf nodes (atomic concepts or object properties) by their names, then merging the verbalised child nodes according to the logical pattern at their parent node.
Please cite the following paper if you consider using our verbaliser.
Paper
The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).
@inproceedings{he-etal-2023-language,\n title = \"Language Model Analysis for Ontology Subsumption Inference\",\n author = \"He, Yuan and\n Chen, Jiaoyan and\n Jimenez-Ruiz, Ernesto and\n Dong, Hang and\n Horrocks, Ian\",\n booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n month = jul,\n year = \"2023\",\n address = \"Toronto, Canada\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2023.findings-acl.213\",\n doi = \"10.18653/v1/2023.findings-acl.213\",\n pages = \"3439--3453\"\n}\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser","title":"OntologyVerbaliser(onto, apply_lowercasing=False, keep_iri=False, apply_auto_correction=False, add_quantifier_word=False)
","text":"A recursive natural language verbaliser for the OWL logical expressions, e.g., OWLAxiom
and OWLClassExpression
.
The concept patterns supported by this verbaliser are shown below:
Pattern Verbalisation (\\(\\mathcal{V}\\)) \\(A\\) (atomic) the name (\\(\\texttt{rdfs:label}\\)) of \\(A\\) (auto-correction is optional) \\(r\\) (property) the name (\\(\\texttt{rdfs:label}\\)) of \\(r\\) (auto-correction is optional) \\(\\neg C\\) \"not \\(\\mathcal{V}(C)\\)\" \\(\\exists r.C\\) \"something that \\(\\mathcal{V}(r)\\) some \\(\\mathcal{V}(C)\\)\" (the quantifier word \"some\" is optional) \\(\\forall r.C\\) \"something that \\(\\mathcal{V}(r)\\) only \\(\\mathcal{V}(C)\\)\" (the quantifier word \"only\" is optional) \\(C_1 \\sqcap ... \\sqcap C_n\\) if \\(C_i = \\exists/\\forall r.D_i\\) and \\(C_j = \\exists/\\forall r.D_j\\), they will be re-written into \\(\\exists/\\forall r.(D_i \\sqcap D_j)\\) before verbalisation; suppose after re-writing the new expression is \\(C_1 \\sqcap ... \\sqcap C_{n'}\\)(a) if all \\(C_i\\)s (for \\(i = 1, ..., n'\\)) are restrictions, in the form of \\(\\exists/\\forall r_i.D_i\\): \"something that \\(\\mathcal{V}(r_1)\\) some/only \\(V(D_1)\\) and ... and \\(\\mathcal{V}(r_{n'})\\) some/only \\(V(D_{n'})\\)\" (b) if some \\(C_i\\)s (for \\(i = m+1, ..., n'\\)) are restrictions, in the form of \\(\\exists/\\forall r_i.D_i\\): \"\\(\\mathcal{V}(C_{1})\\) and ... and \\(\\mathcal{V}(C_{m})\\) that \\(\\mathcal{V}(r_{m+1})\\) some/only \\(V(D_{m+1})\\) and ... and \\(\\mathcal{V}(r_{n'})\\) some/only \\(V(D_{n'})\\)\" (c) if no \\(C_i\\) is a restriction: \"\\(\\mathcal{V}(C_{1})\\) and ... and \\(\\mathcal{V}(C_{n'})\\)\" \\(C_1 \\sqcup ... \\sqcup C_n\\) similar to verbalising \\(C_1 \\sqcap ... \\sqcap C_n\\) except that \"and\" is replaced by \"or\" and case (b) uses the same verbalisation as case (c) \\(r_1 \\cdot r_2\\) (property chain) \\(\\mathcal{V}(r_1)\\) something that \\(\\mathcal{V}(r_2)\\)
With this concept verbaliser, a range of OWL axioms are supported:
The verbaliser operates at the concept level, and an additional template is needed to integrate the verbalised components of an axiom.
Warning
This verbaliser utilises spacy for POS tagging used in the auto-correction of property names. Automatic download of the rule-based library en_core_web_sm
is available at the init function. However, if you somehow cannot find it, please manually download it using python -m spacy download en_core_web_sm
.
Attributes:
Name Type Descriptiononto
Ontology
An ontology whose entities and axioms are to be verbalised.
parser
OntologySyntaxParser
A syntax parser for the string representation of an OWLObject
.
vocab
dict[str, list[str]]
A dictionary with (entity_iri, entity_name)
pairs, by default the names are retrieved from \\(\\texttt{rdfs:label}\\).
apply_lowercasing
bool
Whether to apply lowercasing to the entity names. Defaults to False
.
keep_iri
bool
Whether to keep the IRIs of entities without verbalising them using self.vocab
. Defaults to False
.
apply_auto_correction
bool
Whether to automatically apply rule-based auto-correction to entity names. Defaults to False
.
add_quantifier_word
bool
Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to False
.
Parameters:
Name Type Description Defaultonto
Ontology
An ontology whose entities and axioms are to be verbalised.
requiredapply_lowercasing
bool
Whether to apply lowercasing to the entity names. Defaults to False
.
False
keep_iri
bool
Whether to keep the IRIs of entities without verbalising them using self.vocab
. Defaults to False
.
False
apply_auto_correction
bool
Whether to automatically apply rule-based auto-correction to entity names. Defaults to False
.
False
add_quantifier_word
bool
Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to False
.
False
Source code in src/deeponto/onto/verbalisation.py
def __init__(\n self,\n onto: Ontology,\n apply_lowercasing: bool = False,\n keep_iri: bool = False,\n apply_auto_correction: bool = False,\n add_quantifier_word: bool = False,\n):\n\"\"\"Initialise an ontology verbaliser.\n\n Args:\n onto (Ontology): An ontology whose entities and axioms are to be verbalised.\n apply_lowercasing (bool, optional): Whether to apply lowercasing to the entity names. Defaults to `False`.\n keep_iri (bool, optional): Whether to keep the IRIs of entities without verbalising them using `self.vocab`. Defaults to `False`.\n apply_auto_correction (bool, optional): Whether to automatically apply rule-based auto-correction to entity names. Defaults to `False`.\n add_quantifier_word (bool, optional): Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to `False`.\n \"\"\"\n self.onto = onto\n self.parser = OntologySyntaxParser()\n\n # download en_core_web_sm for object property\n try:\n spacy.load(\"en_core_web_sm\")\n except:\n print(\"Download `en_core_web_sm` for pos tagger.\")\n os.system(\"python -m spacy download en_core_web_sm\")\n\n self.nlp = spacy.load(\"en_core_web_sm\")\n\n # build the default vocabulary for entities\n self.apply_lowercasing_to_vocab = apply_lowercasing\n self.vocab = dict()\n for entity_type in [\"Classes\", \"ObjectProperties\", \"DataProperties\", \"Individuals\"]:\n entity_annotations, _ = self.onto.build_annotation_index(\n entity_type=entity_type, apply_lowercasing=self.apply_lowercasing_to_vocab\n )\n self.vocab.update(**entity_annotations)\n literal_or_iri = lambda k, v: list(v)[0] if v else k # set vocab to IRI if no string available\n self.vocab = {k: literal_or_iri(k, v) for k, v in self.vocab.items()} # only set one name for each entity\n\n self.keep_iri = keep_iri\n self.apply_auto_correction = apply_auto_correction\n self.add_quantifier_word = add_quantifier_word\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.update_entity_name","title":"update_entity_name(entity_iri, entity_name)
","text":"Update the name of an entity in self.vocab
.
If you want to change the name of a specific entity, you should call this function before applying verbalisation.
Source code insrc/deeponto/onto/verbalisation.py
def update_entity_name(self, entity_iri: str, entity_name: str):\n\"\"\"Update the name of an entity in `self.vocab`.\n\n If you want to change the name of a specific entity, you should call this\n function before applying verbalisation.\n \"\"\"\n self.vocab[entity_iri] = entity_name\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_expression","title":"verbalise_class_expression(class_expression)
","text":"Verbalise a class expression (OWLClassExpression
) or its parsed form (in RangeNode
).
See currently supported types of class (or concept) expressions here.
Parameters:
Name Type Description Defaultclass_expression
Union[OWLClassExpression, str, RangeNode]
A class expression to be verbalised.
requiredRaises:
Type DescriptionRuntimeError
Occurs when the class expression is not in one of the supported types.
Returns:
Type DescriptionCfgNode
A nested dictionary that presents the recursive results of verbalisation. The verbalised string can be accessed with the key [\"verbal\"]
or with the attribute .verbal
.
src/deeponto/onto/verbalisation.py
def verbalise_class_expression(self, class_expression: Union[OWLClassExpression, str, RangeNode]):\nr\"\"\"Verbalise a class expression (`OWLClassExpression`) or its parsed form (in `RangeNode`).\n\n See currently supported types of class (or concept) expressions [here][deeponto.onto.verbalisation.OntologyVerbaliser].\n\n\n Args:\n class_expression (Union[OWLClassExpression, str, RangeNode]): A class expression to be verbalised.\n\n Raises:\n RuntimeError: Occurs when the class expression is not in one of the supported types.\n\n Returns:\n (CfgNode): A nested dictionary that presents the recursive results of verbalisation. The verbalised string\n can be accessed with the key `[\"verbal\"]` or with the attribute `.verbal`.\n \"\"\"\n\n if not isinstance(class_expression, RangeNode):\n parsed_class_expression = self.parser.parse(class_expression).children[0] # skip the root node\n else:\n parsed_class_expression = class_expression\n\n # for a singleton IRI\n if parsed_class_expression.is_iri:\n return self._verbalise_iri(parsed_class_expression)\n\n if parsed_class_expression.name.startswith(\"NEG\"):\n # negation only has one child\n cl = self.verbalise_class_expression(parsed_class_expression.children[0])\n return CfgNode({\"verbal\": \"not \" + cl.verbal, \"class\": cl, \"type\": \"NEG\"})\n\n # for existential and universal restrictions\n if parsed_class_expression.name.startswith(\"EX.\") or parsed_class_expression.name.startswith(\"ALL\"):\n return self._verbalise_restriction(parsed_class_expression)\n\n # for conjunction and disjunction\n if parsed_class_expression.name.startswith(\"AND\") or parsed_class_expression.name.startswith(\"OR\"):\n return self._verbalise_junction(parsed_class_expression)\n\n # for a property chain\n if parsed_class_expression.name.startswith(\"OPC\"):\n return self._verbalise_property(parsed_class_expression)\n\n raise RuntimeError(f\"Input class expression `{str(class_expression)}` is not in one of the supported types.\")\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_subsumption_axiom","title":"verbalise_class_subsumption_axiom(class_subsumption_axiom)
","text":"Verbalise a class subsumption axiom.
The subsumption axiom can have two forms:
SubClassOf
axiom;SuperClassOf
axiom.Parameters:
Name Type Description Defaultclass_subsumption_axiom
OWLAxiom
Then class subsumption axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised sub-concept \\(\\mathcal{V}(C_{sub})\\) and super-concept \\(\\mathcal{V}(C_{super})\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_class_subsumption_axiom(self, class_subsumption_axiom: OWLAxiom):\nr\"\"\"Verbalise a class subsumption axiom.\n\n The subsumption axiom can have two forms:\n\n - $C_{sub} \\sqsubseteq C_{super}$, the `SubClassOf` axiom;\n - $C_{super} \\sqsupseteq C_{sub}$, the `SuperClassOf` axiom.\n\n Args:\n class_subsumption_axiom (OWLAxiom): Then class subsumption axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised sub-concept $\\mathcal{V}(C_{sub})$ and super-concept $\\mathcal{V}(C_{super})$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(class_subsumption_axiom, \"SubClassOf\", \"SuperClassOf\")\n\n parsed_subsumption_axiom = self.parser.parse(class_subsumption_axiom).children[0] # skip the root node\n if str(class_subsumption_axiom).startswith(\"SubClassOf\"):\n parsed_sub_class, parsed_super_class = parsed_subsumption_axiom.children\n elif str(class_subsumption_axiom).startswith(\"SuperClassOf\"):\n parsed_super_class, parsed_sub_class = parsed_subsumption_axiom.children\n\n verbalised_sub_class = self.verbalise_class_expression(parsed_sub_class)\n verbalised_super_class = self.verbalise_class_expression(parsed_super_class)\n return verbalised_sub_class, verbalised_super_class\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_equivalence_axiom","title":"verbalise_class_equivalence_axiom(class_equivalence_axiom)
","text":"Verbalise a class equivalence axiom.
The equivalence axiom has the form \\(C \\equiv D\\).
Parameters:
Name Type Description Defaultclass_equivalence_axiom
OWLAxiom
The class equivalence axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised concept \\(\\mathcal{V}(C)\\) and its equivalent concept \\(\\mathcal{V}(D)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_class_equivalence_axiom(self, class_equivalence_axiom: OWLAxiom):\nr\"\"\"Verbalise a class equivalence axiom.\n\n The equivalence axiom has the form $C \\equiv D$.\n\n Args:\n class_equivalence_axiom (OWLAxiom): The class equivalence axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised concept $\\mathcal{V}(C)$ and its equivalent concept $\\mathcal{V}(D)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(class_equivalence_axiom, \"EquivalentClasses\")\n\n parsed_equivalence_axiom = self.parser.parse(class_equivalence_axiom).children[0] # skip the root node\n parsed_class_left, parsed_class_right = parsed_equivalence_axiom.children\n\n verbalised_left_class = self.verbalise_class_expression(parsed_class_left)\n verbalised_right_class = self.verbalise_class_expression(parsed_class_right)\n return verbalised_left_class, verbalised_right_class\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_assertion_axiom","title":"verbalise_class_assertion_axiom(class_assertion_axiom)
","text":"Verbalise a class assertion axiom.
The class assertion axiom has the form \\(C(x)\\).
Parameters:
Name Type Description Defaultclass_assertion_axiom
OWLAxiom
The class assertion axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised class \\(\\mathcal{V}(C)\\) and individual \\(\\mathcal{V}(x)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_class_assertion_axiom(self, class_assertion_axiom: OWLAxiom):\nr\"\"\"Verbalise a class assertion axiom.\n\n The class assertion axiom has the form $C(x)$.\n\n Args:\n class_assertion_axiom (OWLAxiom): The class assertion axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised class $\\mathcal{V}(C)$ and individual $\\mathcal{V}(x)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(class_assertion_axiom, \"ClassAssertion\")\n\n parsed_equivalence_axiom = self.parser.parse(class_assertion_axiom).children[0] # skip the root node\n parsed_class, parsed_individual = parsed_equivalence_axiom.children\n\n verbalised_class = self.verbalise_class_expression(parsed_class)\n verbalised_individual = self._verbalise_iri(parsed_individual)\n return verbalised_class, verbalised_individual\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_subsumption_axiom","title":"verbalise_object_property_subsumption_axiom(object_property_subsumption_axiom)
","text":"Verbalise an object property subsumption axiom.
The subsumption axiom can have two forms:
SubObjectPropertyOf
axiom;SuperObjectPropertyOf
axiom.Parameters:
Name Type Description Defaultobject_property_subsumption_axiom
OWLAxiom
The object property subsumption axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised sub-property \\(\\mathcal{V}(r_{sub})\\) and super-property \\(\\mathcal{V}(r_{super})\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_object_property_subsumption_axiom(self, object_property_subsumption_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property subsumption axiom.\n\n The subsumption axiom can have two forms:\n\n - $r_{sub} \\sqsubseteq r_{super}$, the `SubObjectPropertyOf` axiom;\n - $r_{super} \\sqsupseteq r_{sub}$, the `SuperObjectPropertyOf` axiom.\n\n Args:\n object_property_subsumption_axiom (OWLAxiom): The object property subsumption axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised sub-property $\\mathcal{V}(r_{sub})$ and super-property $\\mathcal{V}(r_{super})$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(\n object_property_subsumption_axiom,\n \"SubObjectPropertyOf\",\n \"SuperObjectPropertyOf\",\n \"SubPropertyChainOf\",\n \"SuperPropertyChainOf\",\n )\n\n parsed_subsumption_axiom = self.parser.parse(object_property_subsumption_axiom).children[\n 0\n ] # skip the root node\n if str(object_property_subsumption_axiom).startswith(\"SubObjectPropertyOf\"):\n parsed_sub_property, parsed_super_property = parsed_subsumption_axiom.children\n elif str(object_property_subsumption_axiom).startswith(\"SuperObjectPropertyOf\"):\n parsed_super_property, parsed_sub_property = parsed_subsumption_axiom.children\n\n verbalised_sub_property = self._verbalise_property(parsed_sub_property)\n verbalised_super_property = self._verbalise_property(parsed_super_property)\n return verbalised_sub_property, verbalised_super_property\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_assertion_axiom","title":"verbalise_object_property_assertion_axiom(object_property_assertion_axiom)
","text":"Verbalise an object property assertion axiom.
The object property assertion axiom has the form \\(r(x, y)\\).
Parameters:
Name Type Description Defaultobject_property_assertion_axiom
OWLAxiom
The object property assertion axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised object property \\(\\mathcal{V}(r)\\) and two individuals \\(\\mathcal{V}(x)\\) and \\(\\mathcal{V}(y)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_object_property_assertion_axiom(self, object_property_assertion_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property assertion axiom.\n\n The object property assertion axiom has the form $r(x, y)$.\n\n Args:\n object_property_assertion_axiom (OWLAxiom): The object property assertion axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and two individuals $\\mathcal{V}(x)$ and $\\mathcal{V}(y)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(object_property_assertion_axiom, \"ObjectPropertyAssertion\")\n\n # skip the root node\n parsed_object_property_assertion_axiom = self.parser.parse(object_property_assertion_axiom).children[0]\n parsed_obj_prop, parsed_indiv_x, parsed_indiv_y = parsed_object_property_assertion_axiom.children\n\n verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n verbalised_individual_x = self._verbalise_iri(parsed_indiv_x)\n verbalised_individual_y = self._verbalise_iri(parsed_indiv_y)\n return verbalised_object_property, verbalised_individual_x, verbalised_individual_y\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_domain_axiom","title":"verbalise_object_property_domain_axiom(object_property_domain_axiom)
","text":"Verbalise an object property domain axiom.
The domain of a property \\(r: X \\rightarrow Y\\) specifies the concept expression \\(X\\) of its subject.
Parameters:
Name Type Description Defaultobject_property_domain_axiom
OWLAxiom
The object property domain axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised object property \\(\\mathcal{V}(r)\\) and its domain \\(\\mathcal{V}(X)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_object_property_domain_axiom(self, object_property_domain_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property domain axiom.\n\n The domain of a property $r: X \\rightarrow Y$ specifies the concept expression $X$ of its subject.\n\n Args:\n object_property_domain_axiom (OWLAxiom): The object property domain axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and its domain $\\mathcal{V}(X)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(object_property_domain_axiom, \"ObjectPropertyDomain\")\n\n # skip the root node\n parsed_object_property_domain_axiom = self.parser.parse(object_property_domain_axiom).children[0]\n parsed_obj_prop, parsed_obj_prop_domain = parsed_object_property_domain_axiom.children\n\n verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n verbalised_object_property_domain = self.verbalise_class_expression(parsed_obj_prop_domain)\n\n return verbalised_object_property, verbalised_object_property_domain\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_range_axiom","title":"verbalise_object_property_range_axiom(object_property_range_axiom)
","text":"Verbalise an object property range axiom.
The range of a property \\(r: X \\rightarrow Y\\) specifies the concept expression \\(Y\\) of its object.
Parameters:
Name Type Description Defaultobject_property_range_axiom
OWLAxiom
The object property range axiom to be verbalised.
requiredReturns:
Type DescriptionTuple[CfgNode, CfgNode]
The verbalised object property \\(\\mathcal{V}(r)\\) and its range \\(\\mathcal{V}(Y)\\) (order matters).
Source code insrc/deeponto/onto/verbalisation.py
def verbalise_object_property_range_axiom(self, object_property_range_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property range axiom.\n\n The range of a property $r: X \\rightarrow Y$ specifies the concept expression $Y$ of its object.\n\n Args:\n object_property_range_axiom (OWLAxiom): The object property range axiom to be verbalised.\n\n Returns:\n (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and its range $\\mathcal{V}(Y)$ (order matters).\n \"\"\"\n\n # input check\n self._axiom_input_check(object_property_range_axiom, \"ObjectPropertyRange\")\n\n # skip the root node\n parsed_object_property_range_axiom = self.parser.parse(object_property_range_axiom).children[0]\n parsed_obj_prop, parsed_obj_prop_range = parsed_object_property_range_axiom.children\n\n verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n verbalised_object_property_range = self.verbalise_class_expression(parsed_obj_prop_range)\n\n return verbalised_object_property, verbalised_object_property_range\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser","title":"OntologySyntaxParser()
","text":"A syntax parser for the OWL logical expressions, e.g., OWLAxiom
and OWLClassExpression
.
It makes use of the string representation (based on Manchester Syntax) defined in the OWLAPI. In Python, such string can be accessed by simply using str(some_owl_object)
.
To keep the Java import in the main Ontology
class, this parser does not deal with OWLAxiom
directly but instead its string representation.
Due to the OWLObject
syntax, this parser relies on two components:
RangeNode
).As a result, it will return a RangeNode
that specifies the sub-formulas (and their respective positions in the string representation) in a tree structure.
Examples:
Suppose the input is an OWLAxiom
that has the string representation:
>>> str(owl_axiom)\n>>> 'EquivalentClasses(<http://purl.obolibrary.org/obo/FOODON_00001707> ObjectIntersectionOf(<http://purl.obolibrary.org/obo/FOODON_00002044> ObjectSomeValuesFrom(<http://purl.obolibrary.org/obo/RO_0001000> <http://purl.obolibrary.org/obo/FOODON_03412116>)) )'\n
This corresponds to the following logical expression:
\\[ CephalopodFoodProduct \\equiv MolluskFoodProduct \\sqcap \\exists derivesFrom.Cephalopod \\]After apply the parser, a RangeNode
will be returned which can be rentered as:
axiom_parser = OntologySyntaxParser()\nprint(axiom_parser.parse(str(owl_axiom)).render_tree())\n
Output:
Root@[0:inf]\n\u2514\u2500\u2500 EQV@[0:212]\n \u251c\u2500\u2500 FOODON_00001707@[6:54]\n \u2514\u2500\u2500 AND@[55:210]\n \u251c\u2500\u2500 FOODON_00002044@[61:109]\n \u2514\u2500\u2500 EX.@[110:209]\n \u251c\u2500\u2500 RO_0001000@[116:159]\n \u2514\u2500\u2500 FOODON_03412116@[160:208]\n
Or, if graphviz
(installed by e.g., sudo apt install graphviz
) is available, you can visualise the tree as an image by:
axiom_parser.parse(str(owl_axiom)).render_image()\n
Output:
The name for each node has the form {node_type}@[{start}:{end}]
, which means a node of the type {node_type}
is located at the range [{start}:{end}]
in the abbreviated expression (see abbreviate_owl_expression
below).
The leaf nodes are IRIs and they are represented by the last segment (split by \"/\"
) of the whole IRI.
Child nodes can be accessed by .children
, the string representation of the sub-formula in this node can be accessed by .text
. For example:
parser.parse(str(owl_axiom)).children[0].children[1].text\n
Output:
'[AND](<http://purl.obolibrary.org/obo/FOODON_00002044> [EX.](<http://purl.obolibrary.org/obo/RO_0001000> <http://purl.obolibrary.org/obo/FOODON_03412116>))'\n
Source code in src/deeponto/onto/verbalisation.py
def __init__(self):\n pass\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.abbreviate_owl_expression","title":"abbreviate_owl_expression(owl_expression)
","text":"Abbreviate the string representations of logical operators to a fixed length (easier for parsing).
The abbreviations are specified at deeponto.onto.verbalisation.ABBREVIATION_DICT
.
Parameters:
Name Type Description Defaultowl_expression
str
The string representation of an OWLObject
.
Returns:
Type Descriptionstr
The modified string representation of this OWLObject
where the logical operators are abbreviated.
src/deeponto/onto/verbalisation.py
def abbreviate_owl_expression(self, owl_expression: str):\nr\"\"\"Abbreviate the string representations of logical operators to a\n fixed length (easier for parsing).\n\n The abbreviations are specified at `deeponto.onto.verbalisation.ABBREVIATION_DICT`.\n\n Args:\n owl_expression (str): The string representation of an `OWLObject`.\n\n Returns:\n (str): The modified string representation of this `OWLObject` where the logical operators are abbreviated.\n \"\"\"\n for k, v in ABBREVIATION_DICT.items():\n owl_expression = owl_expression.replace(k, v)\n return owl_expression\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.parse","title":"parse(owl_expression)
","text":"Parse an OWLAxiom
into a RangeNode
.
This is the main entry for using the parser, which relies on the parse_by_parentheses
method below.
Parameters:
Name Type Description Defaultowl_expression
Union[str, OWLObject]
The string representation of an OWLObject
or the OWLObject
itself.
Returns:
Type DescriptionRangeNode
A parsed syntactic tree given what parentheses to be matched.
Source code insrc/deeponto/onto/verbalisation.py
def parse(self, owl_expression: Union[str, OWLObject]) -> RangeNode:\nr\"\"\"Parse an `OWLAxiom` into a [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n This is the main entry for using the parser, which relies on the [`parse_by_parentheses`][deeponto.onto.verbalisation.OntologySyntaxParser.parse_by_parentheses]\n method below.\n\n Args:\n owl_expression (Union[str, OWLObject]): The string representation of an `OWLObject` or the `OWLObject` itself.\n\n Returns:\n (RangeNode): A parsed syntactic tree given what parentheses to be matched.\n \"\"\"\n if not isinstance(owl_expression, str):\n owl_expression = str(owl_expression)\n owl_expression = self.abbreviate_owl_expression(owl_expression)\n # print(\"To parse the following (transformed) axiom text:\\n\", owl_expression)\n # parse complex patterns first\n cur_parsed = self.parse_by_parentheses(owl_expression)\n # parse the IRI patterns latter\n return self.parse_by_parentheses(owl_expression, cur_parsed, for_iri=True)\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.parse_by_parentheses","title":"parse_by_parentheses(owl_expression, already_parsed=None, for_iri=False)
classmethod
","text":"Parse an OWLAxiom
based on parentheses matching into a RangeNode
.
This function needs to be applied twice to get a fully parsed RangeNode
because IRIs have a different parenthesis pattern.
Parameters:
Name Type Description Defaultowl_expression
str
The string representation of an OWLObject
.
already_parsed
RangeNode
A partially parsed RangeNode
to continue with. Defaults to None
.
None
for_iri
bool
Parentheses are by default ()
but will be changed to <>
for IRIs. Defaults to False
.
False
Raises:
Type DescriptionRuntimeError
Raised when the input axiom text is nor properly formatted.
Returns:
Type DescriptionRangeNode
A parsed syntactic tree given what parentheses to be matched.
Source code insrc/deeponto/onto/verbalisation.py
@classmethod\ndef parse_by_parentheses(\n cls, owl_expression: str, already_parsed: RangeNode = None, for_iri: bool = False\n) -> RangeNode:\nr\"\"\"Parse an `OWLAxiom` based on parentheses matching into a [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n This function needs to be applied twice to get a fully parsed [`RangeNode`][deeponto.onto.verbalisation.RangeNode] because IRIs have\n a different parenthesis pattern.\n\n Args:\n owl_expression (str): The string representation of an `OWLObject`.\n already_parsed (RangeNode, optional): A partially parsed [`RangeNode`][deeponto.onto.verbalisation.RangeNode] to continue with. Defaults to `None`.\n for_iri (bool, optional): Parentheses are by default `()` but will be changed to `<>` for IRIs. Defaults to `False`.\n\n Raises:\n RuntimeError: Raised when the input axiom text is nor properly formatted.\n\n Returns:\n (RangeNode): A parsed syntactic tree given what parentheses to be matched.\n \"\"\"\n if not already_parsed:\n # a root node that covers the entire sentence\n parsed = RangeNode(0, math.inf, name=f\"Root\", text=owl_expression, is_iri=False)\n else:\n parsed = already_parsed\n stack = []\n left_par = \"(\"\n right_par = \")\"\n if for_iri:\n left_par = \"<\"\n right_par = \">\"\n\n for i, c in enumerate(owl_expression):\n if c == left_par:\n stack.append(i)\n if c == right_par:\n try:\n start = stack.pop()\n end = i\n if not for_iri:\n # the first character is actually \"[\"\n real_start = start - 5\n axiom_type = owl_expression[real_start + 1 : start - 1]\n node = RangeNode(\n real_start,\n end + 1,\n name=f\"{axiom_type}\",\n text=owl_expression[real_start : end + 1],\n is_iri=False,\n )\n parsed.insert_child(node)\n else:\n # no preceding characters for just atomic class (IRI)\n abbr_iri = owl_expression[start : end + 1].split(\"/\")[-1].rstrip(\">\")\n node = RangeNode(\n start, end + 1, name=abbr_iri, text=owl_expression[start : end + 1], is_iri=True\n )\n parsed.insert_child(node)\n except IndexError:\n print(\"Too many closing parentheses\")\n\n if stack: # check if stack is empty afterwards\n raise RuntimeError(\"Too many opening parentheses\")\n\n return parsed\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode","title":"RangeNode(start, end, name=None, **kwargs)
","text":" Bases: NodeMixin
A tree implementation for ranges (without partial overlap).
[1, 10]
is a parent of [2, 5]
.[2, 4]
and [3, 5]
cannot appear in the same RangeNodeTree
.src/deeponto/onto/verbalisation.py
def __init__(self, start, end, name=None, **kwargs):\n if start >= end:\n raise RuntimeError(\"invalid start and end positions ...\")\n self.start = start\n self.end = end\n self.name = \"Root\" if not name else name\n self.name = f\"{self.name}@[{self.start}:{self.end}]\" # add start and ent to the name\n for k, v in kwargs.items():\n setattr(self, k, v)\n super().__init__()\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.__gt__","title":"__gt__(other)
","text":"Compare two ranges if they have a different start
and/or a different end
.
\"irrelevant\"
: if range \\(R_1\\) and range \\(R_2\\) have no overlap.Warning
Partial overlap is not allowed.
Source code insrc/deeponto/onto/verbalisation.py
def __gt__(self, other: RangeNode):\nr\"\"\"Compare two ranges if they have a different `start` and/or a different `end`.\n\n - $R_1 \\lt R_2$: if range $R_1$ is completely contained in range $R_2$, and $R_1 \\neq R_2$.\n - $R_1 \\gt R_2$: if range $R_2$ is completely contained in range $R_1$, and $R_1 \\neq R_2$.\n - `\"irrelevant\"`: if range $R_1$ and range $R_2$ have no overlap.\n\n !!! warning\n\n Partial overlap is not allowed.\n \"\"\"\n # ranges inside\n if self.start <= other.start and other.end <= self.end:\n return True\n\n # ranges outside\n if other.start <= self.start and self.end <= other.end:\n return False\n\n if other.end < self.start or self.end < other.start:\n return \"irrelevant\"\n\n raise RuntimeError(\"Compared ranges have a partial overlap.\")\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.sort_by_start","title":"sort_by_start(nodes)
staticmethod
","text":"A sorting function that sorts the nodes by their starting positions.
Source code insrc/deeponto/onto/verbalisation.py
@staticmethod\ndef sort_by_start(nodes: List[RangeNode]):\n\"\"\"A sorting function that sorts the nodes by their starting positions.\"\"\"\n temp = {sib: sib.start for sib in nodes}\n return list(dict(sorted(temp.items(), key=lambda item: item[1])).keys())\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.insert_child","title":"insert_child(node)
","text":"Inserting a child RangeNode
.
Child nodes have a smaller (inclusive) range, e.g., [2, 5]
is a child of [1, 6]
.
src/deeponto/onto/verbalisation.py
def insert_child(self, node: RangeNode):\nr\"\"\"Inserting a child [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n Child nodes have a smaller (inclusive) range, e.g., `[2, 5]` is a child of `[1, 6]`.\n \"\"\"\n if node > self:\n raise RuntimeError(\"invalid child node\")\n if node.start == self.start and node.end == self.end:\n # duplicated node\n return\n # print(self.children)\n if self.children:\n inserted = False\n for ch in self.children:\n if (node < ch) is True:\n # print(\"further down\")\n ch.insert_child(node)\n inserted = True\n break\n elif (node > ch) is True:\n # print(\"insert in between\")\n ch.parent = node\n # NOTE: should not break here as it could be parent of multiple children !\n # break\n # NOTE: the equal case is when two nodes are exactly the same, no operation needed\n if not inserted:\n self.children = list(self.children) + [node]\n self.children = self.sort_by_start(self.children)\n else:\n node.parent = self\n self.children = [node]\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.render_tree","title":"render_tree()
","text":"Render the whole tree.
Source code insrc/deeponto/onto/verbalisation.py
def render_tree(self):\n\"\"\"Render the whole tree.\"\"\"\n return RenderTree(self)\n
"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.render_image","title":"render_image()
","text":"Calling this function will generate a temporary range_node.png
file which will be displayed.
To make this visualisation work, you need to install graphviz
by, e.g.,
sudo apt install graphviz\n
Source code in src/deeponto/onto/verbalisation.py
def render_image(self):\n\"\"\"Calling this function will generate a temporary `range_node.png` file\n which will be displayed.\n\n To make this visualisation work, you need to install `graphviz` by, e.g.,\n\n ```bash\n sudo apt install graphviz\n ```\n \"\"\"\n RenderTreeGraph(self).to_picture(\"range_node.png\")\n return Image(\"range_node.png\")\n
"},{"location":"deeponto/utils/data_utils/","title":"Data Utilities","text":""},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.set_seed","title":"set_seed(seed)
","text":"Set seed function imported from transformers.
Source code insrc/deeponto/utils/data_utils.py
def set_seed(seed):\n\"\"\"Set seed function imported from transformers.\"\"\"\n t_set_seed(seed)\n
"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.sort_dict_by_values","title":"sort_dict_by_values(dic, desc=True, k=None)
","text":"Return a sorted dict by values with first k reserved if provided.
Source code insrc/deeponto/utils/data_utils.py
def sort_dict_by_values(dic: dict, desc: bool = True, k: Optional[int] = None):\n\"\"\"Return a sorted dict by values with first k reserved if provided.\"\"\"\n sorted_items = list(sorted(dic.items(), key=lambda item: item[1], reverse=desc))\n return dict(sorted_items[:k])\n
"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.uniqify","title":"uniqify(ls)
","text":"Return a list of unique elements without messing around the order
Source code insrc/deeponto/utils/data_utils.py
def uniqify(ls):\n\"\"\"Return a list of unique elements without messing around the order\"\"\"\n non_empty_ls = list(filter(lambda x: x != \"\", ls))\n return list(dict.fromkeys(non_empty_ls))\n
"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.print_dict","title":"print_dict(dic)
","text":"Pretty print a dictionary.
Source code insrc/deeponto/utils/data_utils.py
def print_dict(dic: dict):\n\"\"\"Pretty print a dictionary.\"\"\"\n pretty_print = json.dumps(dic, indent=4, separators=(\",\", \": \"))\n # print(pretty_print)\n return pretty_print\n
"},{"location":"deeponto/utils/decorators/","title":"Decorators","text":""},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.timer","title":"timer(function)
","text":"Print the runtime of the decorated function.
Source code insrc/deeponto/utils/decorators.py
def timer(function):\n\"\"\"Print the runtime of the decorated function.\"\"\"\n\n @wraps(function)\n def wrapper_timer(*args, **kwargs):\n start_time = time.perf_counter() # 1\n value = function(*args, **kwargs)\n end_time = time.perf_counter() # 2\n run_time = end_time - start_time # 3\n print(f\"Finished {function.__name__!r} in {run_time:.4f} secs.\")\n return value\n\n return wrapper_timer\n
"},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.debug","title":"debug(function)
","text":"Print the function signature and return value.
Source code insrc/deeponto/utils/decorators.py
def debug(function):\n\"\"\"Print the function signature and return value.\"\"\"\n\n @wraps(function)\n def wrapper_debug(*args, **kwargs):\n args_repr = [repr(a) for a in args]\n kwargs_repr = [f\"{k}={v!r}\" for k, v in kwargs.items()]\n signature = \", \".join(args_repr + kwargs_repr)\n print(f\"Calling {function.__name__}({signature})\")\n value = function(*args, **kwargs)\n print(f\"{function.__name__!r} returned {value!r}.\")\n return value\n\n return wrapper_debug\n
"},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.paper","title":"paper(title, link)
","text":"Add paper tagger for methods.
Source code insrc/deeponto/utils/decorators.py
def paper(title: str, link: str):\n\"\"\"Add paper tagger for methods.\"\"\"\n # Define a new decorator, named \"decorator\", to return\n def decorator(func):\n # Ensure the decorated function keeps its metadata\n @wraps(func)\n def wrapper(*args, **kwargs):\n # Call the function being decorated and return the result\n return func(*args, **kwargs)\n\n wrapper.paper_title = f'This method is associated with tha paper of title: \"{title}\".'\n wrapper.paper_link = f\"This method is associated with the paper with link: {link}.\"\n return wrapper\n\n # Return the new decorator\n return decorator\n
"},{"location":"deeponto/utils/file_utils/","title":"File Utilities","text":""},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.create_path","title":"create_path(path)
","text":"Create a path recursively.
Source code insrc/deeponto/utils/file_utils.py
def create_path(path: str):\n\"\"\"Create a path recursively.\"\"\"\n Path(path).mkdir(parents=True, exist_ok=True)\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.save_file","title":"save_file(obj, save_path, sort_keys=False)
","text":"Save an object to a certain format.
Source code insrc/deeponto/utils/file_utils.py
def save_file(obj, save_path: str, sort_keys: bool = False):\n\"\"\"Save an object to a certain format.\"\"\"\n if save_path.endswith(\".json\"):\n with open(save_path, \"w\") as output:\n json.dump(obj, output, indent=4, separators=(\",\", \": \"), sort_keys=sort_keys)\n elif save_path.endswith(\".pkl\"):\n with open(save_path, \"wb\") as output:\n pickle.dump(obj, output, -1)\n elif save_path.endswith(\".yaml\"):\n with open(save_path, \"w\") as output:\n yaml.dump(obj, output, default_flow_style=False, allow_unicode=True)\n else:\n raise RuntimeError(f\"Unsupported saving format: {save_path}\")\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.load_file","title":"load_file(save_path)
","text":"Load an object of a certain format.
Source code insrc/deeponto/utils/file_utils.py
def load_file(save_path: str):\n\"\"\"Load an object of a certain format.\"\"\"\n if save_path.endswith(\".json\"):\n with open(save_path, \"r\") as input:\n return json.load(input)\n elif save_path.endswith(\".pkl\"):\n with open(save_path, \"rb\") as input:\n return pickle.load(input)\n elif save_path.endswith(\".yaml\"):\n with open(save_path, \"r\") as input:\n return yaml.safe_load(input)\n else:\n raise RuntimeError(f\"Unsupported loading format: {save_path}\")\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.copy2","title":"copy2(source, destination)
","text":"Copy a file from source to destination.
Source code insrc/deeponto/utils/file_utils.py
def copy2(source: str, destination: str):\n\"\"\"Copy a file from source to destination.\"\"\"\n try:\n shutil.copy2(source, destination)\n print(f\"copied successfully FROM {source} TO {destination}\")\n except shutil.SameFileError:\n print(f\"same file exists at {destination}\")\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_table","title":"read_table(table_file_path)
","text":"Read csv
or tsv
file as pandas dataframe without treating \"NULL\"
, \"null\"
, and \"n/a\"
as an empty string.
src/deeponto/utils/file_utils.py
def read_table(table_file_path: str):\nr\"\"\"Read `csv` or `tsv` file as pandas dataframe without treating `\"NULL\"`, `\"null\"`, and `\"n/a\"` as an empty string.\"\"\"\n # TODO: this might change with the version of pandas\n na_vals = pd.io.parsers.readers.STR_NA_VALUES.difference({\"NULL\", \"null\", \"n/a\"})\n sep = \"\\t\" if table_file_path.endswith(\".tsv\") else \",\"\n return pd.read_csv(table_file_path, sep=sep, na_values=na_vals, keep_default_na=False)\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_jsonl","title":"read_jsonl(file_path)
","text":"Read .jsonl
file (list of json) introduced in the BLINK project.
src/deeponto/utils/file_utils.py
def read_jsonl(file_path: str):\n\"\"\"Read `.jsonl` file (list of json) introduced in the BLINK project.\"\"\"\n results = []\n key_set = []\n with open(file_path, \"r\", encoding=\"utf-8-sig\") as f:\n lines = f.readlines()\n for line in lines:\n record = json.loads(line)\n results.append(record)\n key_set += list(record.keys())\n print(f\"all available keys: {set(key_set)}\")\n return results\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_oaei_mappings","title":"read_oaei_mappings(rdf_file)
","text":"To read mapping files in the OAEI rdf format.
Source code insrc/deeponto/utils/file_utils.py
def read_oaei_mappings(rdf_file: str):\n\"\"\"To read mapping files in the OAEI rdf format.\"\"\"\n xml_root = ET.parse(rdf_file).getroot()\n ref_mappings = [] # where relation is \"=\"\n ignored_mappings = [] # where relation is \"?\"\n\n for elem in xml_root.iter():\n # every Cell contains a mapping of en1 -rel(some value)-> en2\n if \"Cell\" in elem.tag:\n en1, en2, rel, measure = None, None, None, None\n for sub_elem in elem:\n if \"entity1\" in sub_elem.tag:\n en1 = list(sub_elem.attrib.values())[0]\n elif \"entity2\" in sub_elem.tag:\n en2 = list(sub_elem.attrib.values())[0]\n elif \"relation\" in sub_elem.tag:\n rel = sub_elem.text\n elif \"measure\" in sub_elem.tag:\n measure = sub_elem.text\n row = (en1, en2, measure)\n # =: equivalent; > superset of; < subset of.\n if rel == \"=\" or rel == \">\" or rel == \"<\":\n # rel.replace(\">\", \">\").replace(\"<\", \"<\")\n ref_mappings.append(row)\n elif rel == \"?\":\n ignored_mappings.append(row)\n else:\n print(\"Unknown Relation Warning: \", rel)\n\n print('#Maps (\"=\"):', len(ref_mappings))\n print('#Maps (\"?\"):', len(ignored_mappings))\n\n return ref_mappings, ignored_mappings\n
"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.run_jar","title":"run_jar(jar_command, timeout=3600)
","text":"Run jar command using subprocess.
Source code insrc/deeponto/utils/file_utils.py
def run_jar(jar_command: str, timeout=3600):\n\"\"\"Run jar command using subprocess.\"\"\"\n print(f\"Run jar command with timeout: {timeout}s.\")\n proc = subprocess.Popen(jar_command.split(\" \"))\n try:\n _, _ = proc.communicate(timeout=timeout)\n except subprocess.TimeoutExpired:\n warnings.warn(\"kill the jar process as timed out\")\n proc.kill()\n _, _ = proc.communicate()\n
"},{"location":"deeponto/utils/logging/","title":"Logging","text":""},{"location":"deeponto/utils/logging/#deeponto.utils.logging.RuntimeFormatter","title":"RuntimeFormatter(*args, **kwargs)
","text":" Bases: logging.Formatter
Auxiliary class for runtime formatting in the logger.
Source code insrc/deeponto/utils/logging.py
def __init__(self, *args, **kwargs):\n super().__init__(*args, **kwargs)\n self.start_time = time.time()\n
"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.RuntimeFormatter.formatTime","title":"formatTime(record, datefmt=None)
","text":"Record relative runtime in hr:min:sec format\u3002
Source code insrc/deeponto/utils/logging.py
def formatTime(self, record, datefmt=None):\n\"\"\"Record relative runtime in hr:min:sec format\u3002\"\"\"\n duration = datetime.datetime.utcfromtimestamp(record.created - self.start_time)\n elapsed = duration.strftime(\"%H:%M:%S\")\n return \"{}\".format(elapsed)\n
"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.create_logger","title":"create_logger(model_name, saved_path)
","text":"Create logger for both console info and saved info.
The pre-existed log file will be cleared before writing into new messages.
Source code insrc/deeponto/utils/logging.py
def create_logger(model_name: str, saved_path: str):\n\"\"\"Create logger for both console info and saved info.\n\n The pre-existed log file will be cleared before writing into new messages.\n \"\"\"\n logger = logging.getLogger(model_name)\n logger.setLevel(logging.DEBUG)\n # create file handler which logs even debug messages\n fh = logging.FileHandler(f\"{saved_path}/{model_name}.log\", mode=\"w\") # \"w\" means clear the log file before writing\n fh.setLevel(logging.DEBUG)\n # create console handler with a higher log level\n ch = logging.StreamHandler()\n ch.setLevel(logging.INFO)\n # create formatter and add it to the handlers\n formatter = RuntimeFormatter(\"[Time: %(asctime)s] - [PID: %(process)d] - [Model: %(name)s] \\n%(message)s\")\n fh.setFormatter(formatter)\n ch.setFormatter(formatter)\n # add the handlers to the logger\n logger.addHandler(fh)\n logger.addHandler(ch)\n logger.propagate = False\n return logger\n
"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.banner_message","title":"banner_message(message, sym='^')
","text":"Print a banner message surrounded by special symbols.
Source code insrc/deeponto/utils/logging.py
def banner_message(message: str, sym=\"^\"):\n\"\"\"Print a banner message surrounded by special symbols.\"\"\"\n print()\n message = message.upper()\n banner_len = len(message) + 4\n message = \" \" * ((banner_len - len(message)) // 2) + message\n message = message + \" \" * (banner_len - len(message))\n print(message)\n print(sym * banner_len)\n print()\n
"},{"location":"deeponto/utils/text_utils/","title":"Text Utilities","text":""},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer","title":"Tokenizer(tokenizer_type)
","text":"A Tokenizer class for both sub-word (pre-trained) and word (rule-based) level tokenization.
Source code insrc/deeponto/utils/text_utils.py
def __init__(self, tokenizer_type: str):\n self.type = tokenizer_type\n self._tokenizer = None # hidden tokenizer\n self.tokenize = None # the tokenization method\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer.from_pretrained","title":"from_pretrained(pretrained_path='bert-base-uncased')
classmethod
","text":"(Based on transformers) Load a sub-word level tokenizer from pre-trained model.
Source code insrc/deeponto/utils/text_utils.py
@classmethod\ndef from_pretrained(cls, pretrained_path: str = \"bert-base-uncased\"):\n\"\"\"(Based on **transformers**) Load a sub-word level tokenizer from pre-trained model.\"\"\"\n instance = cls(\"pre-trained\")\n instance._tokenizer = AutoTokenizer.from_pretrained(pretrained_path)\n instance.tokenize = instance._tokenizer.tokenize\n return instance\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer.from_rule_based","title":"from_rule_based()
classmethod
","text":"(Based on spacy) Load a word-level (rule-based) tokenizer.
Source code insrc/deeponto/utils/text_utils.py
@classmethod\ndef from_rule_based(cls):\n\"\"\"(Based on **spacy**) Load a word-level (rule-based) tokenizer.\"\"\"\n spacy.prefer_gpu()\n instance = cls(\"rule-based\")\n instance._tokenizer = English()\n instance.tokenize = lambda texts: [word.text for word in instance._tokenizer(texts).doc]\n return instance\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.InvertedIndex","title":"InvertedIndex(index, tokenizer)
","text":"Inverted index built from a text index.
Attributes:
Name Type Descriptiontokenizer
Tokenizer
A tokenizer instance to be used.
original_index
defaultdict
A dictionary where the values are text strings to be tokenized.
constructed_index
defaultdict
A dictionary that acts as the inverted index of original_index
.
src/deeponto/utils/text_utils.py
def __init__(self, index: defaultdict, tokenizer: Tokenizer):\n self.tokenizer = tokenizer\n self.original_index = index\n self.constructed_index = defaultdict(list)\n for k, v in self.original_index.items():\n # value is a list of strings\n for token in self.tokenizer(v):\n self.constructed_index[token].append(k)\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.InvertedIndex.idf_select","title":"idf_select(texts, pool_size=200)
","text":"Given a list of tokens, select a set candidates based on the inverted document frequency (idf) scores.
We use idf
instead of tf
because labels have different lengths and thus tf is not a fair measure.
src/deeponto/utils/text_utils.py
def idf_select(self, texts: Union[str, List[str]], pool_size: int = 200):\n\"\"\"Given a list of tokens, select a set candidates based on the inverted document frequency (idf) scores.\n\n We use `idf` instead of `tf` because labels have different lengths and thus tf is not a fair measure.\n \"\"\"\n candidate_pool = defaultdict(lambda: 0)\n # D := number of \"documents\", i.e., number of \"keys\" in the original index\n D = len(self.original_index)\n for token in self.tokenizer(texts):\n # each token is associated with some classes\n potential_candidates = self.constructed_index[token]\n if not potential_candidates:\n continue\n # We use idf instead of tf because the text for each class is of different length, tf is not a fair measure\n # inverse document frequency: with more classes to have the current token tk, the score decreases\n idf = math.log10(D / len(potential_candidates))\n for candidate in potential_candidates:\n # each candidate class is scored by sum(idf)\n candidate_pool[candidate] += idf\n candidate_pool = list(sorted(candidate_pool.items(), key=lambda item: item[1], reverse=True))\n # print(f\"Select {min(len(candidate_pool), pool_size)} candidates.\")\n # select the first K ranked\n return candidate_pool[:pool_size]\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.process_annotation_literal","title":"process_annotation_literal(annotation_literal, apply_lowercasing=False, normalise_identifiers=False)
","text":"Pre-process an annotation literal string.
Parameters:
Name Type Description Defaultannotation_literal
str
A literal string of an entity's annotation.
requiredapply_lowercasing
bool
A boolean that determines lowercasing or not. Defaults to False
.
False
normalise_identifiers
bool
Whether to normalise annotation text that is in the Java identifier format. Defaults to False
.
False
Returns:
Type Descriptionstr
the processed annotation literal string.
Source code insrc/deeponto/utils/text_utils.py
def process_annotation_literal(\n annotation_literal: str, apply_lowercasing: bool = False, normalise_identifiers: bool = False\n):\n\"\"\"Pre-process an annotation literal string.\n\n Args:\n annotation_literal (str): A literal string of an entity's annotation.\n apply_lowercasing (bool): A boolean that determines lowercasing or not. Defaults to `False`.\n normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format. Defaults to `False`.\n\n Returns:\n (str): the processed annotation literal string.\n \"\"\"\n\n # replace the underscores with spaces\n annotation_literal = annotation_literal.replace(\"_\", \" \")\n\n # if the annotation literal is a valid identifier with first letter capitalised\n # we suspect that it could be a Java style identifier that needs to be split\n if normalise_identifiers and annotation_literal[0].isupper() and annotation_literal.isidentifier():\n annotation_literal = split_java_identifier(annotation_literal)\n\n # lowercase the annotation literal if specfied\n if apply_lowercasing:\n annotation_literal = annotation_literal.lower()\n\n return annotation_literal\n
"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.split_java_identifier","title":"split_java_identifier(java_style_identifier)
","text":"Split words in java's identifier style into natural language phrase.
Examples:
\"SuperNaturalPower\"
\\(\\rightarrow\\) \"Super Natural Power\"
\"APIReference\"
\\(\\rightarrow\\) \"API Reference\"
\"Covid19\"
\\(\\rightarrow\\) \"Covid 19\"
src/deeponto/utils/text_utils.py
def split_java_identifier(java_style_identifier: str):\nr\"\"\"Split words in java's identifier style into natural language phrase.\n\n Examples:\n - `\"SuperNaturalPower\"` $\\rightarrow$ `\"Super Natural Power\"`\n - `\"APIReference\"` $\\rightarrow$ `\"API Reference\"`\n - `\"Covid19\"` $\\rightarrow$ `\"Covid 19\"`\n \"\"\"\n # split at every capital letter or number (numbers are treated as capital letters)\n raw_words = re.findall(\"([0-9A-Z][a-z]*)\", java_style_identifier)\n words = []\n capitalized_word = \"\"\n for i, w in enumerate(raw_words):\n # the above regex pattern will split at capitals\n # so the capitalized words are split into characters\n # i.e., (len(w) == 1)\n if len(w) == 1:\n capitalized_word += w\n # edge case for the last word\n if i == len(raw_words) - 1:\n words.append(capitalized_word)\n\n # if the the current w is a full word, save the previous\n # cached capitalized_word and also save current full word\n elif capitalized_word:\n words.append(capitalized_word)\n words.append(w)\n capitalized_word = \"\"\n\n # just save the current full word otherwise\n else:\n words.append(w)\n\n return \" \".join(words)\n
"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index b4800fc1..15d2ff76 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ