diff --git a/index.html b/index.html
index 66eb23aa..8d54821e 100644
--- a/index.html
+++ b/index.html
@@ -1104,7 +1104,7 @@ <h1>Introduction</h1>
 
 <p><strong>News</strong> <img alt="📰" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/twemoji@14.1.2/assets/svg/1f4f0.svg" title=":newspaper:" /></p>
 <ul class="task-list">
-<li>[] Layout re-organisation and amend taxonomy features; integrate ICON into DeepOnto. (<strong>unreleased</strong>)</li>
+<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled/><span class="task-list-indicator"></span></label> Layout re-organisation and amend taxonomy features; integrate ICON into DeepOnto. (<strong>unreleased</strong>)</li>
 <li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> Deploy <code>deeponto.onto.taxonomy</code>; add the structural reasoner type. (<strong>v0.8.8</strong>)</li>
 <li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> Deploy various new ontology processing functions especially for reasoning and verbalisation; update OAEI utitlities for evaluation. (<strong>v0.8.7</strong>)</li>
 <li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> Minor modifications of certain methods and set all utility methods to direct import. (<strong>v0.8.5</strong>)</li>
diff --git a/search/search_index.json b/search/search_index.json
index aa8e01f5..61dea86a 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"DeepOnto","text":"<p>   A package for ontology engineering with deep learning.  </p> <p>News </p> <ul> <li>[] Layout re-organisation and amend taxonomy features; integrate ICON into DeepOnto. (unreleased)</li> <li> Deploy <code>deeponto.onto.taxonomy</code>; add the structural reasoner type. (v0.8.8)</li> <li> Deploy various new ontology processing functions especially for reasoning and verbalisation; update OAEI utitlities for evaluation. (v0.8.7)</li> <li> Minor modifications of certain methods and set all utility methods to direct import. (v0.8.5)</li> <li> Deploy OAEI utilities at <code>deeponto.align.oaei</code> for scripts at the sub-repository OAEI-Bio-ML as well as bug fixing. (v0.8.4)</li> <li> Bug fixing for BERTMap (stuck at reasoning) and ontology alignment evaluation. (v0.8.3)</li> <li> Deploy <code>deeponto.onto.OntologyNormaliser</code> and <code>deeponto.onto.OntologyProjector</code> (v0.8.0).</li> <li> Upload Java dependencies directly and remove mowl from pip dependencies (v0.7.5).</li> <li> Deploy the <code>deeponto.subs.bertsubs</code> and <code>deeponto.onto.pruning</code> modules (v0.7.0).</li> <li> Deploy the <code>deeponto.probe.ontolama</code> and <code>deeponto.onto.verbalisation</code> modules (v0.6.0). </li> <li> Rebuild the whole package based on the OWLAPI; remove owlready2 from the essential dependencies (from v0.5.x). </li> </ul> <p>Check the complete changelog and FAQs. The FAQs page does not contain much information now but will be updated according to feedback.</p>"},{"location":"#about","title":"About","text":"<p>\\(\\textsf{DeepOnto}\\) aims to provide building blocks for implementing deep learning models, constructing resources, and conducting evaluation for various ontology engineering purposes.</p> <ul> <li>Documentation: https://krr-oxford.github.io/DeepOnto/.</li> <li>Github Repository: https://github.com/KRR-Oxford/DeepOnto. </li> <li>PyPI: https://pypi.org/project/deeponto/. </li> </ul>"},{"location":"#installation","title":"Installation","text":""},{"location":"#owlapi","title":"OWLAPI","text":"<p>\\(\\textsf{DeepOnto}\\) relies on OWLAPI version 4 (written in Java) for ontologies. </p> <p>We follow what has been implemented in mOWL that uses JPype to bridge Python and Java Virtual Machine (JVM). Please check JPype's installation page for successful JVM initialisation.</p>"},{"location":"#pytorch","title":"Pytorch","text":"<p>\\(\\textsf{DeepOnto}\\) relies on Pytorch for deep learning framework.</p> <p>We recommend installing Pytorch prior to installing DeepOnto following the commands listed on the Pytorch webpage. Notice that users can choose either GPU (with CUDA) or CPU version of Pytorch.</p> <p>In case the most recent Pytorch version causes any incompatibility issues, use the following command (with <code>CUDA 11.6</code>) known to work:</p> <pre><code>pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116\n</code></pre> <p>Basic usage of DeepOnto does not rely on GPUs, but for efficient deep learning model training, please make sure <code>torch.cuda.is_available()</code> returns <code>True</code>.</p>"},{"location":"#install-from-pypi","title":"Install from PyPI","text":"<p>Other dependencies are specified in <code>setup.cfg</code> and <code>requirements.txt</code> which are supposed to be installed along with <code>deeponto</code>.</p> <pre><code># requiring Python&gt;=3.8\npip install deeponto\n</code></pre>"},{"location":"#install-from-git-repository","title":"Install from Git Repository","text":"<p>To install the latest, probably unreleased version of deeponto, you can directly install from the repository. </p> <pre><code>pip install git+https://github.com/KRR-Oxford/DeepOnto.git\n</code></pre>"},{"location":"#main-features","title":"Main Features","text":"<p> <p>Figure: Illustration of DeepOnto's architecture.</p> </p>"},{"location":"#ontology-processing","title":"Ontology Processing","text":"<p>The base class of \\(\\textsf{DeepOnto}\\) is <code>Ontology</code>, which serves as the main entry point for introducing the OWLAPI's features, such as accessing ontology entities, querying for ancestor/descendent (and parent/child) concepts, deleting entities, modifying axioms, and retrieving annotations. See quick usage at load an ontology. Along with these basic functionalities, several essential sub-modules are built to enhance the core module, including the following:</p> <ul> <li> <p>Ontology Reasoning (<code>OntologyReasoner</code>): Each instance of \\(\\textsf{DeepOnto}\\) has a reasoner as its attribute. It is used for conducting reasoning activities, such as obtaining inferred subsumers and subsumees, as well as checking entailment and consistency. </p> </li> <li> <p>Ontology Pruning (<code>OntologyPruner</code>): This sub-module aims to incorporate pruning algorithms for extracting a sub-ontology from an input ontology. We currently implement the one proposed in [2], which introduces subsumption axioms between the asserted (atomic or complex) parents and children of the class targeted for removal.</p> </li> <li> <p>Ontology Verbalisation (<code>OntologyVerbaliser</code>): The recursive concept verbaliser proposed in [4] is implemented here, which can automatically transform a complex logical expression into a textual sentence based on entity names or labels available in the ontology. See verbalising ontology concepts.</p> </li> <li> <p>Ontology Projection (<code>OntologyProjector</code>): The projection algorithm adopted in the OWL2Vec* ontology embeddings is implemented here, which is to transform an ontology's TBox into a set of RDF triples. The relevant code is modified from the mOWL library.</p> </li> <li> <p>Ontology Normalisation (<code>OntologyNormaliser</code>): The implemented \\(\\mathcal{EL}\\) normalisation is also modified from the mOWL library, which is used to transform TBox axioms into normalised forms to support, e.g., geometric ontology embeddings.</p> </li> <li> <p>Ontology Taxonomy (<code>OntologyTaxonomy</code>): The taxonomy extracted from an ontology is a directed acyclic graph for the subsumption hierarchy, which is often used to support graph-based deep learning applications.</p> </li> </ul>"},{"location":"#tools-and-resources","title":"Tools and Resources","text":"<p>Individual tools and resources are implemented based on the core ontology processing module. Currently, \\(\\textsf{DeepOnto}\\) supports the following:</p> <ul> <li> <p>BERTMap [1] is a BERT-based ontology matching (OM) system originally developed in repo but is now maintained in \\(\\textsf{DeepOnto}\\). See Ontology Matching with BERTMap &amp; BERTMapLt.</p> </li> <li> <p>Bio-ML [2] is an OM resource that has been used in the Bio-ML track of the OAEI. See Bio-ML: A Comprehensive Documentation. </p> </li> <li> <p>BERTSubs [3] is a system for ontology subsumption prediction. We have transformed its original experimental code into this project. See Subsumption Inference with BERTSubs.</p> </li> <li> <p>OntoLAMA [4] is an evaluation of language model probing datasets for ontology subsumption inference. See OntoLAMA: Dataset Overview &amp; Usage Guide for the use of the datasets and the prompt-based probing approach.</p> </li> </ul>"},{"location":"#license","title":"License","text":"<p>License</p> <p>Copyright 2021-2023 Yuan He. Copyright 2023 Yuan He, Jiaoyan Chen. All rights reserved.</p> <p>Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0</p> <p>Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.</p>"},{"location":"#citation","title":"Citation","text":"<p>The preprint of our system paper for \\(\\textsf{DeepOnto}\\) is currently available at arxiv.</p> <p>Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, and Brahmananda Sapkota. DeepOnto: A Python Package for Ontology Engineering with Deep Learning. arXiv preprint arXiv:2307.03067 (2023).</p> <pre><code>@article{he2023deeponto,\n  title={DeepOnto: A Python Package for Ontology Engineering with Deep Learning},\n  author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian and Allocca, Carlo and Kim, Taehun and Sapkota, Brahmananda},\n  journal={arXiv preprint arXiv:2307.03067},\n  year={2023}\n}\n</code></pre>"},{"location":"#relevant-publications","title":"Relevant Publications","text":"<ul> <li>[1] Yuan He\u201a Jiaoyan Chen\u201a Denvar Antonyrajah and Ian Horrocks. BERTMap: A BERT\u2212Based Ontology Alignment System. In Proceedings of 36th AAAI Conference on Artificial Intelligence (AAAI-2022). /arxiv/ /aaai/  </li> <li>[2] Yuan He\u201a Jiaoyan Chen\u201a Hang Dong, Ernesto Jim\u00e9nez-Ruiz, Ali Hadian and Ian Horrocks. Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching. The 21st International Semantic Web Conference (ISWC-2022, Best Resource Paper Candidate). /arxiv/ /iswc/  </li> <li>[3] Jiaoyan Chen, Yuan He, Yuxia Geng, Ernesto Jim\u00e9nez-Ruiz, Hang Dong and Ian Horrocks. Contextual Semantic Embeddings for Ontology Subsumption Prediction. World Wide Web Journal \uff08WWWJ-2023). /arxiv/ /wwwj/  </li> <li>[4] Yuan He\u201a Jiaoyan Chen, Ernesto Jim\u00e9nez-Ruiz, Hang Dong and Ian Horrocks. Language Model Analysis for Ontology Subsumption Inference. Findings of the Association for Computational Linguistics (ACL-2023). /arxiv/ /acl/ </li> <li>[5] Yuan He, Jiaoyan Chen, Hang Dong, and Ian Horrocks. Exploring Large Language Models for Ontology Alignment. ISWC 2023 Posters and Demos: 22nd International Semantic Web Conference. /arxiv/ /iswc/ </li> </ul> <p>Please report any bugs or queries by raising a GitHub issue or sending emails to the maintainers (Yuan He or Jiaoyan Chen) through:</p> <p>first_name.last_name@cs.ox.ac.uk</p>"},{"location":"bertmap/","title":"Ontology Matching with BERTMap and BERTMapLt","text":"<p>Paper</p> <p>Paper for BERTMap: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).</p> <pre><code>@inproceedings{he2022bertmap,\n  title={BERTMap: a BERT-based ontology alignment system},\n  author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},\n  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n  volume={36},\n  number={5},\n  pages={5684--5691},\n  year={2022}\n}\n</code></pre> <p>This page gives the tutorial for \\(\\textsf{BERTMap}\\) family including the summary of the models and how to use them.</p> <p> <p>Figure 1. Pipeline illustration of BERTMap.</p> </p> <p> The ontology matching (OM) pipeline of \\(\\textsf{BERTMap}\\) consists of following steps:</p> <ol> <li>Load the source and target ontologies and build annotation indices from them based on selected annotation properties.</li> <li>Construct the text semantics corpora including intra-ontology (from input ontologies), cross-ontology (optional, from input mappings), and auxiliary (optional, from auxiliary ontologies) sub-corpora. </li> <li>Split samples in the form of <code>(src_annotation, tgt_annotation, synonym_label)</code> into training and validation sets.</li> <li>Fine-tune a BERT synonym classifier on the samples and obtain the best checkpoint on the validation split.</li> <li> <p>Predict mappings for each class \\(c\\) of the source ontology \\(\\mathcal{O}\\) by:</p> <ul> <li>Selecting plausible candidates \\(c'\\)s in the target ontology \\(\\mathcal{O'}\\) based on idf scores w.r.t. the sub-word inverted index built from the target ontology annotation index. For \\(c\\) and a candidate \\(c'\\), first check if they can be string-matched (i.e., share a common annotation, or equivalently the maximum edit similarity score is \\(1.0\\)); if not, consider all combinations (cartesian product) of their respective class annotations, compute a synonym score for each combination, and take the average of synonym scores as the mapping score.</li> <li>\\(N\\) best scored mappings (no filtering) will be preserved as raw predictions which should have relatively higher recall and lower precision.</li> </ul> </li> <li> <p>Extend the raw predictions using an iterative algorithm based on the locality principle. To be specific, if \\(c\\) and \\(c'\\) are matched with a relatively high mapping score (\\(\\geq \\kappa\\)), then search for plausible mappings between the parents (resp. children) of \\(c\\) and the parents (resp. children) of \\(c'\\). This process is iterative because there would be new highly scored mappings at each round. Terminate mapping extension when there is no new mapping with score \\(\\geq \\kappa\\) found or it exceeds the maximum number of iterations. Note that \\(\\kappa\\) is set to \\(0.9\\) by default, as in the original paper.</p> </li> <li> <p>Truncate the extended mappings by preserving only those with scores \\(\\geq \\lambda\\). In the original paper, \\(\\lambda\\) is supposed to be tuned on validation mappings \u2013 which are often not available. Also, \\(\\lambda\\) is not a sensitive hyperparameter in practice. Therefore, we manually set \\(\\lambda\\) to \\(0.9995\\) as a default value which usually yields a higher F1 score. Note that both \\(\\kappa\\) and \\(\\lambda\\) are made available in the configuration file.</p> </li> <li> <p>Repair the rest of the mappings with the repair module built in LogMap (BERTMap does not focus on mapping repair). In short, a minimum set of inconsistent mappings will be removed (further improve precision).</p> </li> </ol> <p>Steps 5-8 are referred to as the global matching process which computes OM mappings from two input ontologies. \\(\\textsf{BERTMapLt}\\) is the light-weight version without BERT training and mapping refinement. The mapping filtering threshold for \\(\\textsf{BERTMapLt}\\) is \\(1.0\\) (i.e., string-matched). </p> <p>In addition to the traditional OM procedure, the scoring modules of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) can be used to evaluate any class pair given their selected annotations. This is useful in ranking-based evaluation. </p> <p>Warning</p> <p>The \\(\\textsf{BERTMap}\\) family rely on sufficient class annotations for constructing training corpora of the BERT synonym classifier, especially under the unsupervised setting where there are no input mappings and/or external resources. It is very important to specify correct annotation properties in the configuration file.</p>"},{"location":"bertmap/#usage","title":"Usage","text":"<p>To use \\(\\textsf{BERTMap}\\), a configuration file and two input ontologies to be matched should be imported.</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.bertmap import BERTMapPipeline\n\nconfig_file = \"path_to_config.yaml\"\nsrc_onto_file = \"path_to_the_source_ontology.owl\"  \ntgt_onto_file = \"path_to_the_target_ontology.owl\" \n\nconfig = BERTMapPipeline.load_bertmap_config(config_file)\nsrc_onto = Ontology(src_onto_file)\ntgt_onto = Ontology(tgt_onto_file)\n\nBERTMapPipeline(src_onto, tgt_onto, config)\n</code></pre> <p>The default configuration file can be loaded as:</p> <pre><code>from deeponto.align.bertmap import BERTMapPipeline, DEFAULT_CONFIG_FILE\n\nconfig = BERTMapPipeline.load_bertmap_config(DEFAULT_CONFIG_FILE)\n</code></pre> <p>The loaded configuration is a <code>CfgNode</code> object supporting attribute access of dictionary keys.  To customise the configuration, users can either copy the <code>DEFAULT_CONFIG_FILE</code>, save it locally using <code>BERTMapPipeline.save_bertmap_config</code> method, and modify it accordingly; or change it in the run time.</p> <pre><code>from deeponto.align.bertmap import BERTMapPipeline, DEFAULT_CONFIG_FILE\n\nconfig = BERTMapPipeline.load_bertmap_config(DEFAULT_CONFIG_FILE)\n\n# save the configuration file\nBERTMapPipeline.save_bertmap_config(config, \"path_to_saved_config.yaml\")\n\n# modify it in the run time\n# for example, add more annotation properties for synonyms\nconfig.annotation_property_iris.append(\"http://...\") \n</code></pre> <p>If using \\(\\textsf{BERTMap}\\) for scoring class pairs instead of global matching, disable automatic global matching and load class pairs to be scored.</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.bertmap import BERTMapPipeline\n\nconfig_file = \"path_to_config.yaml\"\nsrc_onto_file = \"path_to_the_source_ontology.owl\"  \ntgt_onto_file = \"path_to_the_target_ontology.owl\" \n\nconfig = BERTMapPipeline.load_bertmap_config(config_file)\nconfig.global_matching.enabled = False\nsrc_onto = Ontology(src_onto_file)\ntgt_onto = Ontology(tgt_onto_file)\n\nbertmap = BERTMapPipeline(src_onto, tgt_onto, config)\n\nclass_pairs_to_be_scored = [...]  # (src_class_iri, tgt_class_iri)\nfor src_class_iri, tgt_class_iri in class_pairs_to_be_scored:\n    # retrieve class annotations\n    src_class_annotations = bertmap.src_annotation_index[src_class_iri]\n    tgt_class_annotations = bertmap.tgt_annotation_index[tgt_class_iri]\n    # the bertmap score\n    bertmap_score = bertmap.mapping_predictor.bert_mapping_score(\n        src_class_annotations, tgt_class_annotations\n    )\n    # the bertmaplt score\n    bertmaplt_score = bertmap.mapping_predictor.edit_similarity_mapping_score(\n        src_class_annotations, tgt_class_annotations\n    )\n    ...\n</code></pre> <p>Tip</p> <p>The implemented \\(\\textsf{BERTMap}\\) by default searches for each source ontology class a set of possible matched target ontology classes. Because of this, it is recommended to set the source ontology as the one with a smaller number of classes for efficiency.</p> <p>Note that in the original paper, the model is expected to match for both directions <code>src2tgt</code> and <code>tgt2src</code>, and also consider the combination of both results. However, this does not usually bring better performance and consumes significantly more time. Therefore, this feature is discarded and the users can choose which direction to match.</p> <p>Warning</p> <p>Occasionally, the fine-tuning loss may not be converging and the validation accuracy is not improving; in that case, set to a different random seed can usually fix the problem. </p>"},{"location":"bertmap/#configuration","title":"Configuration","text":"<p>The default configuration file looks like:</p> <pre><code>model: bertmap  # bertmap or bertmaplt\n\noutput_path: null  # if not provided, the current path \".\" is used\n\nannotation_property_iris:\n- http://www.w3.org/2000/01/rdf-schema#label  # rdfs:label\n- http://www.geneontology.org/formats/oboInOwl#hasSynonym\n- http://www.geneontology.org/formats/oboInOwl#hasExactSynonym\n- http://www.w3.org/2004/02/skos/core#exactMatch\n- http://www.ebi.ac.uk/efo/alternative_term\n- http://www.orpha.net/ORDO/Orphanet_#symbol\n- http://purl.org/sig/ont/fma/synonym\n- http://www.w3.org/2004/02/skos/core#prefLabel\n- http://www.w3.org/2004/02/skos/core#altLabel\n- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P108\n- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P90\n\n# additional corpora \nknown_mappings: null  # cross-ontology corpus\nauxiliary_ontos: [] # auxiliary corpus\n\n# bert config\nbert:  pretrained_path: emilyalsentzer/Bio_ClinicalBERT  max_length_for_input: 128 num_epochs_for_training: 3.0\nbatch_size_for_training: 32\nbatch_size_for_prediction: 128\nresume_training: null\n\n# global matching config\nglobal_matching:\nenabled: true\nnum_raw_candidates: 200 num_best_predictions: 10 mapping_extension_threshold: 0.9   mapping_filtered_threshold: 0.9995 for_oaei: false\n</code></pre>"},{"location":"bertmap/#bertmap-or-bertmaplt","title":"BERTMap or BERTMapLt","text":"<code>config.model</code> By changing this parameter to <code>bertmap</code> or <code>bertmaplt</code>, users can switch between  \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\). Note that \\(\\textsf{BERTMapLt}\\) does not use any training and mapping refinement parameters."},{"location":"bertmap/#annotation-properties","title":"Annotation Properties","text":"<code>config.annotation_property_iris</code> The IRIs stored in this parameter refer to annotation properties with literal values that define the synonyms of an ontology class. Many ontology matching systems rely on synonyms for good performance, including the \\(\\textsf{BERTMap}\\) family. The default <code>config.annotation_property_iris</code> are in line with the Bio-ML dataset, which will be constantly updated. Users can append or delete IRIs for specific input ontologies. <p>Note that it is safe to specify all possible annotation properties regardless of input ontologies because the ones that are not used will be ignored.</p>"},{"location":"bertmap/#additional-training-data","title":"Additional Training Data","text":"<p>The text semantics corpora by default (unsupervised setting) will consist of two intra-ontology sub-corpora built from two input ontologies (based on the specified annotation properties). To add more training data, users can opt to feed input mappings (cross-ontology sub-corpus) and/or a list of auxiliary ontologies (auxiliary sub-corpora). </p> <code>config.known_mappings</code> Specify the path to input mapping file here; the input mapping file should be a <code>.tsv</code> or <code>.csv</code> file with three columns with headings: <code>[\"SrcEntity\", \"TgtEntity\", \"Score\"]</code>. Each row corresponds to a triple \\((c, c', s(c, c'))\\) where \\(c\\) is a source ontology class, \\(c'\\) is a target ontology class, and \\(s(c, c')\\) is the matching score. Note that in the BERTMap context, input mapppings are assumed to be gold standard (reference) mappings with scores equal to \\(1.0\\). Regardless of scores specified in the mapping file, the scores of the input mapppings will be adjusted to \\(1.0\\) automatically. <code>config.auxiliary_ontos</code> Specify a list of paths to auxiliary ontology files here. For each auxiliary ontology, a corresponding intra-ontology corpus will be created and thus produce more synonym and non-synonym samples."},{"location":"bertmap/#bert-settings","title":"BERT Settings","text":"<code>config.bert.pretrained_path</code> \\(\\textsf{BERTMap}\\) uses the pre-trained Bio-Clincal BERT as specified in this parameter because it was originally applied on biomedical ontologies. For general purpose ontology matching, users can use pre-trained variants such as <code>bert-base-uncased</code>. <code>config.bert.batch_size_for_training</code> Batch size for BERT fine-tuning. <code>config.bert.batch_size_for_prediction</code> Batch size for BERT validation and mapping prediction. <p>Adjust these two parameters if users found an inappropriate GPU memory fit. </p> <code>config.bert.resume_training</code> Set to <code>true</code> if the BERT training process is somehow interrupted and users wish to continue training."},{"location":"bertmap/#global-matching-settings","title":"Global Matching Settings","text":"<code>config.global_matching.enabled</code> As mentioned in usage, users can disable automatic global matching by setting this parameter to <code>false</code> if they wish to use the mapping scoring module only.  <code>config.global_matching.num_raw_candidates</code> Set the number of raw candidates selected in the mapping prediction phase.  <code>config.global_matching.num_best_predictions</code> Set the number of best scored mappings preserved in the mapping prediction phase. The default value <code>10</code> is often more than enough. <code>config.global_matching.mapping_extension_threshold</code> Set the score threshold of mappings used in the iterative mapping extension process. Higher value shortens the time but reduces the recall.  <code>config.global_matching.mapping_filtered_threshold</code> The score threshold of mappings preserved for final mapping refinement.  <code>config.global_matching.for_oaei</code> Set to <code>false</code> for normal use and set to <code>true</code> for the OAEI 2023 Bio-ML Track such that entities that are annotated as not used in alignment will be ignored during global matching."},{"location":"bertmap/#output-format","title":"Output Format","text":"<p>Running \\(\\textsf{BERTMap}\\) will create a directory named <code>bertmap</code> or <code>bertmaplt</code> in the specified output path. The file structure of this directory is as follows:</p> <pre><code>bertmap\n\u251c\u2500\u2500 data\n\u2502   \u251c\u2500\u2500 fine-tune.data.json\n\u2502   \u2514\u2500\u2500 text-semantics.corpora.json\n\u251c\u2500\u2500 bert\n\u2502   \u251c\u2500\u2500 tensorboard\n\u2502   \u251c\u2500\u2500 checkpoint-{some_number}\n\u2502   \u2514\u2500\u2500 checkpoint-{some_number}\n\u251c\u2500\u2500 match\n\u2502   \u251c\u2500\u2500 logmap-repair\n\u2502   \u251c\u2500\u2500 raw_mappings.json\n\u2502   \u251c\u2500\u2500 repaired_mappings.tsv \n\u2502   \u251c\u2500\u2500 raw_mappings.tsv\n\u2502   \u251c\u2500\u2500 extended_mappings.tsv\n\u2502   \u2514\u2500\u2500 filtered_mappings.tsv\n\u251c\u2500\u2500 bertmap.log\n\u2514\u2500\u2500 config.yaml\n</code></pre> <p>It is worth mentioning that the <code>match</code> sub-directory contains all the global matching files:</p> <code>raw_mappings.tsv</code> The raw mapping predictions before mapping refinement. The <code>.json</code> one is used internally to prevent accidental interruption. Note that <code>bertmaplt</code> only produces raw mapping predictions (no mapping refinement). <code>extended_mappings.tsv</code> The output mappings after applying mapping extension.  <code>filtered_mappings.tsv</code> The output mappings after mapping extension and threshold filtering.  <code>logmap-repair</code> A folder containing intermediate files needed for applying LogMap's debugger. <code>repaired_mappings.tsv</code> The final output mappings after mapping repair."},{"location":"bertsubs/","title":"Subsumption Prediction with BERTSubs","text":"<p>Paper</p> <p>Paper for BERTSubs: Contextual Semantic Embeddings for Ontology Subsumption Prediction (accepted by WWW Journal in 2023).</p> <pre><code>@article{chen2023contextual,\n  title={Contextual semantic embeddings for ontology subsumption prediction},\n  author={Chen, Jiaoyan and He, Yuan and Geng, Yuxia and Jim{\\'e}nez-Ruiz, Ernesto and Dong, Hang and Horrocks, Ian},\n  journal={World Wide Web},\n  pages={1--23},\n  year={2023},\n  publisher={Springer}\n}\n</code></pre> <p>This page gives the tutorial for \\(\\textsf{BERTSubs}\\) including the functions, the summary of the models and usage instructions.</p> <p> The current version of \\(\\textsf{BERTSubs}\\) is able to predict:</p> <ol> <li>named subsumptions between two named classes, or complex subsumptions between one named class and one complex class, within an ontology,</li> <li>named subsumption between two named classes, or complex subsumptions between one named class and one complex class, across two ontologies (note the former corresponds to subsumption mapping).  </li> </ol> <p> <p>Figure 1. Pipeline illustration of BERTSubs.</p> </p> <p> The pipeline of \\(\\textsf{BERTSubs}\\) consists of following steps.</p> <ol> <li> <p>Corpus Construction: extracting a set of sentence pairs from positive and negative subsumptions from the target ontology (or ontologies), with one of the following three templates used for transforming each class into a sentence,</p> <ul> <li>Isolated Class, which just uses the names of the input class,</li> <li>Path Context, which uses the names of the upper (resp. down) path starting from the input superclass (resp. subclass),</li> <li>Breadth-first Class Context, which uses the names of the input class's surrounding classes.</li> </ul> </li> <li> <p>Model Fine-tuning: fine-tuning a language model such as BERT with the above sentence pairs.</p> </li> <li>Prediction: using the fine-tuned language model to predict the subsumption scores of the given candidate class pairs.</li> </ol> <p>Note that the optionally given subsumptions via a train subsumption file can also be used for fine-tuning.  Please see more technical details in the paper.</p>"},{"location":"bertsubs/#evaluation-case-and-dataset-ontology-completion","title":"Evaluation Case and Dataset (Ontology Completion)","text":"<p>The evaluation is implemented scripts/bertsubs_intra_evaluate.py. Download an ontology (e.g., FoodOn) and run: <pre><code>python bertsubs_intra_evaluate.py --onto_file ./foodon-merged.0.4.8.owl\n</code></pre></p> <p>The parameter --subsumption_type can be set to 'restriction' for complex class subsumptions, and 'named_class' for named class subsumptions. Please see the programme for more parameters and their meanings.</p> <p>It executes the following procedure:</p> <ol> <li> <p>The named class or complex class subsumption axioms of an ontology is partitioned into a train set, a valid set and a test set. They are saved as train, valid and test files, respectively.</p> </li> <li> <p>The test and the valid subsumption axioms are removed from the original ontology, and a new ontology is saved.</p> </li> </ol> <p>Notice: for a named class test/valid subsumption, a set of negative candidate super classes are extracted from the ground truth super class's neighbourhood. For a complex class test/valid subsumption, a set of negative candidate super classes are randomly extracted from all the complex classes in the ontology.</p>"},{"location":"bertsubs/#usage","title":"Usage","text":"<p>To run \\(\\textsf{BERTSubs}\\), a configuration file and one input ontology (or two ontologies) are mandatory. If candidate class pairs are given, a fine-tuned language model and a file with predicted scores of the candidate class pairs in the test file are output; otherwise, only the fine-grained language model is output. The test metrics (MRR and Hits@K) can also be output if the ground truth and a set of negative candidate super classes are given for the subclass of each valid/test subsumption. </p> <ol> <li> <p>The following code is for intra-ontology subsumption. <pre><code>from yacs.config import CfgNode\nfrom deeponto.complete.bertsubs import BERTSubsIntraPipeline, DEFAULT_CONFIG_FILE_INTRA\nfrom deeponto.utils import load_file\nfrom deeponto.onto import Ontology\n\nconfig = CfgNode(load_file(DEFAULT_CONFIG_FILE_INTRA)) # Load default configuration file\nconfig.onto_file = './foodon.owl'\nconfig.train_subsumption_file = './train_subsumptions.csv' # optional\nconfig.valid_subsumption_file = './valid_subsumptions.csv' # optional\nconfig.test_subsumption_file = './test_subsumptions.csv' #optional\nconfig.test_type = 'evaluation' #'evaluation': calculate metrics with ground truths given in the test_subsumption_file; 'prediction': predict scores for candidate subsumptions given in test_submission_file\nconfig.subsumption_type = 'named_class'  # 'named_class' or 'restriction' \nconfig.prompt.prompt_type = 'isolated'  # 'isolated', 'traversal', 'path' (three templates)\n\nonto = Ontology(owl_path=config.onto_file)\nintra_pipeline = BERTSubsIntraPipeline(onto=onto, config=config)\n</code></pre></p> </li> <li> <p>The following code is for inter-ontology subsumption. <pre><code>from yacs.config import CfgNode\nfrom deeponto.complete.bertsubs import BERTSubsInterPipeline, DEFAULT_CONFIG_FILE_INTER\nfrom deeponto.utils import load_file\nfrom deeponto.onto import Ontology\n\nconfig = CfgNode(load_file(DEFAULT_CONFIG_FILE_INTER)) # Load default configuration file\nconfig.src_onto_file = './helis2foodon/helis_v1.00.owl'\nconfig.tgt_onto_file = './helis2foodon/foodon-merged.0.4.8.subs.owl'\nconfig.train_subsumption_file = './helis2foodon/train_subsumptions.csv' # optional\nconfig.valid_subsumption_file = './helis2foodon/valid_subsumptions.csv' # optional\nconfig.test_subsumption_file = './helis2foodon/test_subsumptions.csv' # optional\nconfig.test_type = 'evaluation' # 'evaluation', or 'prediction'\nconfig.subsumption_type = 'named_class'  # 'named_class', or 'restriction'\nconfig.prompt.prompt_type = 'path'   # 'isolated', 'traversal', 'path' (three templates)\n\nsrc_onto = Ontology(owl_path=config.src_onto_file)\ntgt_onto = Ontology(owl_path=config.tgt_onto_file)\ninter_pipeline = BERTSubsInterPipeline(src_onto=src_onto, tgt_onto=tgt_onto, config=config)\n</code></pre></p> </li> </ol> <p>For more details on the configuration, please see the comment in the default configuration files  default_config_intra.yaml and default_config_inter.yaml.</p>"},{"location":"bio-ml/","title":"Bio-ML: A Comprehensive Documentation","text":"<p>paper</p> <p>Paper for Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022). It was nominated as the best resource paper candidate at ISWC 2022.</p> <pre><code>@inproceedings{he2022machine,\n  title={Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching},\n  author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian},\n  booktitle={The Semantic Web--ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23--27, 2022, Proceedings},\n  pages={575--591},\n  year={2022},\n  organization={Springer}\n}\n</code></pre>"},{"location":"bio-ml/#overview","title":"Overview","text":"<p>\\(\\textsf{Bio-ML}\\) is a comprehensive ontology matching (OM) dataset that includes five ontology pairs for both equivalence and subsumption ontology matching. Two of these pairs are based on the Mondo ontology, and the remaining three are based on the UMLS ontology. The construction of these datasets encompasses several steps:</p> <ul> <li>Ontology Preprocessing: This phase involves verifying the integrity of the ontology and eliminating deprecated or superfluous classes.</li> <li>Ontology Pruning: In this stage, a sub-ontology is obtained in accordance with a list of preserved class IRIs. For Mondo ontologies, class preservation is based on reference mappings, while for UMLS ontologies, it relies on semantic types (see Ontology Pruning).</li> <li>Subsumption Mapping Construction: Reference subsumption mappings are built from reference equivalence mappings, subject to target class deletion. To clarify, if an equivalence mapping is utilised for constructing a subsumption mapping, its corresponding target ontology class will be discarded to enforce direct subsumption matching (see Subsumption Mapping Construction).</li> <li>Candidate Mapping Generation: For the purpose of evaluating an Ontology Matching (OM) system using ranking-based metrics, we generate a list of negative candidate mappings for each reference mapping by employing various heuristics (see Candidate Mapping Generation).</li> <li>Locality Module Enrichment (NEW ): Newly introduced in the OAEI 2023 version, the pruned ontologies are enriched with classes that serve as context (annotated as not used in alignment) for existing classes, leveraging the locality module technique (access the code). OM systems can use these supplemental classes as auxiliary information while excluding them from the alignment process. These additional classes will also be omitted from the final evaluation. </li> <li>Bio-LLM: A Special Sub-Track for Large Language Models (NEW ): Another addition to the OAEI 2023 version, we introduced a unique sub-track for Large Language Model (LLM)-based OM systems. This is achieved by extracting small but challenging subsets from the NCIT-DOID and SNOMED-FMA (Body) datasets (see OAEI Bio-LLM 2023).</li> </ul>"},{"location":"bio-ml/#important-links","title":"Important Links","text":"<ul> <li> <p>Dataset Download (License: CC BY 4.0 International):</p> <ul> <li>OAEI 2022: https://doi.org/10.5281/zenodo.6946466 (see OAEI Bio-ML 2022 for detailed description).</li> <li>OAEI 2023: https://doi.org/10.5281/zenodo.8193375 (see OAEI Bio-ML 2023 for detailed description).</li> </ul> </li> <li> <p>Complete Documentation: https://krr-oxford.github.io/DeepOnto/bio-ml/ (this page).</p> </li> <li>Reference Paper: https://arxiv.org/abs/2205.03447 (revised arXiv version).</li> <li>Official OAEI Page: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/index.html (OAEI participation and results).</li> </ul>"},{"location":"bio-ml/#ontology-pruning","title":"Ontology Pruning","text":"<p>In order to derive scalable Ontology Matching (OM) pairs, the ontology pruning algorithm propoased in the \\(\\textsf{Bio-ML}\\) paper can be utilised. This algorithm is designed to trim a large-scale ontology based on certain criteria, such as involvement in a reference mapping or association with a particular semantic type (see UMLS data scripts). The primary goal of the pruning function is to discard irrelevant ontology classes whilst preserving the relevant hierarchical structure. </p> <p>More specifically, for each class, denoted as \\(c\\), that needs to be removed, subsumption axioms are created between the parent and child elements of \\(c\\). This step is followed by the removal of all axioms related to the unwanted classes.</p> <p>Once a list of class IRIs to be removed has been compiled, the ontology pruning can be executed using the following code:</p> <pre><code>from deeponto.onto import Ontology, OntologyPruner\n\n# Load the DOID ontology\ndoid = Ontology(\"doid.owl\")\n\n# Initialise the ontology pruner\npruner = OntologyPruner(doid)\n\n# Specify the classes to be removed\nto_be_removed_class_iris = [\n    \"http://purl.obolibrary.org/obo/DOID_0060158\",\n    \"http://purl.obolibrary.org/obo/DOID_9969\"\n]\n\n# Perform the pruning operation\npruner.prune(to_be_removed_class_iris)\n\n# Save the pruned ontology locally\npruner.save_onto(\"doid.pruned.owl\")  \n</code></pre>"},{"location":"bio-ml/#subsumption-mapping-construction","title":"Subsumption Mapping Construction","text":"<p>Ontology Matching (OM) datasets often include equivalence matching, but not subsumption matching. However, it is feasible to create a subsumption matching task from an equivalence matching task. Given a list of reference equivalence mappings, which take the form of \\({(c, c') | c \\equiv c' }\\), one can construct reference subsumption mappings by identifying the subsumers of \\(c'\\) and producing \\({(c, c'') | c \\equiv c', c' \\sqsubseteq c'' }\\). We have developed a subsumption mapping generator for this purpose.</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.mapping import SubsFromEquivMappingGenerator, ReferenceMapping\n\n# Load the NCIT and DOID ontologies\nncit = Ontology(\"ncit.owl\")\ndoid = Ontology(\"doid.owl\")\n\n# Load the equivalence mappings\nncit2doid_equiv_mappings = ReferenceMapping.read_table_mappings(\"ncit2doid_equiv_mappings.tsv\")  # The headings are [\"SrcEntity\", \"TgtEntity\", \"Score\"]\n\n# Initialise the subsumption mapping generator \n# and the mapping construction is automatically done\nsubs_generator = SubsFromEquivMappingGenerator(\n  ncit, doid, ncit2doid_equiv_mappings, \n  subs_generation_ratio=1, delete_used_equiv_tgt_class=True\n)\n</code></pre> <code>Output:</code> <pre><code>3299/4686 are used for creating at least one subsumption mapping.\n3305 subsumption mappings are created in the end.\n</code></pre> <p>Retrieve the generated subsumption mappings with:</p> <pre><code>subs_generator.subs_from_equivs\n</code></pre> <code>Output:</code> <pre><code>[('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C9311',\n  'http://purl.obolibrary.org/obo/DOID_120',\n  1.0),\n ('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C8410',\n  'http://purl.obolibrary.org/obo/DOID_1612',\n  1.0), ...]\n</code></pre> <p>See a concrete data script for this process at <code>OAEI-Bio-ML/data_scripts/generate_subs_maps.py</code>.</p> <p>The <code>subs_generation_ratio</code> parameter determines at most how many subsumption mappings can be generated from an equivalence mapping. The <code>delete_used_equiv_tgt_class</code> determines whether or not to sabotage equivalence mappings used for creating at least one subsumption mappings. If it is set to <code>True</code>, then the target side of an (used) equivalence mapping will be marked as deleted from the target ontology. Then, apply ontology pruning to the list of to-be-deleted target ontology classes:</p> <pre><code>from deeponto.onto import OntologyPruner\n\npruner = OntologyPruner(doid)\npruner.prune(subs_generator.used_equiv_tgt_class_iris)\npruner.save_onto(\"doid.subs.owl\")\n</code></pre> <p>See a concrete data script for this process at <code>OAEI-Bio-ML/data_scripts/generate_cand_maps.py</code>.</p> <p>Note</p> <p>In the OAEI 2023 version, the target class deletion is disabled as modularisation counteracts the effects of such deletion. For more details, refer to OAEI Bio-ML 2023.</p>"},{"location":"bio-ml/#candidate-mapping-generation","title":"Candidate Mapping Generation","text":"<p>To evaluate an Ontology Matching (OM) model's capacity to identify correct mappings amid a pool of challenging negative candidates, we utilise the negative candidate mapping generation algorithm as proposed in the Bio-ML paper. This algorithm uses <code>idf_sample</code> to generate candidates that are textually ambiguous (i.e., with similar naming), and <code>neighbour_sample</code> to generate candidates that are structurally ambiguous (e.g., siblings). The algorithm ensures that none of the reference mappings are added as negative candidates. Additionally, for subsumption cases, the algorithm carefully excludes ancestors as they are technically correct subsumptions.</p> <p>Use the following Python code to perform this operation:</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.mapping import NegativeCandidateMappingGenerator, ReferenceMapping\nfrom deeponto.align.bertmap import BERTMapPipeline\n\n# Load the NCIT and DOID ontologies\nncit = Ontology(\"ncit.owl\")\ndoid = Ontology(\"doid.owl\")\n\n# Load the equivalence mappings\nncit2doid_equiv_mappings = ReferenceMapping.read_table_mappings(\"ncit2doid_equiv_mappings.tsv\")  # The headings are [\"SrcEntity\", \"TgtEntity\", \"Score\"]\n\n# Load default config in BERTMap\nconfig = BERTMapPipeline.load_bertmap_config()\n\n# Initialise the candidate mapping generator\ncand_generator = NegativeCandidateMappingGenerator(\n  ncit, doid, ncit2doid_equiv_mappings, \n  annotation_property_iris = config.annotation_property_iris,  # Used for idf sample\n  tokenizer=Tokenizer.from_pretrained(config.bert.pretrained_path),  # Used for idf sample\n  max_hops=5, # Used for neighbour sample\n  for_subsumptions=False,  # Set to False because the input mappings in this example are equivalence mappings\n)\n\n# Sample candidate mappings for each reference equivalence mapping\nresults = []\nfor test_map in ncit2doid_equiv_mappings:\n    valid_tgts, stats = neg_gen.mixed_sample(test_map, idf=50, neighbour=50)\n    print(f\"STATS for {test_map}:\\n{stats}\")\n    results.append((test_map.head, test_map.tail, valid_tgts))\nresults = pd.DataFrame(results, columns=[\"SrcEntity\", \"TgtEntity\", \"TgtCandidates\"])\nresults.to_csv(result_path, sep=\"\\t\", index=False)\n</code></pre> <p>See a concrete data script for this process at <code>OAEI-Bio-ML/data_scripts/generate_cand_maps.py</code>.</p> <p>The process of sampling using idf scores was originally proposed in the BERTMap paper. The <code>annotation_property_iris</code> parameter specifies the list of annotation properties used to extract the names or aliases of an ontology class. The <code>tokenizer</code> parameter refers to a pre-trained sub-word level tokenizer used to build the inverted annotation index. These aspects are thoroughly explained in the BERTMap tutorial.</p>"},{"location":"bio-ml/#evaluation-framework","title":"Evaluation Framework","text":"<p>Our evaluation protocol concerns two scenarios for OM: global matching for overall assessment and local ranking for partial assessment.</p>"},{"location":"bio-ml/#global-matching","title":"Global Matching","text":"<p>As an overall assessment, given a complete set of reference mappings, an OM system is expected to compute a set of true mappings and compare against the reference mappings using Precision, Recall, and F-score metrics. With \\(\\textsf{DeepOnto}\\), the evaluation can be performed using the following code. </p> <p>Matching Result<p>Download an example of matching result file. The three columns, <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code> refer to the source class IRI, the target class IRI, and the matching score.</p> </p> <pre><code>from deeponto.align.evaluation import AlignmentEvaluator\nfrom deeponto.align.mapping import ReferenceMapping, EntityMapping\n\n# load prediction mappings and reference mappings\npreds = EntityMapping.read_table_mappings(f\"{experiment_dir}/bertmap/match/repaired_mappings.tsv\")\nrefs = ReferenceMapping.read_table_mappings(f\"{data_dir}/refs_equiv/full.tsv\")\n\n# compute the precision, recall and F-score metrics\nresults = AlignmentEvaluator.f1(preds, refs)\nprint(results)\n</code></pre> <p>The associated formulas for Precision, Recall and F-score are:</p> \\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}, \\ \\ R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}, \\ \\ F_1 = \\frac{2 P R}{P + R}\\] <p>where \\(\\mathcal{M}_{pred}\\) and \\(\\mathcal{M}_{ref}\\) denote the prediction mappings and reference mappings, respectively.</p> <code>Output:</code> <pre><code>{'P': 0.887, 'R': 0.879, 'F1': 0.883}\n</code></pre> <p>For the semi-supervised setting where a small set of training mappings is provided, the training set should also be loaded and set as null (neither positive nor negative) with <code>null_reference_mappings</code> during evaluation:</p> <pre><code>train_refs = ReferenceMapping.read_table_mappings(f\"{data_dir}/refs_equiv/train.tsv\")\nresults = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=train_refs)\n</code></pre> <p>When null reference mappings are involved, the formulas of Precision and Recall become:</p> \\[P = \\frac{|(\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}) - \\mathcal{M}_{null}|}{|\\mathcal{M}_{pred} - \\mathcal{M}_{null} |}, \\ \\ R = \\frac{|(\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}) - \\mathcal{M}_{null}|}{|\\mathcal{M}_{ref} - \\mathcal{M}_{null}|}\\] <p>As for the OAEI 2023 version, some prediction mappings could involve classes that are marked as not used in alignment. Therefore, we need to filter out those mappings before evaluation.</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.oaei import *\n\n# load the source and target ontologies and  \n# extract classes that are marked as not used in alignment\nsrc_onto = Ontology(\"src_onto_file\")\ntgt_onto = Ontology(\"tgt_onto_file\")\nignored_class_index = get_ignored_class_index(src_onto)\nignored_class_index.update(get_ignored_class_index(tgt_onto))\n\n# filter the prediction mappings\npreds = remove_ignored_mappings(preds, ignored_class_index)\n\n# then compute the results\nresults = AlignmentEvaluator.f1(preds, refs, ...)\n</code></pre> <p>Tip</p> <p>We have encapsulated above features in the <code>matching_eval</code> function in the OAEI utilities.</p> <p>However,</p> <ul> <li>The scores will be biased towards high-precision, low-recall OM systems if the set of reference mappings is incomplete. </li> <li>For efficient OM system development and debugging, an intermediate evaluation is required.</li> </ul> <p>Therefore, the ranking-based evaluation protocol is presented as follows.</p>"},{"location":"bio-ml/#local-ranking","title":"Local Ranking","text":"<p>An OM system is also expected to distinguish the reference mapping among a set of candidate mappings and the performance can be reflected in Hits@K and MRR metrics. </p> <p>Warning</p> <p>The reference subsumption mappings are inherently incomplete, so only the ranking metrics are adopted in evaluating system performance in subsumption matching.</p> <p>Ranking Result<p>Download an example of raw (unscored) candidate mapping file and an example of scored candidate mapping file. The <code>\"SrcEntity\"</code> and <code>\"TgtEntity\"</code> columns refer to the source class IRI and the target class IRI involved in a reference mapping. The <code>\"TgtCandidates\"</code> column stores a sequence of <code>tgt_cand_iri</code> in the unscored file and a list of tuples <code>(tgt_cand_iri, score)</code> in the scored file, which can be accessed by the built-in Python function <code>eval</code>. </p> </p> <p>With \\(\\textsf{DeepOnto}\\), the evaluation can be performed as follows. First, an OM system needs to assign a score to each target candidate class and save the results as a list of tuples <code>(tgt_cand_class_iri, matching_score)</code>. </p> <pre><code>from deeponto.utils import read_table\nimport pandas as pd\n\ntest_candidate_mappings = read_table(\"test.cands.tsv\").values.to_list()\nranking_results = []\nfor src_ref_class, tgt_ref_class, tgt_cands in test_candidate_mappings:\n    tgt_cands = eval(tgt_cands)  # transform string into list or sequence\n    scored_cands = []\n    for tgt_cand in tgt_cands:\n        # assign a score to each candidate with an OM system\n        ...\n        scored_cands.append((tgt_cand, matching_score))\n    ranking_results.append((src_ref_class, tgt_ref_class, scored_cands))\n# save the scored candidate mappings in the same format as the original `test.cands.tsv`\npd.DataFrame(ranking_results, columns=[\"SrcEntity\", \"TgtEntity\", \"TgtCandidates\"]).to_csv(\"scored.test.cands.tsv\", sep=\"\\t\", index=False)\n</code></pre> <p>Then, the ranking evaluation results can be obtained by:</p> <pre><code>from deeponto.align.oaei import *\n\n# If `has_score` is False, assume default ranking (see tips below)\nranking_eval(\"scored.test.cands.tsv\", has_score=True, Ks=[1, 5, 10])\n</code></pre> <code>Output:</code> <pre><code>{'MRR': 0.9586373098280843,\n 'Hits@1': 0.9371951219512196,\n 'Hits@5': 0.9820121951219513,\n 'Hits@10': 0.9878048780487805}\n</code></pre> <p>The associated formulas for MRR and Hits@K are:</p> \\[MRR = \\sum_i^N rank_i^{-1} / N, \\ \\ Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N\\] <p>where \\(N\\) is the number of reference mappings used for testing, \\(rank_i\\) is the relative rank of the reference mapping among its candidate mappings.</p> <p>Tip</p> <p>If matching scores are not available, the target candidate classes should be sorted in descending order and saved in a list, the <code>ranking_eval</code> function will compute scores according to the sorted list.</p>"},{"location":"bio-ml/#oaei-bio-ml-2022","title":"OAEI Bio-ML 2022","text":"<p>Below is a table showing the data statistics for the original Bio-ML used in OAEI 2022. In the Category column, \"Disease\" indicates that the data from Mondo mainly covers disease concepts, while \"Body\", \"Pharm\", and \"Neoplas\" denote semantic types of \"Body Part, Organ, or Organ Components\", \"Pharmacologic Substance\", and \"Neoplastic Process\" in UMLS, respectively. </p> <p>Note that each subsumption matching task is constructed from an equivalence matching task subject to target ontology class deletion, therefore <code>#TgtCls (subs)</code> differs from <code>#TgtCls</code>.</p> <p> Source Task Category #SrcCls #TgtCls #TgtCls(\\(\\sqsubseteq\\)) #Ref(\\(\\equiv\\)) #Ref(\\(\\sqsubseteq\\)) Mondo OMIM-ORDO Disease 9,642 8,838 8,735 3,721 103 Mondo NCIT-DOID Disease 6,835 8,448 5,113 4,686 3,339 UMLS SNOMED-FMA Body 24,182 64,726 59,567 7,256 5,506 UMLS SNOMED-NCIT Pharm 16,045 15,250 12,462 5,803 4,225 UMLS SNOMED-NCIT Neoplas 11,271 13,956 13,790 3,804 213 <p> </p> <p>The datasets, which can be downloaded from Zenodo, include <code>Mondo.zip</code> and <code>UMLS.zip</code> for resources constructed from Mondo and UMLS, respectively. Each <code>.zip</code> file contains three folders: <code>raw_data</code>, <code>equiv_match</code>, and <code>subs_match</code>, corresponding to the raw source ontologies, data for equivalence matching, and data for subsumption matching, respectively. The detailed file structure is illustrated in the figure below.</p> <p></p> <p> </p>"},{"location":"bio-ml/#oaei-bio-ml-2023","title":"OAEI Bio-ML 2023","text":"<p>For the OAEI 2023 version, we implemented several updates, including:</p> <ul> <li> <p>Locality Module Enrichment: In response to the loss of ontology context due to pruning, we used the locality module technique (access the code) to enrich pruned ontologies with logical modules that provide context for existing classes. To ensure the completeness of reference mappings, the new classes added are annotated as not used in alignment with the annotation property <code>use_in_alignment</code> set to <code>false</code>. While these supplemental classes can be used by OM systems as auxiliary information, they can be excluded from the alignment process. Even they are considered in the final output mappings, our evaluation will ensure that they are excluded in the metric computation (see Evaluation Framework). </p> </li> <li> <p>Simplified Task Settings: For each of the five OM pairs, we simplified the task settings to the following:</p> </li> <li>Equivalence Matching: <ul> <li>Unsupervised: We cancelled the validation mapping set, and the full reference mapping set available at <code>{task_name}/refs_equiv/full.tsv</code> is used for global matching evaluation.</li> <li>Semi-supervised: The validation mapping set is incorporated into the training set (rendering train-val splitting optional), and the test mapping set available at <code>{task_name}/refs_equiv/test.tsv</code> is used for global matching evaluation.</li> <li>Local Ranking: Both unsupervised and semi-supervised settings use the same set of candidate mappings found at <code>{task_name}/refs_equiv/test.cands.tsv</code> for local ranking evaluation.</li> </ul> </li> <li> <p>Subsumption Matching:</p> <ul> <li>Target Ontology: In the OAEI 2022 version, the target ontology for subsumption matching is different from the one for equivalence matching due to target class deletion. However, as the locality modules counteract such deletion process, we use the same target ontology for both types of matching.</li> <li>Unsupervised: We removed the unsupervised setting since the subsumption matching task did not attract enough participation.</li> <li>Semi-supervised: The validation mapping set is merged into the training set (rendering train-val splitting optional). We conduct a local ranking evaluation (global matching is not applicable for subsumption matching) for candidate mappings available at <code>{task_name}/refs_subs/test.cands.tsv</code>. </li> </ul> </li> <li> <p>Bio-LLM: A Special Sub-Track for Large Language Models: We introduced a unique sub-track for Large Language Model (LLM)-based OM systems. We extracted small but challenging subsets from the NCIT-DOID and SNOMED-FMA (Body) datasets for this purpose (refer to OAEI Bio-LLM 2023).</p> </li> </ul> <p>Below demonstrates the data statistics for the OAEI 2023 version of Bio-ML, where the input ontologies are enriched with locality modules compared to the pruned versions used in OAEI 2022. The augmented structural and logical contexts make these ontologies more similar to their original versions without any processing (available at <code>raw_data</code>). The changes compared to the previous version (see Bio-ML OAEI 2022) are reflected in the <code>+</code> numbers of ontology classes. </p> <p>In the Category column, \"Disease\" indicates that the Mondo data are mainly about disease concepts, while \"Body\", \"Pharm\", and \"Neoplas\" denote semantic types of \"Body Part, Organ, or Organ Components\", \"Pharmacologic Substance\", and \"Neoplastic Process\" in UMLS, respectively. </p> <p> Source Task Category #SrcCls #TgtCls #Ref(\\(\\equiv\\)) #Ref(\\(\\sqsubseteq\\)) Mondo OMIM-ORDO Disease 9,648 (+6) 9,275 (+437) 3,721 103 Mondo NCIT-DOID Disease 15,762 (+8,927) 8,465 (+17) 4,686 3,339 UMLS SNOMED-FMA Body 34,418 (+10,236) 88,955 (+24,229) 7,256 5,506 UMLS SNOMED-NCIT Pharm 29,500 (+13,455) 22,136 (+6,886) 5,803 4,225 UMLS SNOMED-NCIT Neoplas 22,971 (+11,700) 20,247 (+6291) 3,804 213 <p> </p> <p>The file structure for the download datasets (from Zenodo) is also simplified this year to accommodate the changes. Detailed structure is presented in the following figure.</p> <p></p> <p> </p> <p>Remarks on this figure:</p> <ol> <li>For equivalence matching, testing of the global matching evaluation should be performed on <code>refs_equiv/full.tsv</code> in the unsupervised setting, and on <code>refs_equiv/test.tsv</code> (with <code>refs_equiv/train.tsv</code> set to null reference mappings) in the semi-supervised setting. Testing of the local ranking evaluation should be performed on <code>refs_equiv/test.cands.tsv</code> for both settings.</li> <li>For subsumption matching, the local ranking evaluation should be performed on <code>refs_equiv/test.cands.tsv</code> and the training mapping set <code>refs_subs/train.tsv</code> is optional.</li> <li>The <code>test.cands.tsv</code> file in the Bio-LLM sub-track is different from the main Bio-LM track ones. See OAEI Bio-LLM 2023 for more information and how to evaluate on it.</li> </ol>"},{"location":"bio-ml/#oaei-bio-llm-2023","title":"OAEI Bio-LLM 2023","text":"<p>As Large Language Models (LLMs) are trending in the AI community, we formulate a special sub-track for evaluating LLM-based OM systems. However, evaluating LLMs with the current OM datasets can be time and resource intensive. To yield insightful results prior to full implementation, we leverage two challenging subsets extracted from the NCIT-DOID and the SNOMED-FMA (Body) equivalence matching datasets.</p> <p>For each original dataset, we first randomly select 50 matched class pairs from ground truth mappings, but excluding pairs that can be aligned with direct string matching (i.e., having at least one shared label) to restrict the efficacy of conventional lexical matching. Next, with a fixed source ontology class, we further select 99 negative target ontology classes, thus forming a total of 100 candidate mappings (inclusive of the ground truth mapping). This selection is guided by the sub-word inverted index-based idf scores as in the BERTMap paper (see BERTMap tutorial for more details), which are capable of producing target ontology classes lexically akin to the fixed source class. We finally randomly choose 50 source classes that do not have a matched target class according to the ground truth mappings, and create 100 candidate mappings using the inverted index for each. Therefore, each subset comprises 50 source ontology classes with a match and 50 without. Each class is associated with 100 candidate mappings, culminating in a total extraction of 10,000, i.e., (50+50)*100, class pairs.</p>"},{"location":"bio-ml/#evaluation","title":"Evaluation","text":""},{"location":"bio-ml/#matching","title":"Matching","text":"<p>From all the 10,000 class pairs in a given subset, the OM system is expected to predict the true mappings among them, which can be compared against the 50 available ground truth mappings using  Precision, Recall, and F-score. </p> <p>We use the same formulas in the main track evaluation framework to calculate Precision, Recall, and F-score. The prediction mappings \\(\\mathcal{M}_{pred}\\) are the class pairs an OM system predicts as true mappings, and the reference mappings \\(\\mathcal{M}_{ref}\\) refers to the 50 matched pairs. </p> \\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}, \\ \\ R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}, \\ \\ F_1 = \\frac{2 P R}{P + R}\\]"},{"location":"bio-ml/#ranking","title":"Ranking","text":"<p>Given that each source class is associated with 100 candidate mappings, we can compute ranking-based metrics based on their scores. Specifically, we calculate:</p> <ul> <li> <p>\\(Hits@1\\) for the 50 matched source classes, counting a hit when the top-ranked candidate mapping is a ground truth mapping. The corresponding formula is:</p> \\[ Hits@K = \\sum_{(c, c') \\in \\mathcal{M}_{ref}} \\mathbb{I}_{rank_{c'} \\leq K} / |\\mathcal{M}_{ref}| \\] <p>where \\(rank_{c'}\\) is the predicted relative rank of \\(c'\\) among its candidates, \\(\\mathbb{I}_{rank_{c'} \\leq K}\\) is a binary indicator function that outputs 1 if the rank is less than or equal to \\(K\\) and outputs 0 otherwise.</p> </li> <li> <p>The \\(MRR\\) score is also computed for these matched source classes, summing the inverses of the ground truth mappings' relative ranks among candidate mappings. The corresponding formula is:</p> \\[ MRR = \\sum_{(c, c') \\in \\mathcal{M}_{ref}} rank_{c'}^{-1} / |\\mathcal{M}_{ref}| \\] </li> <li> <p>For the 50 unmatched source classes, we compute the rejection rate (denoted as \\(RR\\)), counting a successful rejection when all the candidate mappings are predicted as false mappings. We assign each unmatched source class with a null class \\(c_{null}\\), which refers to any target class that does not have a match with the source class, and denote this set of unreferenced mappings as \\(\\mathcal{M}_{unref}\\). </p> \\[ RR = \\sum_{(c, c_{null}) \\in \\mathcal{M}_{unref}} \\prod_{d \\in \\mathcal{T}_c} (1 - \\mathbb{I}_{c \\equiv d})  / |\\mathcal{M}_{unref}| \\] <p>where \\(\\mathcal{T}_c\\) is the set of target candidate classes for \\(c\\), and \\(\\mathbb{I}_{c \\equiv d}\\) is a binary indicator that outputs 0 if the OM system predicts a false mapping between \\(c\\) and \\(d\\), and outputs 1 otherwise. The product term in this equation returns 1 if all target candidate classes are predicted as unmatched, i.e., \\(\\forall d \\in \\mathcal{T}_c.\\mathbb{I}_{c \\equiv d}=0\\).</p> </li> </ul> <p>To summarise, the Bio-LLM sub-track provides two representative OM subsets and adopts a range of evaluation metrics to gain meaningful insights from this partial assessment, thus promoting robust and efficient development of LLM-based OM systems.</p>"},{"location":"bio-ml/#oaei-participation","title":"OAEI Participation","text":"<p>To participate in the OAEI track, please visit the OAEI Bio-ML website for more information, especially on the instructions of system submission or direct result submission. In the following, we present the formats of result files we expect participants to submit.</p>"},{"location":"bio-ml/#result-submission-format","title":"Result Submission Format","text":"<p>For the main Bio-ML track, we expect two result files for each setting:</p> <ul> <li> <p>(1) A prediction mapping file named <code>match.result.tsv</code> in the same format as the reference mapping file (e.g., <code>task_name/refs_equiv/full.tsv</code>).</p> <p>Matching Result<p>Download an example of mapping file. The three columns, <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code> refer to the source class IRI, the target class IRI, and the matching score.</p> </p> </li> <li> <p>(2) A scored or ranked candidate mapping file named <code>rank.result.tsv</code> in the same format as the test candidate mapping file (e.g., <code>task_name/refs_equiv/test.cands.tsv</code>). </p> <p>Ranking Result<p>Download an example of raw (unscored) candidate mapping file and an example of scored candidate mapping file. The <code>\"SrcEntity\"</code> and <code>\"TgtEntity\"</code> columns refer to the source class IRI and the target class IRI involved in a reference mapping. The <code>\"TgtCandidates\"</code> column stores a sequence of <code>tgt_cand_iri</code> in the unscored file and a list of tuples <code>(tgt_cand_iri, score)</code> in the scored file, which can be accessed by the built-in Python function <code>eval</code>. </p> <p>We also accept a result file without scores and in that case we assume the list of <code>tgt_cand_iri</code> has been sorted in descending order.</p> </p> </li> </ul> <p>Note that each OM pair is accompanied with an unsupervised and a semi-supervised setting and thus separate sets of result files should be submitted. Moreover, for subsumption matching, only the ranking result file in (2) is required.</p> <p>For the Bio-LLM sub-track, we expect one result file (similar to (2) but requiring a list of triples) for the task:</p> <ul> <li> <p>(3)  A scored or ranked (with answers) candidate mapping file named <code>biollm.result.tsv</code> in the same format as the test candidate mapping file (i.e., <code>task_name/test.cands.tsv</code>).</p> <p>Bio-LLM Result<p>Download an example of bio-llm mapping file. The <code>\"SrcEntity\"</code> and <code>\"TgtEntity\"</code> columns refer to the source class IRI and the target class IRI involved in a reference mapping. The <code>\"TgtCandidates\"</code> column stores a sequence of a list of triples <code>(tgt_cand_iri, score, answer)</code> in the scored file, which can be accessed by the built-in Python function <code>eval</code>. The additional <code>answer</code> values are <code>True</code> or <code>False</code> indicating whether the OM system predicts <code>(src_class_iri, tgt_cand_iri)</code> as a true mapping.</p> </p> </li> </ul> <p>It is important to notice that the <code>answer</code> values are necessary for the matching evaluation of P, R, F-score, and the computation of rejection rate, the <code>score</code> values are used for ranking evaluation of MRR and Hits@1.</p>"},{"location":"changelog/","title":"Changelog","text":""},{"location":"changelog/#unreleased","title":"Unreleased","text":""},{"location":"changelog/#added","title":"Added","text":"<ul> <li>[] Add ICON as a completion service at <code>deeponto.complete</code>.</li> <li> Add empty annotation index warning for BERTMap, related to issue #18.</li> <li> Add <code>check_consistency()</code> at <code>deeponto.onto.Ontology</code>.</li> <li> Add a warning message for empty vocab at <code>deeponto.onto.OntologyVerbaliser</code>.</li> </ul>"},{"location":"changelog/#changed","title":"Changed","text":"<ul> <li> Change <code>deeponto.subs</code> to <code>deeponto.complete</code>.</li> <li> Move <code>deeponto.probe.ontolama</code> into <code>deeponto.complete</code>.</li> </ul> <p>...</p>"},{"location":"changelog/#v088-2023-october","title":"v0.8.8 (2023 October)","text":""},{"location":"changelog/#added_1","title":"Added","text":"<ul> <li> Add object property domain/range verbalisation at <code>deeponto.onto.OntologyVerbaliser</code>.</li> <li> Add new reasoner type <code>\"struct\"</code> (Structural Reasoner) at <code>deeponto.onto.OntologyReasoner</code>.</li> <li> Add <code>load_reasoner()</code> method at <code>deeponto.onto.OntologyReasoner</code> for convenience of changing the reasoner type and remove <code>reload_reasoner()</code> method as it is a special case of <code>load_reasoner()</code>.</li> <li> Add <code>rdflib</code> into the dependencies for building graph-related features.</li> <li> Add <code>deeponto.onto.taxonomy</code> for building the taxonomy over ontologies and potentially other structured data.</li> </ul>"},{"location":"changelog/#changed_1","title":"Changed","text":"<ul> <li> Change printing to appropriate logging (gradually).</li> <li> Change <code>read_table_mappings()</code> method at <code>deeponto.align.mapping</code> from using <code>dataframe.iterrows()</code> to <code>dataframe.itertuples()</code> which is much more efficient.</li> <li> Change the default lowcasing argument of <code>deeponto.utils.process_annotation_literal()</code> to <code>False</code>.</li> <li> Change the default logging level of <code>slf4j</code> to <code>warn</code> to prevent tons of printing at ELK (issue (#13)[https://github.com/KRR-Oxford/DeepOnto/issues/13]).</li> </ul>"},{"location":"changelog/#v087-2023-september","title":"v0.8.7 (2023 September)","text":""},{"location":"changelog/#added_2","title":"Added","text":"<ul> <li> Add the OAEI evaluation code for the main track global matching, local ranking, and the special sub-track bio-llm at <code>deeponto.align.oaei</code>.</li> <li> Add <code>reasoner_type</code> argument at <code>deeponto.onto.OntologyReasoner</code>, now supporting <code>hermit</code> (default) and <code>elk</code>.</li> <li> Add <code>get_all_axioms()</code> method at <code>deeponto.onto.Ontology</code>.</li> <li> <p> Add <code>get_iri()</code> method at <code>deeponto.onto.Ontology</code>.</p> </li> <li> <p> Add new features into <code>deeponto.onto.OntologyVerbaliser</code> including:</p> </li> <li> <p><code>verbalise_object_property_subsumption()</code> for object property subsumption axioms.</p> </li> <li>property chain verbalisation at <code>verbalise_class_expression()</code>.</li> <li><code>verbalise_class_subsumption()</code> for class subsumption axioms;</li> <li><code>verbalise_class_equivalence()</code> for class equivalence axioms;</li> <li><code>verbalise_class_assertion()</code> for class assertion axioms;</li> <li><code>verbalise_relation_assertion()</code> for relation assertion axioms;</li> <li><code>auto-correction</code> option for fixing entity names.</li> <li><code>keep_iri</code> option for keeping entity IRIs.</li> <li> <p><code>add_quantifier_word</code> option for adding quantifier words as in the Manchester syntax.</p> </li> <li> <p> Add <code>get_assertion_axioms()</code> method at <code>deeponto.onto.Ontology</code>.</p> </li> <li> Add <code>get_axiom_type()</code> method at <code>deeponto.onto.Ontology</code>.</li> <li> Add <code>owl_individuals</code> attribute at <code>deeponto.onto.Ontology</code>.</li> </ul>"},{"location":"changelog/#changed_2","title":"Changed","text":"<ul> <li> Change <code>get_owl_objects()</code> method to be anonymous as it is only used for creating pre-processed entity index at <code>deeponto.onto.Ontology</code>.</li> <li> Change <code>get_owl_object_from_iri()</code> method to <code>get_owl_object()</code> at <code>deeponto.onto.Ontology</code>.</li> <li> Change the log level of the ELK reasoner to <code>ERROR</code>.</li> </ul>"},{"location":"changelog/#fixed","title":"Fixed","text":"<ul> <li> Fix the file path problem of loading ontologies for Windows systems.</li> <li> Fix the version of ELK to the latest by manually adding in the dependencies. See download link at https://github.com/liveontologies/elk-reasoner/wiki/GettingElk.</li> </ul>"},{"location":"changelog/#v085-2023-september","title":"v0.8.5 (2023 September)","text":""},{"location":"changelog/#added_3","title":"Added","text":"<ul> <li> Add <code>set_seed()</code> method at <code>deeponto.utils</code>.</li> </ul>"},{"location":"changelog/#changed_3","title":"Changed","text":"<ul> <li> Change the layout of all utility methods by making them stand-alone instead of static methods.</li> <li> Change the <code>.verbalise_class_expression()</code> method by adding an option to keep entity IRIs without verbalising them using <code>.vocabs</code> at <code>deeponto.onto.OntologyVerbaliser</code>.</li> <li> Change default <code>apply_lowercasing</code> value to <code>False</code> for both <code>.get_annotations()</code> and <code>.build_annotation_index()</code> methods at <code>deeponto.onto.Ontology</code>.</li> <li> Change the method <code>.get_owl_object_annotations()</code> to <code>.get_annotations()</code> at <code>deeponto.onto.Ontology</code>.</li> <li> Change the LogMap debugger memory options for BERTMap's mapping repair.</li> <li> Change the default jar command timeout to 1 hour.</li> </ul>"},{"location":"changelog/#fixed_1","title":"Fixed","text":"<ul> <li> Fix duplicate logging in running BERTMap due to progapagation.</li> </ul>"},{"location":"changelog/#v084-2023-july","title":"v0.8.4 (2023 July)","text":""},{"location":"changelog/#added_4","title":"Added","text":"<ul> <li> Add specific check of <code>use_in_alignment</code> annotation in BERTMap for the OAEI.</li> <li> Add OAEI utilities at <code>deeponto.align.oaei</code>.</li> </ul>"},{"location":"changelog/#changed_4","title":"Changed","text":"<ul> <li> Change the <code>read_table_mappings</code> method to allow <code>None</code> for threshold.</li> </ul>"},{"location":"changelog/#fixed_2","title":"Fixed","text":"<ul> <li> Fix BERTMap error and add corresponding warning when an input ontology has no sibling class group, related to Issue #10.</li> <li> Fix BERTMap error and add corresponding warning when an input ontology has some class with no label (annotation), related to Issue #10.</li> </ul>"},{"location":"changelog/#v083-2023-july","title":"v0.8.3 (2023 July)","text":""},{"location":"changelog/#changed_5","title":"Changed","text":"<ul> <li> Change the mapping extension from using reasoner to direct assertions.</li> <li> Change the name of pruning function in <code>deeponto.onto.OntologyPruner</code>.</li> <li> Change the verbalisation function by setting quantifier words as optional (by default not adding).</li> <li> Change sibing retrieval from using reasoner to direct assertions.</li> </ul>"},{"location":"changelog/#fixed_3","title":"Fixed","text":"<ul> <li> Fix the minor bug for the <code>f1</code> and <code>MRR</code> method in <code>deeponto.align.evaluation.AlignmentEvaluator</code>.</li> </ul>"},{"location":"changelog/#v080-2023-june","title":"v0.8.0 (2023 June)","text":""},{"location":"changelog/#added_5","title":"Added","text":"<ul> <li> Add the ontology normaliser at <code>deeponto.onto.OntologyNormaliser</code>.</li> <li> Add the ontology projector at <code>deeponto.onto.OntologyProjector</code>.</li> </ul>"},{"location":"changelog/#changed_6","title":"Changed","text":"<ul> <li> Change the dependency <code>transformers</code> to <code>transformers[torch]</code>.</li> </ul>"},{"location":"changelog/#v075-2023-june","title":"v0.7.5 (2023 June)","text":""},{"location":"changelog/#changed_7","title":"Changed","text":"<ul> <li> Change Java dependencies from using <code>lib</code> from mowl to direct import.</li> <li> Change <code>get_owl_object_annotations</code> by adding <code>uniqify</code> at the end to preserve the order.</li> </ul>"},{"location":"changelog/#fixed_4","title":"Fixed","text":"<ul> <li> Fix BERTMap's non-synonym sampling when the class labels are not available using the try-catch block.</li> </ul>"},{"location":"changelog/#v070-2023-april","title":"v0.7.0 (2023 April)","text":""},{"location":"changelog/#added_6","title":"Added","text":"<ul> <li> Add the BERTSubs module at <code>deeponto.subs.bertsubs</code>; its inter-ontology setting is also imported at <code>deeponto.align.bertsubs</code>.</li> </ul>"},{"location":"changelog/#changed_8","title":"Changed","text":"<ul> <li> Move the pruning functionality into <code>deeponto.onto.OntologyPruner</code> as a separate module.</li> <li> Amend JVM checking before displaying the JVM memory prompt from importing <code>deeponto.onto.Ontology</code>; if started already, skip this step.</li> <li> Change the function <code>get_owl_object_annotations</code> at <code>deeponto.onto.Ontology</code> by preserving the relative order of annotation retrieval, i.e., create <code>set</code> first and use the <code>.add()</code> function instead of casting the <code>list</code> into <code>set</code> in the end.</li> </ul>"},{"location":"changelog/#fixed_5","title":"Fixed","text":"<ul> <li> Fix the function <code>check_deprecated</code> at <code>deeponto.onto.Ontology</code> by adding a check for the \\(\\texttt{owl:deprecated}\\) annotation property -- if this property does not exist in the current ontology, return <code>False</code> (not deprecated).</li> </ul>"},{"location":"changelog/#v061-2023-april","title":"v0.6.1 (2023 April)","text":""},{"location":"changelog/#added_7","title":"Added","text":"<ul> <li> Add the method <code>remove_axiom</code> for removing an axiom from the ontology at <code>deeponto.onto.Ontology</code> (note that the counterpart <code>add_axiom</code> has already been available).</li> <li> Add the method <code>check_named_entity</code> for checking if an entity is named at <code>deeponto.onto.Ontology</code>.</li> <li> Add the method <code>get_subsumption_axioms</code> for getting subsumption axioms subject to different entity types at <code>deeponto.onto.Ontology</code>.</li> <li> Add the method <code>get_asserted_complex_classes</code> for getting all complex classes that occur in ontology (subsumption and/or equivalence) axioms at <code>deeponto.onto.Ontology</code>.</li> <li> Add the methods <code>get_asserted_parents</code> and <code>get_asserted_children</code> for getting asserted parent and children for a given entity at <code>deeponto.onto.Ontology</code>.</li> <li> Add the method <code>check_deprecation</code> for checking an owl object's deprecation (annotated) at <code>deeponto.onto.Ontology</code>.</li> </ul>"},{"location":"changelog/#changed_9","title":"Changed","text":"<ul> <li> Move the spacy <code>en_core_web_sm</code> download into the initialisation of <code>OntologyVerbaliser</code>.</li> <li> Change the method of getting equivalence axioms by adding support to different entity types at <code>deeponto.onto.Ontology</code>.</li> <li> Rename the methods of getting inferred super-entities and sub-entities at <code>deeponto.onto.OntologyReasoner</code>:</li> <li><code>super_entities_of</code> \\(\\rightarrow\\) <code>get_inferred_super_entities</code></li> <li><code>sub_entities_of</code> \\(\\rightarrow\\) <code>get_inferred_sub_entities</code></li> </ul>"},{"location":"changelog/#fixed_6","title":"Fixed","text":"<ul> <li> Fix the top and bottom data property iris (from \"https:\" to \"http:\") at <code>deeponto.onto.Ontology</code>.</li> </ul>"},{"location":"changelog/#v060-2023-mar","title":"v0.6.0 (2023 Mar)","text":"<ul> <li> Add the OntoLAMA module at <code>deeponto.lama</code>.</li> <li> Add the verb auto-correction and more precise documentation for <code>deeponto.onto.verbalisation</code>.</li> </ul>"},{"location":"changelog/#v05x-2023-jan-feb","title":"v0.5.x (2023 Jan - Feb)","text":"<ul> <li> Add the preliminary ontology verbalisation module at <code>deeponto.onto.verbalisation</code>.</li> <li> Fix PyPI issues based on the new code layout.</li> <li> Change code layout to the <code>src/</code> layout.</li> <li> Rebuild the whole package based on the OWLAPI.</li> <li> Remove owlready2 from the essential dependencies.</li> </ul>"},{"location":"changelog/#deprecated-before-2023","title":"Deprecated (before 2023)","text":"<p>The code before v0.5.0 is no longer available.</p>"},{"location":"faqs/","title":"FAQs","text":"<ul> <li> <p>Q1: System compatibility?</p> <ul> <li>Ans: Reported successfull installation on different platforms include:<ul> <li>Windows 11: Python 3.8 with a virtual environment.</li> <li>Ubuntu 22: Python 3.8, 3.9 and 3.10 with a virtual environment.</li> </ul> </li> </ul> </li> <li> <p>Q2: Encountering issues with the JPype installation?</p> <ul> <li>Ans: JPype seems to be not compatible with the most recent version of Python; check valid Python versions across platforms at Q1.</li> </ul> </li> <li> <p>Q3: Missing system-level dependencies on Linux?</p> <ul> <li>Ans: Please ensure that the essential dev tools package has been deployed if you are using a Linux system. Also, according to JPype's documentation, <code>g++</code> and <code>python-dev</code> need to be installed.</li> </ul> </li> </ul>"},{"location":"ontolama/","title":"OntoLAMA: Dataset Overview and Usage Guide","text":"<p>paper</p> <p>Paper for OntoLAMA: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).</p> <pre><code>@inproceedings{he-etal-2023-language,\n    title = \"Language Model Analysis for Ontology Subsumption Inference\",\n    author = \"He, Yuan  and\n    Chen, Jiaoyan  and\n    Jimenez-Ruiz, Ernesto  and\n    Dong, Hang  and\n    Horrocks, Ian\",\n    booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n    month = jul,\n    year = \"2023\",\n    address = \"Toronto, Canada\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.findings-acl.213\",\n    doi = \"10.18653/v1/2023.findings-acl.213\",\n    pages = \"3439--3453\"\n}\n</code></pre> <p>This page provides an overview of the \\(\\textsf{OntoLAMA}\\) datasets, how to use them, and the related probing approach introduced in the research paper.</p>"},{"location":"ontolama/#overview","title":"Overview","text":"<p>\\(\\textsf{OntoLAMA}\\) is a set of language model (LM) probing datasets and a prompt-based probing method for ontology subsumption inference or ontology completion. The work follows the \"LMs-as-KBs\" literature but focuses on conceptualised knowledge extracted from formalised KBs such as the OWL ontologies. Specifically, the subsumption inference (SI) task is introduced and formulated in the Natural Language Inference (NLI) style, where the sub-concept and the super-concept involved in a subsumption axiom are verbalised and fitted into a template to form the premise and hypothesis, respectively. The sampled axioms are verified through ontology reasoning. The SI task is further divided into Atomic SI and Complex SI where the former involves only atomic named concepts and the latter involves both atomic and complex concepts. Real-world ontologies of different scales and domains are used for constructing OntoLAMA and in total there are four Atomic SI datasets and two Complex SI datasets.</p>"},{"location":"ontolama/#useful-links","title":"Useful Links","text":"<ul> <li>Datasets available at Zenodo: https://doi.org/10.5281/zenodo.6480540 (CC BY 4.0 International).</li> <li>Also available at Huggingface: https://huggingface.co/datasets/krr-oxford/OntoLAMA.</li> <li>The source code for dataset construction and LM probing is available at: https://krr-oxford.github.io/DeepOnto/deeponto/probe/ontolama/.</li> </ul>"},{"location":"ontolama/#statistics","title":"Statistics","text":"<p> Source #NamedConcepts #EquivAxioms #Dataset (Train/Dev/Test) Schema.org 894 - Atomic SI: 808/404/2,830 DOID 11,157 - Atomic SI: 90,500/11,312/11,314 FoodOn 30,995 2,383 Atomic SI: 768,486/96,060/96,062  Complex SI: 3,754/1,850/13,080 GO 43,303 11,456 Atomic SI: 772,870/96,608/96,610  Complex SI: 72,318/9,040/9,040 MNLI - - biMNLI: 235,622/26,180/12,906 <p></p>"},{"location":"ontolama/#usage","title":"Usage","text":"<p>Users have two options for accessing the OntoLAMA datasets. They can either download the datasets directly from Zenodo or use the Huggingface Datasets platform. </p> <p>If using Huggingface, users should first install the <code>dataset</code> package:</p> <pre><code>pip install datasets\n</code></pre> <p>Then, a dataset can be accessed by:</p> <pre><code>from datasets import load_dataset\n# dataset = load_dataset(\"krr-oxford/OntoLAMA\", dataset_name)\n# for example, loading the Complex SI dataset of Go\ndataset = load_dataset(\"krr-oxford/OntoLAMA\", \"go-complex-SI\") \n</code></pre> <p>Options of <code>dataset_name</code> include:</p> <ul> <li><code>\"bimnli\"</code> (from MNLI)</li> <li><code>\"schemaorg-atomic-SI\"</code> (from Schema.org)</li> <li><code>\"doid-atomic-SI\"</code> (from DOID)</li> <li><code>\"foodon-atomic-SI\"</code>, <code>\"foodon-complex-SI\"</code> (from FoodOn)</li> <li><code>\"go-atomic-SI\"</code>, <code>\"go-complex-SI\"</code> (from Go)</li> </ul> <p>After loading the dataset, a particular data split can be accessed by:</p> <pre><code>dataset[split_name]  # split_name = \"train\", \"validation\", or \"test\"\n</code></pre> <p>Please refer to the Huggingface page for examples of data points and explanations of data fields.</p> <p>If downloading from Zenodo, users can simply target on specific <code>.jsonl</code> files.</p>"},{"location":"ontolama/#prompt-based-probing","title":"Prompt-based Probing","text":"<p>\\(\\textsf{OntoLAMA}\\) adopts the prompt-based probing approach to examine an LM's knowledge. Specifically, it wraps the verbalised sub-concept and super-concept into a template with a masked position; the LM is expected to predict the masked token and determine whether there exists a subsumption relationship between the two concepts.</p> <p>The verbalisation algorithm has been implemented as a separate ontology processing module, see  verbalise ontology concepts.</p> <p>To conduct probing, users can write the following code into a script, e.g., <code>probing.py</code>:</p> <pre><code>from openprompt.config import get_config\nfrom deeponto.complete.ontolama import run_inference\n\nconfig, args = get_config()\n# you can then manipulate the configuration before running the inference\nconfig.learning_setting = \"few_shot\"  # zero_shot, full\nconfig.manual_template.choice = 0  # using the first template in the template file\n...\n\n# run the subsumption inference\nrun_inference(config, args)\n</code></pre> <p>Then, run the script with the following command:</p> <pre><code>python probing.py --config_yaml config.yaml\n</code></pre> <p>See an example of <code>config.yaml</code> at <code>DeepOnto/scripts/ontolama/config.yaml</code></p> <p>The template file for the SI task (two templates) is located in <code>DeepOnto/scripts/ontolama/si_templates.txt</code>.</p> <p>The template file for the biMNLI task (two templates) is located in <code>DeepOnto/scripts/ontolama/nli_templates.txt</code>.</p> <p>The label word file for both SI and biMNLI tasks is located in <code>DeepOnto/scripts/ontolama/label_words.jsonl</code>.</p>"},{"location":"ontology/","title":"Basic Usage of Ontology","text":"<p>\\(\\textsf{DeepOnto}\\) extends from the OWLAPI and implements many useful methods for ontology processing and reasoning, integrated in the base class <code>Ontology</code>.</p> <p>This page gives typical examples of how to use <code>Ontology</code>. There are other more specific usages, please refer to the documentation by clicking <code>Ontology</code>.</p>"},{"location":"ontology/#loading-ontology","title":"Loading Ontology","text":"<p><code>Ontology</code> can be easily loaded from a local ontology file by its path:</p> <pre><code>from deeponto.onto import Ontology\n</code></pre> <p>Importing <code>Ontology</code> will require JVM memory allocation (defaults to <code>8g</code>; if <code>nohup</code> is used to run the program in the backend, use <code>nohup echo \"8g\" | python command</code>):</p> <pre><code>Please enter the maximum memory located to JVM: [8g]: 16g\n\n16g maximum memory allocated to JVM.\nJVM started successfully.\n</code></pre> <p>Loading an ontology from a local file:</p> <pre><code>onto = Ontology(\"path_to_ontology.owl\")\n</code></pre> <p>It also possible to choose a reasoner to be used:</p> <pre><code>onto = Ontology(\"path_to_ontology.owl\", \"hermit\")\n</code></pre> <p>Tip<p>For faster (but incomplete) reasoning over larger ontologies, choose a reasoner like <code>\"elk\"</code>.</p> </p>"},{"location":"ontology/#acessing-ontology-entities","title":"Acessing Ontology Entities","text":"<p>The most fundamental feature of <code>Ontology</code> is to access entities in the ontology such as classes (or concepts) and properties (object, data, and annotation properties). To get an entity by its IRI, do the following:</p> <pre><code>from deeponto.onto import Ontology\n# e.g., load the disease ontology\ndoid = Ontology(\"doid.owl\")\n# class or property IRI as input\ndoid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\")\n</code></pre> <p>To get the asserted parents or children of a given class or property, do the following:</p> <pre><code>doid.get_asserted_parents(doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"))\ndoid.get_asserted_children(doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"))\n</code></pre> <p>To obtain the literal values (as <code>Set[str]</code>) of an annotation property (such as \\(\\texttt{rdfs:label}\\)) for an entity:</p> <pre><code># note that annotations with no language tags are deemed as in English (\"en\")\ndoid.get_annotations(\n    doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"),\n    annotation_property_iri='http://www.w3.org/2000/01/rdf-schema#label',\n    annotation_language_tag=None,\n    apply_lowercasing=False,\n    normalise_identifiers=False\n)\n</code></pre> <code>Output:</code> <pre><code>{'carotenemia'}\n</code></pre> <p>To get the special entities related to top (\\(\\top\\)) and bottom (\\(\\bot\\)), for example, to get \\(\\texttt{owl:Thing}\\):</p> <pre><code>doid.OWLThing\n</code></pre>"},{"location":"ontology/#ontology-reasoning","title":"Ontology Reasoning","text":"<p><code>Ontology</code> has an important attribute <code>.reasoner</code> for conducting reasoning activities. Currently, two types of reasoners are supported, i.e., HermitT and ELK.</p>"},{"location":"ontology/#inferring-super-and-sub-entities","title":"Inferring Super- and Sub-Entities","text":"<p>To get the super-entities (a super-class, or a super-propety) of an entity, do the following:</p> <pre><code>doid_class = doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\")\ndoid.reasoner.get_inferred_super_entities(doid_class, direct=False) \n</code></pre> <code>Output:</code> <pre><code>['http://purl.obolibrary.org/obo/DOID_0014667',\n'http://purl.obolibrary.org/obo/DOID_0060158',\n'http://purl.obolibrary.org/obo/DOID_4']\n</code></pre> <p>The outputs are IRIs of the corresponding super-entities. <code>direct</code> is a boolean value indicating whether the returned entities are parents (<code>direct=True</code>) or ancestors (<code>direct=False</code>).</p> <p>To get the sub-entities, simply replace the method name with <code>sub_entities_of</code>.</p>"},{"location":"ontology/#inferring-class-instances","title":"Inferring Class Instances","text":"<p>To retrieve the entailed instances of a class:</p> <pre><code>doid.reasoner.instances_of(doid_class)\n</code></pre>"},{"location":"ontology/#checking-entailment","title":"Checking Entailment","text":"<p>The implemented reasoner also supports several entailment checks for subsumption, disjointness, and so on. For example:</p> <pre><code>doid.reasoner.check_subsumption(doid_potential_sub_entity, doid_potential_super_entity)\n</code></pre>"},{"location":"ontology/#feature-requests","title":"Feature Requests","text":"<p>Should you have any feature requests (such as those commonly used in the OWLAPI), please raise a ticket in the \\(\\textsf{DeepOnto}\\) GitHub repository.</p>"},{"location":"verbaliser/","title":"Verbalise Ontology Concepts","text":"<p>Verbalising concept expressions is very useful for models that take textual inputs. While the named concepts can be verbalised simply using their names (or labels), complex concepts that involve logical operators require a more sophisticated algorithm. In \\(\\textsf{DeepOnto}\\), we have implemented the recursive concept verbaliser originally proposed in the OntoLAMA paper to address the need.</p> <p>Paper</p> <p>The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).</p> <pre><code>@inproceedings{he-etal-2023-language,\n    title = \"Language Model Analysis for Ontology Subsumption Inference\",\n    author = \"He, Yuan  and\n    Chen, Jiaoyan  and\n    Jimenez-Ruiz, Ernesto  and\n    Dong, Hang  and\n    Horrocks, Ian\",\n    booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n    month = jul,\n    year = \"2023\",\n    address = \"Toronto, Canada\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.findings-acl.213\",\n    doi = \"10.18653/v1/2023.findings-acl.213\",\n    pages = \"3439--3453\"\n}\n</code></pre> <p>This rule-based verbaliser (found in <code>OntologyVerbaliser</code>) first parses a complex concept expression into a sub-formula tree (with <code>OntologySyntaxParser</code>). Each intermediate node within the tree represents the decomposition of a specific logical operator, while the leaf nodes are named concepts or properties.  The verbaliser then recursively merges the verbalisations in a bottom-to-top manner, creating the overall textual representation of the complex concept. An example of this process is shown in the following figure:</p> <p></p> <p> <p>Figure 1. Verbalising a complex concept recursively. </p> </p> <p></p> <p>To use the verbaliser, do the following:</p> <pre><code>from deeponto.onto import Ontology, OntologyVerbaliser\n\n# load an ontology and init the verbaliser\nonto = Ontology(\"some_ontology_file.owl\")\nverbaliser = OntologyVerbaliser(onto)\n</code></pre> <p>To verbalise a complex concept expression:</p> <pre><code># get complex concepts asserted in the ontology\ncomplex_concepts = list(onto.get_asserted_complex_classes())\n\n# verbalise the first complex concept\nv_concept = verbaliser.verbalise_class_expression(complex_concepts[0])\n</code></pre> <p>To verbaliser a class subsumption axiom:</p> <pre><code># get subsumption axioms from the ontology\nsubsumption_axioms = onto.get_subsumption_axioms(entity_type=\"Classes\")\n\n# verbalise the first subsumption axiom\nv_sub, v_super = verbaliser.verbalise_class_subsumption_axiom(subsumption_axioms[0])\n</code></pre> <p>Tip<p>The concept verbaliser is under development to incorporate the parsing of various axiom types. Please check the existing functions of <code>OntologyVerbaliser</code> for specific usage.</p> </p> <p>Notice that the verbalised result is a <code>CfgNode</code> object which keeps track of the recursive process. Users can access the final verbalisation by:</p> <pre><code>result.verbal\n</code></pre> <p>Users can also manually update the vocabulary for named entities by:</p> <pre><code>verbaliser.update_entity_name(entity_iri, entity_name)\n</code></pre> <p>This is useful when the entity labels are not naturally fitted into the verbalised sentence.</p> <p>Moreover, users can see the parsed sub-formula tree using:</p> <pre><code>tree = verbaliser.parser.parse(str(subsumption_axioms[0]))\ntree.render_image()\n</code></pre> <p>Note that rendering the image requires <code>graphiviz</code> to be installed. Check this link for installing <code>graphiviz</code>.</p> <p>See an example with image at <code>OntologySyntaxParser</code>.</p>"},{"location":"deeponto/align/evaluation/","title":"Evaluation","text":""},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator","title":"<code>AlignmentEvaluator()</code>","text":"<p>Class that provides evaluation metrics for alignment.</p> Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>def __init__(self):\n    pass\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.precision","title":"<code>precision(prediction_mappings, reference_mappings)</code>  <code>staticmethod</code>","text":"<p>The percentage of correct predictions.</p> \\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}\\] Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef precision(prediction_mappings: List[EntityMapping], reference_mappings: List[ReferenceMapping]) -&gt; float:\nr\"\"\"The percentage of correct predictions.\n\n    $$P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}$$\n    \"\"\"\n    preds = [p.to_tuple() for p in prediction_mappings]\n    refs = [r.to_tuple() for r in reference_mappings]\n    return len(set(preds).intersection(set(refs))) / len(set(preds))\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.recall","title":"<code>recall(prediction_mappings, reference_mappings)</code>  <code>staticmethod</code>","text":"<p>The percentage of correct retrievals.</p> \\[R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}\\] Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef recall(prediction_mappings: List[EntityMapping], reference_mappings: List[ReferenceMapping]) -&gt; float:\nr\"\"\"The percentage of correct retrievals.\n\n    $$R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}$$\n    \"\"\"\n    preds = [p.to_tuple() for p in prediction_mappings]\n    refs = [r.to_tuple() for r in reference_mappings]\n    return len(set(preds).intersection(set(refs))) / len(set(refs))\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.f1","title":"<code>f1(prediction_mappings, reference_mappings, null_reference_mappings=[])</code>  <code>staticmethod</code>","text":"<p>Compute the F1 score given the prediction and reference mappings.</p> \\[F_1 = \\frac{2 P R}{P + R}\\] <p><code>null_reference_mappings</code> is an additional set whose elements should be ignored in the calculation, i.e., neither positive nor negative. Specifically, both \\(\\mathcal{M}_{pred}\\) and \\(\\mathcal{M}_{ref}\\) will substract \\(\\mathcal{M}_{null}\\) from them.</p> Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef f1(\n    prediction_mappings: List[EntityMapping],\n    reference_mappings: List[ReferenceMapping],\n    null_reference_mappings: List[ReferenceMapping] = [],\n):\nr\"\"\"Compute the F1 score given the prediction and reference mappings.\n\n    $$F_1 = \\frac{2 P R}{P + R}$$\n\n    `null_reference_mappings` is an additional set whose elements\n    should be **ignored** in the calculation, i.e., **neither positive nor negative**.\n    Specifically, both $\\mathcal{M}_{pred}$ and $\\mathcal{M}_{ref}$ will **substract**\n    $\\mathcal{M}_{null}$ from them.\n    \"\"\"\n    preds = [p.to_tuple() for p in prediction_mappings]\n    refs = [r.to_tuple() for r in reference_mappings]\n    null_refs = [n.to_tuple() for n in null_reference_mappings]\n    # elements in the {null_set} are removed from both {pred} and {ref} (ignored)\n    if null_refs:\n        preds = set(preds) - set(null_refs)\n        refs = set(refs) - set(null_refs)\n    P = len(set(preds).intersection(set(refs))) / len(set(preds))\n    R = len(set(preds).intersection(set(refs))) / len(set(refs))\n    F1 = 2 * P * R / (P + R)\n\n    return {\"P\": round(P, 3), \"R\": round(R, 3), \"F1\": round(F1, 3)}\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.hits_at_K","title":"<code>hits_at_K(reference_and_candidates, K)</code>  <code>staticmethod</code>","text":"<p>Compute \\(Hits@K\\) for a list of <code>(reference_mapping, candidate_mappings)</code> pair.</p> <p>It is computed as the number of a <code>reference_mapping</code> existed in the first \\(K\\) ranked <code>candidate_mappings</code>, divided by the total number of input pairs.</p> \\[Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N\\] Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef hits_at_K(reference_and_candidates: List[Tuple[ReferenceMapping, List[EntityMapping]]], K: int):\nr\"\"\"Compute $Hits@K$ for a list of `(reference_mapping, candidate_mappings)` pair.\n\n    It is computed as the number of a `reference_mapping` existed in the first $K$ ranked `candidate_mappings`,\n    divided by the total number of input pairs.\n\n    $$Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N$$\n    \"\"\"\n    n_hits = 0\n    for pred, cands in reference_and_candidates:\n        ordered_candidates = [c.to_tuple() for c in EntityMapping.sort_entity_mappings_by_score(cands, k=K)]\n        if pred.to_tuple() in ordered_candidates:\n            n_hits += 1\n    return n_hits / len(reference_and_candidates)\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank","title":"<code>mean_reciprocal_rank(reference_and_candidates)</code>  <code>staticmethod</code>","text":"<p>Compute \\(MRR\\) for a list of <code>(reference_mapping, candidate_mappings)</code> pair.</p> \\[MRR = \\sum_i^N rank_i^{-1} / N\\] Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef mean_reciprocal_rank(reference_and_candidates: List[Tuple[ReferenceMapping, List[EntityMapping]]]):\nr\"\"\"Compute $MRR$ for a list of `(reference_mapping, candidate_mappings)` pair.\n\n    $$MRR = \\sum_i^N rank_i^{-1} / N$$\n    \"\"\"\n    sum_inverted_ranks = 0\n    for pred, cands in reference_and_candidates:\n        ordered_candidates = [c.to_tuple() for c in EntityMapping.sort_entity_mappings_by_score(cands)]\n        if pred.to_tuple() in ordered_candidates:\n            rank = ordered_candidates.index(pred.to_tuple()) + 1\n        else:\n            rank = math.inf\n        sum_inverted_ranks += 1 / rank\n    return sum_inverted_ranks / len(reference_and_candidates)\n</code></pre>"},{"location":"deeponto/align/mapping/","title":"Mapping","text":""},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping","title":"<code>EntityMapping(src_entity_iri, tgt_entity_iri, relation=DEFAULT_REL, score=0.0)</code>","text":"<p>A datastructure for entity mapping.</p> <p>Such entities should be named and have an IRI.</p> <p>Attributes:</p> Name Type Description <code>src_entity_iri</code> <code>str</code> <p>The IRI of the source entity, usually its IRI if available.</p> <code>tgt_entity_iri</code> <code>str</code> <p>The IRI of the target entity, usually its IRI if available.</p> <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>score</code> <code>float</code> <p>The score that indicates the confidence of this mapping. Defaults to <code>0.0</code>.</p> <p>Parameters:</p> Name Type Description Default <code>src_entity_iri</code> <code>str</code> <p>The IRI of the source entity, usually its IRI if available.</p> required <code>tgt_entity_iri</code> <code>str</code> <p>The IRI of the target entity, usually its IRI if available.</p> required <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <code>score</code> <code>float</code> <p>The score that indicates the confidence of this mapping. Defaults to <code>0.0</code>.</p> <code>0.0</code> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def __init__(self, src_entity_iri: str, tgt_entity_iri: str, relation: str = DEFAULT_REL, score: float = 0.0):\n\"\"\"Intialise an entity mapping.\n\n    Args:\n        src_entity_iri (str): The IRI of the source entity, usually its IRI if available.\n        tgt_entity_iri (str): The IRI of the target entity, usually its IRI if available.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n        score (float, optional): The score that indicates the confidence of this mapping. Defaults to `0.0`.\n    \"\"\"\n    self.head = src_entity_iri\n    self.tail = tgt_entity_iri\n    self.relation = relation\n    self.score = score\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.from_owl_objects","title":"<code>from_owl_objects(src_entity, tgt_entity, relation=DEFAULT_REL, score=0.0)</code>  <code>classmethod</code>","text":"<p>Create an entity mapping from two <code>OWLObject</code> entities which have an IRI.</p> <p>Parameters:</p> Name Type Description Default <code>src_entity</code> <code>OWLObject</code> <p>The source entity in <code>OWLObject</code>.</p> required <code>tgt_entity</code> <code>OWLObject</code> <p>The target entity in <code>OWLObject</code>.</p> required <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <code>score</code> <code>float</code> <p>The score that indicates the confidence of this mapping. Defaults to <code>0.0</code>.</p> <code>0.0</code> <p>Returns:</p> Type Description <code>EntityMapping</code> <p>The entity mapping created from the source and target entities.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@classmethod\ndef from_owl_objects(\n    cls, src_entity: OWLObject, tgt_entity: OWLObject, relation: str = DEFAULT_REL, score: float = 0.0\n):\n\"\"\"Create an entity mapping from two `OWLObject` entities which have an IRI.\n\n    Args:\n        src_entity (OWLObject): The source entity in `OWLObject`.\n        tgt_entity (OWLObject): The target entity in `OWLObject`.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n        score (float, optional): The score that indicates the confidence of this mapping. Defaults to `0.0`.\n    Returns:\n        (EntityMapping): The entity mapping created from the source and target entities.\n    \"\"\"\n    return cls(str(src_entity.getIRI()), str(tgt_entity.getIRI()), relation, score)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.to_tuple","title":"<code>to_tuple(with_score=False)</code>","text":"<p>Transform an entity mapping (<code>self</code>) to a tuple representation</p> <p>Note that <code>relation</code> is discarded and <code>score</code> is optionally preserved).</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def to_tuple(self, with_score: bool = False):\n\"\"\"Transform an entity mapping (`self`) to a tuple representation\n\n    Note that `relation` is discarded and `score` is optionally preserved).\n    \"\"\"\n    if with_score:\n        return (self.head, self.tail, self.score)\n    else:\n        return (self.head, self.tail)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.as_tuples","title":"<code>as_tuples(entity_mappings, with_score=False)</code>  <code>staticmethod</code>","text":"<p>Transform a list of entity mappings to their tuple representations.</p> <p>Note that <code>relation</code> is discarded and <code>score</code> is optionally preserved).</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@staticmethod\ndef as_tuples(entity_mappings: List[EntityMapping], with_score: bool = False):\n\"\"\"Transform a list of entity mappings to their tuple representations.\n\n    Note that `relation` is discarded and `score` is optionally preserved).\n    \"\"\"\n    return [m.to_tuple(with_score=with_score) for m in entity_mappings]\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.sort_entity_mappings_by_score","title":"<code>sort_entity_mappings_by_score(entity_mappings, k=None)</code>  <code>staticmethod</code>","text":"<p>Sort the entity mappings in a list by their scores in descending order.</p> <p>Parameters:</p> Name Type Description Default <code>entity_mappings</code> <code>List[EntityMapping]</code> <p>A list entity mappings to sort.</p> required <code>k</code> <code>int</code> <p>The number of top \\(k\\) scored entities preserved if specified. Defaults to <code>None</code> which means to return all entity mappings.</p> <code>None</code> <p>Returns:</p> Type Description <code>List[EntityMapping]</code> <p>A list of sorted entity mappings.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@staticmethod\ndef sort_entity_mappings_by_score(entity_mappings: List[EntityMapping], k: Optional[int] = None):\nr\"\"\"Sort the entity mappings in a list by their scores in descending order.\n\n    Args:\n        entity_mappings (List[EntityMapping]): A list entity mappings to sort.\n        k (int, optional): The number of top $k$ scored entities preserved if specified. Defaults to `None` which\n            means to return **all** entity mappings.\n\n    Returns:\n        (List[EntityMapping]): A list of sorted entity mappings.\n    \"\"\"\n    return list(sorted(entity_mappings, key=lambda x: x.score, reverse=True))[:k]\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.read_table_mappings","title":"<code>read_table_mappings(table_of_mappings_file, threshold=None, relation=DEFAULT_REL, is_reference=False)</code>  <code>staticmethod</code>","text":"<p>Read entity mappings from <code>.csv</code> or <code>.tsv</code> files.</p> <p>Mapping Table Format</p> <p>The columns of the mapping table must have the headings: <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code>.</p> <p>Parameters:</p> Name Type Description Default <code>table_of_mappings_file</code> <code>str</code> <p>The path to the table (<code>.csv</code> or <code>.tsv</code>) of mappings.</p> required <code>threshold</code> <code>Optional[float]</code> <p>Mappings with scores less than <code>threshold</code> will not be loaded. Defaults to 0.0.</p> <code>None</code> <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <code>is_reference</code> <code>bool</code> <p>Whether the loaded mappings are reference mappigns; if so, <code>threshold</code> is disabled and mapping scores are all set to \\(1.0\\). Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>List[EntityMapping]</code> <p>A list of entity mappings loaded from the table file.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@staticmethod\ndef read_table_mappings(\n    table_of_mappings_file: str,\n    threshold: Optional[float] = None,\n    relation: str = DEFAULT_REL,\n    is_reference: bool = False,\n) -&gt; List[EntityMapping]:\nr\"\"\"Read entity mappings from `.csv` or `.tsv` files.\n\n    !!! note \"Mapping Table Format\"\n\n        The columns of the mapping table must have the headings: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`.\n\n    Args:\n        table_of_mappings_file (str): The path to the table (`.csv` or `.tsv`) of mappings.\n        threshold (Optional[float], optional): Mappings with scores less than `threshold` will not be loaded. Defaults to 0.0.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n        is_reference (bool): Whether the loaded mappings are reference mappigns; if so, `threshold` is disabled and mapping scores\n            are all set to $1.0$. Defaults to `False`.\n\n    Returns:\n        (List[EntityMapping]): A list of entity mappings loaded from the table file.\n    \"\"\"\n    df = read_table(table_of_mappings_file)\n    entity_mappings = []\n    for dp in df.itertuples():\n        if is_reference:\n            entity_mappings.append(ReferenceMapping(dp.SrcEntity, dp.TgtEntity, relation))\n        else:\n            # allow `None` for threshold\n            if not threshold or dp[\"Score\"] &gt;= threshold:\n                entity_mappings.append(EntityMapping(dp.SrcEntity, dp.TgtEntity, relation, dp.Score))\n    return entity_mappings\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping","title":"<code>ReferenceMapping(src_entity_iri, tgt_entity_iri, relation=DEFAULT_REL, candidate_mappings=[])</code>","text":"<p>             Bases: <code>EntityMapping</code></p> <p>A datastructure for entity mapping that acts as a reference mapping.</p> <p>A reference mapppings is a ground truth entity mapping (with \\(score = 1.0\\)) and can have several entity mappings as candidates. These candidate mappings should have the same <code>head</code> (i.e., source entity) as the reference mapping.</p> <p>Attributes:</p> Name Type Description <code>src_entity_iri</code> <code>str</code> <p>The IRI of the source entity, usually its IRI if available.</p> <code>tgt_entity_iri</code> <code>str</code> <p>The IRI of the target entity, usually its IRI if available.</p> <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <p>Parameters:</p> Name Type Description Default <code>src_entity_iri</code> <code>str</code> <p>The IRI of the source entity, usually its IRI if available.</p> required <code>tgt_entity_iri</code> <code>str</code> <p>The IRI of the target entity, usually its IRI if available.</p> required <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <code>candidate_mappings</code> <code>List[EntityMapping]</code> <p>A list of entity mappings that are candidates for this reference mapping. Defaults to <code>[]</code>.</p> <code>[]</code> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def __init__(\n    self,\n    src_entity_iri: str,\n    tgt_entity_iri: str,\n    relation: str = DEFAULT_REL,\n    candidate_mappings: Optional[List[EntityMapping]] = [],\n):\nr\"\"\"Intialise a reference mapping.\n\n    Args:\n        src_entity_iri (str): The IRI of the source entity, usually its IRI if available.\n        tgt_entity_iri (str): The IRI of the target entity, usually its IRI if available.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n        candidate_mappings (List[EntityMapping], optional): A list of entity mappings that are candidates for this reference mapping. Defaults to `[]`.\n    \"\"\"\n    super().__init__(src_entity_iri, tgt_entity_iri, relation, 1.0)\n    self.candidates = []\n    for candidate in candidate_mappings:\n        self.add_candidate(candidate)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping.add_candidate","title":"<code>add_candidate(candidate_mapping)</code>","text":"<p>Add a candidate mapping whose relation and head entity are the same as the reference mapping's.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def add_candidate(self, candidate_mapping: EntityMapping):\n\"\"\"Add a candidate mapping whose relation and head entity are the\n    same as the reference mapping's.\n    \"\"\"\n    if self.relation != candidate_mapping.relation:\n        raise ValueError(\n            f\"Expect relation of candidate mapping to be {self.relation} but got {candidate_mapping.relation}\"\n        )\n    if self.head != candidate_mapping.head:\n        raise ValueError(\"Candidate mapping does not have the same head entity as the anchor mapping.\")\n    self.candidates.append(candidate_mapping)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping.read_table_mappings","title":"<code>read_table_mappings(table_of_mappings_file, relation=DEFAULT_REL)</code>  <code>staticmethod</code>","text":"<p>Read reference mappings from <code>.csv</code> or <code>.tsv</code> files.</p> <p>Mapping Table Format</p> <p>The columns of the mapping table must have the headings: <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code>.</p> <p>Parameters:</p> Name Type Description Default <code>table_of_mappings_file</code> <code>str</code> <p>The path to the table (<code>.csv</code> or <code>.tsv</code>) of mappings.</p> required <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <p>Returns:</p> Type Description <code>List[ReferenceMapping]</code> <p>A list of reference mappings loaded from the table file.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@staticmethod\ndef read_table_mappings(table_of_mappings_file: str, relation: str = DEFAULT_REL):\nr\"\"\"Read reference mappings from `.csv` or `.tsv` files.\n\n    !!! note \"Mapping Table Format\"\n\n        The columns of the mapping table must have the headings: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`.\n\n    Args:\n        table_of_mappings_file (str): The path to the table (`.csv` or `.tsv`) of mappings.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n\n    Returns:\n        (List[ReferenceMapping]): A list of reference mappings loaded from the table file.\n    \"\"\"\n    return EntityMapping.read_table_mappings(table_of_mappings_file, relation=relation, is_reference=True)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator","title":"<code>SubsFromEquivMappingGenerator(src_onto, tgt_onto, equiv_mappings, subs_generation_ratio=None, delete_used_equiv_tgt_class=True)</code>","text":"<p>Generating subsumption mappings from gold standard equivalence mappings.</p> <p>paper</p> <p>The online subsumption mapping construction algorithm is proposed in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).</p> <p>This generator has an attribute <code>delete_used_equiv_tgt_class</code> for determining whether or not to sabotage the equivalence mappings used to create \\(\\geq 1\\) subsumption mappings. The reason is that, if the equivalence mapping is broken, then the OM tool is expected to predict subsumption mappings directly without relying on the equivalence mappings as an intermediate.</p> <p>Attributes:</p> Name Type Description <code>src_onto</code> <code>Ontology</code> <p>The source ontology.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology.</p> <code>equiv_class_pairs</code> <code>List[Tuple[str, str]]</code> <p>A list of class pairs (in IRIs) that are equivalent according to the input equivalence mappings.</p> <code>subs_generation_ratio</code> <code>int</code> <p>The maximum number of subsumption mappings generated from each equivalence mapping. Defaults to <code>None</code> which means there is no limit on the number of subsumption mappings.</p> <code>delete_used_equiv_tgt_class</code> <code>bool</code> <p>Whether to mark the target side of an equivalence mapping used for creating at least one subsumption mappings as \"deleted\". Defaults to <code>True</code>.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def __init__(\n    self,\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    equiv_mappings: List[ReferenceMapping],\n    subs_generation_ratio: Optional[int] = None,\n    delete_used_equiv_tgt_class: bool = True,\n):\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    self.equiv_class_pairs = [m.to_tuple() for m in equiv_mappings]\n    self.subs_generation_ratio = subs_generation_ratio\n    self.delete_used_equiv_tgt_class = delete_used_equiv_tgt_class\n\n    subs_from_equivs, self.used_equiv_tgt_class_iris = self.online_construction()\n    # turn into triples with scores 1.0\n    self.subs_from_equivs = [(c, p, 1.0) for c, p in subs_from_equivs]\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator.online_construction","title":"<code>online_construction()</code>","text":"<p>An online algorithm for constructing subsumption mappings from gold standard equivalence mappings.</p> <p>Let \\(t\\) denote the boolean value that indicates if the target class involved in an equivalence mapping will be deleted. If \\(t\\) is true, then for each equivalent class pair \\((c, c')\\), do the following:</p> <ol> <li>If \\(c'\\) has been inolved in a subsumption mapping, skip this pair as otherwise \\(c'\\) will need to be deleted.</li> <li>For each parent class of \\(c'\\), skip it if it has been marked deleted (i.e., involved in an equivalence mapping that has been used to create a subsumption mapping).</li> <li>If any subsumption mapping has been created from \\((c, c')\\), mark \\(c'\\) as deleted.</li> </ol> <p>Steps 1 and 2 ensure that target classes that have been involved in a subsumption mapping have no conflicts with target classes that have been used to create a subsumption mapping.</p> <p>This algorithm is online because the construction and deletion depend on the order of the input equivalent class pairs.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def online_construction(self):\nr\"\"\"An **online** algorithm for constructing subsumption mappings from gold standard equivalence mappings.\n\n    Let $t$ denote the boolean value that indicates if the target class involved in an equivalence mapping\n    will be deleted. If $t$ is true, then for each equivalent class pair $(c, c')$, do the following:\n\n    1. If $c'$ has been inolved in a subsumption mapping, skip this pair as otherwise $c'$ will need to be deleted.\n    2. For each parent class of $c'$, skip it if it has been marked deleted (i.e., involved in an equivalence mapping that has been used to create a subsumption mapping).\n    3. If any subsumption mapping has been created from $(c, c')$, mark $c'$ as deleted.\n\n    Steps 1 and 2 ensure that target classes that have been **involved in a subsumption mapping** have **no conflicts** with\n    target classes that have been **used to create a subsumption mapping**.\n\n    This algorithm is *online* because the construction and deletion depend on the order of the input equivalent class pairs.\n    \"\"\"\n    subs_class_pairs = []\n    in_subs = defaultdict(lambda: False)  # in a subsumption mapping\n    used_equivs = defaultdict(lambda: False)  # in a used equivalence mapping\n\n    for src_class_iri, tgt_class_iri in self.equiv_class_pairs:\n\n        cur_subs_pairs = []\n\n        # NOTE (1) an equiv pair is skipped if the target side is marked constructed\n        if self.delete_used_equiv_tgt_class and in_subs[tgt_class_iri]:\n            continue\n\n        # construct subsumption pairs by matching the source class and the target class's parents\n        tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n        # tgt_class_parent_iris = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class, direct=True)\n        tgt_class_parent_iris = [str(p.getIRI()) for p in self.tgt_onto.get_asserted_parents(tgt_class, named_only=True)]\n        for parent_iri in tgt_class_parent_iris:\n            # skip this parent if it is marked as \"used\"\n            if self.delete_used_equiv_tgt_class and used_equivs[parent_iri]:\n                continue\n            cur_subs_pairs.append((src_class_iri, parent_iri))\n            # if successfully created, mark this parent as \"in\"\n            if self.delete_used_equiv_tgt_class:\n                in_subs[parent_iri] = True\n\n        # mark the target class as \"used\" because it has been used for creating a subsumption mapping\n        if self.delete_used_equiv_tgt_class and cur_subs_pairs:\n            used_equivs[tgt_class_iri] = True\n\n        if self.subs_generation_ratio and len(cur_subs_pairs) &gt; self.subs_generation_ratio:\n            cur_subs_pairs = random.sample(cur_subs_pairs, self.subs_generation_ratio)\n        subs_class_pairs += cur_subs_pairs\n\n    used_equiv_tgt_class_iris = None\n    if self.delete_used_equiv_tgt_class:\n        used_equiv_tgt_class_iris = [iri for iri, used in used_equivs.items() if used is True]\n        logger.info(\n            f\"{len(used_equiv_tgt_class_iris)}/{len(self.equiv_class_pairs)} are used for creating at least one subsumption mapping.\"\n        )\n\n    subs_class_pairs = uniqify(subs_class_pairs)\n    logger.info(f\"{len(subs_class_pairs)} subsumption mappings are created in the end.\")\n\n    return subs_class_pairs, used_equiv_tgt_class_iris\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator.save_subs","title":"<code>save_subs(save_path)</code>","text":"<p>Save the constructed subsumption mappings (in tuples) to a local <code>.tsv</code> file.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def save_subs(self, save_path: str):\n\"\"\"Save the constructed subsumption mappings (in tuples) to a local `.tsv` file.\"\"\"\n    subs_df = pd.DataFrame(self.subs_from_equivs, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n    subs_df.to_csv(save_path, sep=\"\\t\", index=False)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator","title":"<code>NegativeCandidateMappingGenerator(src_onto, tgt_onto, reference_class_mappings, annotation_property_iris, tokenizer, max_hops=5, for_subsumption=False)</code>","text":"<p>Generating negative candidate mappings for each gold standard mapping.</p> <p>Note that the source side of the golden standard mapping is fixed, i.e., candidate mappings are generated according to the target side.</p> <p>paper</p> <p>The candidate mapping generation algorithm is proposed in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def __init__(\n    self,\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    reference_class_mappings: List[ReferenceMapping],  # equivalence or subsumption\n    annotation_property_iris: List[str],  # for text-based candidates\n    tokenizer: Tokenizer,  # for text-based candidates\n    max_hops: int = 5,  # for graph-based candidates\n    for_subsumption: bool = False,  # if for subsumption, avoid adding ancestors as candidates\n):\n\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    self.reference_class_mappings = reference_class_mappings\n    self.reference_class_dict = defaultdict(list)  # to prevent wrongly adding negative candidates\n    for m in self.reference_class_mappings:\n        src_class_iri, tgt_class_iri = m.to_tuple()\n        self.reference_class_dict[src_class_iri].append(tgt_class_iri)\n\n    # for IDF sample\n    self.tgt_annotation_index, self.annotation_property_iris = self.tgt_onto.build_annotation_index(\n        annotation_property_iris, apply_lowercasing=True\n    )\n    self.tokenizer = tokenizer\n    self.tgt_inverted_annotation_index = self.tgt_onto.build_inverted_annotation_index(\n        self.tgt_annotation_index, self.tokenizer\n    )\n\n    # for neighbour sample\n    self.max_hops = max_hops\n\n    # if for subsumption, avoid adding ancestors as candidates\n    self.for_subsumption = for_subsumption\n    # if for subsumption, add (src_class, tgt_class_ancestor) into the reference mappings\n    if self.for_subsumption:\n        for m in self.reference_class_mappings:\n            src_class_iri, tgt_class_iri = m.to_tuple()\n            tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n            tgt_class_ancestors = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class)\n            for tgt_ancestor_iri in tgt_class_ancestors:\n                self.reference_class_dict[src_class_iri].append(tgt_ancestor_iri)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.mixed_sample","title":"<code>mixed_sample(reference_class_mapping, **strategy2nums)</code>","text":"<p>A mixed sampling approach that combines several sampling strategies.</p> <p>As introduced in the Bio-ML paper, this mixed approach guarantees that the number of samples for each strategy is either the maximum that can be sampled or the required number.</p> <p>Specifically, at each sampling iteration, the number of candidates is first increased by the number of  previously sampled candidates, as in the worst case, all the candidates sampled at this iteration will be duplicated with the previous. </p> <p>The random sampling is used as the amending strategy, i.e., if other sampling strategies cannot retrieve the specified number of samples, then use random sampling to amend the number.</p> <p>Parameters:</p> Name Type Description Default <code>reference_class_mapping</code> <code>ReferenceMapping</code> <p>The reference class mapping for generating the candidate mappings.</p> required <code>**strategy2nums</code> <code>int</code> <p>The keyword arguments that specify the expected number of candidates for each sampling strategy.</p> <code>{}</code> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def mixed_sample(self, reference_class_mapping: ReferenceMapping, **strategy2nums):\n\"\"\"A mixed sampling approach that combines several sampling strategies.\n\n    As introduced in the Bio-ML paper, this mixed approach guarantees that the number of samples for each\n    strategy is either the **maximum that can be sampled** or the required number.\n\n    Specifically, at each sampling iteration, the number of candidates is **first increased by the number of \n    previously sampled candidates**, as in the worst case, all the candidates sampled at this iteration\n    will be duplicated with the previous. \n\n    The random sampling is used as the amending strategy, i.e., if other sampling strategies cannot retrieve\n    the specified number of samples, then use random sampling to amend the number.\n\n    Args:\n        reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n        **strategy2nums (int): The keyword arguments that specify the expected number of candidates for each\n            sampling strategy.\n    \"\"\"\n\n    valid_tgt_candidate_iris = []\n    sample_stats = defaultdict(lambda: 0)\n    i = 0\n    total_num_candidates = 0\n    for strategy, num_canddiates in strategy2nums.items():\n        i += 1\n        if strategy in SAMPLING_OPTIONS:\n            sampler = getattr(self, f\"{strategy}_sample\")\n            # for ith iteration, the worst case is when all n_cands are duplicated\n            # or should be excluded from other reference targets so we generate\n            # NOTE:  total_num_candidates + num_candidates + len(excluded_tgt_class_iris)\n            # candidates first and prune the rest; another edge case is when sampled\n            # candidates are not sufficient and we use random sample to meet n_cands\n            cur_valid_tgt_candidate_iris = sampler(\n                reference_class_mapping, total_num_candidates + num_canddiates\n            )\n            # remove the duplicated candidates (and excluded refs) and prune the tail\n            cur_valid_tgt_candidate_iris = list(\n                set(cur_valid_tgt_candidate_iris) - set(valid_tgt_candidate_iris)\n            )[:num_canddiates]\n            sample_stats[strategy] += len(cur_valid_tgt_candidate_iris)\n            # use random samples for complementation if not enough\n            while len(cur_valid_tgt_candidate_iris) &lt; num_canddiates:\n                amend_candidate_iris = self.random_sample(\n                    reference_class_mapping, num_canddiates - len(cur_valid_tgt_candidate_iris)\n                )\n                amend_candidate_iris = list(\n                    set(amend_candidate_iris)\n                    - set(valid_tgt_candidate_iris)\n                    - set(cur_valid_tgt_candidate_iris)\n                )\n                cur_valid_tgt_candidate_iris += amend_candidate_iris\n            assert len(cur_valid_tgt_candidate_iris) == num_canddiates\n            # record how many random samples to amend\n            if strategy != \"random\":\n                sample_stats[\"random\"] += num_canddiates - sample_stats[strategy]\n            valid_tgt_candidate_iris += cur_valid_tgt_candidate_iris\n            total_num_candidates += num_canddiates\n        else:\n            raise ValueError(f\"Invalid sampling trategy: {strategy}.\")\n    assert len(valid_tgt_candidate_iris) == total_num_candidates\n\n    # TODO: add the candidate mappings into the reference mapping \n\n    return valid_tgt_candidate_iris, sample_stats\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.random_sample","title":"<code>random_sample(reference_class_mapping, num_candidates)</code>","text":"<p>Randomly sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\).</p> <p>The sampled candidate classes will be combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).</p> <p>Parameters:</p> Name Type Description Default <code>reference_class_mapping</code> <code>ReferenceMapping</code> <p>The reference class mapping for generating the candidate mappings.</p> required <code>num_candidates</code> <code>int</code> <p>The expected number of candidate mappings to generate.</p> required Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def random_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"**Randomly** sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$.\n\n    The sampled candidate classes will be combined with the source reference class $c$ to get a set of\n    candidate mappings $\\{(c, c'_{cand})\\}$.\n\n    Args:\n        reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n        num_candidates (int): The expected number of candidate mappings to generate.\n    \"\"\"\n    ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n    all_tgt_class_iris = set(self.tgt_onto.owl_classes.keys())\n    valid_tgt_class_iris = all_tgt_class_iris - set(\n        self.reference_class_dict[ref_src_class_iri]\n    )  # exclude gold standards\n    assert not ref_tgt_class_iri in valid_tgt_class_iris\n    return random.sample(valid_tgt_class_iris, num_candidates)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.idf_sample","title":"<code>idf_sample(reference_class_mapping, num_candidates)</code>","text":"<p>Sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\) based on the \\(idf\\) scores w.r.t. the inverted annotation index (sub-word level).</p> <p>Candidate classes with higher \\(idf\\) scores will be considered first, and then combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).</p> <p>Parameters:</p> Name Type Description Default <code>reference_class_mapping</code> <code>ReferenceMapping</code> <p>The reference class mapping for generating the candidate mappings.</p> required <code>num_candidates</code> <code>int</code> <p>The expected number of candidate mappings to generate.</p> required Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def idf_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"Sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$ based on the $idf$ scores\n    w.r.t. the inverted annotation index (sub-word level).\n\n    Candidate classes with higher $idf$ scores will be considered first, and then combined with the source reference class $c$\n    to get a set of candidate mappings $\\{(c, c'_{cand})\\}$.\n\n    Args:\n        reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n        num_candidates (int): The expected number of candidate mappings to generate.\n    \"\"\"\n    ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n\n    tgt_candidates = self.tgt_inverted_annotation_index.idf_select(\n        self.tgt_annotation_index[ref_tgt_class_iri]\n    )  # select all non-trivial candidates first\n    valid_tgt_class_iris = []\n    for tgt_candidate_iri, _ in tgt_candidates:\n        # valid as long as it is not one of the reference target\n        if tgt_candidate_iri not in self.reference_class_dict[ref_src_class_iri]:\n            valid_tgt_class_iris.append(tgt_candidate_iri)\n        if len(valid_tgt_class_iris) == num_candidates:\n            break\n    assert not ref_tgt_class_iri in valid_tgt_class_iris\n    return valid_tgt_class_iris\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.neighbour_sample","title":"<code>neighbour_sample(reference_class_mapping, num_candidates)</code>","text":"<p>Sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\) based on the subsumption hierarchy.</p> <p>Define one-hop as one edge derived from an asserted subsumption axiom, i.e., to the parent class or the child class. Candidates classes with nearer hops will be considered first, and then combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).</p> <p>Parameters:</p> Name Type Description Default <code>reference_class_mapping</code> <code>ReferenceMapping</code> <p>The reference class mapping for generating the candidate mappings.</p> required <code>num_candidates</code> <code>int</code> <p>The expected number of candidate mappings to generate.</p> required Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def neighbour_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"Sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$ based on the **subsumption\n    hierarchy**.\n\n    Define one-hop as one edge derived from an **asserted** subsumption axiom, i.e., to the parent class or the child class.\n    Candidates classes with nearer hops will be considered first, and then combined with the source reference class $c$\n    to get a set of candidate mappings $\\{(c, c'_{cand})\\}$.\n\n    Args:\n        reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n        num_candidates (int): The expected number of candidate mappings to generate.\n    \"\"\"\n    ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n\n    valid_tgt_class_iris = set()\n    cur_hop = 1\n    frontier = [ref_tgt_class_iri]\n    # extract from the nearest neighbours until enough candidates or max hop\n    while len(valid_tgt_class_iris) &lt; num_candidates and cur_hop &lt;= self.max_hops:\n\n        neighbours_of_cur_hop = []\n        for tgt_class_iri in frontier:\n            tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n            parents = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class, direct=True)\n            children = self.tgt_onto.reasoner.get_inferred_sub_entities(tgt_class, direct=True)\n            neighbours_of_cur_hop += parents + children  # used for further hop expansion\n\n        valid_neighbours_of_cur_hop = set(neighbours_of_cur_hop) - set(self.reference_class_dict[ref_src_class_iri])\n        # print(valid_neighbours_of_cur_hop)\n\n        # NOTE if by adding neighbours of current hop the require number will be met\n        # we randomly pick among them\n        if len(valid_neighbours_of_cur_hop) &gt; num_candidates - len(valid_tgt_class_iris):\n            valid_neighbours_of_cur_hop = random.sample(\n                valid_neighbours_of_cur_hop, num_candidates - len(valid_tgt_class_iris)\n            )\n        valid_tgt_class_iris.update(valid_neighbours_of_cur_hop)\n\n        frontier = neighbours_of_cur_hop  # update the frontier with all possible neighbors\n        cur_hop += 1\n\n    assert not ref_tgt_class_iri in valid_tgt_class_iris\n    return list(valid_tgt_class_iris)\n</code></pre>"},{"location":"deeponto/align/oaei/","title":"OAEI Utilities","text":"<p>This page concerns utility functions used in the OAEI.</p>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.get_ignored_class_index","title":"<code>get_ignored_class_index(onto)</code>","text":"<p>Get an index for filtering classes that are marked as not used in alignment.</p> <p>This is indicated by the special class annotation <code>use_in_alignment</code> with the following IRI:     http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def get_ignored_class_index(onto: Ontology):\n\"\"\"Get an index for filtering classes that are marked as not used in alignment.\n\n    This is indicated by the special class annotation `use_in_alignment` with the following IRI:\n        http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\n    \"\"\"\n    ignored_class_index = defaultdict(lambda: False)\n    for class_iri, class_obj in onto.owl_classes.items():\n        use_in_alignment = onto.get_annotations(\n            class_obj, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\"\n        )\n        if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n            ignored_class_index[class_iri] = True\n    return ignored_class_index\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.remove_ignored_mappings","title":"<code>remove_ignored_mappings(mappings, ignored_class_index)</code>","text":"<p>Filter prediction mappings that involve classes to be ignored.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def remove_ignored_mappings(mappings: List[EntityMapping], ignored_class_index: dict):\n\"\"\"Filter prediction mappings that involve classes to be ignored.\"\"\"\n    results = []\n    for m in mappings:\n        if ignored_class_index[m.head] or ignored_class_index[m.tail]:\n            continue\n        results.append(m)\n    return results\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.matching_eval","title":"<code>matching_eval(pred_maps_file, ref_maps_file, null_ref_maps_file=None, ignored_class_index=None, pred_maps_threshold=None)</code>","text":"<p>Conduct global matching evaluation for the prediction mappings against the reference mappings.</p> <p>The prediction mappings are formatted the same as <code>full.tsv</code> (the full reference mappings), with three columns: <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code>, indicating the source class IRI, the target class IRI, and the corresponding mapping score.</p> <p>An <code>ignored_class_index</code> needs to be constructed for omitting prediction mappings that involve a class marked as not used in alignment.</p> <p>Use the following code to obtain such index for both the source and target ontologies:</p> <pre><code>ignored_class_index = get_ignored_class_index(src_onto)\nignored_class_index.update(get_ignored_class_index(tgt_onto))\n</code></pre> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def matching_eval(\n    pred_maps_file: str,\n    ref_maps_file: str,\n    null_ref_maps_file: Optional[str] = None,\n    ignored_class_index: Optional[dict] = None,\n    pred_maps_threshold: Optional[float] = None,\n):\nr\"\"\"Conduct **global matching** evaluation for the prediction mappings against the\n    reference mappings.\n\n    The prediction mappings are formatted the same as `full.tsv` (the full reference mappings),\n    with three columns: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`, indicating the source\n    class IRI, the target class IRI, and the corresponding mapping score.\n\n    An `ignored_class_index` needs to be constructed for omitting prediction mappings\n    that involve a class marked as **not used in alignment**.\n\n    Use the following code to obtain such index for both the source and target ontologies:\n\n    ```python\n    ignored_class_index = get_ignored_class_index(src_onto)\n    ignored_class_index.update(get_ignored_class_index(tgt_onto))\n    ```\n    \"\"\"\n    refs = ReferenceMapping.read_table_mappings(ref_maps_file, relation=\"=\")\n    preds = EntityMapping.read_table_mappings(pred_maps_file, relation=\"=\", threshold=pred_maps_threshold)\n    if ignored_class_index:\n        preds = remove_ignored_mappings(preds, ignored_class_index)\n    null_refs = ReferenceMapping.read_table_mappings(null_ref_maps_file, relation=\"=\") if null_ref_maps_file else []\n    results = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=null_refs)\n    return results\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.read_candidate_mappings","title":"<code>read_candidate_mappings(cand_maps_file, for_biollm=False, threshold=0.0)</code>","text":"<p>Load scored or already ranked candidate mappings.</p> <p>The predicted candidate mappings are formatted the same as <code>test.cands.tsv</code>, with three columns: <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"TgtCandidates\"</code>, indicating the source reference class IRI, the target reference class IRI, and a list of tuples in the form of <code>(target_candidate_class_IRI, score)</code> where <code>score</code> is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, <code>\"TgtCandidates\"</code> refers to a list of triples in the form of <code>(target_candidate_class_IRI, score, answer)</code> where the <code>answer</code> is required for computing matching scores.</p> <p>This method loads the candidate mappings in this format and parse them into the inputs of <code>mean_reciprocal_rank</code> and [<code>hits_at_K</code>][[<code>mean_reciprocal_rank</code>][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].</p> <p>For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of <code>f1</code>.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def read_candidate_mappings(cand_maps_file: str, for_biollm: bool = False, threshold: float = 0.0):\nr\"\"\"Load scored or already ranked candidate mappings.\n\n    The predicted candidate mappings are formatted the same as `test.cands.tsv`, with three columns:\n    `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"TgtCandidates\"`, indicating the source reference class IRI, the\n    target reference class IRI, and a list of **tuples** in the form of `(target_candidate_class_IRI, score)` where\n    `score` is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, `\"TgtCandidates\"`\n    refers to a list of **triples** in the form of `(target_candidate_class_IRI, score, answer)` where the `answer` is\n    required for computing matching scores.\n\n    This method loads the candidate mappings in this format and parse them into the inputs of [`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank]\n    and [`hits_at_K`][[`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].\n\n    For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of [`f1`][deeponto.align.evaluation.AlignmentEvaluator.f1].\n    \"\"\"\n\n    all_cand_maps = read_table(cand_maps_file).values.tolist()\n    cands = []\n    unmatched_cands = []\n    preds = []  # only used for bio-llm\n    refs = []  # only used for bio-llm\n\n    for src_ref_class, tgt_ref_class, tgt_cands in all_cand_maps:\n        ref_map = ReferenceMapping(src_ref_class, tgt_ref_class, \"=\")\n        tgt_cands = eval(tgt_cands)\n        has_score = True if all([not isinstance(x, str) for x in tgt_cands]) else False\n        cand_maps = []\n        refs.append(ref_map) if tgt_ref_class != \"UnMatched\" else None\n        if for_biollm:\n            for t, s, a in tgt_cands:\n                m = EntityMapping(src_ref_class, t, \"=\", s)\n                cand_maps.append(m)\n                if a is True and s &gt;= threshold:  # only keep first one\n                    preds.append(m)\n        elif has_score:\n            cand_maps = [EntityMapping(src_ref_class, t, \"=\", s) for t, s in tgt_cands]\n        else:\n            warnings.warn(\"Input candidate mappings do not have a score, assume default rank in descending order.\")\n            cand_maps = [\n                EntityMapping(src_ref_class, t, \"=\", (len(tgt_cands) - i) / len(tgt_cands))\n                for i, t in enumerate(tgt_cands)\n            ]\n        cand_maps = EntityMapping.sort_entity_mappings_by_score(cand_maps)\n        if for_biollm and tgt_ref_class == \"UnMatched\":\n            unmatched_cands.append((ref_map, cand_maps))\n        else:\n            cands.append((ref_map, cand_maps))\n\n    if for_biollm:\n        return cands, unmatched_cands, preds, refs\n    else:\n        return cands\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.ranking_result_file_check","title":"<code>ranking_result_file_check(cand_maps_file, ref_cand_maps_file)</code>","text":"<p>Check if the ranking result file is formatted correctly as the original <code>test.cands.tsv</code> file provided in the dataset.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def ranking_result_file_check(cand_maps_file: str, ref_cand_maps_file: str):\nr\"\"\"Check if the ranking result file is formatted correctly as the original\n    `test.cands.tsv` file provided in the dataset.\n    \"\"\"\n    formatted_cand_maps = read_candidate_mappings(cand_maps_file)\n    formatted_ref_cand_maps = read_candidate_mappings(ref_cand_maps_file)\n    assert len(formatted_cand_maps) == len(\n        formatted_ref_cand_maps\n    ), f\"Mismatched number of reference mappings: {len(formatted_cand_maps)}; should be {len(formatted_ref_cand_maps)}.\"\n    for i in range(len(formatted_cand_maps)):\n        anchor, cands = formatted_cand_maps[i]\n        ref_anchor, ref_cands = formatted_ref_cand_maps[i]\n        assert (\n            anchor.to_tuple() == ref_anchor.to_tuple()\n        ), f\"Mismatched reference mapping: {anchor}; should be {ref_anchor}.\"\n        cands = [c.to_tuple() for c in cands]\n        ref_cands = [rc.to_tuple() for rc in ref_cands]\n        assert not (\n            set(cands) - set(ref_cands)\n        ), f\"Mismatch set of candidate mappings for the reference mapping: {anchor}.\"\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.ranking_eval","title":"<code>ranking_eval(cand_maps_file, Ks=[1, 5, 10])</code>","text":"<p>Conduct local ranking evaluation for the scored or ranked candidate mappings.</p> <p>See <code>read_candidate_mappings</code> for the file format and loading.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def ranking_eval(cand_maps_file: str, Ks=[1, 5, 10]):\nr\"\"\"Conduct **local ranking** evaluation for the scored or ranked candidate mappings.\n\n    See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.\n    \"\"\"\n    formatted_cand_maps = read_candidate_mappings(cand_maps_file)\n    results = {\"MRR\": AlignmentEvaluator.mean_reciprocal_rank(formatted_cand_maps)}\n    for K in Ks:\n        results[f\"Hits@{K}\"] = AlignmentEvaluator.hits_at_K(formatted_cand_maps, K=K)\n    return results\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.is_rejection","title":"<code>is_rejection(preds, cands)</code>","text":"<p>A successful rejection means none of the candidate mappings are predicted as true mappings.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def is_rejection(preds: List[EntityMapping], cands: List[EntityMapping]):\n\"\"\"A successful rejection means none of the candidate mappings are predicted as true mappings.\"\"\"\n    return set([p.to_tuple() for p in preds]).intersection(set([c.to_tuple() for c in cands])) == set()\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.biollm_eval","title":"<code>biollm_eval(cand_maps_file, Ks=[1], threshold=0.0)</code>","text":"<p>Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.</p> <p>See <code>read_candidate_mappings</code> for the file format and loading.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def biollm_eval(cand_maps_file, Ks=[1], threshold: float = 0.0):\nr\"\"\"Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.\n\n    See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.\n    \"\"\"\n    matched_cand_maps, unmatched_cand_maps, preds, refs = read_candidate_mappings(\n        cand_maps_file, for_biollm=True, threshold=threshold\n    )\n\n    results = AlignmentEvaluator.f1(preds, refs)\n    for K in Ks:\n        results[f\"Hits@{K}\"] = AlignmentEvaluator.hits_at_K(matched_cand_maps, K=K)\n    results[\"MRR\"] = AlignmentEvaluator.mean_reciprocal_rank(matched_cand_maps)\n    rej = 0\n    for _, cs in unmatched_cand_maps:\n        rej += int(is_rejection(preds, cs))\n    results[\"RR\"] = rej / len(unmatched_cand_maps)\n    return results\n</code></pre>"},{"location":"deeponto/align/bertmap/","title":"BERTMap","text":"<p>Paper</p> <p>\\(\\textsf{BERTMap}\\) is proposed in the paper: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).</p> <pre><code>@inproceedings{he2022bertmap,\n    title={BERTMap: a BERT-based ontology alignment system},\n    author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},\n    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n    volume={36},\n    number={5},\n    pages={5684--5691},\n    year={2022}\n}\n</code></pre> <p>\\(\\textsf{BERTMap}\\) is a BERT-based ontology matching (OM) system consisting of following components:</p> <ul> <li>Text semantics corpora construction from input ontologies, and optionally from input mappings and other auxiliary ontologies.</li> <li>BERT synonym classifier training on synonym and non-synonym samples in text semantics corpora.</li> <li>Sub-word Inverted Index construction from the tokenised class annotations for candidate selection in mapping prediction.</li> <li>Mapping Predictor which integrates a simple edit distance-based string matching module and the fine-tuned BERT synonym classifier for mapping scoring. For each source ontology class, narrow down target class candidates using the sub-word inverted index, apply string matching for \"easy\" mappings and then apply BERT matching.</li> <li>Mapping Refiner which consists of the mapping extension and mapping repair modules. Mapping extension is an iterative process based on the locality principle. Mapping repair utilises the LogMap's debugger. </li> </ul> <p>\\(\\textsf{BERTMapLt}\\) is a light-weight version of \\(\\textsf{BERTMap}\\) without the BERT module and mapping refiner.</p> <p>See the tutorial for \\(\\textsf{BERTMap}\\) here.</p>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline","title":"<code>BERTMapPipeline(src_onto, tgt_onto, config)</code>","text":"<p>Class for the whole ontology alignment pipeline of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) models.</p> <p>Note</p> <p>Parameters related to BERT training are <code>None</code> by default. They will be constructed for \\(\\textsf{BERTMap}\\) and stay as <code>None</code> for \\(\\textsf{BERTMapLt}\\).</p> <p>Attributes:</p> Name Type Description <code>config</code> <code>CfgNode</code> <p>The configuration for BERTMap or BERTMapLt.</p> <code>name</code> <code>str</code> <p>The name of the model, either <code>bertmap</code> or <code>bertmaplt</code>.</p> <code>output_path</code> <code>str</code> <p>The path to the output directory.</p> <code>src_onto</code> <code>Ontology</code> <p>The source ontology to be matched.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology to be matched.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>The annotation property IRIs used for extracting synonyms and nonsynonyms.</p> <code>src_annotation_index</code> <code>dict</code> <p>A dictionary that stores the <code>(class_iri, class_annotations)</code> pairs from <code>src_onto</code> according to <code>annotation_property_iris</code>.</p> <code>tgt_annotation_index</code> <code>dict</code> <p>A dictionary that stores the <code>(class_iri, class_annotations)</code> pairs from <code>tgt_onto</code> according to <code>annotation_property_iris</code>.</p> <code>known_mappings</code> <code>List[ReferenceMapping]</code> <p>List of known mappings for constructing the cross-ontology corpus.</p> <code>auxliary_ontos</code> <code>List[Ontology]</code> <p>List of auxiliary ontolgoies for constructing any auxiliary corpus.</p> <code>corpora</code> <code>dict</code> <p>A dictionary that stores the <code>summary</code> of built text semantics corpora and the sampled <code>synonyms</code> and <code>nonsynonyms</code>.</p> <code>finetune_data</code> <code>dict</code> <p>A dictionary that stores the <code>training</code> and <code>validation</code> splits of samples from <code>corpora</code>.</p> <code>bert</code> <code>BERTSynonymClassifier</code> <p>A BERT model for synonym classification and mapping prediction.</p> <code>best_checkpoint</code> <code>str</code> <p>The path to the best BERT checkpoint which will be loaded after training.</p> <code>mapping_predictor</code> <code>MappingPredictor</code> <p>The predictor function based on class annotations, used for global matching or mapping scoring.</p> <p>Parameters:</p> Name Type Description Default <code>src_onto</code> <code>Ontology</code> <p>The source ontology for alignment.</p> required <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology for alignment.</p> required <code>config</code> <code>CfgNode</code> <p>The configuration for BERTMap or BERTMapLt.</p> required Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def __init__(self, src_onto: Ontology, tgt_onto: Ontology, config: CfgNode):\n\"\"\"Initialise the BERTMap or BERTMapLt model.\n\n    Args:\n        src_onto (Ontology): The source ontology for alignment.\n        tgt_onto (Ontology): The target ontology for alignment.\n        config (CfgNode): The configuration for BERTMap or BERTMapLt.\n    \"\"\"\n    # load the configuration and confirm model name is valid\n    self.config = config\n    self.name = self.config.model\n    if not self.name in MODEL_OPTIONS.keys():\n        raise RuntimeError(f\"`model` {self.name} in the config file is not one of the supported.\")\n\n    # create the output directory, e.g., experiments/bertmap\n    self.config.output_path = \".\" if not self.config.output_path else self.config.output_path\n    self.config.output_path = os.path.abspath(self.config.output_path)\n    self.output_path = os.path.join(self.config.output_path, self.name)\n    create_path(self.output_path)\n\n    # create logger and progress manager (hidden attribute) \n    self.logger = create_logger(self.name, self.output_path)\n    self.enlighten_manager = enlighten.get_manager()\n\n    # ontology\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    self.annotation_property_iris = self.config.annotation_property_iris\n    self.logger.info(f\"Load the following configurations:\\n{print_dict(self.config)}\")\n    config_path = os.path.join(self.output_path, \"config.yaml\")\n    self.logger.info(f\"Save the configuration file at {config_path}.\")\n    self.save_bertmap_config(self.config, config_path)\n\n    # build the annotation thesaurus\n    self.src_annotation_index, _ = self.src_onto.build_annotation_index(self.annotation_property_iris, apply_lowercasing=True)\n    self.tgt_annotation_index, _ = self.tgt_onto.build_annotation_index(self.annotation_property_iris, apply_lowercasing=True)\n    if (not self.src_annotation_index) or (not self.tgt_annotation_index):\n        raise RuntimeError(\"No class annotations found in input ontologies; unable to produce alignment.\")\n\n    # provided mappings if any\n    self.known_mappings = self.config.known_mappings\n    if self.known_mappings:\n        self.known_mappings = ReferenceMapping.read_table_mappings(self.known_mappings)\n\n    # auxiliary ontologies if any\n    self.auxiliary_ontos = self.config.auxiliary_ontos\n    if self.auxiliary_ontos:\n        self.auxiliary_ontos = [Ontology(ao) for ao in self.auxiliary_ontos]\n\n    self.data_path = os.path.join(self.output_path, \"data\")\n    # load or construct the corpora\n    self.corpora_path = os.path.join(self.data_path, \"text-semantics.corpora.json\")\n    self.corpora = self.load_text_semantics_corpora()\n\n    # load or construct fine-tune data\n    self.finetune_data_path = os.path.join(self.data_path, \"fine-tune.data.json\")\n    self.finetune_data = self.load_finetune_data()\n\n    # load the bert model and train\n    self.bert_config = self.config.bert\n    self.bert_pretrained_path = self.bert_config.pretrained_path\n    self.bert_finetuned_path = os.path.join(self.output_path, \"bert\")\n    self.bert_resume_training = self.bert_config.resume_training\n    self.bert_synonym_classifier = None\n    self.best_checkpoint = None\n    if self.name == \"bertmap\":\n        self.bert_synonym_classifier = self.load_bert_synonym_classifier()\n        # train if the loaded classifier is not in eval mode\n        if self.bert_synonym_classifier.eval_mode == False:\n            self.logger.info(\n                f\"Data statistics:\\n \\\n{print_dict(self.bert_synonym_classifier.data_stat)}\"\n            )\n            self.bert_synonym_classifier.train(self.bert_resume_training)\n            # turn on eval mode after training\n            self.bert_synonym_classifier.eval()\n        # NOTE potential redundancy here: after training, load the best checkpoint\n        self.best_checkpoint = self.load_best_checkpoint()\n        if not self.best_checkpoint:\n            raise RuntimeError(f\"No best checkpoint found for the BERT synonym classifier model.\")\n        self.logger.info(f\"Fine-tuning finished, found best checkpoint at {self.best_checkpoint}.\")\n    else:\n        self.logger.info(f\"No training needed; skip BERT fine-tuning.\")\n\n    # pretty progress bar tracking\n    self.enlighten_status = self.enlighten_manager.status_bar(\n        status_format=u'Global Matching{fill}Stage: {demo}{fill}{elapsed}',\n        color='bold_underline_bright_white_on_lightslategray',\n        justify=enlighten.Justify.CENTER, demo='Initializing',\n        autorefresh=True, min_delta=0.5\n    )\n\n    # mapping predictions\n    self.global_matching_config = self.config.global_matching\n\n    # build ignored class index for OAEI\n    self.ignored_class_index = None  \n    if self.global_matching_config.for_oaei:\n        self.ignored_class_index = defaultdict(lambda: False)\n        for src_class_iri, src_class in self.src_onto.owl_classes.items():\n            use_in_alignment = self.src_onto.get_annotations(src_class, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\")\n            if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n                self.ignored_class_index[src_class_iri] = True\n        for tgt_class_iri, tgt_class in self.tgt_onto.owl_classes.items():\n            use_in_alignment = self.tgt_onto.get_annotations(tgt_class, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\")\n            if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n                self.ignored_class_index[tgt_class_iri] = True\n\n    self.mapping_predictor = MappingPredictor(\n        output_path=self.output_path,\n        tokenizer_path=self.bert_config.pretrained_path,\n        src_annotation_index=self.src_annotation_index,\n        tgt_annotation_index=self.tgt_annotation_index,\n        bert_synonym_classifier=self.bert_synonym_classifier,\n        num_raw_candidates=self.global_matching_config.num_raw_candidates,\n        num_best_predictions=self.global_matching_config.num_best_predictions,\n        batch_size_for_prediction=self.bert_config.batch_size_for_prediction,\n        logger=self.logger,\n        enlighten_manager=self.enlighten_manager,\n        enlighten_status=self.enlighten_status,\n        ignored_class_index=self.ignored_class_index,\n    )\n    self.mapping_refiner = None\n\n    # if global matching is disabled (potentially used for class pair scoring)\n    if self.config.global_matching.enabled:\n        self.mapping_predictor.mapping_prediction()  # mapping prediction\n        if self.name == \"bertmap\":\n            self.mapping_refiner = MappingRefiner(\n                output_path=self.output_path,\n                src_onto=self.src_onto,\n                tgt_onto=self.tgt_onto,\n                mapping_predictor=self.mapping_predictor,\n                mapping_extension_threshold=self.global_matching_config.mapping_extension_threshold,\n                mapping_filtered_threshold=self.global_matching_config.mapping_filtered_threshold,\n                logger=self.logger,\n                enlighten_manager=self.enlighten_manager,\n                enlighten_status=self.enlighten_status\n            )\n            self.mapping_refiner.mapping_extension()  # mapping extension\n            self.mapping_refiner.mapping_repair()  # mapping repair\n        self.enlighten_status.update(demo=\"Finished\")  \n    else:\n        self.enlighten_status.update(demo=\"Skipped\")  \n\n    self.enlighten_status.close()\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_or_construct","title":"<code>load_or_construct(data_file, data_name, construct_func, *args, **kwargs)</code>","text":"<p>Load existing data or construct a new one.</p> <p>An auxlirary function that checks the existence of a data file and loads it if it exists. Otherwise, construct new data with the input <code>construct_func</code> which is supported generate a local data file.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_or_construct(self, data_file: str, data_name: str, construct_func: Callable, *args, **kwargs):\n\"\"\"Load existing data or construct a new one.\n\n    An auxlirary function that checks the existence of a data file and loads it if it exists.\n    Otherwise, construct new data with the input `construct_func` which is supported generate\n    a local data file.\n    \"\"\"\n    if os.path.exists(data_file):\n        self.logger.info(f\"Load existing {data_name} from {data_file}.\")\n    else:\n        self.logger.info(f\"Construct new {data_name} and save at {data_file}.\")\n        construct_func(*args, **kwargs)\n    # load the data file that is supposed to be saved locally\n    return load_file(data_file)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_text_semantics_corpora","title":"<code>load_text_semantics_corpora()</code>","text":"<p>Load or construct text semantics corpora.</p> <p>See <code>TextSemanticsCorpora</code>.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_text_semantics_corpora(self):\n\"\"\"Load or construct text semantics corpora.\n\n    See [`TextSemanticsCorpora`][deeponto.align.bertmap.text_semantics.TextSemanticsCorpora].\n    \"\"\"\n    data_name = \"text semantics corpora\"\n\n    if self.name == \"bertmap\":\n\n        def construct():\n            corpora = TextSemanticsCorpora(\n                src_onto=self.src_onto,\n                tgt_onto=self.tgt_onto,\n                annotation_property_iris=self.annotation_property_iris,\n                class_mappings=self.known_mappings,\n                auxiliary_ontos=self.auxiliary_ontos,\n            )\n            self.logger.info(str(corpora))\n            corpora.save(self.data_path)\n\n        return self.load_or_construct(self.corpora_path, data_name, construct)\n\n    self.logger.info(f\"No training needed; skip the construction of {data_name}.\")\n    return None\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_finetune_data","title":"<code>load_finetune_data()</code>","text":"<p>Load or construct fine-tuning data from text semantics corpora.</p> <p>Steps of constructing fine-tuning data from text semantics:</p> <ol> <li>Mix synonym and nonsynonym data.</li> <li>Randomly sample 90% as training samples and 10% as validation.</li> </ol> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_finetune_data(self):\nr\"\"\"Load or construct fine-tuning data from text semantics corpora.\n\n    Steps of constructing fine-tuning data from text semantics:\n\n    1. Mix synonym and nonsynonym data.\n    2. Randomly sample 90% as training samples and 10% as validation.\n    \"\"\"\n    data_name = \"fine-tuning data\"\n\n    if self.name == \"bertmap\":\n\n        def construct():\n            finetune_data = dict()\n            samples = self.corpora[\"synonyms\"] + self.corpora[\"nonsynonyms\"]\n            random.shuffle(samples)\n            split_index = int(0.9 * len(samples))  # split at 90%\n            finetune_data[\"training\"] = samples[:split_index]\n            finetune_data[\"validation\"] = samples[split_index:]\n            save_file(finetune_data, self.finetune_data_path)\n\n        return self.load_or_construct(self.finetune_data_path, data_name, construct)\n\n    self.logger.info(f\"No training needed; skip the construction of {data_name}.\")\n    return None\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_bert_synonym_classifier","title":"<code>load_bert_synonym_classifier()</code>","text":"<p>Load the BERT model from a pre-trained or a local checkpoint.</p> <ul> <li>If loaded from pre-trained, it means to start training from a pre-trained model such as <code>bert-uncased</code>.</li> <li>If loaded from local, turn on the <code>eval</code> mode for mapping predictions.</li> <li>If <code>self.bert_resume_training</code> is <code>True</code>, it will be loaded from the latest saved checkpoint.</li> </ul> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_bert_synonym_classifier(self):\n\"\"\"Load the BERT model from a pre-trained or a local checkpoint.\n\n    - If loaded from pre-trained, it means to start training from a pre-trained model such as `bert-uncased`.\n    - If loaded from local, turn on the `eval` mode for mapping predictions.\n    - If `self.bert_resume_training` is `True`, it will be loaded from the latest saved checkpoint.\n    \"\"\"\n    checkpoint = self.load_best_checkpoint()  # load the best checkpoint or nothing\n    eval_mode = True\n    # if no checkpoint has been found, start training from scratch OR resume training\n    # no point to load the best checkpoint if resume training (will automatically search for the latest checkpoint)\n    if not checkpoint or self.bert_resume_training:\n        checkpoint = self.bert_pretrained_path\n        eval_mode = False  # since it is for training now\n\n    return BERTSynonymClassifier(\n        loaded_path=checkpoint,\n        output_path=self.bert_finetuned_path,\n        eval_mode=eval_mode,\n        max_length_for_input=self.bert_config.max_length_for_input,\n        num_epochs_for_training=self.bert_config.num_epochs_for_training,\n        batch_size_for_training=self.bert_config.batch_size_for_training,\n        batch_size_for_prediction=self.bert_config.batch_size_for_prediction,\n        training_data=self.finetune_data[\"training\"],\n        validation_data=self.finetune_data[\"validation\"],\n    )\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_best_checkpoint","title":"<code>load_best_checkpoint()</code>","text":"<p>Find the best checkpoint by searching for trainer states in each checkpoint file.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_best_checkpoint(self) -&gt; Optional[str]:\n\"\"\"Find the best checkpoint by searching for trainer states in each checkpoint file.\"\"\"\n    best_checkpoint = -1\n\n    if os.path.exists(self.bert_finetuned_path):\n        for file in os.listdir(self.bert_finetuned_path):\n            # load trainer states from each checkpoint file\n            if file.startswith(\"checkpoint\"):\n                trainer_state = load_file(\n                    os.path.join(self.bert_finetuned_path, file, \"trainer_state.json\")\n                )\n                checkpoint = int(trainer_state[\"best_model_checkpoint\"].split(\"/\")[-1].split(\"-\")[-1])\n                # find the latest best checkpoint\n                if checkpoint &gt; best_checkpoint:\n                    best_checkpoint = checkpoint\n\n    if best_checkpoint == -1:\n        best_checkpoint = None\n    else:\n        best_checkpoint = os.path.join(self.bert_finetuned_path, f\"checkpoint-{best_checkpoint}\")\n\n    return best_checkpoint\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_bertmap_config","title":"<code>load_bertmap_config(config_file=None)</code>  <code>staticmethod</code>","text":"<p>Load the BERTMap configuration in <code>.yaml</code>. If the file is not provided, use the default configuration.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>@staticmethod\ndef load_bertmap_config(config_file: Optional[str] = None):\n\"\"\"Load the BERTMap configuration in `.yaml`. If the file\n    is not provided, use the default configuration.\n    \"\"\"\n    if not config_file:\n        config_file = DEFAULT_CONFIG_FILE\n        print(f\"Use the default configuration at {DEFAULT_CONFIG_FILE}.\")  \n    if not config_file.endswith(\".yaml\"):\n        raise RuntimeError(\"Configuration file should be in `yaml` format.\")\n    return CfgNode(load_file(config_file))\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.save_bertmap_config","title":"<code>save_bertmap_config(config, config_file)</code>  <code>staticmethod</code>","text":"<p>Save the BERTMap configuration in <code>.yaml</code>.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>@staticmethod\ndef save_bertmap_config(config: CfgNode, config_file: str):\n\"\"\"Save the BERTMap configuration in `.yaml`.\"\"\"\n    with open(config_file, \"w\") as c:\n        config.dump(stream=c, sort_keys=False, default_flow_style=False)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus","title":"<code>AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)</code>","text":"<p>A thesaurus class for synonyms and non-synonyms extracted from an ontology.</p> <p>Some related definitions of arguments here:</p> <ul> <li>A <code>synonym_group</code> is a set of annotation phrases that are synonymous to each other;</li> <li>The <code>transitivity</code> of synonyms means if A and B are synonymous and B and C are synonymous, then A and C are synonymous. This is achieved by a connected graph-based algorithm.</li> <li>A <code>synonym_pair</code> is a pair synonymous annotation phrase which can be extracted from the cartesian product of a <code>synonym_group</code> and itself. NOTE that reflexivity and symmetry are preserved meaning that (i) every phrase A is a synonym of itself and (ii) if (A, B) is a synonym pair then (B, A) is a synonym pair, too.</li> </ul> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>An ontology to construct the annotation thesaurus from.</p> <code>annotation_index</code> <code>Dict[str, Set[str]]</code> <p>An index of the class annotations with <code>(class_iri, annotations)</code> pairs.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>A list of annotation property IRIs used to extract the annotations.</p> <code>average_number_of_annotations_per_class</code> <code>int</code> <p>The average number of (extracted) annotations per ontology class.</p> <code>apply_transitivity</code> <code>bool</code> <p>Apply synonym transitivity to merge synonym groups or not.</p> <code>synonym_groups</code> <code>List[Set[str]]</code> <p>The list of synonym groups extracted from the ontology according to specified annotation properties.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The input ontology to extract annotations from.</p> required <code>annotation_property_iris</code> <code>List[str]</code> <p>Specify which annotation properties to be used.</p> required <code>apply_transitivity</code> <code>bool</code> <p>Apply synonym transitivity to merge synonym groups or not. Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def __init__(self, onto: Ontology, annotation_property_iris: List[str], apply_transitivity: bool = False):\nr\"\"\"Initialise a thesaurus for ontology class annotations.\n\n    Args:\n        onto (Ontology): The input ontology to extract annotations from.\n        annotation_property_iris (List[str]): Specify which annotation properties to be used.\n        apply_transitivity (bool, optional): Apply synonym transitivity to merge synonym groups or not. Defaults to `False`.\n    \"\"\"\n\n    self.onto = onto\n    # build the annotation index to extract synonyms from `onto`\n    # the input property iris may not exist in this ontology\n    # the output property iris will be truncated to the existing ones\n    index, iris = self.onto.build_annotation_index(\n        annotation_property_iris=annotation_property_iris,\n        entity_type=\"Classes\",\n        apply_lowercasing=True,\n    )\n    self.annotation_index = index\n    self.annotation_property_iris = iris\n    total_number_of_annotations = sum([len(v) for v in self.annotation_index.values()])\n    self.average_number_of_annotations_per_class = total_number_of_annotations / len(self.annotation_index)\n\n    # synonym groups\n    self.apply_transitivity = apply_transitivity\n    self.synonym_groups = list(self.annotation_index.values())\n    if self.apply_transitivity:\n        self.synonym_groups = self.merge_synonym_groups_by_transitivity(self.synonym_groups)\n\n    # summary\n    self.info = {\n        type(self).__name__: {\n            \"ontology\": self.onto.info[type(self.onto).__name__],\n            \"average_number_of_annotations_per_class\": round(self.average_number_of_annotations_per_class, 3),\n            \"number_of_synonym_groups\": len(self.synonym_groups),\n        }\n    }\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.get_synonym_pairs","title":"<code>get_synonym_pairs(synonym_group, remove_duplicates=True)</code>  <code>staticmethod</code>","text":"<p>Get synonym pairs from a synonym group through a cartesian product.</p> <p>Parameters:</p> Name Type Description Default <code>synonym_group</code> <code>Set[str]</code> <p>A set of annotation phrases that are synonymous to each other.</p> required <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of synonym pairs.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>@staticmethod\ndef get_synonym_pairs(synonym_group: Set[str], remove_duplicates: bool = True):\n\"\"\"Get synonym pairs from a synonym group through a cartesian product.\n\n    Args:\n        synonym_group (Set[str]): A set of annotation phrases that are synonymous to each other.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of synonym pairs.\n    \"\"\"\n    synonym_pairs = list(itertools.product(synonym_group, synonym_group))\n    if remove_duplicates:\n        return uniqify(synonym_pairs)\n    else:\n        return synonym_pairs\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.merge_synonym_groups_by_transitivity","title":"<code>merge_synonym_groups_by_transitivity(synonym_groups)</code>  <code>staticmethod</code>","text":"<p>Merge synonym groups by transitivity.</p> <p>Synonym groups that share a common annotation phrase will be merged. NOTE that for multiple ontologies, we can merge their synonym groups by first concatenating them then use this function.</p> <p>Note</p> <p>In \\(\\textsf{BERTMap}\\) experiments we have considered this as a data augmentation approach but it does not bring a significant performance improvement. However, if the overall number of annotations is not large enough then this could be a good option.</p> <p>Parameters:</p> Name Type Description Default <code>synonym_groups</code> <code>List[Set[str]]</code> <p>A sequence of synonym groups to be merged.</p> required <p>Returns:</p> Type Description <code>List[Set[str]]</code> <p>A list of merged synonym groups.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>@staticmethod\ndef merge_synonym_groups_by_transitivity(synonym_groups: List[Set[str]]):\nr\"\"\"Merge synonym groups by transitivity.\n\n    Synonym groups that share a common annotation phrase will be merged. NOTE that for\n    multiple ontologies, we can merge their synonym groups by first concatenating them\n    then use this function.\n\n    !!! note\n\n        In $\\textsf{BERTMap}$ experiments we have considered this as a data augmentation approach\n        but it does not bring a significant performance improvement. However, if the\n        overall number of annotations is not large enough then this could be a good option.\n\n    Args:\n        synonym_groups (List[Set[str]]): A sequence of synonym groups to be merged.\n\n    Returns:\n        (List[Set[str]]): A list of merged synonym groups.\n    \"\"\"\n    synonym_pairs = []\n    for synonym_group in synonym_groups:\n        # gather synonym pairs from the self-product of a synonym group\n        synonym_pairs += AnnotationThesaurus.get_synonym_pairs(synonym_group, remove_duplicates=False)\n    synonym_pairs = uniqify(synonym_pairs)\n    merged_grouped_synonyms = AnnotationThesaurus.connected_labels(synonym_pairs)\n    return merged_grouped_synonyms\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.connected_annotations","title":"<code>connected_annotations(synonym_pairs)</code>  <code>staticmethod</code>","text":"<p>Build a graph for adjacency among the class annotations (labels) such that the transitivity of synonyms is ensured.</p> <p>Auxiliary function for <code>merge_synonym_groups_by_transitivity</code>.</p> <p>Parameters:</p> Name Type Description Default <code>synonym_pairs</code> <code>List[Tuple[str, str]]</code> <p>List of pairs of phrases that are synonymous.</p> required <p>Returns:</p> Type Description <code>List[Set[str]]</code> <p>A list of synonym groups.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>@staticmethod\ndef connected_annotations(synonym_pairs: List[Tuple[str, str]]):\n\"\"\"Build a graph for adjacency among the class annotations (labels) such that\n    the **transitivity** of synonyms is ensured.\n\n    Auxiliary function for [`merge_synonym_groups_by_transitivity`][deeponto.align.bertmap.text_semantics.AnnotationThesaurus.merge_synonym_groups_by_transitivity].\n\n    Args:\n        synonym_pairs (List[Tuple[str, str]]): List of pairs of phrases that are synonymous.\n\n    Returns:\n        (List[Set[str]]): A list of synonym groups.\n    \"\"\"\n    graph = nx.Graph()\n    graph.add_edges_from(synonym_pairs)\n    # nx.draw(G, with_labels = True)\n    connected = list(nx.connected_components(graph))\n    return connected\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.synonym_sampling","title":"<code>synonym_sampling(num_samples=None)</code>","text":"<p>Sample synonym pairs from a list of synonym groups extracted from the input ontology.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, synonyms are defined as label pairs that belong to the same ontology class.</p> <p>NOTE this has been validated for getting the same results as in the original \\(\\textsf{BERTMap}\\) repository.</p> <p>Parameters:</p> Name Type Description Default <code>num_samples</code> <code>int</code> <p>The (maximum) number of unique samples extracted. Defaults to <code>None</code>.</p> <code>None</code> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique synonym pair samples.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def synonym_sampling(self, num_samples: Optional[int] = None):\nr\"\"\"Sample synonym pairs from a list of synonym groups extracted from the input ontology.\n\n    According to the $\\textsf{BERTMap}$ paper, **synonyms** are defined as label pairs that belong\n    to the same ontology class.\n\n    NOTE this has been validated for getting the same results as in the original $\\textsf{BERTMap}$ repository.\n\n    Args:\n        num_samples (int, optional): The (maximum) number of **unique** samples extracted. Defaults to `None`.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique synonym pair samples.\n    \"\"\"\n    synonym_pool = []\n    for synonym_group in self.synonym_groups:\n        # do not remove duplicates in the loop to save time\n        synonym_pairs = self.get_synonym_pairs(synonym_group, remove_duplicates=False)\n        synonym_pool += synonym_pairs\n    # remove duplicates afer the loop\n    synonym_pool = uniqify(synonym_pool)\n\n    if (not num_samples) or (num_samples &gt;= len(synonym_pool)):\n        # print(\"Return all synonym pairs without downsampling.\")\n        return synonym_pool\n    else:\n        return random.sample(synonym_pool, num_samples)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.soft_nonsynonym_sampling","title":"<code>soft_nonsynonym_sampling(num_samples, max_iter=5)</code>","text":"<p>Sample soft non-synonyms from a list of synonym groups extracted from the input ontology.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, soft non-synonyms are defined as label pairs from two different synonym groups that are randomly selected.</p> <p>Parameters:</p> Name Type Description Default <code>num_samples</code> <code>int</code> <p>The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups).</p> required <code>max_iter</code> <code>int</code> <p>The maximum number of iterations for conducting sampling. Defaults to <code>5</code>.</p> <code>5</code> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique (soft) non-synonym pair samples.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def soft_nonsynonym_sampling(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample **soft** non-synonyms from a list of synonym groups extracted from the input ontology.\n\n    According to the $\\textsf{BERTMap}$ paper, **soft non-synonyms** are defined as label pairs\n    from two *different* synonym groups that are **randomly** selected.\n\n    Args:\n        num_samples (int): The (maximum) number of **unique** samples extracted; this is\n            required **unlike for synonym sampling** because the non-synonym pool is **significantly\n            larger** (considering random combinations of different synonym groups).\n        max_iter (int): The maximum number of iterations for conducting sampling. Defaults to `5`.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique (soft) non-synonym pair samples.\n    \"\"\"\n    nonsyonym_pool = []\n    # randomly select disjoint synonym group pairs from all\n    for _ in range(num_samples):\n        left_synonym_group, right_synonym_group = tuple(random.sample(self.synonym_groups, 2))\n        try:\n            # randomly choose one label from a synonym group\n            left_label = random.choice(list(left_synonym_group))\n            right_label = random.choice(list(right_synonym_group))\n            nonsyonym_pool.append((left_label, right_label))\n        except:\n            # skip if there are no class labels\n            continue\n\n    # DataUtils.uniqify is too slow so we should avoid operating it too often\n    nonsyonym_pool = uniqify(nonsyonym_pool)\n\n    while len(nonsyonym_pool) &lt; num_samples and max_iter &gt; 0:\n        max_iter = max_iter - 1  # reduce the iteration to prevent exhausting loop\n        nonsyonym_pool += self.soft_nonsynonym_sampling(num_samples - len(nonsyonym_pool), max_iter)\n        nonsyonym_pool = uniqify(nonsyonym_pool)\n\n    return nonsyonym_pool\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.weighted_random_choices_of_sibling_groups","title":"<code>weighted_random_choices_of_sibling_groups(k=1)</code>","text":"<p>Randomly (weighted) select a number of sibling class groups.</p> <p>The weights are computed according to the sizes of the sibling class groups.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def weighted_random_choices_of_sibling_groups(self, k: int = 1):\n\"\"\"Randomly (weighted) select a number of sibling class groups.\n\n    The weights are computed according to the sizes of the sibling class groups.\n    \"\"\"\n    weights = [len(s) for s in self.onto.sibling_class_groups]\n    weights = [w / sum(weights) for w in weights]  # normalised\n    return random.choices(self.onto.sibling_class_groups, weights=weights, k=k)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.hard_nonsynonym_sampling","title":"<code>hard_nonsynonym_sampling(num_samples, max_iter=5)</code>","text":"<p>Sample hard non-synonyms from sibling classes of the input ontology.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, hard non-synonyms are defined as label pairs that belong to two disjoint ontology classes. For practical reason, the condition is eased to two sibling ontology classes.</p> <p>Parameters:</p> Name Type Description Default <code>num_samples</code> <code>int</code> <p>The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups).</p> required <code>max_iter</code> <code>int</code> <p>The maximum number of iterations for conducting sampling. Defaults to <code>5</code>.</p> <code>5</code> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique (hard) non-synonym pair samples.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def hard_nonsynonym_sampling(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample **hard** non-synonyms from sibling classes of the input ontology.\n\n    According to the $\\textsf{BERTMap}$ paper, **hard non-synonyms** are defined as label pairs\n    that belong to two **disjoint** ontology classes. For practical reason, the condition\n    is eased to two **sibling** ontology classes.\n\n    Args:\n        num_samples (int): The (maximum) number of **unique** samples extracted; this is\n            required **unlike for synonym sampling** because the non-synonym pool is **significantly\n            larger** (considering random combinations of different synonym groups).\n        max_iter (int): The maximum number of iterations for conducting sampling. Defaults to `5`.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique (hard) non-synonym pair samples.\n    \"\"\"\n    # intialise the sibling class groups\n    self.onto.sibling_class_groups\n\n    if not self.onto.sibling_class_groups:\n        warnings.warn(\"Skip hard negative sampling as no sibling class groups are defined.\")\n        return []\n\n    # flatten the disjointness groups into all pairs of hard neagtives\n    nonsynonym_pool = []\n    # randomly (weighted) select a number of sibling class groups with replacement\n    sibling_class_groups = self.weighted_random_choices_of_sibling_groups(k=num_samples)\n\n    for sibling_class_group in sibling_class_groups:\n        # random select two sibling classes; no weights this time\n        left_class_iri, right_class_iri = tuple(random.sample(sibling_class_group, 2))\n        try:\n            # random select a label for each of them\n            left_label = random.choice(list(self.annotation_index[left_class_iri]))\n            right_label = random.choice(list(self.annotation_index[right_class_iri]))\n            # add the label pair to the pool\n            nonsynonym_pool.append((left_label, right_label))\n        except:\n            # skip them if there are no class labels\n            continue\n\n    # DataUtils.uniqify is too slow so we should avoid operating it too often\n    nonsynonym_pool = uniqify(nonsynonym_pool)\n\n    while len(nonsynonym_pool) &lt; num_samples and max_iter &gt; 0:\n        max_iter = max_iter - 1  # reduce the iteration to prevent exhausting loop\n        nonsynonym_pool += self.hard_nonsynonym_sampling(num_samples - len(nonsynonym_pool), max_iter)\n        nonsynonym_pool = uniqify(nonsynonym_pool)\n\n    return nonsynonym_pool\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.IntraOntologyTextSemanticsCorpus","title":"<code>IntraOntologyTextSemanticsCorpus(onto, annotation_property_iris, soft_negative_ratio=2, hard_negative_ratio=2)</code>","text":"<p>Class for creating the intra-ontology text semantics corpus from an ontology.</p> <p>As defined in the \\(\\textsf{BERTMap}\\) paper, the intra-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the ontology class annotations.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>An ontology to construct the intra-ontology text semantics corpus from.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>Specify which annotation properties to be used.</p> <code>soft_negative_ratio</code> <code>int</code> <p>The expected negative sample ratio of the soft non-synonyms to the extracted synonyms. Defaults to <code>2</code>.</p> <code>hard_negative_ratio</code> <code>int</code> <p>The expected negative sample ratio of the hard non-synonyms to the extracted synonyms. Defaults to <code>2</code>. However, hard non-synonyms are sometimes insufficient given an ontology's hierarchy, the soft ones are used to compensate the number in this case.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def __init__(\n    self,\n    onto: Ontology,\n    annotation_property_iris: List[str],\n    soft_negative_ratio: int = 2,\n    hard_negative_ratio: int = 2,\n):\n    self.onto = onto\n    # $\\textsf{BERTMap}$ does not apply synonym transitivity\n    self.thesaurus = AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)\n\n    self.synonyms = self.thesaurus.synonym_sampling()\n    # sample hard negatives first as they might not be enough\n    num_hard = hard_negative_ratio * len(self.synonyms)\n    self.hard_nonsynonyms = self.thesaurus.hard_nonsynonym_sampling(num_hard)\n    # compensate the number of hard negatives as soft negatives are almost always available\n    num_soft = (soft_negative_ratio + hard_negative_ratio) * len(self.synonyms) - len(self.hard_nonsynonyms)\n    self.soft_nonsynonyms = self.thesaurus.soft_nonsynonym_sampling(num_soft)\n\n    self.info = {\n        type(self).__name__: {\n            \"num_synonyms\": len(self.synonyms),\n            \"num_nonsynonyms\": len(self.soft_nonsynonyms) + len(self.hard_nonsynonyms),\n            \"num_soft_nonsynonyms\": len(self.soft_nonsynonyms),\n            \"num_hard_nonsynonyms\": len(self.hard_nonsynonyms),\n            \"annotation_thesaurus\": self.thesaurus.info[\"AnnotationThesaurus\"],\n        }\n    }\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.IntraOntologyTextSemanticsCorpus.save","title":"<code>save(save_path)</code>","text":"<p>Save the intra-ontology corpus (a <code>.json</code> file for label pairs and its summary) in the specified directory.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def save(self, save_path: str):\n\"\"\"Save the intra-ontology corpus (a `.json` file for label pairs\n    and its summary) in the specified directory.\n    \"\"\"\n    create_path(save_path)\n    save_json = {\n        \"summary\": self.info,\n        \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n        \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.soft_nonsynonyms + self.hard_nonsynonyms],\n    }\n    save_file(save_json, os.path.join(save_path, \"intra-onto.corpus.json\"))\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus","title":"<code>CrossOntologyTextSemanticsCorpus(class_mappings, src_onto, tgt_onto, annotation_property_iris, negative_ratio=4)</code>","text":"<p>Class for creating the cross-ontology text semantics corpus from two ontologies and provided mappings between them.</p> <p>As defined in the \\(\\textsf{BERTMap}\\) paper, the cross-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the annotations/labels of class pairs involved in the provided cross-ontology mappigns.</p> <p>Attributes:</p> Name Type Description <code>class_mappings</code> <code>List[ReferenceMapping]</code> <p>A list of cross-ontology class mappings.</p> <code>src_onto</code> <code>Ontology</code> <p>The source ontology whose class IRIs are heads of the <code>class_mappings</code>.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology whose class IRIs are tails of the <code>class_mappings</code>.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>A list of annotation property IRIs used to extract the annotations.</p> <code>negative_ratio</code> <code>int</code> <p>The expected negative sample ratio of the non-synonyms to the extracted synonyms. Defaults to <code>4</code>. NOTE that we do not have hard non-synonyms at the cross-ontology level.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def __init__(\n    self,\n    class_mappings: List[ReferenceMapping],\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    annotation_property_iris: List[str],\n    negative_ratio: int = 4,\n):\n    self.class_mappings = class_mappings\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    # build the annotation thesaurus for each ontology\n    self.src_thesaurus = AnnotationThesaurus(src_onto, annotation_property_iris)\n    self.tgt_thesaurus = AnnotationThesaurus(tgt_onto, annotation_property_iris)\n    self.negative_ratio = negative_ratio\n\n    self.synonyms = self.synonym_sampling_from_mappings()\n    num_negative = negative_ratio * len(self.synonyms)\n    self.nonsynonyms = self.nonsynonym_sampling_from_mappings(num_negative)\n\n    self.info = {\n        type(self).__name__: {\n            \"num_synonyms\": len(self.synonyms),\n            \"num_nonsynonyms\": len(self.nonsynonyms),\n            \"num_mappings\": len(self.class_mappings),\n            \"src_annotation_thesaurus\": self.src_thesaurus.info[\"AnnotationThesaurus\"],\n            \"tgt_annotation_thesaurus\": self.tgt_thesaurus.info[\"AnnotationThesaurus\"],\n        }\n    }\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.save","title":"<code>save(save_path)</code>","text":"<p>Save the cross-ontology corpus (a <code>.json</code> file for label pairs and its summary) in the specified directory.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def save(self, save_path: str):\n\"\"\"Save the cross-ontology corpus (a `.json` file for label pairs\n    and its summary) in the specified directory.\n    \"\"\"\n    create_path(save_path)\n    save_json = {\n        \"summary\": self.info,\n        \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n        \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.nonsynonyms],\n    }\n    save_file(save_json, os.path.join(save_path, \"cross-onto.corpus.json\"))\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.synonym_sampling_from_mappings","title":"<code>synonym_sampling_from_mappings()</code>","text":"<p>Sample synonyms from cross-ontology class mappings.</p> <p>Arguments of this method are all class attributes. See <code>CrossOntologyTextSemanticsCorpus</code>.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, cross-ontology synonyms are defined as label pairs that belong to two matched classes. Suppose the class \\(C\\) from the source ontology and the class \\(D\\) from the target ontology are matched according to one of the <code>class_mappings</code>, then the cartesian product of labels of \\(C\\) and labels of \\(D\\) form cross-ontology synonyms. Note that identity synonyms in the form of \\((a, a)\\) are removed because they have been covered in the intra-ontology case.</p> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique synonym pair samples from ontology class mappings.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def synonym_sampling_from_mappings(self):\nr\"\"\"Sample synonyms from cross-ontology class mappings.\n\n    Arguments of this method are all class attributes.\n    See [`CrossOntologyTextSemanticsCorpus`][deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus].\n\n    According to the $\\textsf{BERTMap}$ paper, **cross-ontology synonyms** are defined as label pairs\n    that belong to two **matched** classes. Suppose the class $C$ from the source ontology\n    and the class $D$ from the target ontology are matched according to one of the `class_mappings`,\n    then the cartesian product of labels of $C$ and labels of $D$ form cross-ontology synonyms.\n    Note that **identity synonyms** in the form of $(a, a)$ are removed because they have been covered\n    in the intra-ontology case.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique synonym pair samples from ontology class mappings.\n    \"\"\"\n    synonym_pool = []\n\n    for class_mapping in self.class_mappings:\n        src_class_iri, tgt_class_iri = class_mapping.to_tuple()\n        src_class_annotations = self.src_thesaurus.annotation_index[src_class_iri]\n        tgt_class_annotations = self.tgt_thesaurus.annotation_index[tgt_class_iri]\n        synonym_pairs = list(itertools.product(src_class_annotations, tgt_class_annotations))\n        # remove the identity synonyms as the have been covered in the intra-ontology case\n        synonym_pairs = [(l, r) for l, r in synonym_pairs if l != r]\n        backward_synonym_pairs = [(r, l) for l, r in synonym_pairs]\n        synonym_pool += synonym_pairs + backward_synonym_pairs\n\n    synonym_pool = uniqify(synonym_pool)\n    return synonym_pool\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.nonsynonym_sampling_from_mappings","title":"<code>nonsynonym_sampling_from_mappings(num_samples, max_iter=5)</code>","text":"<p>Sample non-synonyms from cross-ontology class mappings.</p> <p>Arguments of this method are all class attributes. See <code>CrossOntologyTextSemanticsCorpus</code>.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, cross-ontology non-synonyms are defined as label pairs that belong to two unmatched classes. Assume that the provided class mappings are self-contained in the sense that they are complete for the classes involved in them, then we can randomly sample two cross-ontology classes that are not matched according to the mappings and take their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since the number of incorrect mappings is much larger than the number of correct ones.</p> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique nonsynonym pair samples from ontology class mappings.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def nonsynonym_sampling_from_mappings(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample non-synonyms from cross-ontology class mappings.\n\n    Arguments of this method are all class attributes.\n    See [`CrossOntologyTextSemanticsCorpus`][deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus].\n\n    According to the $\\textsf{BERTMap}$ paper, **cross-ontology non-synonyms** are defined as label pairs\n    that belong to two **unmatched** classes. Assume that the provided class mappings are self-contained\n    in the sense that they are complete for the classes involved in them, then we can randomly\n    sample two cross-ontology classes that are not matched according to the mappings and take\n    their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since\n    the number of incorrect mappings is much larger than the number of correct ones.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique nonsynonym pair samples from ontology class mappings.\n    \"\"\"\n    nonsynonym_pool = []\n\n    # form cross-ontology synonym groups\n    cross_onto_synonym_group_pair = []\n    for class_mapping in self.class_mappings:\n        src_class_iri, tgt_class_iri = class_mapping.to_tuple()\n        src_class_annotations = self.src_thesaurus.annotation_index[src_class_iri]\n        tgt_class_annotations = self.tgt_thesaurus.annotation_index[tgt_class_iri]\n        # let each matched class pair's annotations form a synonym group_pair\n        cross_onto_synonym_group_pair.append((src_class_annotations, tgt_class_annotations))\n\n    # randomly select disjoint synonym group pairs from all\n    for _ in range(num_samples):\n        left_class_pair, right_class_pair = tuple(random.sample(cross_onto_synonym_group_pair, 2))\n        try:\n            # randomly choose one label from a synonym group\n            left_label = random.choice(list(left_class_pair[0]))  # choosing the src side by [0]\n            right_label = random.choice(list(right_class_pair[1]))  # choosing the tgt side by [1]\n            nonsynonym_pool.append((left_label, right_label))\n        except:\n            # skip if there are no class labels\n            continue\n\n    # DataUtils.uniqify is too slow so we should avoid operating it too often\n    nonsynonym_pool = uniqify(nonsynonym_pool)\n    while len(nonsynonym_pool) &lt; num_samples and max_iter &gt; 0:\n        max_iter = max_iter - 1  # reduce the iteration to prevent exhausting loop\n        nonsynonym_pool += self.nonsynonym_sampling_from_mappings(num_samples - len(nonsynonym_pool), max_iter)\n        nonsynonym_pool = uniqify(nonsynonym_pool)\n    return nonsynonym_pool\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora","title":"<code>TextSemanticsCorpora(src_onto, tgt_onto, annotation_property_iris, class_mappings=None, auxiliary_ontos=None)</code>","text":"<p>Class for creating the collection text semantics corpora.</p> <p>As defined in the \\(\\textsf{BERTMap}\\) paper, the collection of text semantics corpora contains at least two intra-ontology sub-corpora from the source and target ontologies, respectively. If some class mappings are provided, then a cross-ontology sub-corpus will be created. If some additional auxiliary ontologies are provided, the intra-ontology corpora created from them will serve as the auxiliary sub-corpora.</p> <p>Attributes:</p> Name Type Description <code>src_onto</code> <code>Ontology</code> <p>The source ontology to be matched or aligned.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology to be matched or aligned.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>A list of annotation property IRIs used to extract the annotations.</p> <code>class_mappings</code> <code>List[ReferenceMapping]</code> <p>A list of cross-ontology class mappings between the source and the target ontologies. Defaults to <code>None</code>.</p> <code>auxiliary_ontos</code> <code>List[Ontology]</code> <p>A list of auxiliary ontologies for augmenting more synonym/non-synonym samples. Defaults to <code>None</code>.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def __init__(\n    self,\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    annotation_property_iris: List[str],\n    class_mappings: Optional[List[ReferenceMapping]] = None,\n    auxiliary_ontos: Optional[List[Ontology]] = None,\n):\n    self.synonyms = []\n    self.nonsynonyms = []\n\n    # build intra-ontology corpora\n    # negative sample ratios are by default\n    self.intra_src_onto_corpus = IntraOntologyTextSemanticsCorpus(src_onto, annotation_property_iris)\n    self.add_samples_from_sub_corpus(self.intra_src_onto_corpus)\n    self.intra_tgt_onto_corpus = IntraOntologyTextSemanticsCorpus(tgt_onto, annotation_property_iris)\n    self.add_samples_from_sub_corpus(self.intra_tgt_onto_corpus)\n\n    # build cross-ontolgoy corpora\n    self.class_mappings = class_mappings\n    self.cross_onto_corpus = None\n    if self.class_mappings:\n        self.cross_onto_corpus = CrossOntologyTextSemanticsCorpus(\n            class_mappings, src_onto, tgt_onto, annotation_property_iris\n        )\n        self.add_samples_from_sub_corpus(self.cross_onto_corpus)\n\n    # build auxiliary ontology corpora (same as intra-ontology)\n    self.auxiliary_ontos = auxiliary_ontos\n    self.auxiliary_onto_corpora = []\n    if self.auxiliary_ontos:\n        for auxiliary_onto in self.auxiliary_ontos:\n            self.auxiliary_onto_corpora.append(\n                IntraOntologyTextSemanticsCorpus(auxiliary_onto, annotation_property_iris)\n            )\n    for auxiliary_onto_corpus in self.auxiliary_onto_corpora:\n        self.add_samples_from_sub_corpus(auxiliary_onto_corpus)\n\n    # DataUtils.uniqify the samples\n    self.synonyms = uniqify(self.synonyms)\n    self.nonsynonyms = uniqify(self.nonsynonyms)\n    # remove invalid nonsynonyms\n    self.nonsynonyms = list(set(self.nonsynonyms) - set(self.synonyms))\n\n    # summary\n    self.info = {\n        type(self).__name__: {\n            \"num_synonyms\": len(self.synonyms),\n            \"num_nonsynonyms\": len(self.nonsynonyms),\n            \"intra_src_onto_corpus\": self.intra_src_onto_corpus.info[\"IntraOntologyTextSemanticsCorpus\"],\n            \"intra_tgt_onto_corpus\": self.intra_tgt_onto_corpus.info[\"IntraOntologyTextSemanticsCorpus\"],\n            \"cross_onto_corpus\": self.cross_onto_corpus.info[\"CrossOntologyTextSemanticsCorpus\"]\n            if self.cross_onto_corpus\n            else None,\n            \"auxiliary_onto_corpora\": [\n                a.info[\"IntraOntologyTextSemanticsCorpus\"] for a in self.auxiliary_onto_corpora\n            ],\n        }\n    }\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora.save","title":"<code>save(save_path)</code>","text":"<p>Save the overall text semantics corpora (a <code>.json</code> file for label pairs and its summary) in the specified directory.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def save(self, save_path: str):\n\"\"\"Save the overall text semantics corpora (a `.json` file for label pairs\n    and its summary) in the specified directory.\n    \"\"\"\n    create_path(save_path)\n    save_json = {\n        \"summary\": self.info,\n        \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n        \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.nonsynonyms],\n    }\n    save_file(save_json, os.path.join(save_path, \"text-semantics.corpora.json\"))\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora.add_samples_from_sub_corpus","title":"<code>add_samples_from_sub_corpus(sub_corpus)</code>","text":"<p>Add synonyms and non-synonyms from each sub-corpus to the overall collection.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def add_samples_from_sub_corpus(\n    self, sub_corpus: Union[IntraOntologyTextSemanticsCorpus, CrossOntologyTextSemanticsCorpus]\n):\n\"\"\"Add synonyms and non-synonyms from each sub-corpus to the overall collection.\"\"\"\n    self.synonyms += sub_corpus.synonyms\n    if isinstance(sub_corpus, IntraOntologyTextSemanticsCorpus):\n        self.nonsynonyms += sub_corpus.soft_nonsynonyms + sub_corpus.hard_nonsynonyms\n    else:\n        self.nonsynonyms += sub_corpus.nonsynonyms\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier","title":"<code>BERTSynonymClassifier(loaded_path, output_path, eval_mode, max_length_for_input, num_epochs_for_training=None, batch_size_for_training=None, batch_size_for_prediction=None, training_data=None, validation_data=None)</code>","text":"<p>Class for BERT synonym classifier.</p> <p>The main scoring module of \\(\\textsf{BERTMap}\\) consisting of a BERT model and a binary synonym classifier.</p> <p>Attributes:</p> Name Type Description <code>loaded_path</code> <code>str</code> <p>The path to the checkpoint of a pre-trained BERT model.</p> <code>output_path</code> <code>str</code> <p>The path to the output BERT model (usually fine-tuned).</p> <code>eval_mode</code> <code>bool</code> <p>Set to <code>False</code> if the model is loaded for training.</p> <code>max_length_for_input</code> <code>int</code> <p>The maximum length of an input sequence.</p> <code>num_epochs_for_training</code> <code>int</code> <p>The number of epochs for training a BERT model.</p> <code>batch_size_for_training</code> <code>int</code> <p>The batch size for training a BERT model.</p> <code>batch_size_for_prediction</code> <code>int</code> <p>The batch size for making predictions.</p> <code>training_data</code> <code>Dataset</code> <p>Data for training the model if <code>for_training</code> is set to <code>True</code>. Defaults to <code>None</code>.</p> <code>validation_data</code> <code>Dataset</code> <p>Data for validating the model if <code>for_training</code> is set to <code>True</code>. Defaults to <code>None</code>.</p> <code>training_args</code> <code>TrainingArguments</code> <p>Training arguments for training the model if <code>for_training</code> is set to <code>True</code>. Defaults to <code>None</code>.</p> <code>trainer</code> <code>Trainer</code> <p>The model trainer fed with <code>training_args</code> and data samples. Defaults to <code>None</code>.</p> <code>softmax</code> <code>torch.nn.SoftMax</code> <p>The softmax layer used for normalising synonym scores. Defaults to <code>None</code>.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def __init__(\n    self,\n    loaded_path: str,\n    output_path: str,\n    eval_mode: bool,\n    max_length_for_input: int,\n    num_epochs_for_training: Optional[float] = None,\n    batch_size_for_training: Optional[int] = None,\n    batch_size_for_prediction: Optional[int] = None,\n    training_data: Optional[List[Tuple[str, str, int]]] = None,  # (sentence1, sentence2, label)\n    validation_data: Optional[List[Tuple[str, str, int]]] = None,\n):\n    # Load the pretrained BERT model from the given path\n    self.loaded_path = loaded_path\n    print(f\"Loading a BERT model from: {self.loaded_path}.\")\n    self.model = AutoModelForSequenceClassification.from_pretrained(\n        self.loaded_path, output_hidden_states=eval_mode\n    )\n    self.tokenizer = Tokenizer.from_pretrained(loaded_path)\n\n    self.output_path = output_path\n    self.eval_mode = eval_mode\n    self.max_length_for_input = max_length_for_input\n    self.num_epochs_for_training = num_epochs_for_training\n    self.batch_size_for_training = batch_size_for_training\n    self.batch_size_for_prediction = batch_size_for_prediction\n    self.training_data = None\n    self.validation_data = None\n    self.data_stat = {}\n    self.training_args = None\n    self.trainer = None\n    self.softmax = None\n\n    # load the pre-trained BERT model and set it to eval mode (static)\n    if self.eval_mode:\n        self.eval()\n    # load the pre-trained BERT model for fine-tuning\n    else:\n        if not training_data:\n            raise RuntimeError(\"Training data should be provided when `for_training` is `True`.\")\n        if not validation_data:\n            raise RuntimeError(\"Validation data should be provided when `for_training` is `True`.\")\n        # load data (max_length is used for truncation)\n        self.training_data = self.load_dataset(training_data, \"training\")\n        self.validation_data = self.load_dataset(validation_data, \"validation\")\n        self.data_stat = {\n            \"num_training\": len(self.training_data),\n            \"num_validation\": len(self.validation_data),\n        }\n\n        # generate training arguments\n        epoch_steps = len(self.training_data) // self.batch_size_for_training  # total steps of an epoch\n        if torch.cuda.device_count() &gt; 0:\n            epoch_steps = epoch_steps // torch.cuda.device_count()  # to deal with multi-gpus case\n        # keep logging steps consisitent even for small batch size\n        # report logging on every 0.02 epoch\n        logging_steps = int(epoch_steps * 0.02)\n        # eval on every 0.2 epoch\n        eval_steps = 10 * logging_steps\n        # generate the training arguments\n        self.training_args = TrainingArguments(\n            output_dir=self.output_path,\n            num_train_epochs=self.num_epochs_for_training,\n            per_device_train_batch_size=self.batch_size_for_training,\n            per_device_eval_batch_size=self.batch_size_for_training,\n            warmup_ratio=0.0,\n            weight_decay=0.01,\n            logging_steps=logging_steps,\n            logging_dir=f\"{self.output_path}/tensorboard\",\n            eval_steps=eval_steps,\n            evaluation_strategy=\"steps\",\n            do_train=True,\n            do_eval=True,\n            save_steps=eval_steps,\n            save_total_limit=2,\n            load_best_model_at_end=True,\n        )\n        # build the trainer\n        self.trainer = Trainer(\n            model=self.model,\n            args=self.training_args,\n            train_dataset=self.training_data,\n            eval_dataset=self.validation_data,\n            compute_metrics=self.compute_metrics,\n            tokenizer=self.tokenizer._tokenizer,\n        )\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.train","title":"<code>train(resume_from_checkpoint=None)</code>","text":"<p>Start training the BERT model.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def train(self, resume_from_checkpoint: Optional[Union[bool, str]] = None):\n\"\"\"Start training the BERT model.\"\"\"\n    if self.eval_mode:\n        raise RuntimeError(\"Training cannot be started in `eval` mode.\")\n    self.trainer.train(resume_from_checkpoint=resume_from_checkpoint)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.eval","title":"<code>eval()</code>","text":"<p>To eval mode.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def eval(self):\n\"\"\"To eval mode.\"\"\"\n    print(\"The BERT model is set to eval mode for making predictions.\")\n    self.model.eval()\n    # TODO: to implement multi-gpus for inference\n    self.device = self.get_device(device_num=0)\n    self.model.to(self.device)\n    self.softmax = torch.nn.Softmax(dim=1).to(self.device)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.predict","title":"<code>predict(sent_pairs)</code>","text":"<p>Run prediction pipeline for synonym classification.</p> <p>Return the <code>softmax</code> probailities of predicting pairs as synonyms (<code>index=1</code>).</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def predict(self, sent_pairs: List[Tuple[str, str]]):\nr\"\"\"Run prediction pipeline for synonym classification.\n\n    Return the `softmax` probailities of predicting pairs as synonyms (`index=1`).\n    \"\"\"\n    inputs = self.process_inputs(sent_pairs)\n    with torch.no_grad():\n        return self.softmax(self.model(**inputs).logits)[:, 1]\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.load_dataset","title":"<code>load_dataset(data, split)</code>","text":"<p>Load the list of <code>(annotation1, annotation2, label)</code> samples into a <code>datasets.Dataset</code>.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def load_dataset(self, data: List[Tuple[str, str, int]], split: str) -&gt; Dataset:\nr\"\"\"Load the list of `(annotation1, annotation2, label)` samples into a `datasets.Dataset`.\"\"\"\n\n    def iterate():\n        for sample in data:\n            yield {\"annotation1\": sample[0], \"annotation2\": sample[1], \"labels\": sample[2]}\n\n    dataset = Dataset.from_generator(iterate)\n    # NOTE: no padding here because the Trainer class supports dynamic padding\n    dataset = dataset.map(\n        lambda examples: self.tokenizer._tokenizer(\n            examples[\"annotation1\"], examples[\"annotation2\"], max_length=self.max_length_for_input, truncation=True\n        ),\n        batched=True,\n        desc=f\"Load {split} data:\",\n    )\n    return dataset\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.process_inputs","title":"<code>process_inputs(sent_pairs)</code>","text":"<p>Process input sentence pairs for the BERT model.</p> <p>Transform the sentences into BERT input embeddings and load them into the device. This function is called only when the BERT model is about to make predictions (<code>eval</code> mode).</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def process_inputs(self, sent_pairs: List[Tuple[str, str]]):\nr\"\"\"Process input sentence pairs for the BERT model.\n\n    Transform the sentences into BERT input embeddings and load them into the device.\n    This function is called only when the BERT model is about to make predictions (`eval` mode).\n    \"\"\"\n    return self.tokenizer._tokenizer(\n        sent_pairs,\n        return_tensors=\"pt\",\n        max_length=self.max_length_for_input,\n        padding=True,\n        truncation=True,\n    ).to(self.device)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.compute_metrics","title":"<code>compute_metrics(pred)</code>  <code>staticmethod</code>","text":"<p>Add more evaluation metrics into the training log.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>@staticmethod\ndef compute_metrics(pred):\n\"\"\"Add more evaluation metrics into the training log.\"\"\"\n    # TODO: currently only accuracy is added, will expect more in the future if needed\n    labels = pred.label_ids\n    preds = pred.predictions.argmax(-1)\n    acc = accuracy_score(labels, preds)\n    return {\"accuracy\": acc}\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.get_device","title":"<code>get_device(device_num=0)</code>  <code>staticmethod</code>","text":"<p>Get a device (GPU or CPU) for the torch model</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>@staticmethod\ndef get_device(device_num: int = 0):\n\"\"\"Get a device (GPU or CPU) for the torch model\"\"\"\n    # If there's a GPU available...\n    if torch.cuda.is_available():\n        # Tell PyTorch to use the GPU.\n        device = torch.device(f\"cuda:{device_num}\")\n        print(\"There are %d GPU(s) available.\" % torch.cuda.device_count())\n        print(\"We will use the GPU:\", torch.cuda.get_device_name(device_num))\n    # If not...\n    else:\n        print(\"No GPU available, using the CPU instead.\")\n        device = torch.device(\"cpu\")\n    return device\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.set_seed","title":"<code>set_seed(seed_val=888)</code>  <code>staticmethod</code>","text":"<p>Set random seed for reproducible results.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>@staticmethod\ndef set_seed(seed_val: int = 888):\n\"\"\"Set random seed for reproducible results.\"\"\"\n    random.seed(seed_val)\n    np.random.seed(seed_val)\n    torch.manual_seed(seed_val)\n    torch.cuda.manual_seed_all(seed_val)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor","title":"<code>MappingPredictor(output_path, tokenizer_path, src_annotation_index, tgt_annotation_index, bert_synonym_classifier, num_raw_candidates, num_best_predictions, batch_size_for_prediction, logger, enlighten_manager, enlighten_status, ignored_class_index=None)</code>","text":"<p>Class for the mapping prediction module of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) models.</p> <p>Attributes:</p> Name Type Description <code>tokenizer</code> <code>Tokenizer</code> <p>The tokenizer used for constructing the inverted annotation index and candidate selection.</p> <code>src_annotation_index</code> <code>dict</code> <p>A dictionary that stores the <code>(class_iri, class_annotations)</code> pairs from <code>src_onto</code> according to <code>annotation_property_iris</code>.</p> <code>tgt_annotation_index</code> <code>dict</code> <p>A dictionary that stores the <code>(class_iri, class_annotations)</code> pairs from <code>tgt_onto</code> according to <code>annotation_property_iris</code>.</p> <code>tgt_inverted_annotation_index</code> <code>InvertedIndex</code> <p>The inverted index built from <code>tgt_annotation_index</code> used for target class candidate selection.</p> <code>bert_synonym_classifier</code> <code>BERTSynonymClassifier</code> <p>The BERT synonym classifier fine-tuned on text semantics corpora.</p> <code>num_raw_candidates</code> <code>int</code> <p>The maximum number of selected target class candidates for a source class.</p> <code>num_best_predictions</code> <code>int</code> <p>The maximum number of best scored mappings presevred for a source class.</p> <code>batch_size_for_prediction</code> <code>int</code> <p>The batch size of class annotation pairs for computing synonym scores.</p> <code>ignored_class_index</code> <code>dict</code> <p>OAEI arguemnt, a dictionary that stores the <code>(class_iri, used_in_alignment)</code> pairs.</p> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>def __init__(\n    self,\n    output_path: str,\n    tokenizer_path: str,\n    src_annotation_index: dict,\n    tgt_annotation_index: dict,\n    bert_synonym_classifier: Optional[BERTSynonymClassifier],\n    num_raw_candidates: Optional[int],\n    num_best_predictions: Optional[int],\n    batch_size_for_prediction: int,\n    logger: Logger,\n    enlighten_manager: enlighten.Manager,\n    enlighten_status: enlighten.StatusBar,\n    ignored_class_index: Optional[dict] = None,\n):\n    self.logger = logger\n    self.enlighten_manager = enlighten_manager\n    self.enlighten_status = enlighten_status\n\n    self.tokenizer = Tokenizer.from_pretrained(tokenizer_path)\n\n    self.logger.info(\"Build inverted annotation index for candidate selection.\")\n    self.src_annotation_index = src_annotation_index\n    self.tgt_annotation_index = tgt_annotation_index\n    self.tgt_inverted_annotation_index = Ontology.build_inverted_annotation_index(\n        tgt_annotation_index, self.tokenizer\n    )\n    # the fundamental judgement for whether bertmap or bertmaplt is loaded\n    self.bert_synonym_classifier = bert_synonym_classifier\n    self.num_raw_candidates = num_raw_candidates\n    self.num_best_predictions = num_best_predictions\n    self.batch_size_for_prediction = batch_size_for_prediction\n    self.output_path = output_path\n\n    # for the OAEI, adding in check for classes that are not used in alignment\n    self.ignored_class_index = ignored_class_index\n\n    self.init_class_mapping = lambda head, tail, score: EntityMapping(head, tail, \"&lt;EquivalentTo&gt;\", score)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.bert_mapping_score","title":"<code>bert_mapping_score(src_class_annotations, tgt_class_annotations)</code>","text":"<p>\\(\\textsf{BERTMap}\\)'s main mapping score module which utilises the fine-tuned BERT synonym classifier.</p> <p>Compute the synonym score for each pair of src-tgt class annotations, and return the average score as the mapping score. Apply string matching before applying the BERT module to filter easy mappings (with scores \\(1.0\\)).</p> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>def bert_mapping_score(\n    self,\n    src_class_annotations: Set[str],\n    tgt_class_annotations: Set[str],\n):\nr\"\"\"$\\textsf{BERTMap}$'s main mapping score module which utilises the fine-tuned BERT synonym\n    classifier.\n\n    Compute the **synonym score** for each pair of src-tgt class annotations, and return\n    the **average** score as the mapping score. Apply string matching before applying the\n    BERT module to filter easy mappings (with scores $1.0$).\n    \"\"\"\n\n    if not src_class_annotations or not tgt_class_annotations:\n        warnings.warn(\"Return zero score due to empty input class annotations...\")\n        return 0.0\n\n    # apply string matching before applying the bert module\n    prelim_score = self.edit_similarity_mapping_score(\n        src_class_annotations,\n        tgt_class_annotations,\n        string_match_only=True,\n    )\n    if prelim_score == 1.0:\n        return prelim_score\n    # apply BERT classifier and define mapping score := Average(SynonymScores)\n    class_annotation_pairs = list(itertools.product(src_class_annotations, tgt_class_annotations))\n    synonym_scores = self.bert_synonym_classifier.predict(class_annotation_pairs)\n    # only one element tensor is able to be extracted as a scalar by .item()\n    return float(torch.mean(synonym_scores).item())\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.edit_similarity_mapping_score","title":"<code>edit_similarity_mapping_score(src_class_annotations, tgt_class_annotations, string_match_only=False)</code>  <code>staticmethod</code>","text":"<p>\\(\\textsf{BERTMap}\\)'s string match module and \\(\\textsf{BERTMapLt}\\)'s mapping prediction function.</p> <p>Compute the normalised edit similarity <code>(1 - normalised edit distance)</code> for each pair of src-tgt class annotations, and return the maximum score as the mapping score.</p> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>@staticmethod\ndef edit_similarity_mapping_score(\n    src_class_annotations: Set[str],\n    tgt_class_annotations: Set[str],\n    string_match_only: bool = False,\n):\nr\"\"\"$\\textsf{BERTMap}$'s string match module and $\\textsf{BERTMapLt}$'s mapping prediction function.\n\n    Compute the **normalised edit similarity** `(1 - normalised edit distance)` for each pair\n    of src-tgt class annotations, and return the **maximum** score as the mapping score.\n    \"\"\"\n\n    if not src_class_annotations or not tgt_class_annotations:\n        warnings.warn(\"Return zero score due to empty input class annotations...\")\n        return 0.0\n\n    # edge case when src and tgt classes have an exact match of annotation\n    if len(src_class_annotations.intersection(tgt_class_annotations)) &gt; 0:\n        return 1.0\n    # a shortcut to save time for $\\textsf{BERTMap}$\n    if string_match_only:\n        return 0.0\n    annotation_pairs = itertools.product(src_class_annotations, tgt_class_annotations)\n    sim_scores = [levenshtein.normalized_similarity(src, tgt) for src, tgt in annotation_pairs]\n    return max(sim_scores)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction_for_src_class","title":"<code>mapping_prediction_for_src_class(src_class_iri)</code>","text":"<p>Predict \\(N\\) best scored mappings for a source ontology class, where \\(N\\) is specified in <code>self.num_best_predictions</code>.</p> <ol> <li>Apply the string matching module to compute \"easy\" mappings.</li> <li>Return the mappings if found any, or if there is no BERT synonym classifier as in \\(\\textsf{BERTMapLt}\\).</li> <li> <p>If using the BERT synonym classifier module:</p> <ul> <li>Generate batches for class annotation pairs. Each batch contains the combinations of the source class annotations and \\(M\\) target candidate classes' annotations. \\(M\\) is determined by <code>batch_size_for_prediction</code>, i.e., stop adding annotations of a target class candidate into the current batch if this operation will cause the size of current batch to exceed the limit.</li> <li>Compute the synonym scores for each batch and aggregate them into mapping scores; preserve \\(N\\) best scored candidates and update them in the next batch. By this dynamic process, we eventually get \\(N\\) best scored mappings for a source ontology class.</li> </ul> </li> </ol> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>def mapping_prediction_for_src_class(self, src_class_iri: str) -&gt; List[EntityMapping]:\nr\"\"\"Predict $N$ best scored mappings for a source ontology class, where\n    $N$ is specified in `self.num_best_predictions`.\n\n    1. Apply the **string matching** module to compute \"easy\" mappings.\n    2. Return the mappings if found any, or if there is no BERT synonym classifier\n    as in $\\textsf{BERTMapLt}$.\n    3. If using the BERT synonym classifier module:\n\n        - Generate batches for class annotation pairs. Each batch contains the combinations of the\n        source class annotations and $M$ target candidate classes' annotations. $M$ is determined\n        by `batch_size_for_prediction`, i.e., stop adding annotations of a target class candidate into\n        the current batch if this operation will cause the size of current batch to exceed the limit.\n        - Compute the synonym scores for each batch and aggregate them into mapping scores; preserve\n        $N$ best scored candidates and update them in the next batch. By this dynamic process, we eventually\n        get $N$ best scored mappings for a source ontology class.\n    \"\"\"\n\n    src_class_annotations = self.src_annotation_index[src_class_iri]\n    # previously wrongly put tokenizer again !!!\n    tgt_class_candidates = self.tgt_inverted_annotation_index.idf_select(\n        list(src_class_annotations), pool_size=len(self.tgt_annotation_index.keys())\n    )  # [(tgt_class_iri, idf_score)]\n    # if some classes are set to be ignored, remove them from the candidates\n    if self.ignored_class_index:\n        tgt_class_candidates = [(iri, idf_score) for iri, idf_score in tgt_class_candidates if not self.ignored_class_index[iri]]\n    # select a truncated number of candidates\n    tgt_class_candidates = tgt_class_candidates[:self.num_raw_candidates]\n    best_scored_mappings = []\n\n    # for string matching: save time if already found string-matched candidates\n    def string_match():\n\"\"\"Compute string-matched mappings.\"\"\"\n        string_matched_mappings = []\n        for tgt_candidate_iri, _ in tgt_class_candidates:\n            tgt_candidate_annotations = self.tgt_annotation_index[tgt_candidate_iri]\n            prelim_score = self.edit_similarity_mapping_score(\n                src_class_annotations,\n                tgt_candidate_annotations,\n                string_match_only=True,\n            )\n            if prelim_score &gt; 0.0:\n                # if src_class_annotations.intersection(tgt_candidate_annotations):\n                string_matched_mappings.append(\n                    self.init_class_mapping(src_class_iri, tgt_candidate_iri, prelim_score)\n                )\n\n        return string_matched_mappings\n\n    best_scored_mappings += string_match()\n    # return string-matched mappings if found or if there is no bert module (bertmaplt)\n    if best_scored_mappings or not self.bert_synonym_classifier:\n        self.logger.info(f\"The best scored class mappings for {src_class_iri} are\\n{best_scored_mappings}\")\n        return best_scored_mappings\n\n    def generate_batched_annotations(batch_size: int):\n\"\"\"Generate batches of class annotations for the input source class and its\n        target candidates.\n        \"\"\"\n        batches = []\n        # the `nums`` parameter determines how the annotations are grouped\n        current_batch = CfgNode({\"annotations\": [], \"nums\": []})\n        for i, (tgt_candidate_iri, _) in enumerate(tgt_class_candidates):\n            tgt_candidate_annotations = self.tgt_annotation_index[tgt_candidate_iri]\n            annotation_pairs = list(itertools.product(src_class_annotations, tgt_candidate_annotations))\n            current_batch.annotations += annotation_pairs\n            num_annotation_pairs = len(annotation_pairs)\n            current_batch.nums.append(num_annotation_pairs)\n            # collect when the batch is full or for the last target class candidate\n            if sum(current_batch.nums) &gt; batch_size or i == len(tgt_class_candidates) - 1:\n                batches.append(current_batch)\n                current_batch = CfgNode({\"annotations\": [], \"nums\": []})\n        return batches\n\n    def bert_match():\n\"\"\"Compute mappings with fine-tuned BERT synonym classifier.\"\"\"\n        bert_matched_mappings = []\n        class_annotation_batches = generate_batched_annotations(self.batch_size_for_prediction)\n        batch_base_candidate_idx = (\n            0  # after each batch, the base index will be increased by # of covered target candidates\n        )\n        device = self.bert_synonym_classifier.device\n\n        # intialize N prediction scores and N corresponding indices w.r.t `tgt_class_candidates`\n        final_best_scores = torch.tensor([-1] * self.num_best_predictions).to(device)\n        final_best_idxs = torch.tensor([-1] * self.num_best_predictions).to(device)\n\n        for annotation_batch in class_annotation_batches:\n\n            synonym_scores = self.bert_synonym_classifier.predict(annotation_batch.annotations)\n            # aggregating to mappings cores\n            grouped_synonym_scores = torch.split(\n                synonym_scores,\n                split_size_or_sections=annotation_batch.nums,\n            )\n            mapping_scores = torch.stack([torch.mean(chunk) for chunk in grouped_synonym_scores])\n            assert len(mapping_scores) == len(annotation_batch.nums)\n\n            # preserve N best scored mappings\n            # scale N in case there are less than N tgt candidates in this batch\n            N = min(len(mapping_scores), self.num_best_predictions)\n            batch_best_scores, batch_best_idxs = torch.topk(mapping_scores, k=N)\n            batch_best_idxs += batch_base_candidate_idx\n\n            # we do the substitution for every batch to prevent from memory overflow\n            final_best_scores, _idxs = torch.topk(\n                torch.cat([batch_best_scores, final_best_scores]),\n                k=self.num_best_predictions,\n            )\n            final_best_idxs = torch.cat([batch_best_idxs, final_best_idxs])[_idxs]\n\n            # update the index for target candidate classes\n            batch_base_candidate_idx += len(annotation_batch.nums)\n\n        for candidate_idx, mapping_score in zip(final_best_idxs, final_best_scores):\n            # ignore intial values (-1.0) for dummy mappings\n            # the threshold 0.9 is for mapping extension\n            if mapping_score.item() &gt;= 0.9:\n                tgt_candidate_iri = tgt_class_candidates[candidate_idx.item()][0]\n                bert_matched_mappings.append(\n                    self.init_class_mapping(\n                        src_class_iri,\n                        tgt_candidate_iri,\n                        mapping_score.item(),\n                    )\n                )\n\n        assert len(bert_matched_mappings) &lt;= self.num_best_predictions\n        self.logger.info(f\"The best scored class mappings for {src_class_iri} are\\n{bert_matched_mappings}\")\n        return bert_matched_mappings\n\n    return bert_match()\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction","title":"<code>mapping_prediction()</code>","text":"<p>Apply global matching for each class in the source ontology.</p> <p>See <code>mapping_prediction_for_src_class</code>.</p> <p>If this process is accidentally stopped, it can be resumed from already saved predictions. The progress bar keeps track of the number of source ontology classes that have been matched.</p> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>def mapping_prediction(self):\nr\"\"\"Apply global matching for each class in the source ontology.\n\n    See [`mapping_prediction_for_src_class`][deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction_for_src_class].\n\n    If this process is accidentally stopped, it can be resumed from already saved predictions. The progress\n    bar keeps track of the number of source ontology classes that have been matched.\n    \"\"\"\n    self.logger.info(\"Start global matching for each class in the source ontology.\")\n\n    match_dir = os.path.join(self.output_path, \"match\")\n    try:\n        mapping_index = load_file(os.path.join(match_dir, \"raw_mappings.json\"))\n        self.logger.info(\"Load the existing mapping prediction file.\")\n    except:\n        mapping_index = dict()\n        create_path(match_dir)\n\n    progress_bar = self.enlighten_manager.counter(\n        total=len(self.src_annotation_index), desc=\"Mapping Prediction\", unit=\"per src class\"\n    )\n    self.enlighten_status.update(demo=\"Mapping Prediction\")\n\n    for i, src_class_iri in enumerate(self.src_annotation_index.keys()):\n        # skip computed classes\n        if src_class_iri in mapping_index.keys():\n            self.logger.info(f\"[Class {i}] Skip matching {src_class_iri} as already computed.\")\n            progress_bar.update()\n            continue\n        # for OAEI\n        if self.ignored_class_index and self.ignored_class_index[src_class_iri]:\n            self.logger.info(f\"[Class {i}] Skip matching {src_class_iri} as marked as not used in alignment.\")\n            progress_bar.update()\n            continue\n        mappings = self.mapping_prediction_for_src_class(src_class_iri)\n        mapping_index[src_class_iri] = [m.to_tuple(with_score=True) for m in mappings]\n\n        if i % 100 == 0 or i == len(self.src_annotation_index) - 1:\n            save_file(mapping_index, os.path.join(match_dir, \"raw_mappings.json\"))\n            # also save a .tsv version\n            mapping_in_tuples = list(itertools.chain.from_iterable(mapping_index.values()))\n            mapping_df = pd.DataFrame(mapping_in_tuples, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n            mapping_df.to_csv(os.path.join(match_dir, \"raw_mappings.tsv\"), sep=\"\\t\", index=False)\n            self.logger.info(\"Save currently computed mappings to prevent undesirable loss.\")\n\n        progress_bar.update()\n\n    self.logger.info(\"Finished mapping prediction for each class in the source ontology.\")\n    progress_bar.close()\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner","title":"<code>MappingRefiner(output_path, src_onto, tgt_onto, mapping_predictor, mapping_extension_threshold, mapping_filtered_threshold, logger, enlighten_manager, enlighten_status)</code>","text":"<p>Class for the mapping refinement module of \\(\\textsf{BERTMap}\\).</p> <p>\\(\\textsf{BERTMapLt}\\) does not go through mapping refinement for its being \"light\". All the attributes of this class are supposed to be passed from <code>BERTMapPipeline</code>.</p> <p>Attributes:</p> Name Type Description <code>src_onto</code> <code>Ontology</code> <p>The source ontology to be matched.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology to be matched.</p> <code>mapping_predictor</code> <code>MappingPredictor</code> <p>The mapping prediction module of BERTMap.</p> <code>mapping_extension_threshold</code> <code>float</code> <p>Mappings with scores \\(\\geq\\) this value will be considered in the iterative mapping extension process.</p> <code>raw_mappings</code> <code>List[EntityMapping]</code> <p>List of raw class mappings predicted in the global matching phase.</p> <code>mapping_score_dict</code> <code>dict</code> <p>A dynamic dictionary that keeps track of mappings (with scores) that have already been computed.</p> <code>mapping_filter_threshold</code> <code>float</code> <p>Mappings with scores \\(\\geq\\) this value will be preserved for the final mapping repairing.</p> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def __init__(\n    self,\n    output_path: str,\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    mapping_predictor: MappingPredictor,\n    mapping_extension_threshold: float,\n    mapping_filtered_threshold: float,\n    logger: Logger,\n    enlighten_manager: enlighten.Manager,\n    enlighten_status: enlighten.StatusBar\n):\n    self.output_path = output_path\n    self.logger = logger\n    self.enlighten_manager = enlighten_manager\n    self.enlighten_status = enlighten_status\n\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n\n    # iterative mapping extension\n    self.mapping_predictor = mapping_predictor\n    self.mapping_extension_threshold = mapping_extension_threshold  # \\kappa\n    self.raw_mappings = EntityMapping.read_table_mappings(\n        os.path.join(self.output_path, \"match\", \"raw_mappings.tsv\"),\n        threshold=self.mapping_extension_threshold,\n        relation=\"&lt;EquivalentTo&gt;\",\n    )\n    # keep track of already scored mappings to prevent duplicated predictions\n    self.mapping_score_dict = dict()\n    for m in self.raw_mappings:\n        src_class_iri, tgt_class_iri, score = m.to_tuple(with_score=True)\n        self.mapping_score_dict[(src_class_iri, tgt_class_iri)] = score\n\n    # the threshold for final filtering the extended mappings\n    self.mapping_filtered_threshold = mapping_filtered_threshold  # \\lambda\n\n    # logmap mapping repair folder\n    self.logmap_repair_path = os.path.join(self.output_path, \"match\", \"logmap-repair\")\n\n    # paths for mapping extension and repair\n    self.extended_mapping_path = os.path.join(self.output_path, \"match\", \"extended_mappings.tsv\")\n    self.filtered_mapping_path = os.path.join(self.output_path, \"match\", \"filtered_mappings.tsv\")\n    self.repaired_mapping_path = os.path.join(self.output_path, \"match\", \"repaired_mappings.tsv\")\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.mapping_extension","title":"<code>mapping_extension(max_iter=10)</code>","text":"<p>Iterative mapping extension based on the locality principle.</p> <p>For each class pair \\((c, c')\\) (scored in the global matching phase) with score  \\(\\geq \\kappa\\), search for plausible mappings between the parents of \\(c\\) and \\(c'\\), and between the children of \\(c\\) and \\(c'\\). This is an iterative process as the set  newly discovered mappings can act renew the frontier for searching. Terminate if no new mappings with score \\(\\geq \\kappa\\) can be found or the limit <code>max_iter</code> has  been reached. Note that \\(\\kappa\\) is set to \\(0.9\\) by default (can be altered in the configuration file). The mapping extension progress bar keeps track of the  total number of extended mappings (including the previously predicted ones).</p> <p>A further filtering will be performed by only preserving mappings with score \\(\\geq \\lambda\\), in the original BERTMap paper, \\(\\lambda\\) is determined by the validation mappings, but in practice \\(\\lambda\\) is not a sensitive hyperparameter and validation mappings are often not available. Therefore, we manually set \\(\\lambda\\) to \\(0.9995\\) by default (can be altered in the configuration file). The mapping filtering progress bar keeps track of the  total number of filtered mappings (this bar is purely for logging purpose).</p> <p>Parameters:</p> Name Type Description Default <code>max_iter</code> <code>int</code> <p>The maximum number of mapping extension iterations. Defaults to <code>10</code>.</p> <code>10</code> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def mapping_extension(self, max_iter: int = 10):\nr\"\"\"Iterative mapping extension based on the locality principle.\n\n    For each class pair $(c, c')$ (scored in the global matching phase) with score \n    $\\geq \\kappa$, search for plausible mappings between the parents of $c$ and $c'$,\n    and between the children of $c$ and $c'$. This is an iterative process as the set \n    newly discovered mappings can act renew the frontier for searching. Terminate if\n    no new mappings with score $\\geq \\kappa$ can be found or the limit `max_iter` has \n    been reached. Note that $\\kappa$ is set to $0.9$ by default (can be altered\n    in the configuration file). The mapping extension progress bar keeps track of the \n    total number of extended mappings (including the previously predicted ones).\n\n    A further filtering will be performed by only preserving mappings with score $\\geq \\lambda$,\n    in the original BERTMap paper, $\\lambda$ is determined by the validation mappings, but\n    in practice $\\lambda$ is not a sensitive hyperparameter and validation mappings are often\n    not available. Therefore, we manually set $\\lambda$ to $0.9995$ by default (can be altered\n    in the configuration file). The mapping filtering progress bar keeps track of the \n    total number of filtered mappings (this bar is purely for logging purpose).\n\n    Args:\n        max_iter (int, optional): The maximum number of mapping extension iterations. Defaults to `10`.\n    \"\"\"\n\n    num_iter = 0\n    self.enlighten_status.update(demo=\"Mapping Extension\")\n    extension_progress_bar = self.enlighten_manager.counter(\n        desc=f\"Mapping Extension [Iteration #{num_iter}]\", unit=\"mapping\"\n    )\n    filtering_progress_bar = self.enlighten_manager.counter(\n        desc=f\"Mapping Filtering\", unit=\"mapping\"\n    )\n\n    if os.path.exists(self.extended_mapping_path) and os.path.exists(self.filtered_mapping_path):\n        self.logger.info(\n            f\"Found extended and filtered mapping files at {self.extended_mapping_path}\"\n            + f\" and {self.filtered_mapping_path}.\\nPlease check file integrity; if incomplete, \"\n            + \"delete them and re-run the program.\"\n        )\n\n        # for animation purposes\n        extension_progress_bar.desc = f\"Mapping Extension\"\n        for _ in EntityMapping.read_table_mappings(self.extended_mapping_path):\n            extension_progress_bar.update()\n\n        self.enlighten_status.update(demo=\"Mapping Filtering\")\n        for _ in EntityMapping.read_table_mappings(self.filtered_mapping_path):\n            filtering_progress_bar.update()\n\n        extension_progress_bar.close()\n        filtering_progress_bar.close()\n\n        return\n    # intialise the frontier, explored, final expansion sets with the raw mappings\n    # NOTE be careful of address pointers\n    frontier = [m.to_tuple() for m in self.raw_mappings]\n    expansion = [m.to_tuple(with_score=True) for m in self.raw_mappings]\n    # for animation purposes\n    for _ in range(len(expansion)):\n        extension_progress_bar.update()\n\n    self.logger.info(\n        f\"Start mapping extension for each class pair with score &gt;= {self.mapping_extension_threshold}.\"\n    )\n    while frontier and num_iter &lt; max_iter:\n        new_mappings = []\n        for src_class_iri, tgt_class_iri in frontier:\n            # one hop extension makes sure new mappings are really \"new\"\n            cur_new_mappings = self.one_hop_extend(src_class_iri, tgt_class_iri)\n            extension_progress_bar.update(len(cur_new_mappings))\n            new_mappings += cur_new_mappings\n        # add new mappings to the expansion set\n        expansion += new_mappings\n        # renew frontier with the newly discovered mappings\n        frontier = [(x, y) for x, y, _ in new_mappings]\n\n        self.logger.info(f\"Add {len(new_mappings)} mappings at iteration #{num_iter}.\")\n        num_iter += 1\n        extension_progress_bar.desc = f\"Mapping Extension [Iteration #{num_iter}]\"\n\n    num_extended = len(expansion) - len(self.raw_mappings)\n    self.logger.info(\n        f\"Finished iterative mapping extension with {num_extended} new mappings and in total {len(expansion)} extended mappings.\"\n    )\n\n    extended_mapping_df = pd.DataFrame(expansion, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n    extended_mapping_df.to_csv(self.extended_mapping_path, sep=\"\\t\", index=False)\n\n    self.enlighten_status.update(demo=\"Mapping Filtering\")\n\n    filtered_expansion = [\n        (src, tgt, score) for src, tgt, score in expansion if score &gt;= self.mapping_filtered_threshold\n    ]\n    self.logger.info(\n        f\"Filtered the extended mappings by a threshold of {self.mapping_filtered_threshold}.\"\n        + f\"There are {len(filtered_expansion)} mappings left for mapping repair.\"\n    )\n\n    for _ in range(len(filtered_expansion)):\n        filtering_progress_bar.update()\n\n    filtered_mapping_df = pd.DataFrame(filtered_expansion, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n    filtered_mapping_df.to_csv(self.filtered_mapping_path, sep=\"\\t\", index=False)\n\n    extension_progress_bar.close()\n    filtering_progress_bar.close()\n    return filtered_expansion\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.one_hop_extend","title":"<code>one_hop_extend(src_class_iri, tgt_class_iri, pool_size=200)</code>","text":"<p>Extend mappings from a scored class pair \\((c, c')\\) by searching from one-hop neighbors.</p> <p>Search for plausible mappings between the parents of \\(c\\) and \\(c'\\), and between the children of \\(c\\) and \\(c'\\). Mappings that are not already computed (recorded in <code>self.mapping_score_dict</code>) and have a score \\(\\geq\\) <code>self.mapping_extension_threshold</code> will be returned as new mappings.</p> <p>Parameters:</p> Name Type Description Default <code>src_class_iri</code> <code>str</code> <p>The IRI of the source ontology class \\(c\\).</p> required <code>tgt_class_iri</code> <code>str</code> <p>The IRI of the target ontology class \\(c'\\).</p> required <code>pool_size</code> <code>int</code> <p>The maximum number of plausible mappings to be extended. Defaults to 200.</p> <code>200</code> <p>Returns:</p> Type Description <code>List[EntityMapping]</code> <p>A list of one-hop extended mappings.</p> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def one_hop_extend(self, src_class_iri: str, tgt_class_iri: str, pool_size: int = 200):\nr\"\"\"Extend mappings from a scored class pair $(c, c')$ by\n    searching from one-hop neighbors.\n\n    Search for plausible mappings between the parents of $c$ and $c'$,\n    and between the children of $c$ and $c'$. Mappings that are not\n    already computed (recorded in `self.mapping_score_dict`) and have\n    a score $\\geq$ `self.mapping_extension_threshold` will be returned as\n    **new** mappings.\n\n    Args:\n        src_class_iri (str): The IRI of the source ontology class $c$.\n        tgt_class_iri (str): The IRI of the target ontology class $c'$.\n        pool_size (int, optional): The maximum number of plausible mappings to be extended. Defaults to 200.\n\n    Returns:\n        (List[EntityMapping]): A list of one-hop extended mappings.\n    \"\"\"\n\n    def get_iris(owl_objects):\n        return [str(x.getIRI()) for x in owl_objects]\n\n    src_class = self.src_onto.get_owl_object(src_class_iri)\n    src_class_parent_iris = get_iris(self.src_onto.get_asserted_parents(src_class, named_only=True))\n    src_class_children_iris = get_iris(self.src_onto.get_asserted_children(src_class, named_only=True))\n\n    tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n    tgt_class_parent_iris = get_iris(self.tgt_onto.get_asserted_parents(tgt_class, named_only=True))\n    tgt_class_children_iris = get_iris(self.tgt_onto.get_asserted_children(tgt_class, named_only=True))\n\n    # pair up parents and children, respectively; NOTE set() might not be necessary\n    parent_pairs = list(set(itertools.product(src_class_parent_iris, tgt_class_parent_iris)))\n    children_pairs = list(set(itertools.product(src_class_children_iris, tgt_class_children_iris)))\n\n    candidate_pairs = parent_pairs + children_pairs\n    # downsample if the number of candidates is too large\n    if len(candidate_pairs) &gt; pool_size:\n        candidate_pairs = random.sample(candidate_pairs, pool_size)\n\n    extended_mappings = []\n    for src_candidate_iri, tgt_candidate_iri in parent_pairs + children_pairs:\n\n        # if already computed meaning that it is not a new mapping\n        if (src_candidate_iri, tgt_candidate_iri) in self.mapping_score_dict:\n            continue\n\n        src_candidate_annotations = self.mapping_predictor.src_annotation_index[src_candidate_iri]\n        tgt_candidate_annotations = self.mapping_predictor.tgt_annotation_index[tgt_candidate_iri]\n        score = self.mapping_predictor.bert_mapping_score(src_candidate_annotations, tgt_candidate_annotations)\n        # add to already scored collection\n        self.mapping_score_dict[(src_candidate_iri, tgt_candidate_iri)] = score\n\n        # skip mappings with low scores\n        if score &lt; self.mapping_extension_threshold:\n            continue\n\n        extended_mappings.append((src_candidate_iri, tgt_candidate_iri, score))\n\n    self.logger.info(\n        f\"New mappings (in tuples) extended from {(src_class_iri, tgt_class_iri)} are:\\n\" + f\"{extended_mappings}\"\n    )\n\n    return extended_mappings\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.mapping_repair","title":"<code>mapping_repair()</code>","text":"<p>Repair the filtered mappings with LogMap's debugger.</p> <p>Note</p> <p>A sub-folder under <code>match</code> named <code>logmap-repair</code> contains LogMap-related intermediate files.</p> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def mapping_repair(self):\n\"\"\"Repair the filtered mappings with LogMap's debugger.\n\n    !!! note\n\n        A sub-folder under `match` named `logmap-repair` contains LogMap-related intermediate files.\n    \"\"\"\n\n    # progress bar for animation purposes\n    self.enlighten_status.update(demo=\"Mapping Repairing\")\n    repair_progress_bar = self.enlighten_manager.counter(\n        desc=f\"Mapping Repairing\", unit=\"mapping\"\n    )\n\n    # skip repairing if already found the file\n    if os.path.exists(self.repaired_mapping_path):\n        self.logger.info(\n            f\"Found the repaired mapping file at {self.repaired_mapping_path}.\"\n            + \"\\nPlease check file integrity; if incomplete, \"\n            + \"delete it and re-run the program.\"\n        )\n        # update progress bar for animation purposes\n        for _ in EntityMapping.read_table_mappings(self.repaired_mapping_path):\n            repair_progress_bar.update()\n        repair_progress_bar.close()\n        return \n\n    # start mapping repair\n    self.logger.info(\"Repair the filtered mappings with LogMap debugger.\")\n    # formatting the filtered mappings\n    self.logmap_repair_formatting()\n\n    # run the LogMap repair module on the extended mappings\n    run_logmap_repair(\n        self.src_onto.owl_path,\n        self.tgt_onto.owl_path,\n        os.path.join(self.logmap_repair_path, f\"filtered_mappings_for_LogMap_repair.txt\"),\n        self.logmap_repair_path,\n        Ontology.get_max_jvm_memory()\n    )\n\n    # create table mappings from LogMap repair outputs\n    with open(os.path.join(self.logmap_repair_path, \"mappings_repaired_with_LogMap.tsv\"), \"r\") as f:\n        lines = f.readlines()\n    with open(os.path.join(self.output_path, \"match\", \"repaired_mappings.tsv\"), \"w+\") as f:\n        f.write(\"SrcEntity\\tTgtEntity\\tScore\\n\")\n        for line in lines:\n            src_ent_iri, tgt_ent_iri, score = line.split(\"\\t\")\n            f.write(f\"{src_ent_iri}\\t{tgt_ent_iri}\\t{score}\")\n            repair_progress_bar.update()\n\n    self.logger.info(\"Mapping repair finished.\")\n    repair_progress_bar.close()\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.logmap_repair_formatting","title":"<code>logmap_repair_formatting()</code>","text":"<p>Transform the filtered mapping file into the LogMap format.</p> <p>An auxiliary function of the mapping repair module which requires mappings to be formatted as LogMap's input format.</p> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def logmap_repair_formatting(self):\n\"\"\"Transform the filtered mapping file into the LogMap format.\n\n    An auxiliary function of the mapping repair module which requires mappings\n    to be formatted as LogMap's input format.\n    \"\"\"\n    # read the filtered mapping file and convert to tuples\n    filtered_mappings = EntityMapping.read_table_mappings(self.filtered_mapping_path)\n    filtered_mappings_in_tuples = [m.to_tuple(with_score=True) for m in filtered_mappings]\n\n    # write the mappings into logmap format\n    lines = []\n    for src_class_iri, tgt_class_iri, score in filtered_mappings_in_tuples:\n        lines.append(f\"{src_class_iri}|{tgt_class_iri}|=|{score}|CLS\\n\")\n\n    # create a path to prevent error\n    create_path(self.logmap_repair_path)\n    formatted_file = os.path.join(self.logmap_repair_path, f\"filtered_mappings_for_LogMap_repair.txt\")\n    with open(formatted_file, \"w\") as f:\n        f.writelines(lines)\n\n    return lines\n</code></pre>"},{"location":"deeponto/align/bertsubs/","title":"BERTSubs (Inter)","text":""},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline","title":"<code>BERTSubsInterPipeline(src_onto, tgt_onto, config)</code>","text":"<p>Class for the model training and prediction/validation pipeline of inter-ontology subsumption of BERTSubs.</p> <p>Attributes:</p> Name Type Description <code>src_onto</code> <code>Ontology</code> <p>Source ontology (the sub-class side).</p> <code>tgt_onto</code> <code>Ontology</code> <p>Target ontology (the super-class side).</p> <code>config</code> <code>CfgNode</code> <p>Configuration.</p> <code>src_sampler</code> <code>SubsumptionSampler</code> <p>Object for sampling-related functions of the source ontology.</p> <code>tgt_sampler</code> <code>SubsumptionSampler</code> <p>Object for sampling-related functions of the target ontology.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def __init__(self, src_onto: Ontology, tgt_onto: Ontology, config: CfgNode):\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    self.config = config\n    self.config.label_property = self.config.src_label_property\n    self.src_sampler = SubsumptionSampler(onto=self.src_onto, config=self.config)\n    self.config.label_property = self.config.tgt_label_property\n    self.tgt_sampler = SubsumptionSampler(onto=self.tgt_onto, config=self.config)\n    start_time = datetime.datetime.now()\n\n    read_subsumptions = lambda file_name: [line.strip().split(',') for line in open(file_name).readlines()]\n    test_subsumptions = None if config.test_subsumption_file is None or config.test_subsumption_file == 'None' \\\n        else read_subsumptions(config.test_subsumption_file)\n    valid_subsumptions = None if config.valid_subsumption_file is None or config.valid_subsumption_file == 'None' \\\n        else read_subsumptions(config.valid_subsumption_file)\n\n    if config.use_ontology_subsumptions_training:\n        src_subsumptions = BERTSubsIntraPipeline.extract_subsumptions_from_ontology(onto=self.src_onto,\n                                                                                    subsumption_type=config.subsumption_type)\n        tgt_subsumptions = BERTSubsIntraPipeline.extract_subsumptions_from_ontology(onto=self.tgt_onto,\n                                                                                    subsumption_type=config.subsumption_type)\n        src_subsumptions0, tgt_subsumptions0 = [], []\n        if config.subsumption_type == 'named_class':\n            for subs in src_subsumptions:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                src_subsumptions0.append([str(c1.getIRI()), str(c2.getIRI())])\n            for subs in tgt_subsumptions:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                tgt_subsumptions0.append([str(c1.getIRI()), str(c2.getIRI())])\n        elif config.subsumption_type == 'restriction':\n            for subs in src_subsumptions:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                src_subsumptions0.append([str(c1.getIRI()), str(c2)])\n            for subs in tgt_subsumptions:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                tgt_subsumptions0.append([str(c1.getIRI()), str(c2)])\n            restrictions = BERTSubsIntraPipeline.extract_restrictions_from_ontology(onto=self.tgt_onto)\n            print('restrictions in the target ontology: %d' % len(restrictions))\n        else:\n            warnings.warn('Unknown subsumption type %s' % config.subsumption_type)\n            sys.exit(0)\n        print('Positive train subsumptions from the source/target ontology: %d/%d' % (\n            len(src_subsumptions0), len(tgt_subsumptions0)))\n\n        src_tr = self.src_sampler.generate_samples(subsumptions=src_subsumptions0)\n        tgt_tr = self.tgt_sampler.generate_samples(subsumptions=tgt_subsumptions0)\n    else:\n        src_tr, tgt_tr = [], []\n\n    if config.train_subsumption_file is None or config.train_subsumption_file == 'None':\n        tr = src_tr + tgt_tr\n    else:\n        train_subsumptions = read_subsumptions(config.train_subsumption_file)\n        tr = self.inter_ontology_sampling(subsumptions=train_subsumptions, pos_dup=config.fine_tune.train_pos_dup,\n                                          neg_dup=config.fine_tune.train_neg_dup)\n        tr = tr + src_tr + tgt_tr\n\n    if len(tr) == 0:\n        warnings.warn('No training samples extracted')\n        if config.fine_tune.do_fine_tune:\n            sys.exit(0)\n\n    end_time = datetime.datetime.now()\n    print('data pre-processing costs %.1f minutes' % ((end_time - start_time).seconds / 60))\n\n    start_time = datetime.datetime.now()\n    torch.cuda.empty_cache()\n    bert_trainer = BERTSubsumptionClassifierTrainer(config.fine_tune.pretrained, train_data=tr,\n                                                    val_data=tr[0:int(len(tr) / 5)],\n                                                    max_length=config.prompt.max_length,\n                                                    early_stop=config.fine_tune.early_stop)\n\n    epoch_steps = len(bert_trainer.tra) // config.fine_tune.batch_size  # total steps of an epoch\n    logging_steps = int(epoch_steps * 0.02) if int(epoch_steps * 0.02) &gt; 0 else 5\n    eval_steps = 5 * logging_steps\n    training_args = TrainingArguments(\n        output_dir=config.fine_tune.output_dir,\n        num_train_epochs=config.fine_tune.num_epochs,\n        per_device_train_batch_size=config.fine_tune.batch_size,\n        per_device_eval_batch_size=config.fine_tune.batch_size,\n        warmup_ratio=config.fine_tune.warm_up_ratio,\n        weight_decay=0.01,\n        logging_steps=logging_steps,\n        logging_dir=f\"{config.fine_tune.output_dir}/tb\",\n        eval_steps=eval_steps,\n        evaluation_strategy=\"steps\",\n        do_train=True,\n        do_eval=True,\n        save_steps=eval_steps,\n        load_best_model_at_end=True,\n        save_total_limit=1,\n        metric_for_best_model=\"accuracy\",\n        greater_is_better=True\n    )\n    if config.fine_tune.do_fine_tune and (config.prompt.prompt_type == 'traversal' or (\n            config.prompt.prompt_type == 'path' and config.prompt.use_sub_special_token)):\n        bert_trainer.add_special_tokens(['&lt;SUB&gt;'])\n\n    bert_trainer.train(train_args=training_args, do_fine_tune=config.fine_tune.do_fine_tune)\n    if config.fine_tune.do_fine_tune:\n        bert_trainer.trainer.save_model(\n            output_dir=os.path.join(config.fine_tune.output_dir, 'fine-tuned-checkpoint'))\n        print('fine-tuning done, fine-tuned model saved')\n    else:\n        print('pretrained or fine-tuned model loaded.')\n    end_time = datetime.datetime.now()\n    print('Fine-tuning costs %.1f minutes' % ((end_time - start_time).seconds / 60))\n\n    bert_trainer.model.eval()\n    self.device = torch.device(f\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n    bert_trainer.model.to(self.device)\n    self.tokenize = lambda x: bert_trainer.tokenizer(x, max_length=config.prompt.max_length, truncation=True,\n                                                     padding=True, return_tensors=\"pt\")\n    softmax = torch.nn.Softmax(dim=1)\n    self.classifier = lambda x: softmax(bert_trainer.model(**x).logits)[:, 1]\n\n    if valid_subsumptions is not None:\n        self.evaluate(target_subsumptions=valid_subsumptions, test_type='valid')\n\n    if test_subsumptions is not None:\n        if config.test_type == 'evaluation':\n            self.evaluate(target_subsumptions=test_subsumptions, test_type='test')\n        elif config.test_type == 'prediction':\n            self.predict(target_subsumptions=test_subsumptions)\n        else:\n            warnings.warn(\"Unknown test_type: %s\" % config.test_type)\n    print('\\n ------------------------- done! ---------------------------\\n\\n\\n')\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.inter_ontology_sampling","title":"<code>inter_ontology_sampling(subsumptions, pos_dup=1, neg_dup=1)</code>","text":"<p>Transform inter-ontology subsumptions to two-string samples</p> <p>Parameters:</p> Name Type Description Default <code>subsumptions</code> <code>List[List]</code> <p>A list of subsumptions; each subsumption is composed of two IRIs.</p> required <code>pos_dup</code> <code>int</code> <p>Positive sample duplication.</p> <code>1</code> <code>neg_dup</code> <code>int</code> <p>Negative sample duplication.</p> <code>1</code> Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def inter_ontology_sampling(self, subsumptions: List[List], pos_dup: int = 1, neg_dup: int = 1):\nr\"\"\"Transform inter-ontology subsumptions to two-string samples\n    Args:\n        subsumptions (List[List]): A list of subsumptions; each subsumption is composed of two IRIs.\n        pos_dup (int): Positive sample duplication.\n        neg_dup (int): Negative sample duplication.\n    \"\"\"\n    pos_samples = list()\n    for subs in subsumptions:\n        sub_strs = self.src_sampler.subclass_to_strings(subcls=subs[0])\n        sup_strs = self.tgt_sampler.supclass_to_strings(supcls=subs[1],\n                                                        subsumption_type=self.config.subsumption_type)\n        for sub_str in sub_strs:\n            for sup_str in sup_strs:\n                pos_samples.append([sub_str, sup_str, 1])\n    pos_samples = pos_dup * pos_samples\n\n    neg_subsumptions = list()\n    for subs in subsumptions:\n        for _ in range(neg_dup):\n            neg_c = self.tgt_sampler.get_negative_sample(subclass_iri=subs[1],\n                                                         subsumption_type=self.config.subsumption_type)\n            neg_subsumptions.append([subs[0], neg_c])\n\n    neg_samples = list()\n    for subs in neg_subsumptions:\n        sub_strs = self.src_sampler.subclass_to_strings(subcls=subs[0])\n        sup_strs = self.tgt_sampler.supclass_to_strings(supcls=subs[1],\n                                                        subsumption_type=self.config.subsumption_type)\n        for sub_str in sub_strs:\n            for sup_str in sup_strs:\n                neg_samples.append([sub_str, sup_str, 0])\n\n    if len(neg_samples) &lt; len(pos_samples):\n        neg_samples = neg_samples + [random.choice(neg_samples) for _ in range(len(pos_samples) - len(neg_samples))]\n    if len(neg_samples) &gt; len(pos_samples):\n        pos_samples = pos_samples + [random.choice(pos_samples) for _ in range(len(neg_samples) - len(pos_samples))]\n    print('training mappings, pos_samples: %d, neg_samples: %d' % (len(pos_samples), len(neg_samples)))\n    all_samples = [s for s in pos_samples + neg_samples if s[0] != '' and s[1] != '']\n    return all_samples\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.inter_ontology_subsumption_to_sample","title":"<code>inter_ontology_subsumption_to_sample(subsumption)</code>","text":"<p>Transform an inter ontology subsumption into a sample (a two-string list).</p> <p>Parameters:</p> Name Type Description Default <code>subsumption</code> <code>List</code> <p>a subsumption composed of two IRIs.</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def inter_ontology_subsumption_to_sample(self, subsumption: List):\nr\"\"\"Transform an inter ontology subsumption into a sample (a two-string list).\n\n    Args:\n        subsumption (List): a subsumption composed of two IRIs.\n    \"\"\"\n    subcls, supcls = subsumption[0], subsumption[1]\n    substrs = self.src_sampler.subclass_to_strings(subcls=subcls)\n    supstrs = self.tgt_sampler.supclass_to_strings(supcls=supcls, subsumption_type='named_class')\n    samples = list()\n    for substr in substrs:\n        for supstr in supstrs:\n            samples.append([substr, supstr])\n    return samples\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.score","title":"<code>score(samples)</code>","text":"<p>Score the samples with the classifier.</p> <p>Parameters:</p> Name Type Description Default <code>samples</code> <code>List[List]</code> <p>Each item is a list with two strings (input).</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def score(self, samples):\nr\"\"\"Score the samples with the classifier.\n\n    Args:\n        samples (List[List]): Each item is a list with two strings (input).\n    \"\"\"\n    sample_size = len(samples)\n    scores = np.zeros(sample_size)\n    batch_num = math.ceil(sample_size / self.config.evaluation.batch_size)\n    for i in range(batch_num):\n        j = (i + 1) * self.config.evaluation.batch_size \\\n            if (i + 1) * self.config.evaluation.batch_size &lt;= sample_size else sample_size\n        inputs = self.tokenize(samples[i * self.config.evaluation.batch_size:j])\n        inputs.to(self.device)\n        with torch.no_grad():\n            batch_scores = self.classifier(inputs)\n        scores[i * self.config.evaluation.batch_size:j] = batch_scores.cpu().numpy()\n    return scores\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.evaluate","title":"<code>evaluate(target_subsumptions, test_type='test')</code>","text":"<p>Test and calculate the metrics according to a given list of subsumptions.</p> <p>Parameters:</p> Name Type Description Default <code>target_subsumptions</code> <code>List[List]</code> <p>A list of subsumptions, each of which of is a two-component list <code>(subclass_iri, super_class_iri_or_str)</code>.</p> required <code>test_type</code> <code>str</code> <p><code>\"test\"</code> or <code>\"valid\"</code>.</p> <code>'test'</code> Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def evaluate(self, target_subsumptions: List[List], test_type: str = 'test'):\nr\"\"\"Test and calculate the metrics according to a given list of subsumptions.\n\n    Args:\n        target_subsumptions (List[List]): A list of subsumptions, each of which of is a two-component list `(subclass_iri, super_class_iri_or_str)`.\n        test_type (str): `\"test\"` or `\"valid\"`.\n    \"\"\"\n    MRR_sum, hits1_sum, hits5_sum, hits10_sum = 0, 0, 0, 0\n    MRR, Hits1, Hits5, Hits10 = 0, 0, 0, 0\n    size_sum, size_n = 0, 0\n    for k0, test in enumerate(target_subsumptions):\n        subcls, gt = test[0], test[1]\n        candidates = test[1:]\n        candidate_subsumptions = [[subcls, c] for c in candidates]\n        candidate_scores = np.zeros(len(candidate_subsumptions))\n        for k1, candidate_subsumption in enumerate(candidate_subsumptions):\n            samples = self.inter_ontology_subsumption_to_sample(subsumption=candidate_subsumption)\n            size_sum += len(samples)\n            size_n += 1\n            scores = self.score(samples=samples)\n            candidate_scores[k1] = np.average(scores)\n\n        sorted_indexes = np.argsort(candidate_scores)[::-1]\n        sorted_classes = [candidates[i] for i in sorted_indexes]\n        rank = sorted_classes.index(gt) + 1\n        MRR_sum += 1.0 / rank\n        hits1_sum += 1 if gt in sorted_classes[:1] else 0\n        hits5_sum += 1 if gt in sorted_classes[:5] else 0\n        hits10_sum += 1 if gt in sorted_classes[:10] else 0\n        num = k0 + 1\n        MRR, Hits1, Hits5, Hits10 = MRR_sum / num, hits1_sum / num, hits5_sum / num, hits10_sum / num\n        if num % 500 == 0:\n            print('\\n%d tested, MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n' % (\n                num, MRR, Hits1, Hits5, Hits10))\n    print('\\n[%s], MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n' % (test_type, MRR, Hits1, Hits5, Hits10))\n    print('%.2f samples per testing subsumption' % (size_sum / size_n))\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.predict","title":"<code>predict(target_subsumptions)</code>","text":"<p>Predict a score for each given subsumption. </p> <p>The scores will be saved in <code>test_subsumption_scores.csv</code>.</p> <p>Parameters:</p> Name Type Description Default <code>target_subsumptions</code> <code>List[List]</code> <p>Each item is a list with the first element as the sub-class,                               and the remaining elements as n candidate super-classes.</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def predict(self, target_subsumptions: List[List]):\nr\"\"\"Predict a score for each given subsumption. \n\n    The scores will be saved in `test_subsumption_scores.csv`.\n\n    Args:\n        target_subsumptions (List[List]): Each item is a list with the first element as the sub-class,\n                                          and the remaining elements as n candidate super-classes.\n    \"\"\"\n    out_lines = []\n    for test in target_subsumptions:\n        subcls, candidates = test[0], test[1:]\n        candidate_subsumptions = [[subcls, c] for c in candidates]\n        candidate_scores = []\n\n        for candidate_subsumption in candidate_subsumptions:\n            samples = self.inter_ontology_subsumption_to_sample(subsumption=candidate_subsumption)\n            scores = self.score(samples=samples)\n            candidate_scores.append(np.average(scores))\n        out_lines.append(','.join([str(i) for i in candidate_scores]))\n\n    out_file = 'test_subsumption_scores.csv'\n    with open(out_file, 'w') as f:\n        for line in out_lines:\n            f.write('%s\\n' % line)\n    print('Predicted subsumption scores are saved to %s' % out_file)\n</code></pre>"},{"location":"deeponto/align/logmap/","title":"LogMap","text":"<p>Run LogMap matcher 4.0 in a <code>jar</code> command.</p> <p>Credit</p> <p>See LogMap repository at: https://github.com/ernestojimenezruiz/logmap-matcher.</p>"},{"location":"deeponto/align/logmap/#deeponto.align.logmap.run_logmap_repair","title":"<code>run_logmap_repair(src_onto_path, tgt_onto_path, mapping_file_path, output_path, max_jvm_memory='10g')</code>","text":"<p>Run the repair module of LogMap with <code>java -jar</code>.</p> Source code in <code>src/deeponto/align/logmap/__init__.py</code> <pre><code>def run_logmap_repair(\n    src_onto_path: str, tgt_onto_path: str, mapping_file_path: str, output_path: str, max_jvm_memory: str = \"10g\"\n):\n\"\"\"Run the repair module of LogMap with `java -jar`.\"\"\"\n\n    # find logmap directory\n    logmap_path = os.path.dirname(__file__)\n\n    # obtain absolute paths\n    src_onto_path = os.path.abspath(src_onto_path)\n    tgt_onto_path = os.path.abspath(tgt_onto_path)\n    mapping_file_path = os.path.abspath(mapping_file_path)\n    output_path = os.path.abspath(output_path)\n\n    # run jar command\n    print(f\"Run the repair module of LogMap from {logmap_path}.\")\n    repair_command = (\n        f\"java -Xms500m -Xmx{max_jvm_memory} -DentityExpansionLimit=100000000 -jar {logmap_path}/logmap-matcher-4.0.jar DEBUGGER \"\n        + f\"file:{src_onto_path} file:{tgt_onto_path} TXT {mapping_file_path}\"\n        + f\" {output_path} false false\"\n    )\n    print(f\"The jar command is:\\n{repair_command}.\")\n    run_jar(repair_command)\n</code></pre>"},{"location":"deeponto/complete/ontolama/","title":"OntoLAMA","text":""},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.inference.run_inference","title":"<code>run_inference(config, args)</code>","text":"<p>Main entry for running the OpenPrompt script.</p> Source code in <code>src/deeponto/complete/ontolama/inference.py</code> <pre><code>def run_inference(config, args):\n\"\"\"Main entry for running the OpenPrompt script.\n    \"\"\"\n    global CUR_TEMPLATE, CUR_VERBALIZER\n    # exit()\n    # init logger, create log dir and set log level, etc.\n    if args.resume and args.test:\n        raise Exception(\"cannot use flag --resume and --test together\")\n    if args.resume or args.test:\n        config.logging.path = EXP_PATH = args.resume or args.test\n    else:\n        EXP_PATH = config_experiment_dir(config)\n        init_logger(\n            os.path.join(EXP_PATH, \"log.txt\"),\n            config.logging.file_level,\n            config.logging.console_level,\n        )\n        # save config to the logger directory\n        save_config_to_yaml(config)\n\n    # load dataset. The valid_dataset can be None\n    train_dataset, valid_dataset, test_dataset, Processor = OntoLAMADataProcessor.load_inference_dataset(\n        config, test=args.test is not None or config.learning_setting == \"zero_shot\"\n    )\n\n    # main\n    if config.learning_setting == \"full\":\n        res = trainer(\n            EXP_PATH,\n            config,\n            Processor,\n            resume=args.resume,\n            test=args.test,\n            train_dataset=train_dataset,\n            valid_dataset=valid_dataset,\n            test_dataset=test_dataset,\n        )\n    elif config.learning_setting == \"few_shot\":\n        if config.few_shot.few_shot_sampling is None:\n            raise ValueError(\"use few_shot setting but config.few_shot.few_shot_sampling is not specified\")\n        seeds = config.sampling_from_train.seed\n        res = 0\n        for seed in seeds:\n            if not args.test:\n                sampler = FewShotSampler(\n                    num_examples_per_label=config.sampling_from_train.num_examples_per_label,\n                    also_sample_dev=config.sampling_from_train.also_sample_dev,\n                    num_examples_per_label_dev=config.sampling_from_train.num_examples_per_label_dev,\n                )\n                train_sampled_dataset, valid_sampled_dataset = sampler(\n                    train_dataset=train_dataset, valid_dataset=valid_dataset, seed=seed\n                )\n                result = trainer(\n                    os.path.join(EXP_PATH, f\"seed-{seed}\"),\n                    config,\n                    Processor,\n                    resume=args.resume,\n                    test=args.test,\n                    train_dataset=train_sampled_dataset,\n                    valid_dataset=valid_sampled_dataset,\n                    test_dataset=test_dataset,\n                )\n            else:\n                result = trainer(\n                    os.path.join(EXP_PATH, f\"seed-{seed}\"),\n                    config,\n                    Processor,\n                    test=args.test,\n                    test_dataset=test_dataset,\n                )\n            res += result\n        res /= len(seeds)\n    elif config.learning_setting == \"zero_shot\":\n        res = trainer(\n            EXP_PATH,\n            config,\n            Processor,\n            zero=True,\n            train_dataset=train_dataset,\n            valid_dataset=valid_dataset,\n            test_dataset=test_dataset,\n        )\n\n    return config, CUR_TEMPLATE, CUR_VERBALIZER\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase","title":"<code>SubsumptionSamplerBase(onto)</code>","text":"<p>Base Class for Sampling Subsumption Pairs.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def __init__(self, onto: Ontology):\n    self.onto = onto\n    self.progress_manager = enlighten.get_manager()\n\n    # for faster sampling\n    self.concept_iris = list(self.onto.owl_classes.keys())\n    self.object_property_iris = list(self.onto.owl_object_properties.keys())\n    self.sibling_concept_groups = self.onto.sibling_class_groups\n    self.sibling_auxiliary_dict = defaultdict(list)\n    for i, sib_group in enumerate(self.sibling_concept_groups):\n        for sib in sib_group:\n            self.sibling_auxiliary_dict[sib].append(i)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_named_concept","title":"<code>random_named_concept()</code>","text":"<p>Randomly draw a named concept's IRI.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def random_named_concept(self) -&gt; str:\n\"\"\"Randomly draw a named concept's IRI.\"\"\"\n    return random.choice(self.concept_iris)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_object_property","title":"<code>random_object_property()</code>","text":"<p>Randomly draw a object property's IRI.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def random_object_property(self) -&gt; str:\n\"\"\"Randomly draw a object property's IRI.\"\"\"\n    return random.choice(self.object_property_iris)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.get_siblings","title":"<code>get_siblings(concept_iri)</code>","text":"<p>Get the sibling concepts of the given concept.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def get_siblings(self, concept_iri: str):\n\"\"\"Get the sibling concepts of the given concept.\"\"\"\n    sibling_group = self.sibling_auxiliary_dict[concept_iri]\n    sibling_group = [self.sibling_concept_groups[i] for i in sibling_group]\n    sibling_group = list(itertools.chain.from_iterable(sibling_group))\n    return sibling_group\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_sibling","title":"<code>random_sibling(concept_iri)</code>","text":"<p>Randomly draw a sibling concept for a given concept.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def random_sibling(self, concept_iri: str) -&gt; str:\n\"\"\"Randomly draw a sibling concept for a given concept.\"\"\"\n    sibling_group = self.get_siblings(concept_iri)\n    if sibling_group:\n        return random.choice(sibling_group)\n    else:\n        # not every concept has a sibling concept\n        return None\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler","title":"<code>AtomicSubsumptionSampler(onto)</code>","text":"<p>             Bases: <code>SubsumptionSamplerBase</code></p> <p>Sampler for constructing the Atomic Subsumption Inference (SI) dataset.</p> <p>Positive samples come from the entailed subsumptions.</p> <p>Soft negative samples come from the pairs of randomly selected concepts, subject to passing the assumed disjointness check.</p> <p>Hard negative samples come from the pairs of randomly selected sibling concepts, subject to passing the assumed disjointness check.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def __init__(self, onto: Ontology):\n    super().__init__(onto)\n\n    # compute the sibling concept pairs for faster hard negative sampling\n    self.sibling_pairs = []\n    for sib_group in self.sibling_concept_groups:\n        self.sibling_pairs += [(x, y) for x, y in itertools.product(sib_group, sib_group) if x != y]\n    self.maximum_num_hard_negatives = len(self.sibling_pairs)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler.positive_sampling","title":"<code>positive_sampling(num_samples=None)</code>","text":"<p>Sample named concept pairs that are involved in a subsumption axiom.</p> <p>An extracted pair \\((C, D)\\) indicates \\(\\mathcal{O} \\models C \\sqsubseteq D\\) where \\(\\mathcal{O}\\) is the input ontology.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def positive_sampling(self, num_samples: Optional[int] = None):\nr\"\"\"Sample named concept pairs that are involved in a subsumption axiom.\n\n    An extracted pair $(C, D)$ indicates $\\mathcal{O} \\models C \\sqsubseteq D$ where\n    $\\mathcal{O}$ is the input ontology.\n    \"\"\"\n    pbar = self.progress_manager.counter(desc=\"Sample Positive Subsumptions\", unit=\"pair\")\n    positives = []\n    for concept_iri in self.concept_iris:\n        owl_concept = self.onto.owl_classes[concept_iri]\n        for subsumer_iri in self.onto.reasoner.get_inferred_super_entities(owl_concept, direct=False):\n            positives.append((concept_iri, subsumer_iri))\n            pbar.update()\n    positives = list(set(sorted(positives)))\n    if num_samples:\n        positives = random.sample(positives, num_samples)\n    print(f\"Sample {len(positives)} unique positive subsumption pairs.\")\n    return positives\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler.negative_sampling","title":"<code>negative_sampling(negative_sample_type, num_samples, apply_assumed_disjointness_alternative=True)</code>","text":"<p>Sample named concept pairs that are involved in a disjoiness (assumed) axiom, which then implies non-subsumption.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def negative_sampling(\n    self,\n    negative_sample_type: str,\n    num_samples: int,\n    apply_assumed_disjointness_alternative: bool = True,\n):\nr\"\"\"Sample named concept pairs that are involved in a disjoiness (assumed) axiom, which then\n    implies non-subsumption.\n    \"\"\"\n    if negative_sample_type == \"soft\":\n        draw_one = lambda: tuple(random.sample(self.concept_iris, k=2))\n    elif negative_sample_type == \"hard\":\n        draw_one = lambda: random.choice(self.sibling_pairs)\n    else:\n        raise RuntimeError(f\"{negative_sample_type} not supported.\")\n\n    negatives = []\n    max_iter = 2 * num_samples\n\n    # which method to validate the negative sample\n    valid_negative = self.onto.reasoner.check_assumed_disjoint\n    if apply_assumed_disjointness_alternative:\n        valid_negative = self.onto.reasoner.check_assumed_disjoint_alternative\n\n    print(f\"Sample {negative_sample_type} negative subsumption pairs.\")\n    # create two bars for process tracking\n    added_bar = self.progress_manager.counter(total=num_samples, desc=\"Sample Negative Subsumptions\", unit=\"pair\")\n    iter_bar = self.progress_manager.counter(total=max_iter, desc=\"#Iteration\", unit=\"it\")\n    i = 0\n    added = 0\n    while added &lt; num_samples and i &lt; max_iter:\n        sub_concept_iri, super_concept_iri = draw_one()\n        sub_concept = self.onto.get_owl_object(sub_concept_iri)\n        super_concept = self.onto.get_owl_object(super_concept_iri)\n        # collect class iri if accepted\n        if valid_negative(sub_concept, super_concept):\n            neg = (sub_concept_iri, super_concept_iri)\n            negatives.append(neg)\n            added += 1\n            added_bar.update(1)\n            if added == num_samples:\n                negatives = list(set(sorted(negatives)))\n                added = len(negatives)\n                added_bar.count = added\n        i += 1\n        iter_bar.update(1)\n    negatives = list(set(sorted(negatives)))\n    print(f\"Sample {len(negatives)} unique positive subsumption pairs.\")\n    return negatives\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler","title":"<code>ComplexSubsumptionSampler(onto)</code>","text":"<p>             Bases: <code>SubsumptionSamplerBase</code></p> <p>Sampler for constructing the Complex Subsumption Inference (SI) dataset.</p> <p>To obtain complex concept expressions on both sides of the subsumption relationship (as a sub-concept or a super-concept), this sampler utilises the equivalence axioms in the form of \\(C \\equiv C_{comp}\\) where \\(C\\) is atomic and \\(C_{comp}\\) is complex.</p> <p>An equivalence axiom like \\(C \\equiv C_{comp}\\) is deemed as an anchor axiom.</p> <p>Positive samples are in the form of \\(C_{sub} \\sqsubseteq C_{comp}\\) or \\(C_{comp} \\sqsubseteq C_{super}\\) where \\(C_{sub}\\) is an entailed sub-concept of \\(C\\) and \\(C_{comp}\\), \\(C_{super}\\) is an entailed super-concept of \\(C\\) and \\(C_{comp}\\).</p> <p>Negative samples are formed by replacing one of the named entities in the anchor axiom, the modified sub-concept and super-concept need to pass the assumed disjointness check to be accepted as a valid negative sample. Without loss of generality, suppose we choose \\(C \\sqsubseteq C_{comp}\\) and replace a named entity in \\(C_{comp}'\\) to form \\(C \\sqsubseteq C_{comp}'\\), then \\(C\\) and \\(C_{comp}'\\) is a valid negative only if they satisfy the assumed disjointness check.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def __init__(self, onto: Ontology):\n    super().__init__(onto)\n    self.anchor_axioms = self.onto.get_equivalence_axioms(\"Classes\")\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.positive_sampling_from_anchor","title":"<code>positive_sampling_from_anchor(anchor_axiom)</code>","text":"<p>Returns all positive subsumption pairs extracted from an anchor equivalence axiom.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def positive_sampling_from_anchor(self, anchor_axiom: OWLAxiom):\n\"\"\"Returns all positive subsumption pairs extracted from an anchor equivalence axiom.\"\"\"\n    sub_axiom = list(anchor_axiom.asOWLSubClassOfAxioms())[0]\n    atomic_concept, complex_concept = sub_axiom.getSubClass(), sub_axiom.getSuperClass()\n    # determine which is the atomic concept\n    if complex_concept.isClassExpressionLiteral():\n        atomic_concept, complex_concept = complex_concept, atomic_concept\n\n    # intialise the positive samples from the anchor equivalence axiom\n    positives = list(anchor_axiom.asOWLSubClassOfAxioms())\n    for super_concept_iri in self.onto.reasoner.get_inferred_super_entities(atomic_concept, direct=False):\n        positives.append(\n            self.onto.owl_data_factory.getOWLSubClassOfAxiom(\n                complex_concept, self.onto.get_owl_object(super_concept_iri)\n            )\n        )\n    for sub_concept_iri in self.onto.reasoner.get_inferred_sub_entities(atomic_concept, direct=False):\n        positives.append(\n            self.onto.owl_data_factory.getOWLSubClassOfAxiom(\n                self.onto.get_owl_object(sub_concept_iri), complex_concept\n            )\n        )\n\n    # TESTING\n    # for p in positives:\n    #     assert self.onto.reasoner.owl_reasoner.isEntailed(p)    \n\n    return list(set(sorted(positives)))\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.positive_sampling","title":"<code>positive_sampling(num_samples_per_anchor=10)</code>","text":"<p>Sample positive subsumption axioms that involve one atomic and one complex concepts.</p> <p>An extracted pair \\((C, D)\\) indicates \\(\\mathcal{O} \\models C \\sqsubseteq D\\) where \\(\\mathcal{O}\\) is the input ontology.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def positive_sampling(self, num_samples_per_anchor: Optional[int] = 10):\nr\"\"\"Sample positive subsumption axioms that involve one atomic and one complex concepts.\n\n    An extracted pair $(C, D)$ indicates $\\mathcal{O} \\models C \\sqsubseteq D$ where\n    $\\mathcal{O}$ is the input ontology.\n    \"\"\"\n    print(f\"Maximum number of positive samples for each anchor is set to {num_samples_per_anchor}.\")\n    pbar = self.progress_manager.counter(desc=\"Sample Positive Subsumptions from\", unit=\"anchor axiom\")\n    positives = dict()\n    for anchor in self.anchor_axioms:\n        positives_from_anchor = self.positive_sampling_from_anchor(anchor)\n        if num_samples_per_anchor and num_samples_per_anchor &lt; len(positives_from_anchor):\n            positives_from_anchor = random.sample(positives_from_anchor, k = num_samples_per_anchor)\n        positives[str(anchor)] = positives_from_anchor\n        pbar.update()\n    # positives = list(set(sorted(positives)))\n    print(f\"Sample {sum([len(v) for v in positives.values()])} unique positive subsumption pairs.\")\n    return positives\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.negative_sampling","title":"<code>negative_sampling(num_samples_per_anchor=10)</code>","text":"<p>Sample negative subsumption axioms that involve one atomic and one complex concepts.</p> <p>An extracted pair \\((C, D)\\) indicates \\(C\\) and \\(D\\) pass the assumed disjointness check.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def negative_sampling(self, num_samples_per_anchor: Optional[int] = 10):\nr\"\"\"Sample negative subsumption axioms that involve one atomic and one complex concepts.\n\n    An extracted pair $(C, D)$ indicates $C$ and $D$ pass the [assumed disjointness check][deeponto.onto.OntologyReasoner.check_assumed_disjoint].\n    \"\"\"\n    print(f\"Maximum number of negative samples for each anchor is set to {num_samples_per_anchor}.\")\n    pbar = self.progress_manager.counter(desc=\"Sample Negative Subsumptions from\", unit=\"anchor axiom\")\n    negatives = dict()\n    for anchor in self.anchor_axioms:\n        negatives_from_anchor = []\n        i, max_iter = 0, num_samples_per_anchor + 2\n        while i &lt; max_iter and len(negatives_from_anchor) &lt; num_samples_per_anchor:\n            corrupted_anchor = self.random_corrupt(anchor)\n            corrupted_sub_axiom = random.choice(list(corrupted_anchor.asOWLSubClassOfAxioms()))\n            sub_concept, super_concept = corrupted_sub_axiom.getSubClass(), corrupted_sub_axiom.getSuperClass()\n            if self.onto.reasoner.check_assumed_disjoint_alternative(sub_concept, super_concept):\n                negatives_from_anchor.append(corrupted_sub_axiom)\n            i += 1\n        negatives[str(anchor)] = list(set(sorted(negatives_from_anchor)))\n        pbar.update()\n    # negatives = list(set(sorted(negatives)))\n    print(f\"Sample {sum([len(v) for v in negatives.values()])} unique positive subsumption pairs.\")\n    return negatives\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.random_corrupt","title":"<code>random_corrupt(axiom)</code>","text":"<p>Randomly change an IRI in the input axiom and return a new one.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def random_corrupt(self, axiom: OWLAxiom):\n\"\"\"Randomly change an IRI in the input axiom and return a new one.\n    \"\"\"\n    replaced_iri = random.choice(re.findall(IRI, str(axiom)))[1:-1]\n    replaced_entity = self.onto.get_owl_object(replaced_iri)\n    replacement_iri = None\n    if self.onto.get_entity_type(replaced_entity) == \"Classes\":\n        replacement_iri = self.random_named_concept()\n    elif self.onto.get_entity_type(replaced_entity) == \"ObjectProperties\":\n        replacement_iri = self.random_object_property()\n    else:\n        # NOTE: to extend to other types of entities in future\n        raise RuntimeError(\"Unknown type of axiom.\")\n    return self.onto.replace_entity(axiom, replaced_iri, replacement_iri)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor","title":"<code>OntoLAMADataProcessor()</code>","text":"<p>             Bases: <code>DataProcessor</code></p> <p>Class for processing the OntoLAMA data points.</p> Source code in <code>src/deeponto/complete/ontolama/data_processor.py</code> <pre><code>def __init__(self):\n    super().__init__()\n    self.labels = [\"negative\", \"positive\"]\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.load_dataset","title":"<code>load_dataset(task_name, split)</code>  <code>staticmethod</code>","text":"<p>Load a specific OntoLAMA dataset from huggingface dataset hub.</p> Source code in <code>src/deeponto/complete/ontolama/data_processor.py</code> <pre><code>@staticmethod\ndef load_dataset(task_name: str, split: str):\n\"\"\"Load a specific OntoLAMA dataset from huggingface dataset hub.\"\"\"\n    # TODO: remove use_auth_token after going to public\n    return load_dataset(\"krr-oxford/OntoLAMA\", task_name, split=split, use_auth_token=True)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.get_examples","title":"<code>get_examples(task_name, split)</code>","text":"<p>Load a specific OntoLAMA dataset and transform the data points into input examples for prompt-based inference.</p> Source code in <code>src/deeponto/complete/ontolama/data_processor.py</code> <pre><code>def get_examples(self, task_name, split):\n\"\"\"Load a specific OntoLAMA dataset and transform the data points into\n    input examples for prompt-based inference.\n    \"\"\"\n\n    dataset = self.load_dataset(task_name, split)\n\n    premise_name = \"v_sub_concept\"\n    hypothesis_name = \"v_super_concept\"\n    # different data fields for the bimnli dataset\n    if \"bimnli\" in task_name:\n        premise_name = \"premise\"\n        hypothesis_name = \"hypothesis\"\n\n    prompt_samples = []\n    for samp in dataset:\n        inp = InputExample(text_a=samp[premise_name], text_b=samp[hypothesis_name], label=samp[\"label\"])\n        prompt_samples.append(inp)\n\n    return prompt_samples\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.load_inference_dataset","title":"<code>load_inference_dataset(config, return_class=True, test=False)</code>  <code>classmethod</code>","text":"<p>A plm loader using a global config. It will load the train, valid, and test set (if exists) simulatenously.</p> <p>Parameters:</p> Name Type Description Default <code>config</code> <code>CfgNode</code> <p>The global config from the CfgNode.</p> required <code>return_class</code> <code>bool</code> <p>Whether return the data processor class for future usage.</p> <code>True</code> <p>Returns:</p> Type Description <code>Optional[List[InputExample]]</code> <p>The train dataset.</p> <code>Optional[List[InputExample]]</code> <p>The valid dataset.</p> <code>Optional[List[InputExample]]</code> <p>The test dataset.</p> <code>Optional[OntoLAMADataProcessor]</code> <p>The data processor object.</p> Source code in <code>src/deeponto/complete/ontolama/data_processor.py</code> <pre><code>@classmethod\ndef load_inference_dataset(cls, config: CfgNode, return_class=True, test=False):\nr\"\"\"A plm loader using a global config.\n    It will load the train, valid, and test set (if exists) simulatenously.\n\n    Args:\n        config (CfgNode): The global config from the CfgNode.\n        return_class (bool): Whether return the data processor class for future usage.\n\n    Returns:\n        (Optional[List[InputExample]]): The train dataset.\n        (Optional[List[InputExample]]): The valid dataset.\n        (Optional[List[InputExample]]): The test dataset.\n        (Optional[OntoLAMADataProcessor]): The data processor object.\n    \"\"\"\n    dataset_config = config.dataset\n\n    processor = cls()\n\n    train_dataset = None\n    valid_dataset = None\n    if not test:\n        try:\n            train_dataset = processor.get_examples(dataset_config.task_name, \"train\")\n        except FileNotFoundError:\n            logger.warning(f\"Has no training dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n        try:\n            valid_dataset = processor.get_examples(dataset_config.task_name, \"validation\")\n        except FileNotFoundError:\n            logger.warning(f\"Has no validation dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n\n    test_dataset = None\n    try:\n        test_dataset = processor.get_examples(dataset_config.task_name, \"test\")\n    except FileNotFoundError:\n        logger.warning(f\"Has no test dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n    # checking whether donwloaded.\n    if (train_dataset is None) and (valid_dataset is None) and (test_dataset is None):\n        logger.error(\n            \"Dataset is empty. Either there is no download or the path is wrong. \"\n            + \"If not downloaded, please `cd datasets/` and `bash download_xxx.sh`\"\n        )\n        exit()\n    if return_class:\n        return train_dataset, valid_dataset, test_dataset, processor\n    else:\n        return train_dataset, valid_dataset, test_dataset\n</code></pre>"},{"location":"deeponto/complete/bertsubs/","title":"BERTSubs (Intra)","text":""},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline","title":"<code>BERTSubsIntraPipeline(onto, config)</code>","text":"<p>Class for the intra-ontology subsumption prediction setting of BERTSubs.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The target ontology.</p> <code>config</code> <code>CfgNode</code> <p>The configuration for BERTSubs.</p> <code>sampler</code> <code>SubsumptionSample</code> <p>The subsumption sampler for BERTSubs.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>def __init__(self, onto: Ontology, config: CfgNode):\n    self.onto = onto\n    self.config = config\n    self.sampler = SubsumptionSampler(onto=onto, config=config)\n    start_time = datetime.datetime.now()\n\n    n = 0\n    for k in self.sampler.named_classes:\n        n += len(self.sampler.iri_label[k])\n    print(\n        \"%d named classes, %.1f labels per class\"\n        % (len(self.sampler.named_classes), n / len(self.sampler.named_classes))\n    )\n\n    read_subsumptions = lambda file_name: [line.strip().split(\",\") for line in open(file_name).readlines()]\n    test_subsumptions = (\n        None\n        if config.test_subsumption_file is None or config.test_subsumption_file == \"None\"\n        else read_subsumptions(config.test_subsumption_file)\n    )\n\n    # The train/valid subsumptions are not given. They will be extracted from the given ontology:\n    if config.train_subsumption_file is None or config.train_subsumption_file == \"None\":\n        subsumptions0 = self.extract_subsumptions_from_ontology(\n            onto=onto, subsumption_type=config.subsumption_type\n        )\n        random.shuffle(subsumptions0)\n        valid_size = int(len(subsumptions0) * config.valid.valid_ratio)\n        train_subsumptions0, valid_subsumptions0 = subsumptions0[valid_size:], subsumptions0[0:valid_size]\n        train_subsumptions, valid_subsumptions = [], []\n        if config.subsumption_type == \"named_class\":\n            for subs in train_subsumptions0:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                train_subsumptions.append([str(c1.getIRI()), str(c2.getIRI())])\n\n            size_sum = 0\n            for subs in valid_subsumptions0:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                neg_candidates = BERTSubsIntraPipeline.get_test_neg_candidates_named_class(\n                    subclass=c1, gt=c2, max_neg_size=config.valid.max_neg_size, onto=onto\n                )\n                size = len(neg_candidates)\n                size_sum += size\n                if size &gt; 0:\n                    item = [str(c1.getIRI()), str(c2.getIRI())] + [str(c.getIRI()) for c in neg_candidates]\n                    valid_subsumptions.append(item)\n            print(\"\\t average neg candidate size in validation: %.2f\" % (size_sum / len(valid_subsumptions)))\n\n        elif config.subsumption_type == \"restriction\":\n            for subs in train_subsumptions0:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                train_subsumptions.append([str(c1.getIRI()), str(c2)])\n\n            restrictions = BERTSubsIntraPipeline.extract_restrictions_from_ontology(onto=onto)\n            print(\"restrictions: %d\" % len(restrictions))\n            size_sum = 0\n            for subs in valid_subsumptions0:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                c2_neg = BERTSubsIntraPipeline.get_test_neg_candidates_restriction(\n                    subcls=c1, max_neg_size=config.valid.max_neg_size, restrictions=restrictions, onto=onto\n                )\n                size_sum += len(c2_neg)\n                item = [str(c1.getIRI()), str(c2)] + [str(r) for r in c2_neg]\n                valid_subsumptions.append(item)\n            print(\"valid candidate negative avg. size: %.1f\" % (size_sum / len(valid_subsumptions)))\n        else:\n            warnings.warn(\"Unknown subsumption type %s\" % config.subsumption_type)\n            sys.exit(0)\n\n    # The train/valid subsumptions are given:\n    else:\n        train_subsumptions = read_subsumptions(config.train_subsumption_file)\n        valid_subsumptions = read_subsumptions(config.valid_subsumption_file)\n\n    print(\"Positive train/valid subsumptions: %d/%d\" % (len(train_subsumptions), len(valid_subsumptions)))\n    tr = self.sampler.generate_samples(subsumptions=train_subsumptions)\n    va = self.sampler.generate_samples(subsumptions=valid_subsumptions, duplicate=False)\n\n    end_time = datetime.datetime.now()\n    print(\"data pre-processing costs %.1f minutes\" % ((end_time - start_time).seconds / 60))\n\n    start_time = datetime.datetime.now()\n    torch.cuda.empty_cache()\n    bert_trainer = BERTSubsumptionClassifierTrainer(\n        config.fine_tune.pretrained,\n        train_data=tr,\n        val_data=va,\n        max_length=config.prompt.max_length,\n        early_stop=config.fine_tune.early_stop,\n    )\n\n    epoch_steps = len(bert_trainer.tra) // config.fine_tune.batch_size  # total steps of an epoch\n    logging_steps = int(epoch_steps * 0.02) if int(epoch_steps * 0.02) &gt; 0 else 5\n    eval_steps = 5 * logging_steps\n    training_args = TrainingArguments(\n        output_dir=config.fine_tune.output_dir,\n        num_train_epochs=config.fine_tune.num_epochs,\n        per_device_train_batch_size=config.fine_tune.batch_size,\n        per_device_eval_batch_size=config.fine_tune.batch_size,\n        warmup_ratio=config.fine_tune.warm_up_ratio,\n        weight_decay=0.01,\n        logging_steps=logging_steps,\n        logging_dir=f\"{config.fine_tune.output_dir}/tb\",\n        eval_steps=eval_steps,\n        evaluation_strategy=\"steps\",\n        do_train=True,\n        do_eval=True,\n        save_steps=eval_steps,\n        load_best_model_at_end=True,\n        save_total_limit=1,\n        metric_for_best_model=\"accuracy\",\n        greater_is_better=True,\n    )\n    if config.fine_tune.do_fine_tune and (\n        config.prompt.prompt_type == \"traversal\"\n        or (config.prompt.prompt_type == \"path\" and config.prompt.use_sub_special_token)\n    ):\n        bert_trainer.add_special_tokens([\"&lt;SUB&gt;\"])\n\n    bert_trainer.train(train_args=training_args, do_fine_tune=config.fine_tune.do_fine_tune)\n    if config.fine_tune.do_fine_tune:\n        bert_trainer.trainer.save_model(\n            output_dir=os.path.join(config.fine_tune.output_dir, \"fine-tuned-checkpoint\")\n        )\n        print(\"fine-tuning done, fine-tuned model saved\")\n    else:\n        print(\"pretrained or fine-tuned model loaded.\")\n    end_time = datetime.datetime.now()\n    print(\"Fine-tuning costs %.1f minutes\" % ((end_time - start_time).seconds / 60))\n\n    bert_trainer.model.eval()\n    self.device = torch.device(f\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n    bert_trainer.model.to(self.device)\n    self.tokenize = lambda x: bert_trainer.tokenizer(\n        x, max_length=config.prompt.max_length, truncation=True, padding=True, return_tensors=\"pt\"\n    )\n    softmax = torch.nn.Softmax(dim=1)\n    self.classifier = lambda x: softmax(bert_trainer.model(**x).logits)[:, 1]\n\n    self.evaluate(target_subsumptions=valid_subsumptions, test_type=\"valid\")\n    if test_subsumptions is not None:\n        if config.test_type == \"evaluation\":\n            self.evaluate(target_subsumptions=test_subsumptions, test_type=\"test\")\n        elif config.test_type == \"prediction\":\n            self.predict(target_subsumptions=test_subsumptions)\n        else:\n            warnings.warn(\"Unknown test_type: %s\" % config.test_type)\n    print(\"\\n ------------------------- done! ---------------------------\\n\\n\\n\")\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.score","title":"<code>score(samples)</code>","text":"<p>The scoring function based on the fine-tuned BERT classifier.</p> <p>Parameters:</p> Name Type Description Default <code>samples</code> <code>List[Tuple]</code> <p>A list of input sentence pairs to be scored.</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>def score(self, samples: List[List]):\nr\"\"\"The scoring function based on the fine-tuned BERT classifier.\n\n    Args:\n        samples (List[Tuple]): A list of input sentence pairs to be scored.\n    \"\"\"\n    sample_size = len(samples)\n    scores = np.zeros(sample_size)\n    batch_num = math.ceil(sample_size / self.config.evaluation.batch_size)\n    for i in range(batch_num):\n        j = (\n            (i + 1) * self.config.evaluation.batch_size\n            if (i + 1) * self.config.evaluation.batch_size &lt;= sample_size\n            else sample_size\n        )\n        inputs = self.tokenize(samples[i * self.config.evaluation.batch_size : j])\n        inputs.to(self.device)\n        with torch.no_grad():\n            batch_scores = self.classifier(inputs)\n        scores[i * self.config.evaluation.batch_size : j] = batch_scores.cpu().numpy()\n    return scores\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.evaluate","title":"<code>evaluate(target_subsumptions, test_type='test')</code>","text":"<p>Test and calculate the metrics for a given list of subsumption pairs.</p> <p>Parameters:</p> Name Type Description Default <code>target_subsumptions</code> <code>List[Tuple]</code> <p>A list of subsumption pairs.</p> required <code>test_type</code> <code>str</code> <p><code>test</code> for testing or <code>valid</code> for validation.</p> <code>'test'</code> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>def evaluate(self, target_subsumptions: List[List], test_type: str = \"test\"):\nr\"\"\"Test and calculate the metrics for a given list of subsumption pairs.\n\n    Args:\n        target_subsumptions (List[Tuple]): A list of subsumption pairs.\n        test_type (str): `test` for testing or `valid` for validation.\n    \"\"\"\n\n    MRR_sum, hits1_sum, hits5_sum, hits10_sum = 0, 0, 0, 0\n    MRR, Hits1, Hits5, Hits10 = 0, 0, 0, 0\n    size_sum, size_n = 0, 0\n    for k0, test in enumerate(target_subsumptions):\n        subcls, gt = test[0], test[1]\n        candidates = test[1:]\n\n        candidate_subsumptions = [[subcls, c] for c in candidates]\n        candidate_scores = np.zeros(len(candidate_subsumptions))\n        for k1, candidate_subsumption in enumerate(candidate_subsumptions):\n            samples = self.sampler.subsumptions_to_samples(subsumptions=[candidate_subsumption], sample_label=None)\n            size_sum += len(samples)\n            size_n += 1\n            scores = self.score(samples=samples)\n            candidate_scores[k1] = np.average(scores)\n\n        sorted_indexes = np.argsort(candidate_scores)[::-1]\n        sorted_classes = [candidates[i] for i in sorted_indexes]\n\n        rank = sorted_classes.index(gt) + 1\n        MRR_sum += 1.0 / rank\n        hits1_sum += 1 if gt in sorted_classes[:1] else 0\n        hits5_sum += 1 if gt in sorted_classes[:5] else 0\n        hits10_sum += 1 if gt in sorted_classes[:10] else 0\n        num = k0 + 1\n        MRR, Hits1, Hits5, Hits10 = MRR_sum / num, hits1_sum / num, hits5_sum / num, hits10_sum / num\n        if num % 500 == 0:\n            print(\n                \"\\n%d tested, MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n\"\n                % (num, MRR, Hits1, Hits5, Hits10)\n            )\n    print(\n        \"\\n[%s], MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n\" % (test_type, MRR, Hits1, Hits5, Hits10)\n    )\n    print(\"%.2f samples per testing subsumption\" % (size_sum / size_n))\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.predict","title":"<code>predict(target_subsumptions)</code>","text":"<p>Predict a score for each given subsumption in the list.</p> <p>The scores will be saved in <code>test_subsumption_scores.csv</code>.</p> <p>Parameters:</p> Name Type Description Default <code>target_subsumptions</code> <code>List[List]</code> <p>Each item is a list where the first element is a fixed ontology class \\(C\\), and the remaining elements are potential (candidate) super-classes of \\(C\\).</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>def predict(self, target_subsumptions: List[List]):\nr\"\"\"Predict a score for each given subsumption in the list.\n\n    The scores will be saved in `test_subsumption_scores.csv`.\n\n    Args:\n        target_subsumptions (List[List]): Each item is a list where the first element is a fixed ontology class $C$,\n            and the remaining elements are potential (candidate) super-classes of $C$.\n    \"\"\"\n    out_lines = []\n    for test in target_subsumptions:\n        subcls, candidates = test[0], test[1:]\n        candidate_subsumptions = [[subcls, c] for c in candidates]\n        candidate_scores = []\n\n        for candidate_subsumption in candidate_subsumptions:\n            samples = self.sampler.subsumptions_to_samples(subsumptions=[candidate_subsumption], sample_label=None)\n            scores = self.score(samples=samples)\n            candidate_scores.append(np.average(scores))\n\n        out_lines.append(\",\".join([str(i) for i in candidate_scores]))\n\n    out_file = \"test_subsumption_scores.csv\"\n    with open(out_file, \"w\") as f:\n        for line in out_lines:\n            f.write(\"%s\\n\" % line)\n    print(\"Predicted subsumption scores are saved to %s\" % out_file)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.extract_subsumptions_from_ontology","title":"<code>extract_subsumptions_from_ontology(onto, subsumption_type)</code>  <code>staticmethod</code>","text":"<p>Extract target subsumptions from a given ontology.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The target ontology.</p> required <code>subsumption_type</code> <code>str</code> <p>the type of subsumptions, options are <code>\"named_class\"</code> or <code>\"restriction\"</code>.</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>@staticmethod\ndef extract_subsumptions_from_ontology(onto: Ontology, subsumption_type: str):\nr\"\"\"Extract target subsumptions from a given ontology.\n\n    Args:\n        onto (Ontology): The target ontology.\n        subsumption_type (str): the type of subsumptions, options are `\"named_class\"` or `\"restriction\"`.\n\n    \"\"\"\n    all_subsumptions = onto.get_subsumption_axioms(entity_type=\"Classes\")\n    subsumptions = []\n    if subsumption_type == \"restriction\":\n        for subs in all_subsumptions:\n            if (\n                not onto.check_deprecated(owl_object=subs.getSubClass())\n                and not onto.check_named_entity(owl_object=subs.getSuperClass())\n                and SubsumptionSampler.is_basic_existential_restriction(\n                    complex_class_str=str(subs.getSuperClass())\n                )\n            ):\n                subsumptions.append(subs)\n    elif subsumption_type == \"named_class\":\n        for subs in all_subsumptions:\n            c1, c2 = subs.getSubClass(), subs.getSuperClass()\n            if (\n                onto.check_named_entity(owl_object=c1)\n                and not onto.check_deprecated(owl_object=c1)\n                and onto.check_named_entity(owl_object=c2)\n                and not onto.check_deprecated(owl_object=c2)\n            ):\n                subsumptions.append(subs)\n    else:\n        warnings.warn(\"\\nUnknown subsumption type: %s\\n\" % subsumption_type)\n    return subsumptions\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.extract_restrictions_from_ontology","title":"<code>extract_restrictions_from_ontology(onto)</code>  <code>staticmethod</code>","text":"<p>Extract basic existential restriction from an ontology.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The target ontology.</p> required <p>Returns:</p> Name Type Description <code>restrictions</code> <code>List</code> <p>a list of existential restrictions.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>@staticmethod\ndef extract_restrictions_from_ontology(onto: Ontology):\nr\"\"\"Extract basic existential restriction from an ontology.\n\n    Args:\n        onto (Ontology): The target ontology.\n    Returns:\n        restrictions (List): a list of existential restrictions.\n    \"\"\"\n    restrictions = []\n    for complexC in onto.get_asserted_complex_classes():\n        if SubsumptionSampler.is_basic_existential_restriction(complex_class_str=str(complexC)):\n            restrictions.append(complexC)\n    return restrictions\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.get_test_neg_candidates_restriction","title":"<code>get_test_neg_candidates_restriction(subcls, max_neg_size, restrictions, onto)</code>  <code>staticmethod</code>","text":"<p>Get a list of negative candidate class restrictions for testing.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>@staticmethod\ndef get_test_neg_candidates_restriction(subcls, max_neg_size, restrictions, onto):\n\"\"\"Get a list of negative candidate class restrictions for testing.\"\"\"\n    neg_restrictions = list()\n    n = max_neg_size * 2 if max_neg_size * 2 &lt;= len(restrictions) else len(restrictions)\n    for r in random.sample(restrictions, n):\n        if not onto.reasoner.check_subsumption(sub_entity=subcls, super_entity=r):\n            neg_restrictions.append(r)\n            if len(neg_restrictions) &gt;= max_neg_size:\n                break\n    return neg_restrictions\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.get_test_neg_candidates_named_class","title":"<code>get_test_neg_candidates_named_class(subclass, gt, max_neg_size, onto, max_depth=3, max_width=8)</code>  <code>staticmethod</code>","text":"<p>Get a list of negative candidate named classes for testing.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>@staticmethod\ndef get_test_neg_candidates_named_class(subclass, gt, max_neg_size, onto, max_depth=3, max_width=8):\n\"\"\"Get a list of negative candidate named classes for testing.\"\"\"\n    all_nebs, seeds = set(), [gt]\n    depth = 1\n    while depth &lt;= max_depth:\n        new_seeds = set()\n        for seed in seeds:\n            nebs = set()\n            for nc_iri in onto.reasoner.get_inferred_sub_entities(\n                seed, direct=True\n            ) + onto.reasoner.get_inferred_super_entities(seed, direct=True):\n                nc = onto.owl_classes[nc_iri]\n                if onto.check_named_entity(owl_object=nc) and not onto.check_deprecated(owl_object=nc):\n                    nebs.add(nc)\n            new_seeds = new_seeds.union(nebs)\n            all_nebs = all_nebs.union(nebs)\n        depth += 1\n        seeds = random.sample(new_seeds, max_width) if len(new_seeds) &gt; max_width else new_seeds\n    all_nebs = (\n        all_nebs\n        - {onto.owl_classes[iri] for iri in onto.reasoner.get_inferred_super_entities(subclass, direct=False)}\n        - {subclass}\n    )\n    if len(all_nebs) &gt; max_neg_size:\n        return random.sample(all_nebs, max_neg_size)\n    else:\n        return list(all_nebs)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler","title":"<code>SubsumptionSampler(onto, config)</code>","text":"<p>Class for sampling functions for training the subsumption prediction model.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The target ontology.</p> <code>config</code> <code>CfgNode</code> <p>The loaded configuration.</p> <code>named_classes</code> <code>Set[str]</code> <p>IRIs of named classes that are not deprecated.</p> <code>iri_label</code> <code>Dict[str, List]</code> <p>key -- class iris from <code>named_classes</code>, value -- a list of labels.</p> <code>restrictionObjects</code> <code>Set[OWLClassExpression]</code> <p>Basic existential restrictions that appear in the ontology.</p> <code>restrictions</code> <code>set[str]</code> <p>Strings of basic existential restrictions corresponding to <code>restrictionObjects</code>.</p> <code>restriction_label</code> <code>Dict[str</code> <p>List]): key -- existential restriction string, value -- a list of existential restriction labels.</p> <code>verb</code> <code>OntologyVerbaliser</code> <p>object for verbalisation.</p> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def __init__(self, onto: Ontology, config: CfgNode):\n    self.onto = onto\n    self.config = config\n    self.named_classes = self.extract_named_classes(onto=onto)\n    self.iri_label = dict()\n    for iri in self.named_classes:\n        self.iri_label[iri] = []\n        for p in config.label_property:\n            strings = onto.get_annotations(\n                owl_object=onto.get_owl_object(iri),\n                annotation_property_iri=p,\n                annotation_language_tag=None,\n                apply_lowercasing=False,\n                normalise_identifiers=False,\n            )\n            for s in strings:\n                if s not in self.iri_label[iri]:\n                    self.iri_label[iri].append(s)\n\n    self.restrictionObjects = set()\n    self.restrictions = set()\n    self.restriction_label = dict()\n    self.verb = OntologyVerbaliser(onto=onto)\n    for complexC in onto.get_asserted_complex_classes():\n        s = str(complexC)\n        self.restriction_label[s] = []\n        if self.is_basic_existential_restriction(complex_class_str=s):\n            self.restrictionObjects.add(complexC)\n            self.restrictions.add(s)\n            self.restriction_label[s].append(self.verb.verbalise_class_expression(complexC).verbal)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.is_basic_existential_restriction","title":"<code>is_basic_existential_restriction(complex_class_str)</code>  <code>staticmethod</code>","text":"<p>Determine if a complex class expression is a basic existential restriction.</p> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>@staticmethod\ndef is_basic_existential_restriction(complex_class_str: str):\n\"\"\"Determine if a complex class expression is a basic existential restriction.\"\"\"\n    IRI = \"&lt;https?:\\\\/\\\\/(?:www\\\\.)?[-a-zA-Z0-9@:%._\\\\+~#=]{1,256}\\\\.[a-zA-Z0-9()]{1,6}\\\\b(?:[-a-zA-Z0-9()@:%_\\\\+.~#?&amp;\\\\/=]*)&gt;\"\n    p = rf\"ObjectSomeValuesFrom\\({IRI}\\s{IRI}\\)\"\n    if re.match(p, complex_class_str):\n        return True\n    else:\n        return False\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.generate_samples","title":"<code>generate_samples(subsumptions, duplicate=True)</code>","text":"<p>Generate text samples from subsumptions.</p> <p>Parameters:</p> Name Type Description Default <code>subsumptions</code> <code>List[List]</code> <p>A list of subsumptions, each of which of is a two-component list <code>(sub_class_iri, super_class_iri_or_str)</code>.</p> required <code>duplicate</code> <code>bool</code> <p><code>True</code> -- duplicate the positive and negative samples, <code>False</code> -- do not duplicate.</p> <code>True</code> <p>Returns:</p> Type Description <code>List[List]</code> <p>A list of samples, each element is a triple in the form of <code>(sub_class_string, super_class_string, label_index)</code>.</p> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def generate_samples(self, subsumptions: List[List], duplicate: bool = True):\nr\"\"\"Generate text samples from subsumptions.\n\n    Args:\n        subsumptions (List[List]): A list of subsumptions, each of which of is a two-component list `(sub_class_iri, super_class_iri_or_str)`.\n        duplicate (bool): `True` -- duplicate the positive and negative samples, `False` -- do not duplicate.\n\n    Returns:\n        (List[List]): A list of samples, each element is a triple\n            in the form of `(sub_class_string, super_class_string, label_index)`.\n    \"\"\"\n    if duplicate:\n        pos_dup, neg_dup = self.config.fine_tune.train_pos_dup, self.config.fine_tune.train_neg_dup\n    else:\n        pos_dup, neg_dup = 1, 1\n    neg_subsumptions = list()\n    for subs in subsumptions:\n        c1 = subs[0]\n        for _ in range(neg_dup):\n            neg_c = self.get_negative_sample(subclass_iri=c1, subsumption_type=self.config.subsumption_type)\n            if neg_c is not None:\n                neg_subsumptions.append([c1, neg_c])\n    pos_samples = self.subsumptions_to_samples(subsumptions=subsumptions, sample_label=1)\n    pos_samples = pos_dup * pos_samples\n    neg_samples = self.subsumptions_to_samples(subsumptions=neg_subsumptions, sample_label=0)\n    if len(neg_samples) &lt; len(pos_samples):\n        neg_samples = neg_samples + [\n            random.choice(neg_samples) for _ in range(len(pos_samples) - len(neg_samples))\n        ]\n    if len(neg_samples) &gt; len(pos_samples):\n        pos_samples = pos_samples + [\n            random.choice(pos_samples) for _ in range(len(neg_samples) - len(pos_samples))\n        ]\n    print(\"pos_samples: %d, neg_samples: %d\" % (len(pos_samples), len(neg_samples)))\n    all_samples = [s for s in pos_samples + neg_samples if s[0] != \"\" and s[1] != \"\"]\n    random.shuffle(all_samples)\n    return all_samples\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.subsumptions_to_samples","title":"<code>subsumptions_to_samples(subsumptions, sample_label)</code>","text":"<p>Transform subsumptions into samples of strings.</p> <p>Parameters:</p> Name Type Description Default <code>subsumptions</code> <code>List[List]</code> <p>The given subsumptions.</p> required <code>sample_label</code> <code>Union[int, None]</code> <p><code>1</code> (positive), <code>0</code> (negative), <code>None</code> (no label).</p> required <p>Returns:</p> Type Description <code>List[List]</code> <p>A list of samples, each element is a triple in the form of <code>(sub_class_string, super_class_string, label_index)</code>.</p> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def subsumptions_to_samples(self, subsumptions: List[List], sample_label: Union[int, None]):\nr\"\"\"Transform subsumptions into samples of strings.\n\n    Args:\n        subsumptions (List[List]): The given subsumptions.\n        sample_label (Union[int, None]): `1` (positive), `0` (negative), `None` (no label).\n\n    Returns:\n        (List[List]): A list of samples, each element is a triple\n            in the form of `(sub_class_string, super_class_string, label_index)`.\n\n    \"\"\"\n    local_samples = list()\n    for subs in subsumptions:\n        subcls, supcls = subs[0], subs[1]\n        substrs = self.iri_label[subcls] if subcls in self.iri_label and len(self.iri_label[subcls]) &gt; 0 else [\"\"]\n\n        if self.config.subsumption_type == \"named_class\":\n            supstrs = self.iri_label[supcls] if supcls in self.iri_label and len(self.iri_label[supcls]) else [\"\"]\n        else:\n            if supcls in self.restriction_label and len(self.restriction_label[supcls]) &gt; 0:\n                supstrs = self.restriction_label[supcls]\n            else:\n                supstrs = [self.verb.verbalise_class_expression(supcls).verbal]\n\n        if self.config.use_one_label:\n            substrs, supstrs = substrs[0:1], supstrs[0:1]\n\n        if self.config.prompt.prompt_type == \"isolated\":\n            for substr in substrs:\n                for supstr in supstrs:\n                    local_samples.append([substr, supstr])\n\n        elif self.config.prompt.prompt_type == \"traversal\":\n            subs_list_strs = set()\n            for _ in range(self.config.prompt.context_dup):\n                context_sub, no_duplicate = self.traversal_subsumptions(\n                    cls=subcls,\n                    hop=self.config.prompt.prompt_hop,\n                    direction=\"subclass\",\n                    max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n                )\n                subs_list = [self.named_subsumption_to_str(subsum) for subsum in context_sub]\n                subs_list_str = \" &lt;SEP&gt; \".join(subs_list)\n                subs_list_strs.add(subs_list_str)\n                if no_duplicate:\n                    break\n\n            if self.config.subsumption_type == \"named_class\":\n                sups_list_strs = set()\n                for _ in range(self.config.prompt.context_dup):\n                    context_sup, no_duplicate = self.traversal_subsumptions(\n                        cls=supcls,\n                        hop=self.config.prompt.prompt_hop,\n                        direction=\"supclass\",\n                        max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n                    )\n                    sups_list = [self.named_subsumption_to_str(subsum) for subsum in context_sup]\n                    sups_list_str = \" &lt;SEP&gt; \".join(sups_list)\n                    sups_list_strs.add(sups_list_str)\n                    if no_duplicate:\n                        break\n            else:\n                sups_list_strs = set(supstrs)\n\n            for subs_list_str in subs_list_strs:\n                for substr in substrs:\n                    s1 = substr + \" &lt;SEP&gt; \" + subs_list_str\n                    for sups_list_str in sups_list_strs:\n                        for supstr in supstrs:\n                            s2 = supstr + \" &lt;SEP&gt; \" + sups_list_str\n                            local_samples.append([s1, s2])\n\n        elif self.config.prompt.prompt_type == \"path\":\n            sep_token = \"&lt;SUB&gt;\" if self.config.prompt.use_sub_special_token else \"&lt;SEP&gt;\"\n\n            s1_set = set()\n            for _ in range(self.config.prompt.context_dup):\n                context_sub, no_duplicate = self.path_subsumptions(\n                    cls=subcls, hop=self.config.prompt.prompt_hop, direction=\"subclass\"\n                )\n                if len(context_sub) &gt; 0:\n                    s1 = \"\"\n                    for i in range(len(context_sub)):\n                        subsum = context_sub[len(context_sub) - i - 1]\n                        subc = subsum[0]\n                        s1 += \"%s %s \" % (\n                            self.iri_label[subc][0]\n                            if subc in self.iri_label and len(self.iri_label[subc]) &gt; 0\n                            else \"\",\n                            sep_token,\n                        )\n                    for substr in substrs:\n                        s1_set.add(s1 + substr)\n                else:\n                    for substr in substrs:\n                        s1_set.add(\"%s %s\" % (sep_token, substr))\n\n                if no_duplicate:\n                    break\n\n            if self.config.subsumption_type == \"named_class\":\n                s2_set = set()\n                for _ in range(self.config.prompt.context_dup):\n                    context_sup, no_duplicate = self.path_subsumptions(\n                        cls=supcls, hop=self.config.prompt.prompt_hop, direction=\"supclass\"\n                    )\n                    if len(context_sup) &gt; 0:\n                        s2 = \"\"\n                        for subsum in context_sup:\n                            supc = subsum[1]\n                            s2 += \" %s %s\" % (\n                                sep_token,\n                                self.iri_label[supc][0]\n                                if supc in self.iri_label and len(self.iri_label[supc]) &gt; 0\n                                else \"\",\n                            )\n                        for supstr in supstrs:\n                            s2_set.add(supstr + s2)\n                    else:\n                        for supstr in supstrs:\n                            s2_set.add(\"%s %s\" % (supstr, sep_token))\n\n                    if no_duplicate:\n                        break\n            else:\n                s2_set = set(supstrs)\n\n            for s1 in s1_set:\n                for s2 in s2_set:\n                    local_samples.append([s1, s2])\n\n        else:\n            print(f\"unknown context type {self.config.prompt.prompt_type}\")\n            sys.exit(0)\n\n    if sample_label is not None:\n        for i in range(len(local_samples)):\n            local_samples[i].append(sample_label)\n\n    return local_samples\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.get_negative_sample","title":"<code>get_negative_sample(subclass_iri, subsumption_type='named_class')</code>","text":"<p>Given a named subclass, get a negative class for a negative subsumption.</p> <p>Parameters:</p> Name Type Description Default <code>subclass_iri</code> <code>str</code> <p>IRI of a given sub-class.</p> required <code>subsumption_type</code> <code>str</code> <p><code>named_class</code> or <code>restriction</code>.</p> <code>'named_class'</code> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def get_negative_sample(self, subclass_iri: str, subsumption_type: str = \"named_class\"):\nr\"\"\"Given a named subclass, get a negative class for a negative subsumption.\n\n    Args:\n        subclass_iri (str): IRI of a given sub-class.\n        subsumption_type (str): `named_class` or `restriction`.\n    \"\"\"\n    subclass = self.onto.get_owl_object(iri=subclass_iri)\n    if subsumption_type == \"named_class\":\n        if self.config.no_reasoning:\n            parents = self.onto.get_asserted_parents(owl_object=subclass, named_only=True)\n            ancestors = set([str(item.getIRI()) for item in parents])\n        else:\n            ancestors = set(self.onto.reasoner.get_inferred_super_entities(subclass, direct=False))\n        neg_c = random.sample(self.named_classes - ancestors, 1)[0]\n        return neg_c\n    else:\n        for neg_c in random.sample(self.restrictionObjects, 5):\n            if self.config.no_reasoning:\n                return str(neg_c)\n            else:\n                if not self.onto.reasoner.check_subsumption(sub_entity=subclass, super_entity=neg_c):\n                    return str(neg_c)\n        return None\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.named_subsumption_to_str","title":"<code>named_subsumption_to_str(subsum)</code>","text":"<p>Transform a named subsumption into string with <code>&lt;SUB&gt;</code> and classes' labels.</p> <p>Parameters:</p> Name Type Description Default <code>subsum</code> <code>List[Tuple]</code> <p>A list of subsumption pairs in the form of <code>(sub_class_iri, super_class_iri)</code>.</p> required Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def named_subsumption_to_str(self, subsum: List):\nr\"\"\"Transform a named subsumption into string with `&lt;SUB&gt;` and classes' labels.\n\n    Args:\n        subsum (List[Tuple]): A list of subsumption pairs in the form of `(sub_class_iri, super_class_iri)`.\n    \"\"\"\n    subc, supc = subsum[0], subsum[1]\n    subs = self.iri_label[subc][0] if subc in self.iri_label and len(self.iri_label[subc]) &gt; 0 else \"\"\n    sups = self.iri_label[supc][0] if supc in self.iri_label and len(self.iri_label[supc]) &gt; 0 else \"\"\n    return \"%s &lt;SUB&gt; %s\" % (subs, sups)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.subclass_to_strings","title":"<code>subclass_to_strings(subcls)</code>","text":"<p>Transform a sub-class into strings (with the path or traversal context template).</p> <p>Parameters:</p> Name Type Description Default <code>subcls</code> <code>str</code> <p>IRI of the sub-class.</p> required Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def subclass_to_strings(self, subcls):\nr\"\"\"Transform a sub-class into strings (with the path or traversal context template).\n\n    Args:\n        subcls (str): IRI of the sub-class.\n    \"\"\"\n    substrs = self.iri_label[subcls] if subcls in self.iri_label and len(self.iri_label[subcls]) &gt; 0 else [\"\"]\n\n    if self.config.use_one_label:\n        substrs = substrs[0:1]\n\n    if self.config.prompt.prompt_type == \"isolated\":\n        return substrs\n\n    elif self.config.prompt.prompt_type == \"traversal\":\n        subs_list_strs = set()\n        for _ in range(self.config.prompt.context_dup):\n            context_sub, no_duplicate = self.traversal_subsumptions(\n                cls=subcls,\n                hop=self.config.prompt.prompt_hop,\n                direction=\"subclass\",\n                max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n            )\n            subs_list = [self.named_subsumption_to_str(subsum) for subsum in context_sub]\n            subs_list_str = \" &lt;SEP&gt; \".join(subs_list)\n            subs_list_strs.add(subs_list_str)\n            if no_duplicate:\n                break\n\n        strs = list()\n        for subs_list_str in subs_list_strs:\n            for substr in substrs:\n                s1 = substr + \" &lt;SEP&gt; \" + subs_list_str\n                strs.append(s1)\n        return strs\n\n    elif self.config.prompt.prompt_type == \"path\":\n        sep_token = \"&lt;SUB&gt;\" if self.config.prompt.use_sub_special_token else \"&lt;SEP&gt;\"\n\n        s1_set = set()\n        for _ in range(self.config.prompt.context_dup):\n            context_sub, no_duplicate = self.path_subsumptions(\n                cls=subcls, hop=self.config.prompt.prompt_hop, direction=\"subclass\"\n            )\n            if len(context_sub) &gt; 0:\n                s1 = \"\"\n                for i in range(len(context_sub)):\n                    subsum = context_sub[len(context_sub) - i - 1]\n                    subc = subsum[0]\n                    s1 += \"%s %s \" % (\n                        self.iri_label[subc][0]\n                        if subc in self.iri_label and len(self.iri_label[subc]) &gt; 0\n                        else \"\",\n                        sep_token,\n                    )\n                for substr in substrs:\n                    s1_set.add(s1 + substr)\n            else:\n                for substr in substrs:\n                    s1_set.add(\"%s %s\" % (sep_token, substr))\n            if no_duplicate:\n                break\n\n        return list(s1_set)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.supclass_to_strings","title":"<code>supclass_to_strings(supcls, subsumption_type='named_class')</code>","text":"<p>Transform a super-class into strings (with the path or traversal context template if the subsumption type is <code>\"named_class\"</code>).</p> <p>Parameters:</p> Name Type Description Default <code>supcls</code> <code>str</code> <p>IRI of the super-class.</p> required <code>subsumption_type</code> <code>str</code> <p>The type of the subsumption.</p> <code>'named_class'</code> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def supclass_to_strings(self, supcls: str, subsumption_type: str = \"named_class\"):\nr\"\"\"Transform a super-class into strings (with the path or traversal context template if the subsumption type is `\"named_class\"`).\n\n    Args:\n        supcls (str): IRI of the super-class.\n        subsumption_type (str): The type of the subsumption.\n    \"\"\"\n\n    if subsumption_type == \"named_class\":\n        supstrs = self.iri_label[supcls] if supcls in self.iri_label and len(self.iri_label[supcls]) else [\"\"]\n    else:\n        if supcls in self.restriction_label and len(self.restriction_label[supcls]) &gt; 0:\n            supstrs = self.restriction_label[supcls]\n        else:\n            warnings.warn(\"Warning: %s has no descriptions\" % supcls)\n            supstrs = [\"\"]\n\n    if self.config.use_one_label:\n        if subsumption_type == \"named_class\":\n            supstrs = supstrs[0:1]\n\n    if self.config.prompt.prompt_type == \"isolated\":\n        return supstrs\n\n    elif self.config.prompt.prompt_type == \"traversal\":\n        if subsumption_type == \"named_class\":\n            sups_list_strs = set()\n            for _ in range(self.config.prompt.context_dup):\n                context_sup, no_duplicate = self.traversal_subsumptions(\n                    cls=supcls,\n                    hop=self.config.prompt.prompt_hop,\n                    direction=\"supclass\",\n                    max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n                )\n                sups_list = [self.named_subsumption_to_str(subsum) for subsum in context_sup]\n                sups_list_str = \" &lt;SEP&gt; \".join(sups_list)\n                sups_list_strs.add(sups_list_str)\n                if no_duplicate:\n                    break\n\n        else:\n            sups_list_strs = set(supstrs)\n\n        strs = list()\n        for sups_list_str in sups_list_strs:\n            for supstr in supstrs:\n                s2 = supstr + \" &lt;SEP&gt; \" + sups_list_str\n                strs.append(s2)\n        return strs\n\n    elif self.config.prompt.prompt_type == \"path\":\n        sep_token = \"&lt;SUB&gt;\" if self.config.prompt.use_sub_special_token else \"&lt;SEP&gt;\"\n\n        if subsumption_type == \"named_class\":\n            s2_set = set()\n            for _ in range(self.config.prompt.context_dup):\n                context_sup, no_duplicate = self.path_subsumptions(\n                    cls=supcls, hop=self.config.prompt.prompt_hop, direction=\"supclass\"\n                )\n                if len(context_sup) &gt; 0:\n                    s2 = \"\"\n                    for subsum in context_sup:\n                        supc = subsum[1]\n                        s2 += \" %s %s\" % (\n                            sep_token,\n                            self.iri_label[supc][0]\n                            if supc in self.iri_label and len(self.iri_label[supc]) &gt; 0\n                            else \"\",\n                        )\n                    for supstr in supstrs:\n                        s2_set.add(supstr + s2)\n                else:\n                    for supstr in supstrs:\n                        s2_set.add(\"%s %s\" % (supstr, sep_token))\n\n                if no_duplicate:\n                    break\n        else:\n            s2_set = set(supstrs)\n\n        return list(s2_set)\n\n    else:\n        print(\"unknown context type %s\" % self.config.prompt.prompt_type)\n        sys.exit(0)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.traversal_subsumptions","title":"<code>traversal_subsumptions(cls, hop=1, direction='subclass', max_subsumptions=5)</code>","text":"<p>Given a class, get its subsumptions by traversing the class hierarchy.</p> <pre><code>If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\nIf the class is a super-class in the subsumption axiom, get subsumptions from upside.\n</code></pre> <p>Parameters:</p> Name Type Description Default <code>cls</code> <code>str</code> <p>IRI of a named class.</p> required <code>hop</code> <code>int</code> <p>The depth of the path.</p> <code>1</code> <code>direction</code> <code>str</code> <p><code>subclass</code> (downside path) or <code>supclass</code> (upside path).</p> <code>'subclass'</code> <code>max_subsumptions</code> <code>int</code> <p>The maximum number of subsumptions to consider.</p> <code>5</code> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def traversal_subsumptions(self, cls: str, hop: int = 1, direction: str = \"subclass\", max_subsumptions: int = 5):\nr\"\"\"Given a class, get its subsumptions by traversing the class hierarchy.\n\n        If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\n        If the class is a super-class in the subsumption axiom, get subsumptions from upside.\n\n    Args:\n        cls (str): IRI of a named class.\n        hop (int): The depth of the path.\n        direction (str): `subclass` (downside path) or `supclass` (upside path).\n        max_subsumptions (int): The maximum number of subsumptions to consider.\n    \"\"\"\n    subsumptions = list()\n    seeds = [cls]\n    d = 1\n    no_duplicate = True\n    while d &lt;= hop:\n        new_seeds = list()\n        for s in seeds:\n            if direction == \"subclass\":\n                tmp = self.onto.reasoner.get_inferred_sub_entities(\n                    self.onto.get_owl_object(iri=s), direct=True\n                )\n                if len(tmp) &gt; 1:\n                    no_duplicate = False\n                random.shuffle(tmp)\n                for c in tmp:\n                    if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n                        subsumptions.append([c, s])\n                        if c not in new_seeds:\n                            new_seeds.append(c)\n            elif direction == \"supclass\":\n                tmp = self.onto.reasoner.get_inferred_super_entities(\n                    self.onto.get_owl_object(iri=s), direct=True\n                )\n                if len(tmp) &gt; 1:\n                    no_duplicate = False\n                random.shuffle(tmp)\n                for c in tmp:\n                    if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n                        subsumptions.append([s, c])\n                        if c not in new_seeds:\n                            new_seeds.append(c)\n            else:\n                warnings.warn(\"Unknown direction: %s\" % direction)\n        if len(subsumptions) &gt;= max_subsumptions:\n            subsumptions = random.sample(subsumptions, max_subsumptions)\n            break\n        else:\n            seeds = new_seeds\n            random.shuffle(seeds)\n            d += 1\n    return subsumptions, no_duplicate\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.path_subsumptions","title":"<code>path_subsumptions(cls, hop=1, direction='subclass')</code>","text":"<p>Given a class, get its path subsumptions.</p> <pre><code>If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\nIf the class is a super-class in the subsumption axiom, get subsumptions from upside.\n</code></pre> <p>Parameters:</p> Name Type Description Default <code>cls</code> <code>str</code> <p>IRI of a named class.</p> required <code>hop</code> <code>int</code> <p>The depth of the path.</p> <code>1</code> <code>direction</code> <code>str</code> <p><code>subclass</code> (downside path) or <code>supclass</code> (upside path).</p> <code>'subclass'</code> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def path_subsumptions(self, cls: str, hop: int = 1, direction: str = \"subclass\"):\nr\"\"\"Given a class, get its path subsumptions.\n\n        If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\n        If the class is a super-class in the subsumption axiom, get subsumptions from upside.\n\n    Args:\n        cls (str): IRI of a named class.\n        hop (int): The depth of the path.\n        direction (str): `subclass` (downside path) or `supclass` (upside path).\n    \"\"\"\n    subsumptions = list()\n    seed = cls\n    d = 1\n    no_duplicate = True\n    while d &lt;= hop:\n        if direction == \"subclass\":\n            tmp = self.onto.reasoner.get_inferred_sub_entities(\n                self.onto.get_owl_object(iri=seed), direct=True\n            )\n            if len(tmp) &gt; 1:\n                no_duplicate = False\n            end = True\n            if len(tmp) &gt; 0:\n                random.shuffle(tmp)\n                for c in tmp:\n                    if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n                        subsumptions.append([c, seed])\n                        seed = c\n                        end = False\n                        break\n            if end:\n                break\n        elif direction == \"supclass\":\n            tmp = self.onto.reasoner.get_inferred_super_entities(\n                self.onto.get_owl_object(iri=seed), direct=True\n            )\n            if len(tmp) &gt; 1:\n                no_duplicate = False\n            end = True\n            if len(tmp) &gt; 0:\n                random.shuffle(tmp)\n                for c in tmp:\n                    if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n                        subsumptions.append([seed, c])\n                        seed = c\n                        end = False\n                        break\n            if end:\n                break\n        else:\n            warnings.warn(\"Unknown direction: %s\" % direction)\n\n        d += 1\n    return subsumptions, no_duplicate\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer","title":"<code>BERTSubsumptionClassifierTrainer(bert_checkpoint, train_data, val_data, max_length=128, early_stop=False, early_stop_patience=10)</code>","text":"Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>def __init__(\n    self,\n    bert_checkpoint: str,\n    train_data: List,\n    val_data: List,\n    max_length: int = 128,\n    early_stop: bool = False,\n    early_stop_patience: int = 10,\n):\n    print(f\"initialize BERT for Binary Classification from the Pretrained BERT model at: {bert_checkpoint} ...\")\n\n    # BERT\n    self.model = AutoModelForSequenceClassification.from_pretrained(bert_checkpoint)\n    self.tokenizer = AutoTokenizer.from_pretrained(bert_checkpoint)\n    self.trainer = None\n\n    self.max_length = max_length\n    self.tra = self.load_dataset(train_data, max_length=self.max_length, count_token_size=True)\n    self.val = self.load_dataset(val_data, max_length=self.max_length, count_token_size=True)\n    print(f\"text max length: {self.max_length}\")\n    print(f\"data files loaded with sizes:\")\n    print(f\"\\t[# Train]: {len(self.tra)}, [# Val]: {len(self.val)}\")\n\n    # early stopping\n    self.early_stop = early_stop\n    self.early_stop_patience = early_stop_patience\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.add_special_tokens","title":"<code>add_special_tokens(tokens)</code>","text":"<p>Add additional special tokens into the tokenizer's vocab.</p> <p>Parameters:</p> Name Type Description Default <code>tokens</code> <code>List[str]</code> <p>additional tokens to add, e.g., <code>[\"&lt;SUB&gt;\",\"&lt;EOA&gt;\",\"&lt;EOC&gt;\"]</code></p> required Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>def add_special_tokens(self, tokens: List):\nr\"\"\"Add additional special tokens into the tokenizer's vocab.\n    Args:\n        tokens (List[str]): additional tokens to add, e.g., `[\"&lt;SUB&gt;\",\"&lt;EOA&gt;\",\"&lt;EOC&gt;\"]`\n    \"\"\"\n    special_tokens_dict = {\"additional_special_tokens\": tokens}\n    self.tokenizer.add_special_tokens(special_tokens_dict)\n    self.model.resize_token_embeddings(len(self.tokenizer))\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.train","title":"<code>train(train_args, do_fine_tune=True)</code>","text":"<p>Initiate the Huggingface trainer with input arguments and start training.</p> <p>Parameters:</p> Name Type Description Default <code>train_args</code> <code>TrainingArguments</code> <p>Arguments for training.</p> required <code>do_fine_tune</code> <code>bool</code> <p><code>False</code> means loading the checkpoint without training. Defaults to <code>True</code>.</p> <code>True</code> Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>def train(self, train_args: TrainingArguments, do_fine_tune: bool = True):\nr\"\"\"Initiate the Huggingface trainer with input arguments and start training.\n    Args:\n        train_args (TrainingArguments): Arguments for training.\n        do_fine_tune (bool): `False` means loading the checkpoint without training. Defaults to `True`.\n    \"\"\"\n    self.trainer = Trainer(\n        model=self.model,\n        args=train_args,\n        train_dataset=self.tra,\n        eval_dataset=self.val,\n        compute_metrics=self.compute_metrics,\n        tokenizer=self.tokenizer,\n    )\n    if self.early_stop:\n        self.trainer.add_callback(EarlyStoppingCallback(early_stopping_patience=self.early_stop_patience))\n    if do_fine_tune:\n        self.trainer.train()\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.compute_metrics","title":"<code>compute_metrics(pred)</code>  <code>staticmethod</code>","text":"<p>Auxiliary function to add accurate metric into evaluation.</p> Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>@staticmethod\ndef compute_metrics(pred):\n\"\"\"Auxiliary function to add accurate metric into evaluation.\n    \"\"\"\n    labels = pred.label_ids\n    preds = pred.predictions.argmax(-1)\n    acc = accuracy_score(labels, preds)\n    return {\"accuracy\": acc}\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.load_dataset","title":"<code>load_dataset(data, max_length=512, count_token_size=False)</code>","text":"<p>Load a Huggingface dataset from a list of samples.</p> <p>Parameters:</p> Name Type Description Default <code>data</code> <code>List[Tuple]</code> <p>Data samples in a list.</p> required <code>max_length</code> <code>int</code> <p>Maximum length of the input sequence.</p> <code>512</code> <code>count_token_size</code> <code>bool</code> <p>Whether or not to count the token sizes of the data. Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>def load_dataset(self, data: List, max_length: int = 512, count_token_size: bool = False) -&gt; Dataset:\nr\"\"\"Load a Huggingface dataset from a list of samples.\n    Args:\n        data (List[Tuple]): Data samples in a list.\n        max_length (int): Maximum length of the input sequence.\n        count_token_size (bool): Whether or not to count the token sizes of the data. Defaults to `False`.\n    \"\"\"\n    # data_df = pd.DataFrame(data, columns=[\"sent1\", \"sent2\", \"labels\"])\n    # dataset = Dataset.from_pandas(data_df)\n\n    def iterate():\n        for sample in data:\n            yield {\"sent1\": sample[0], \"sent2\": sample[1], \"labels\": sample[2]}\n\n    dataset = Dataset.from_generator(iterate)\n\n    if count_token_size:\n        tokens = self.tokenizer(dataset[\"sent1\"], dataset[\"sent2\"])\n        l_sum, num_128, num_256, num_512, l_max = 0, 0, 0, 0, 0\n        for item in tokens[\"input_ids\"]:\n            l = len(item)\n            l_sum += l\n            if l &lt;= 128:\n                num_128 += 1\n            if l &lt;= 256:\n                num_256 += 1\n            if l &lt;= 512:\n                num_512 += 1\n            if l &gt; l_max:\n                l_max = l\n        print(\"average token size: %.2f\" % (l_sum / len(tokens[\"input_ids\"])))\n        print(\"ratio of token size &lt;= 128: %.3f\" % (num_128 / len(tokens[\"input_ids\"])))\n        print(\"ratio of token size &lt;= 256: %.3f\" % (num_256 / len(tokens[\"input_ids\"])))\n        print(\"ratio of token size &lt;= 512: %.3f\" % (num_512 / len(tokens[\"input_ids\"])))\n        print(\"max token size: %d\" % l_max)\n    dataset = dataset.map(\n        lambda examples: self.tokenizer(\n            examples[\"sent1\"], examples[\"sent2\"], max_length=max_length, truncation=True\n        ),\n        batched=True,\n        num_proc=1,\n    )\n    return dataset\n</code></pre>"},{"location":"deeponto/onto/normalisation/","title":"Ontology Pruning","text":""},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser","title":"<code>OntologyNormaliser()</code>","text":"<p>Class for ontology normalisation.</p> <p>Credit</p> <p>The code of this class originates from the mOWL library, which utilises the normalisation functionality from the Java library <code>Jcel</code>.</p> <p>The normalisation process transforms ontology axioms into normal forms in the Description Logic \\(\\mathcal{EL}\\), including:</p> <ul> <li>\\(C \\sqsubseteq D\\)</li> <li>\\(C \\sqcap C' \\sqsubseteq D\\)</li> <li>\\(C \\sqsubseteq \\exists r.D\\)</li> <li>\\(\\exists r.C \\sqsubseteq D\\)</li> </ul> <p>where \\(C\\) and \\(C'\\) can be named concepts or \\(\\top\\), \\(D\\) is a named concept or \\(\\bot\\), \\(r\\) is a role (property).</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The input ontology to be normalised.</p> <code>temp_super_class_index</code> <code>Dict[OWLCLassExpression, OWLClass]</code> <p>A dictionary in the form of <code>{complex_sub_class: temp_super_class}</code>, which means <code>temp_super_class</code> is created during the normalisation of a complex subsumption axiom that has <code>complex_sub_class</code> as the sub-class.</p> Source code in <code>src/deeponto/onto/normalisation.py</code> <pre><code>def __init__(self):\n    return\n</code></pre>"},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser.normalise","title":"<code>normalise(ontology)</code>","text":"<p>Performs the \\(\\mathcal{EL}\\) normalisation.</p> <p>Parameters:</p> Name Type Description Default <code>ontology</code> <code>Ontology</code> <p>An ontology to be normalised.</p> required <p>Returns:</p> Type Description <code>list[OWLAxiom]</code> <p>A list of normalised TBox axioms.</p> Source code in <code>src/deeponto/onto/normalisation.py</code> <pre><code>def normalise(self, ontology: Ontology):\nr\"\"\"Performs the $\\mathcal{EL}$ normalisation.\n\n    Args:\n        ontology (Ontology): An ontology to be normalised.\n\n    Returns:\n        (list[OWLAxiom]): A list of normalised TBox axioms.\n    \"\"\"\n\n    processed_owl_onto = self.preprocess_ontology(ontology)\n    root_ont = processed_owl_onto\n    translator = Translator(\n        processed_owl_onto.getOWLOntologyManager().getOWLDataFactory(), IntegerOntologyObjectFactoryImpl()\n    )\n    axioms = HashSet()\n    axioms.addAll(root_ont.getAxioms())\n    translator.getTranslationRepository().addAxiomEntities(root_ont)\n\n    for ont in root_ont.getImportsClosure():\n        axioms.addAll(ont.getAxioms())\n        translator.getTranslationRepository().addAxiomEntities(ont)\n\n    intAxioms = translator.translateSA(axioms)\n\n    normaliser = OntologyNormalizer()\n\n    factory = IntegerOntologyObjectFactoryImpl()\n    normalised_ontology = normaliser.normalize(intAxioms, factory)\n    self.rTranslator = ReverseAxiomTranslator(translator, processed_owl_onto)\n\n    normalised_axioms = []\n    # revert the jcel axioms to the original OWLAxioms\n    for ax in normalised_ontology:\n        try:\n            axiom = self.rTranslator.visit(ax)\n            normalised_axioms.append(axiom)\n        except Exception as e:\n            logging.info(\"Reverse translation. Ignoring axiom: %s\", ax)\n            logging.info(e)\n\n    return list(set(axioms))\n</code></pre>"},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser.preprocess_ontology","title":"<code>preprocess_ontology(ontology)</code>","text":"<p>Preprocess the ontology to remove axioms that are not supported by the normalisation process.</p> Source code in <code>src/deeponto/onto/normalisation.py</code> <pre><code>def preprocess_ontology(self, ontology: Ontology):\n\"\"\"Preprocess the ontology to remove axioms that are not supported by the normalisation process.\"\"\"\n\n    tbox_axioms = ontology.owl_onto.getTBoxAxioms(Imports.fromBoolean(True))\n    new_tbox_axioms = HashSet()\n\n    for axiom in tbox_axioms:\n        axiom_as_str = axiom.toString()\n\n        if \"UnionOf\" in axiom_as_str:\n            continue\n        elif \"MinCardinality\" in axiom_as_str:\n            continue\n        elif \"ComplementOf\" in axiom_as_str:\n            continue\n        elif \"AllValuesFrom\" in axiom_as_str:\n            continue\n        elif \"MaxCardinality\" in axiom_as_str:\n            continue\n        elif \"ExactCardinality\" in axiom_as_str:\n            continue\n        elif \"Annotation\" in axiom_as_str:\n            continue\n        elif \"ObjectHasSelf\" in axiom_as_str:\n            continue\n        elif \"urn:swrl\" in axiom_as_str:\n            continue\n        elif \"EquivalentObjectProperties\" in axiom_as_str:\n            continue\n        elif \"SymmetricObjectProperty\" in axiom_as_str:\n            continue\n        elif \"AsymmetricObjectProperty\" in axiom_as_str:\n            continue\n        elif \"ObjectOneOf\" in axiom_as_str:\n            continue\n        else:\n            new_tbox_axioms.add(axiom)\n\n    processed_owl_onto = ontology.owl_manager.createOntology(new_tbox_axioms)\n    # NOTE: the returned object is `owlapi.OWLOntology` not `deeponto.onto.Ontology`\n    return processed_owl_onto\n</code></pre>"},{"location":"deeponto/onto/ontology/","title":"Ontology","text":"<p>Python classes in this page are strongly dependent on the OWLAPI library.  The base class <code>Ontology</code> extends several features including convenient access to specially defined entities (e.g., <code>owl:Thing</code> and <code>owl:Nothing</code>), indexing of entities in the signature with their IRIs as keys, and some other customised functions for specific ontology engineering purposes. <code>Ontology</code> also has an  <code>OntologyReasoner</code> attribute which provides reasoning facilities such as classifying entities, checking entailment, and so on. Users who are familiar with the OWLAPI should feel relatively easy to extend the Python classes here.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology","title":"<code>Ontology(owl_path, reasoner_type='hermit')</code>","text":"<p>Ontology class that extends from the Java library OWLAPI.</p> <p>Typing from OWLAPI</p> <p>Types with <code>OWL</code> prefix are mostly imported from the OWLAPI library by, for example, <code>from org.semanticweb.owlapi.model import OWLObject</code>.</p> <p>Attributes:</p> Name Type Description <code>owl_path</code> <code>str</code> <p>The path to the OWL ontology file.</p> <code>owl_manager</code> <code>OWLOntologyManager</code> <p>A ontology manager for creating <code>OWLOntology</code>.</p> <code>owl_onto</code> <code>OWLOntology</code> <p>An <code>OWLOntology</code> created by <code>owl_manger</code> from <code>owl_path</code>.</p> <code>owl_iri</code> <code>str</code> <p>The IRI of the <code>owl_onto</code>.</p> <code>owl_classes</code> <code>dict[str, OWLClass]</code> <p>A dictionary that stores the <code>(iri, ontology_class)</code> pairs.</p> <code>owl_object_properties</code> <code>dict[str, OWLObjectProperty]</code> <p>A dictionary that stores the <code>(iri, ontology_object_property)</code> pairs.</p> <code>owl_data_properties</code> <code>dict[str, OWLDataProperty]</code> <p>A dictionary that stores the <code>(iri, ontology_data_property)</code> pairs.</p> <code>owl_annotation_properties</code> <code>dict[str, OWLAnnotationProperty]</code> <p>A dictionary that stores the <code>(iri, ontology_annotation_property)</code> pairs.</p> <code>owl_individuals</code> <code>dict[str, OWLIndividual]</code> <p>A dictionary that stores the <code>(iri, ontology_individual)</code> pairs.</p> <code>owl_data_factory</code> <code>OWLDataFactory</code> <p>A data factory for manipulating axioms.</p> <code>reasoner_type</code> <code>str</code> <p>The type of reasoner used. Defaults to <code>\"hermit\"</code>. Options are <code>[\"hermit\", \"elk\", \"struct\"]</code>.</p> <code>reasoner</code> <code>OntologyReasoner</code> <p>A reasoner for ontology inference.</p> <p>Parameters:</p> Name Type Description Default <code>owl_path</code> <code>str</code> <p>The path to the OWL ontology file.</p> required <code>reasoner_type</code> <code>str</code> <p>The type of reasoner used. Defaults to <code>\"hermit\"</code>. Options are <code>[\"hermit\", \"elk\", \"struct\"]</code>.</p> <code>'hermit'</code> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def __init__(self, owl_path: str, reasoner_type: str = \"hermit\"):\n\"\"\"Initialise a new ontology.\n\n    Args:\n        owl_path (str): The path to the OWL ontology file.\n        reasoner_type (str): The type of reasoner used. Defaults to `\"hermit\"`. Options are `[\"hermit\", \"elk\", \"struct\"]`.\n    \"\"\"\n    self.owl_path = os.path.abspath(owl_path)\n    self.owl_manager = OWLManager.createOWLOntologyManager()\n    self.owl_onto = self.owl_manager.loadOntologyFromOntologyDocument(IRI.create(File(self.owl_path)))\n    self.owl_iri = str(self.owl_onto.getOntologyID().getOntologyIRI().get())\n    self.owl_classes = self._get_owl_objects(\"Classes\")\n    self.owl_object_properties = self._get_owl_objects(\"ObjectProperties\")\n    # for some reason the top object property is included\n    if OWL_TOP_OBJECT_PROPERTY in self.owl_object_properties.keys():\n        del self.owl_object_properties[OWL_TOP_OBJECT_PROPERTY]\n    self.owl_data_properties = self._get_owl_objects(\"DataProperties\")\n    self.owl_data_factory = self.owl_manager.getOWLDataFactory()\n    self.owl_annotation_properties = self._get_owl_objects(\"AnnotationProperties\")\n    self.owl_individuals = self._get_owl_objects(\"Individuals\")\n\n    # reasoning\n    self.reasoner_type = reasoner_type\n    self.reasoner = OntologyReasoner(self, self.reasoner_type)\n\n    # hidden attributes\n    self._multi_children_classes = None\n    self._sibling_class_groups = None\n    self._axiom_type = AxiomType  # for development use\n\n    # summary\n    self.info = {\n        type(self).__name__: {\n            \"loaded_from\": os.path.basename(self.owl_path),\n            \"num_classes\": len(self.owl_classes),\n            \"num_object_properties\": len(self.owl_object_properties),\n            \"num_data_properties\": len(self.owl_data_properties),\n            \"num_annotation_properties\": len(self.owl_annotation_properties),\n            \"num_individuals\": len(self.owl_individuals),\n            \"reasoner_type\": self.reasoner_type,\n        }\n    }\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.name","title":"<code>name</code>  <code>property</code>","text":"<p>Return the name of the ontology file.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLThing","title":"<code>OWLThing</code>  <code>property</code>","text":"<p>Return <code>OWLThing</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLNothing","title":"<code>OWLNothing</code>  <code>property</code>","text":"<p>Return <code>OWLNoThing</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLTopObjectProperty","title":"<code>OWLTopObjectProperty</code>  <code>property</code>","text":"<p>Return <code>OWLTopObjectProperty</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLBottomObjectProperty","title":"<code>OWLBottomObjectProperty</code>  <code>property</code>","text":"<p>Return <code>OWLBottomObjectProperty</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLTopDataProperty","title":"<code>OWLTopDataProperty</code>  <code>property</code>","text":"<p>Return <code>OWLTopDataProperty</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLBottomDataProperty","title":"<code>OWLBottomDataProperty</code>  <code>property</code>","text":"<p>Return <code>OWLBottomDataProperty</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.sibling_class_groups","title":"<code>sibling_class_groups: List[List[str]]</code>  <code>property</code>","text":"<p>Return grouped sibling classes (with a common direct parent);</p> <p>NOTE that only groups with size &gt; 1 will be considered</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_entity_type","title":"<code>get_entity_type(entity, is_singular=False)</code>  <code>staticmethod</code>","text":"<p>A handy method to get the <code>type</code> of an <code>OWLObject</code> entity.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef get_entity_type(entity: OWLObject, is_singular: bool = False):\n\"\"\"A handy method to get the `type` of an `OWLObject` entity.\"\"\"\n    if isinstance(entity, OWLClassExpression):\n        return \"Classes\" if not is_singular else \"Class\"\n    elif isinstance(entity, OWLObjectPropertyExpression):\n        return \"ObjectProperties\" if not is_singular else \"ObjectProperty\"\n    elif isinstance(entity, OWLDataPropertyExpression):\n        return \"DataProperties\" if not is_singular else \"DataProperty\"\n    elif isinstance(entity, OWLIndividual):\n        return \"Individuals\" if not is_singular else \"Individual\"\n    else:\n        # NOTE: add further options in future\n        pass\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_max_jvm_memory","title":"<code>get_max_jvm_memory()</code>  <code>staticmethod</code>","text":"<p>Get the maximum heap size assigned to the JVM.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef get_max_jvm_memory():\n\"\"\"Get the maximum heap size assigned to the JVM.\"\"\"\n    if jpype.isJVMStarted():\n        return int(Runtime.getRuntime().maxMemory())\n    else:\n        raise RuntimeError(\"Cannot retrieve JVM memory as it is not started.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_owl_object","title":"<code>get_owl_object(iri)</code>","text":"<p>Get an <code>OWLObject</code> given its IRI.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_owl_object(self, iri: str):\n\"\"\"Get an `OWLObject` given its IRI.\"\"\"\n    if iri in self.owl_classes.keys():\n        return self.owl_classes[iri]\n    elif iri in self.owl_object_properties.keys():\n        return self.owl_object_properties[iri]\n    elif iri in self.owl_data_properties.keys():\n        return self.owl_data_properties[iri]\n    elif iri in self.owl_annotation_properties.keys():\n        return self.owl_annotation_properties[iri]\n    elif iri in self.owl_individuals.keys():\n        return self.owl_individuals[iri]\n    else:\n        raise KeyError(f\"Cannot retrieve unknown IRI: {iri}.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_iri","title":"<code>get_iri(owl_object)</code>","text":"<p>Get the IRI of an <code>OWLObject</code>. Raises an exception if there is no associated IRI.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_iri(self, owl_object: OWLObject):\n\"\"\"Get the IRI of an `OWLObject`. Raises an exception if there is no associated IRI.\"\"\"\n    try:\n        return str(owl_object.getIRI())\n    except:\n        raise RuntimeError(\"Input owl object does not have IRI.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_axiom_type","title":"<code>get_axiom_type(axiom)</code>  <code>staticmethod</code>","text":"<p>Get the axiom type (in <code>str</code>) for the given axiom.</p> <p>Check full list at: http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/AxiomType.html.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef get_axiom_type(axiom: OWLAxiom):\nr\"\"\"Get the axiom type (in `str`) for the given axiom.\n\n    Check full list at: &lt;http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/AxiomType.html&gt;.\n    \"\"\"\n    return str(axiom.getAxiomType())\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_all_axioms","title":"<code>get_all_axioms()</code>","text":"<p>Return all axioms (in a list) asserted in the ontology.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_all_axioms(self):\n\"\"\"Return all axioms (in a list) asserted in the ontology.\"\"\"\n    return list(self.owl_onto.getAxioms())\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_subsumption_axioms","title":"<code>get_subsumption_axioms(entity_type='Classes')</code>","text":"<p>Return subsumption axioms (subject to input entity type) asserted in the ontology.</p> <p>Parameters:</p> Name Type Description Default <code>entity_type</code> <code>str</code> <p>The entity type to be considered. Defaults to <code>\"Classes\"</code>. Options are <code>\"Classes\"</code>, <code>\"ObjectProperties\"</code>, <code>\"DataProperties\"</code>, and <code>\"AnnotationProperties\"</code>.</p> <code>'Classes'</code> <p>Returns:</p> Type Description <code>List[OWLAxiom]</code> <p>A list of equivalence axioms subject to input entity type.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_subsumption_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return subsumption axioms (subject to input entity type) asserted in the ontology.\n\n    Args:\n        entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n            Options are `\"Classes\"`, `\"ObjectProperties\"`, `\"DataProperties\"`, and `\"AnnotationProperties\"`.\n    Returns:\n        (List[OWLAxiom]): A list of equivalence axioms subject to input entity type.\n    \"\"\"\n    if entity_type == \"Classes\":\n        return list(self.owl_onto.getAxioms(AxiomType.SUBCLASS_OF))\n    elif entity_type == \"ObjectProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.SUB_OBJECT_PROPERTY))\n    elif entity_type == \"DataProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.SUB_DATA_PROPERTY))\n    elif entity_type == \"AnnotationProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.SUB_ANNOTATION_PROPERTY_OF))\n    else:\n        raise ValueError(f\"Unknown entity type {entity_type}.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_equivalence_axioms","title":"<code>get_equivalence_axioms(entity_type='Classes')</code>","text":"<p>Return equivalence axioms (subject to input entity type) asserted in the ontology.</p> <p>Parameters:</p> Name Type Description Default <code>entity_type</code> <code>str</code> <p>The entity type to be considered. Defaults to <code>\"Classes\"</code>. Options are <code>\"Classes\"</code>, <code>\"ObjectProperties\"</code>, and <code>\"DataProperties\"</code>.</p> <code>'Classes'</code> <p>Returns:</p> Type Description <code>list[OWLAxiom]</code> <p>A list of equivalence axioms subject to input entity type.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_equivalence_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return equivalence axioms (subject to input entity type) asserted in the ontology.\n\n    Args:\n        entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n            Options are `\"Classes\"`, `\"ObjectProperties\"`, and `\"DataProperties\"`.\n    Returns:\n        (list[OWLAxiom]): A list of equivalence axioms subject to input entity type.\n    \"\"\"\n    if entity_type == \"Classes\":\n        return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_CLASSES))\n    elif entity_type == \"ObjectProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_OBJECT_PROPERTIES))\n    elif entity_type == \"DataProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_DATA_PROPERTIES))\n    else:\n        raise ValueError(f\"Unknown entity type {entity_type}.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_assertion_axioms","title":"<code>get_assertion_axioms(entity_type='Classes')</code>","text":"<p>Return assertion (ABox) axioms (subject to input entity type) asserted in the ontology.</p> <p>Parameters:</p> Name Type Description Default <code>entity_type</code> <code>str</code> <p>The entity type to be considered. Defaults to <code>\"Classes\"</code>. Options are <code>\"Classes\"</code>, <code>\"ObjectProperties\"</code>, and <code>\"DataProperties\"</code>.</p> <code>'Classes'</code> <p>Returns:</p> Type Description <code>list[OWLAxiom]</code> <p>A list of assertion axioms subject to input entity type.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_assertion_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return assertion (ABox) axioms (subject to input entity type) asserted in the ontology.\n\n    Args:\n        entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n            Options are `\"Classes\"`, `\"ObjectProperties\"`, and `\"DataProperties\"`.\n    Returns:\n        (list[OWLAxiom]): A list of assertion axioms subject to input entity type.\n    \"\"\"\n    if entity_type == \"Classes\":\n        return list(self.owl_onto.getAxioms(AxiomType.CLASS_ASSERTION))\n    elif entity_type == \"ObjectProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.OBJECT_PROPERTY_ASSERTION))\n    elif entity_type == \"DataProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.DATA_PROPERTY_ASSERTION))\n    elif entity_type == \"Annotations\":\n        return list(self.owl_onto.getAxioms(AxiomType.ANNOTATION_ASSERTION))\n    else:\n        raise ValueError(f\"Unknown entity type {entity_type}.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_parents","title":"<code>get_asserted_parents(owl_object, named_only=False)</code>","text":"<p>Get all the asserted parents of a given owl object.</p> <p>Parameters:</p> Name Type Description Default <code>owl_object</code> <code>OWLObject</code> <p>An owl object that could have a parent.</p> required <code>named_only</code> <code>bool</code> <p>If <code>True</code>, return parents that are named classes.</p> <code>False</code> <p>Returns:</p> Type Description <code>set[OWLObject]</code> <p>The parent set of the given owl object.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_asserted_parents(self, owl_object: OWLObject, named_only: bool = False):\nr\"\"\"Get all the asserted parents of a given owl object.\n\n    Args:\n        owl_object (OWLObject): An owl object that could have a parent.\n        named_only (bool): If `True`, return parents that are named classes.\n    Returns:\n        (set[OWLObject]): The parent set of the given owl object.\n    \"\"\"\n    entity_type = self.get_entity_type(owl_object)\n    if entity_type == \"Classes\":\n        parents = set(EntitySearcher.getSuperClasses(owl_object, self.owl_onto))\n    elif entity_type.endswith(\"Properties\"):\n        parents = set(EntitySearcher.getSuperProperties(owl_object, self.owl_onto))\n    else:\n        raise ValueError(f\"Unsupported entity type {entity_type}.\")\n    if named_only:\n        parents = set([p for p in parents if self.check_named_entity(p)])\n    return parents\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_children","title":"<code>get_asserted_children(owl_object, named_only=False)</code>","text":"<p>Get all the asserted children of a given owl object.</p> <p>Parameters:</p> Name Type Description Default <code>owl_object</code> <code>OWLObject</code> <p>An owl object that could have a child.</p> required <code>named_only</code> <code>bool</code> <p>If <code>True</code>, return children that are named classes.</p> <code>False</code> <p>Returns:</p> Type Description <code>set[OWLObject]</code> <p>The children set of the given owl object.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_asserted_children(self, owl_object: OWLObject, named_only: bool = False):\nr\"\"\"Get all the asserted children of a given owl object.\n\n    Args:\n        owl_object (OWLObject): An owl object that could have a child.\n        named_only (bool): If `True`, return children that are named classes.\n    Returns:\n        (set[OWLObject]): The children set of the given owl object.\n    \"\"\"\n    entity_type = self.get_entity_type(owl_object)\n    if entity_type == \"Classes\":\n        children = set(EntitySearcher.getSubClasses(owl_object, self.owl_onto))\n    elif entity_type.endswith(\"Properties\"):\n        children = set(EntitySearcher.getSubProperties(owl_object, self.owl_onto))\n    else:\n        raise ValueError(f\"Unsupported entity type {entity_type}.\")\n    if named_only:\n        children = set([c for c in children if self.check_named_entity(c)])\n    return children\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_complex_classes","title":"<code>get_asserted_complex_classes(gci_only=False)</code>","text":"<p>Get complex classes that occur in at least one of the ontology axioms.</p> <p>Parameters:</p> Name Type Description Default <code>gci_only</code> <code>bool</code> <p>If <code>True</code>, consider complex classes that occur in GCIs only; otherwise consider those that occur in equivalence axioms as well.</p> <code>False</code> <p>Returns:</p> Type Description <code>set[OWLClassExpression]</code> <p>A set of complex classes.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_asserted_complex_classes(self, gci_only: bool = False):\n\"\"\"Get complex classes that occur in at least one of the ontology axioms.\n\n    Args:\n        gci_only (bool): If `True`, consider complex classes that occur in GCIs only; otherwise consider\n            those that occur in equivalence axioms as well.\n    Returns:\n        (set[OWLClassExpression]): A set of complex classes.\n    \"\"\"\n    complex_classes = []\n\n    for gci in self.get_subsumption_axioms(\"Classes\"):\n        super_class = gci.getSuperClass()\n        sub_class = gci.getSubClass()\n        if not OntologyReasoner.has_iri(super_class):\n            complex_classes.append(super_class)\n        if not OntologyReasoner.has_iri(sub_class):\n            complex_classes.append(sub_class)\n\n    # also considering equivalence axioms\n    if not gci_only:\n        for eq in self.get_equivalence_axioms(\"Classes\"):\n            gci = list(eq.asOWLSubClassOfAxioms())[0]\n            super_class = gci.getSuperClass()\n            sub_class = gci.getSubClass()\n            if not OntologyReasoner.has_iri(super_class):\n                complex_classes.append(super_class)\n            if not OntologyReasoner.has_iri(sub_class):\n                complex_classes.append(sub_class)\n\n    return set(complex_classes)\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_annotations","title":"<code>get_annotations(owl_object, annotation_property_iri=None, annotation_language_tag=None, apply_lowercasing=False, normalise_identifiers=False)</code>","text":"<p>Get the annotation literals of the given <code>OWLObject</code>.</p> <p>Parameters:</p> Name Type Description Default <code>owl_object</code> <code>Union[OWLObject, str]</code> <p>An <code>OWLObject</code> or its IRI.</p> required <code>annotation_property_iri</code> <code>str</code> <p>Any particular annotation property IRI of interest. Defaults to <code>None</code>.</p> <code>None</code> <code>annotation_language_tag</code> <code>str</code> <p>Any particular annotation language tag of interest; NOTE that not every annotation has a language tag, in this case assume it is in English. Defaults to <code>None</code>. Options are <code>\"en\"</code>, <code>\"ge\"</code> etc.</p> <code>None</code> <code>apply_lowercasing</code> <code>bool</code> <p>Whether or not to apply lowercasing to annotation literals. Defaults to <code>False</code>.</p> <code>False</code> <code>normalise_identifiers</code> <code>bool</code> <p>Whether to normalise annotation text that is in the Java identifier format. Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>set[str]</code> <p>A set of annotation literals of the given <code>OWLObject</code>.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_annotations(\n    self,\n    owl_object: Union[OWLObject, str],\n    annotation_property_iri: Optional[str] = None,\n    annotation_language_tag: Optional[str] = None,\n    apply_lowercasing: bool = False,\n    normalise_identifiers: bool = False,\n):\n\"\"\"Get the annotation literals of the given `OWLObject`.\n\n    Args:\n        owl_object (Union[OWLObject, str]): An `OWLObject` or its IRI.\n        annotation_property_iri (str, optional):\n            Any particular annotation property IRI of interest. Defaults to `None`.\n        annotation_language_tag (str, optional):\n            Any particular annotation language tag of interest; NOTE that not every\n            annotation has a language tag, in this case assume it is in English.\n            Defaults to `None`. Options are `\"en\"`, `\"ge\"` etc.\n        apply_lowercasing (bool): Whether or not to apply lowercasing to annotation literals.\n            Defaults to `False`.\n        normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format.\n            Defaults to `False`.\n    Returns:\n        (set[str]): A set of annotation literals of the given `OWLObject`.\n    \"\"\"\n    if isinstance(owl_object, str):\n        owl_object = self.get_owl_object(owl_object)\n\n    annotation_property = None\n    if annotation_property_iri:\n        # return an empty list if `annotation_property_iri` does not exist in this OWLOntology`\n        annotation_property = self.get_owl_object(annotation_property_iri)\n\n    annotations = []\n    for annotation in EntitySearcher.getAnnotations(owl_object, self.owl_onto, annotation_property):\n        annotation = annotation.getValue()\n        # boolean that indicates whether the annotation's language is of interest\n        fit_language = False\n        if not annotation_language_tag:\n            # it is set to `True` if `annotation_langauge` is not specified\n            fit_language = True\n        else:\n            # restrict the annotations to a language if specified\n            try:\n                # NOTE: not every annotation has a language attribute\n                fit_language = annotation.getLang() == annotation_language_tag\n            except:\n                # in the case when this annotation has no language tag\n                # we assume it is in English\n                if annotation_language_tag == \"en\":\n                    fit_language = True\n\n        if fit_language:\n            # only get annotations that have a literal value\n            if annotation.isLiteral():\n                annotations.append(\n                    process_annotation_literal(\n                        str(annotation.getLiteral()), apply_lowercasing, normalise_identifiers\n                    )\n                )\n\n    return uniqify(annotations)\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_consistency","title":"<code>check_consistency()</code>","text":"<p>Check if the ontology is consistent according to the pre-loaded reasoner.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_consistency(self):\n\"\"\"Check if the ontology is consistent according to the pre-loaded reasoner.\n    \"\"\"\n    logging.info(f\"Checking consistency with `{self.reasoner_type}` reasoner.\")\n    return self.reasoner.owl_reasoner.isConsistent()\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_named_entity","title":"<code>check_named_entity(owl_object)</code>","text":"<p>Check if the input entity is a named atomic entity. That is, it is not a complex entity, \\(\\top\\), or \\(\\bot\\).</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_named_entity(self, owl_object: OWLObject):\nr\"\"\"Check if the input entity is a named atomic entity. That is,\n    it is not a complex entity, $\\top$, or $\\bot$.\n    \"\"\"\n    entity_type = self.get_entity_type(owl_object)\n    top = TOP_BOTTOMS[entity_type].TOP\n    bottom = TOP_BOTTOMS[entity_type].BOTTOM\n    if OntologyReasoner.has_iri(owl_object):\n        iri = str(owl_object.getIRI())\n        # check if the entity is TOP or BOTTOM\n        return iri != top and iri != bottom\n    return False\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_deprecated","title":"<code>check_deprecated(owl_object)</code>","text":"<p>Check if the given OWL object is marked as deprecated according to \\(\\texttt{owl:deprecated}\\).</p> <p>NOTE: the string literal indicating deprecation is either <code>'true'</code> or <code>'True'</code>. Also, if \\(\\texttt{owl:deprecated}\\) is not defined in this ontology, return <code>False</code> by default.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_deprecated(self, owl_object: OWLObject):\nr\"\"\"Check if the given OWL object is marked as deprecated according to $\\texttt{owl:deprecated}$.\n\n    NOTE: the string literal indicating deprecation is either `'true'` or `'True'`. Also, if $\\texttt{owl:deprecated}$\n    is not defined in this ontology, return `False` by default.\n    \"\"\"\n    if not OWL_DEPRECATED in self.owl_annotation_properties.keys():\n        # return False if owl:deprecated is not defined in this ontology\n        return False\n\n    deprecated = self.get_annotations(owl_object, annotation_property_iri=OWL_DEPRECATED)\n    if deprecated and (list(deprecated)[0] == \"true\" or list(deprecated)[0] == \"True\"):\n        return True\n    else:\n        return False\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.save_onto","title":"<code>save_onto(save_path)</code>","text":"<p>Save the ontology file to the given path.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def save_onto(self, save_path: str):\n\"\"\"Save the ontology file to the given path.\"\"\"\n    self.owl_onto.saveOntology(IRI.create(File(save_path).toURI()))\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.build_annotation_index","title":"<code>build_annotation_index(annotation_property_iris=[RDFS_LABEL], entity_type='Classes', apply_lowercasing=False, normalise_identifiers=False)</code>","text":"<p>Build an annotation index for a given type of entities.</p> <p>Parameters:</p> Name Type Description Default <code>annotation_property_iris</code> <code>list[str]</code> <p>A list of annotation property IRIs (it is possible that not every annotation property IRI is in use); if not provided, the built-in <code>rdfs:label</code> is considered. Defaults to <code>[RDFS_LABEL]</code>.</p> <code>[RDFS_LABEL]</code> <code>entity_type</code> <code>str</code> <p>The entity type to be considered. Defaults to <code>\"Classes\"</code>. Options are <code>\"Classes\"</code>, <code>\"ObjectProperties\"</code>, <code>\"DataProperties\"</code>, etc.</p> <code>'Classes'</code> <code>apply_lowercasing</code> <code>bool</code> <p>Whether or not to apply lowercasing to annotation literals. Defaults to <code>True</code>.</p> <code>False</code> <code>normalise_identifiers</code> <code>bool</code> <p>Whether to normalise annotation text that is in the Java identifier format. Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>Tuple[dict, list[str]]</code> <p>The built annotation index, and the list of annotation property IRIs that are in use.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def build_annotation_index(\n    self,\n    annotation_property_iris: List[str] = [RDFS_LABEL],\n    entity_type: str = \"Classes\",\n    apply_lowercasing: bool = False,\n    normalise_identifiers: bool = False,\n):\n\"\"\"Build an annotation index for a given type of entities.\n\n    Args:\n        annotation_property_iris (list[str]): A list of annotation property IRIs (it is possible\n            that not every annotation property IRI is in use); if not provided, the built-in\n            `rdfs:label` is considered. Defaults to `[RDFS_LABEL]`.\n        entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n            Options are `\"Classes\"`, `\"ObjectProperties\"`, `\"DataProperties\"`, etc.\n        apply_lowercasing (bool): Whether or not to apply lowercasing to annotation literals.\n            Defaults to `True`.\n        normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format.\n            Defaults to `False`.\n\n    Returns:\n        (Tuple[dict, list[str]]): The built annotation index, and the list of annotation property IRIs that are in use.\n    \"\"\"\n\n    annotation_index = defaultdict(set)\n    # example: Classes =&gt; owl_classes; ObjectProperties =&gt; owl_object_properties\n    entity_type = \"owl_\" + split_java_identifier(entity_type).replace(\" \", \"_\").lower()\n    entity_index = getattr(self, entity_type)\n\n    # preserve available annotation properties\n    annotation_property_iris = [\n        airi for airi in annotation_property_iris if airi in self.owl_annotation_properties.keys()\n    ]\n\n    # build the annotation index without duplicated literals\n    for airi in annotation_property_iris:\n        for iri, entity in entity_index.items():\n            annotation_index[iri].update(\n                self.get_annotations(\n                    owl_object=entity,\n                    annotation_property_iri=airi,\n                    annotation_language_tag=None,\n                    apply_lowercasing=apply_lowercasing,\n                    normalise_identifiers=normalise_identifiers,\n                )\n            )\n\n    return annotation_index, annotation_property_iris\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.build_inverted_annotation_index","title":"<code>build_inverted_annotation_index(annotation_index, tokenizer)</code>  <code>staticmethod</code>","text":"<p>Build an inverted annotation index given an annotation index and a tokenizer.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef build_inverted_annotation_index(annotation_index: dict, tokenizer: Tokenizer):\n\"\"\"Build an inverted annotation index given an annotation index and a tokenizer.\"\"\"\n    return InvertedIndex(annotation_index, tokenizer)\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.add_axiom","title":"<code>add_axiom(owl_axiom, return_undo=True)</code>","text":"<p>Add an axiom into the current ontology.</p> <p>Parameters:</p> Name Type Description Default <code>owl_axiom</code> <code>OWLAxiom</code> <p>An axiom to be added.</p> required <code>return_undo</code> <code>bool</code> <p>Returning the undo operation or not. Defaults to <code>True</code>.</p> <code>True</code> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def add_axiom(self, owl_axiom: OWLAxiom, return_undo: bool = True):\n\"\"\"Add an axiom into the current ontology.\n\n    Args:\n        owl_axiom (OWLAxiom): An axiom to be added.\n        return_undo (bool, optional): Returning the undo operation or not. Defaults to `True`.\n    \"\"\"\n    change = AddAxiom(self.owl_onto, owl_axiom)\n    result = self.owl_onto.applyChange(change)\n    logger.info(f\"[{str(result)}] Adding the axiom {str(owl_axiom)} into the ontology.\")\n    if return_undo:\n        return change.reverseChange()\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.remove_axiom","title":"<code>remove_axiom(owl_axiom, return_undo=True)</code>","text":"<p>Remove an axiom from the current ontology.</p> <p>Parameters:</p> Name Type Description Default <code>owl_axiom</code> <code>OWLAxiom</code> <p>An axiom to be removed.</p> required <code>return_undo</code> <code>bool</code> <p>Returning the undo operation or not. Defaults to <code>True</code>.</p> <code>True</code> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def remove_axiom(self, owl_axiom: OWLAxiom, return_undo: bool = True):\n\"\"\"Remove an axiom from the current ontology.\n\n    Args:\n        owl_axiom (OWLAxiom): An axiom to be removed.\n        return_undo (bool, optional): Returning the undo operation or not. Defaults to `True`.\n    \"\"\"\n    change = RemoveAxiom(self.owl_onto, owl_axiom)\n    result = self.owl_onto.applyChange(change)\n    logger.info(f\"[{str(result)}] Removing the axiom {str(owl_axiom)} from the ontology.\")\n    if return_undo:\n        return change.reverseChange()\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.replace_entity","title":"<code>replace_entity(owl_object, entity_iri, replacement_iri)</code>","text":"<p>Replace an entity in a class expression with another entity.</p> <p>Parameters:</p> Name Type Description Default <code>owl_object</code> <code>OWLObject</code> <p>An <code>OWLObject</code> entity to be manipulated.</p> required <code>entity_iri</code> <code>str</code> <p>IRI of the entity to be replaced.</p> required <code>replacement_iri</code> <code>str</code> <p>IRI of the entity to replace.</p> required <p>Returns:</p> Type Description <code>OWLObject</code> <p>The changed <code>OWLObject</code> entity.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def replace_entity(self, owl_object: OWLObject, entity_iri: str, replacement_iri: str):\n\"\"\"Replace an entity in a class expression with another entity.\n\n    Args:\n        owl_object (OWLObject): An `OWLObject` entity to be manipulated.\n        entity_iri (str): IRI of the entity to be replaced.\n        replacement_iri (str): IRI of the entity to replace.\n\n    Returns:\n        (OWLObject): The changed `OWLObject` entity.\n    \"\"\"\n    iri_dict = {IRI.create(entity_iri): IRI.create(replacement_iri)}\n    replacer = OWLObjectDuplicator(self.owl_data_factory, iri_dict)\n    return replacer.duplicateObject(owl_object)\n</code></pre>"},{"location":"deeponto/onto/projection/","title":"Ontology Projection","text":""},{"location":"deeponto/onto/projection/#deeponto.onto.projection.OntologyProjector","title":"<code>OntologyProjector(bidirectional_taxonomy=False, only_taxonomy=False, include_literals=False)</code>","text":"<p>Class for ontology projection -- transforming ontology axioms into triples.</p> <p>Credit</p> <p>The code of this class originates from the mOWL library.</p> <p>Attributes:</p> Name Type Description <code>bidirectional_taxonomy</code> <code>bool</code> <p>If <code>True</code> then per each <code>SubClass</code> edge one <code>SuperClass</code> edge will be generated. Defaults to <code>False</code>.</p> <code>only_taxonomy</code> <code>bool</code> <p>If <code>True</code>, then projection will only include <code>subClass</code> edges. Defaults to <code>False</code>.</p> <code>include_literals</code> <code>bool</code> <p>If <code>True</code> the projection will also include triples involving data property assertions and annotations. Defaults to <code>False</code>.</p> <p>Parameters:</p> Name Type Description Default <code>bidirectional_taxonomy</code> <code>bool</code> <p>description. If <code>True</code> then per each <code>SubClass</code> edge one <code>SuperClass</code> edge will be generated. Defaults to <code>False</code>.</p> <code>False</code> <code>only_taxonomy</code> <code>bool</code> <p>If <code>True</code>, then projection will only include <code>subClass</code> edges. Defaults to <code>False</code>.</p> <code>False</code> <code>include_literals</code> <code>bool</code> <p>description. If <code>True</code> the projection will also include triples involving data property assertions and annotations. Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/onto/projection.py</code> <pre><code>def __init__(self, bidirectional_taxonomy: bool=False, only_taxonomy: bool=False, include_literals: bool=False):\n\"\"\"Initialise an ontology projector.\n\n    Args:\n        bidirectional_taxonomy (bool, optional): _description_. If `True` then per each `SubClass` edge one `SuperClass` edge will\n            be generated. Defaults to `False`.\n        only_taxonomy (bool, optional): If `True`, then projection will only include `subClass` edges. Defaults to `False`.\n        include_literals (bool, optional): _description_. If `True` the projection will also include triples involving data property\n            assertions and annotations. Defaults to `False`.\n    \"\"\"\n    self.bidirectional_taxonomy = bidirectional_taxonomy\n    self.include_literals = include_literals\n    self.only_taxonomy = only_taxonomy\n    self.projector = Projector(self.bidirectional_taxonomy, self.only_taxonomy,\n                               self.include_literals)\n</code></pre>"},{"location":"deeponto/onto/projection/#deeponto.onto.projection.OntologyProjector.project","title":"<code>project(ontology)</code>","text":"<p>The projection algorithm implemented in OWL2Vec*.</p> <p>Parameters:</p> Name Type Description Default <code>ontology</code> <code>Ontology</code> <p>An ontology to be processed.</p> required <p>Returns:</p> Type Description <code>set</code> <p>Set of triples after projection.</p> Source code in <code>src/deeponto/onto/projection.py</code> <pre><code>def project(self, ontology: Ontology):\n\"\"\"The projection algorithm implemented in OWL2Vec*.\n\n    Args:\n        ontology (Ontology): An ontology to be processed.\n\n    Returns:\n        (set): Set of triples after projection.\n    \"\"\"\n    ontology = ontology.owl_onto\n    if not isinstance(ontology, OWLOntology):\n        raise TypeError(\n            \"Input ontology must be of type `org.semanticweb.owlapi.model.OWLOntology`.\")\n    edges = self.projector.project(ontology)\n    triples = []\n    for e in edges:\n        s, r, o = str(e.src()), str(e.rel()), str(e.dst())\n        if o != \"\":\n            if r == \"http://subclassof\":\n                r = str(RDFS.subClassOf)\n            triples.append((s, r, o))\n    return set(triples)\n</code></pre>"},{"location":"deeponto/onto/pruning/","title":"Ontology Pruning","text":""},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner","title":"<code>OntologyPruner(onto)</code>","text":"<p>Class for in-place ontology pruning.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The input ontology to be pruned. Note that the pruning process is in-place.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The input ontology to be pruned. Note that the pruning process is in-place.</p> required Source code in <code>src/deeponto/onto/pruning.py</code> <pre><code>def __init__(self, onto: Ontology):\n\"\"\"Initialise an ontology pruner.\n\n    Args:\n        onto (Ontology): The input ontology to be pruned. Note that the pruning process is in-place.\n    \"\"\"\n    self.onto = onto\n    self._pruning_applied = None\n</code></pre>"},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner.save_onto","title":"<code>save_onto(save_path)</code>","text":"<p>Save the pruned ontology file to the given path.</p> Source code in <code>src/deeponto/onto/pruning.py</code> <pre><code>def save_onto(self, save_path: str):\n\"\"\"Save the pruned ontology file to the given path.\"\"\"\n    logging.info(f\"{self._pruning_applied} pruning algorithm has been applied.\")\n    logging.info(f\"Save the pruned ontology file to {save_path}.\")\n    return self.onto.save_onto(save_path)\n</code></pre>"},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner.prune","title":"<code>prune(class_iris_to_be_removed)</code>","text":"<p>Apply ontology pruning while preserving the relevant hierarchy.</p> <p>paper</p> <p>This refers to the ontology pruning algorithm introduced in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).</p> <p>For each class \\(c\\) to be pruned, subsumption axioms will be created between \\(c\\)'s parents and children so as to preserve the relevant hierarchy.</p> <p>Parameters:</p> Name Type Description Default <code>class_iris_to_be_removed</code> <code>list[str]</code> <p>Classes with IRIs in this list will be pruned and the relevant hierarchy will be repaired.</p> required Source code in <code>src/deeponto/onto/pruning.py</code> <pre><code>def prune(self, class_iris_to_be_removed: List[str]):\nr\"\"\"Apply ontology pruning while preserving the relevant hierarchy.\n\n    !!! credit \"paper\"\n\n        This refers to the ontology pruning algorithm introduced in the paper:\n        [*Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022)*](https://link.springer.com/chapter/10.1007/978-3-031-19433-7_33).\n\n    For each class $c$ to be pruned, subsumption axioms will be created between $c$'s parents and children so as to preserve the\n    relevant hierarchy.\n\n    Args:\n        class_iris_to_be_removed (list[str]): Classes with IRIs in this list will be pruned and the relevant hierarchy will be repaired.\n    \"\"\"\n\n    # create the subsumption axioms first\n    for cl_iri in class_iris_to_be_removed:\n        cl = self.onto.get_owl_object(cl_iri)\n        cl_parents = self.onto.get_asserted_parents(cl)\n        cl_children = self.onto.get_asserted_children(cl)\n        for parent, child in itertools.product(cl_parents, cl_children):\n            sub_axiom = self.onto.owl_data_factory.getOWLSubClassOfAxiom(child, parent)\n            self.onto.add_axiom(sub_axiom)\n\n    # apply pruning\n    class_remover = OWLEntityRemover(Collections.singleton(self.onto.owl_onto))\n    for cl_iri in class_iris_to_be_removed:\n        cl = self.onto.get_owl_object(cl_iri)\n        cl.accept(class_remover)\n    self.onto.owl_manager.applyChanges(class_remover.getChanges())\n</code></pre>"},{"location":"deeponto/onto/reasoning/","title":"Ontology Reasoning","text":""},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner","title":"<code>OntologyReasoner(onto, reasoner_type)</code>","text":"<p>Ontology reasoner class that extends from the Java library OWLAPI.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The input <code>deeponto</code> ontology.</p> <code>owl_reasoner_factory</code> <code>OWLReasonerFactory</code> <p>A reasoner factory for creating a reasoner.</p> <code>owl_reasoner</code> <code>OWLReasoner</code> <p>The created reasoner.</p> <code>owl_data_factory</code> <code>OWLDataFactory</code> <p>A data factory (inherited from <code>onto</code>) for manipulating axioms.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The input ontology to conduct reasoning on.</p> required <code>reasoner_type</code> <code>str</code> <p>The type of reasoner used. Options are <code>[\"hermit\", \"elk\", \"struct\"]</code>.</p> required Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def __init__(self, onto: Ontology, reasoner_type: str):\n\"\"\"Initialise an ontology reasoner.\n\n    Args:\n        onto (Ontology): The input ontology to conduct reasoning on.\n        reasoner_type (str): The type of reasoner used. Options are `[\"hermit\", \"elk\", \"struct\"]`.\n    \"\"\"\n    self.onto = onto\n    self.owl_reasoner_factory = None\n    self.owl_reasoner = None\n    self.reasoner_type = reasoner_type\n    self.load_reasoner(self.reasoner_type)\n    self.owl_data_factory = self.onto.owl_data_factory\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.load_reasoner","title":"<code>load_reasoner(reasoner_type)</code>","text":"<p>Load a new reaonser and dispose the old one if existed.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def load_reasoner(self, reasoner_type: str):\n\"\"\"Load a new reaonser and dispose the old one if existed.\"\"\"\n    assert reasoner_type in REASONER_DICT.keys(), f\"Unknown or unsupported reasoner type: {reasoner_type}.\"\n\n    if self.owl_reasoner:\n        self.owl_reasoner.dispose()\n\n    self.reasoner_type = reasoner_type\n    self.owl_reasoner_factory = REASONER_DICT[self.reasoner_type]()\n    # TODO: remove ELK message\n    # somehow Level.ERROR does not prevent the INFO message from ELK\n    # Logger.getLogger(\"org.semanticweb.elk\").setLevel(Level.OFF)\n\n    self.owl_reasoner = self.owl_reasoner_factory.createReasoner(self.onto.owl_onto)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_entity_type","title":"<code>get_entity_type(entity, is_singular=False)</code>  <code>staticmethod</code>","text":"<p>A handy method to get the type of an entity (<code>OWLObject</code>).</p> <p>NOTE: This method is inherited from the Ontology Class.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef get_entity_type(entity: OWLObject, is_singular: bool = False):\n\"\"\"A handy method to get the type of an entity (`OWLObject`).\n\n    NOTE: This method is inherited from the Ontology Class.\n    \"\"\"\n    return Ontology.get_entity_type(entity, is_singular)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.has_iri","title":"<code>has_iri(entity)</code>  <code>staticmethod</code>","text":"<p>Check if an entity has an IRI.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef has_iri(entity: OWLObject):\n\"\"\"Check if an entity has an IRI.\"\"\"\n    try:\n        entity.getIRI()\n        return True\n    except:\n        return False\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_inferred_super_entities","title":"<code>get_inferred_super_entities(entity, direct=False)</code>","text":"<p>Return the IRIs of named super-entities of a given <code>OWLObject</code> according to the reasoner.</p> <p>A mixture of <code>getSuperClasses</code>, <code>getSuperObjectProperties</code>, <code>getSuperDataProperties</code> functions imported from the OWLAPI reasoner. The type of input entity will be automatically determined. The top entity such as <code>owl:Thing</code> is ignored.</p> <p>Parameters:</p> Name Type Description Default <code>entity</code> <code>OWLObject</code> <p>An <code>OWLObject</code> entity of interest.</p> required <code>direct</code> <code>bool</code> <p>Return parents (<code>direct=True</code>) or ancestors (<code>direct=False</code>). Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>list[str]</code> <p>A list of IRIs of the super-entities of the given <code>OWLObject</code> entity.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_inferred_super_entities(self, entity: OWLObject, direct: bool = False):\nr\"\"\"Return the IRIs of named super-entities of a given `OWLObject` according to the reasoner.\n\n    A mixture of `getSuperClasses`, `getSuperObjectProperties`, `getSuperDataProperties`\n    functions imported from the OWLAPI reasoner. The type of input entity will be\n    automatically determined. The top entity such as `owl:Thing` is ignored.\n\n\n    Args:\n        entity (OWLObject): An `OWLObject` entity of interest.\n        direct (bool, optional): Return parents (`direct=True`) or\n            ancestors (`direct=False`). Defaults to `False`.\n\n    Returns:\n        (list[str]): A list of IRIs of the super-entities of the given `OWLObject` entity.\n    \"\"\"\n    entity_type = self.get_entity_type(entity)\n    get_super = f\"getSuper{entity_type}\"\n    TOP = TOP_BOTTOMS[entity_type].TOP  # get the corresponding TOP entity\n    super_entities = getattr(self.owl_reasoner, get_super)(entity, direct).getFlattened()\n    super_entity_iris = [str(s.getIRI()) for s in super_entities]\n    # the root node is owl#Thing\n    if TOP in super_entity_iris:\n        super_entity_iris.remove(TOP)\n    return super_entity_iris\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_inferred_sub_entities","title":"<code>get_inferred_sub_entities(entity, direct=False)</code>","text":"<p>Return the IRIs of named sub-entities of a given <code>OWLObject</code> according to the reasoner.</p> <p>A mixture of <code>getSubClasses</code>, <code>getSubObjectProperties</code>, <code>getSubDataProperties</code> functions imported from the OWLAPI reasoner. The type of input entity will be automatically determined. The bottom entity such as <code>owl:Nothing</code> is ignored.</p> <p>Parameters:</p> Name Type Description Default <code>entity</code> <code>OWLObject</code> <p>An <code>OWLObject</code> entity of interest.</p> required <code>direct</code> <code>bool</code> <p>Return parents (<code>direct=True</code>) or ancestors (<code>direct=False</code>). Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>list[str]</code> <p>A list of IRIs of the sub-entities of the given <code>OWLObject</code> entity.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_inferred_sub_entities(self, entity: OWLObject, direct: bool = False):\n\"\"\"Return the IRIs of named sub-entities of a given `OWLObject` according to the reasoner.\n\n    A mixture of `getSubClasses`, `getSubObjectProperties`, `getSubDataProperties`\n    functions imported from the OWLAPI reasoner. The type of input entity will be\n    automatically determined. The bottom entity such as `owl:Nothing` is ignored.\n\n    Args:\n        entity (OWLObject): An `OWLObject` entity of interest.\n        direct (bool, optional): Return parents (`direct=True`) or\n            ancestors (`direct=False`). Defaults to `False`.\n\n    Returns:\n        (list[str]): A list of IRIs of the sub-entities of the given `OWLObject` entity.\n    \"\"\"\n    entity_type = self.get_entity_type(entity)\n    get_sub = f\"getSub{entity_type}\"\n    BOTTOM = TOP_BOTTOMS[entity_type].BOTTOM\n    sub_entities = getattr(self.owl_reasoner, get_sub)(entity, direct).getFlattened()\n    sub_entity_iris = [str(s.getIRI()) for s in sub_entities]\n    # the root node is owl#Thing\n    if BOTTOM in sub_entity_iris:\n        sub_entity_iris.remove(BOTTOM)\n    return sub_entity_iris\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_subsumption","title":"<code>check_subsumption(sub_entity, super_entity)</code>","text":"<p>Check if the first entity is subsumed by the second entity according to the reasoner.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_subsumption(self, sub_entity: OWLObject, super_entity: OWLObject):\n\"\"\"Check if the first entity is subsumed by the second entity according to the reasoner.\"\"\"\n    entity_type = self.get_entity_type(sub_entity, is_singular=True)\n    assert entity_type == self.get_entity_type(super_entity, is_singular=True)\n\n    sub_axiom = getattr(self.owl_data_factory, f\"getOWLSub{entity_type}OfAxiom\")(sub_entity, super_entity)\n\n    return self.owl_reasoner.isEntailed(sub_axiom)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_disjoint","title":"<code>check_disjoint(entity1, entity2)</code>","text":"<p>Check if two entities are disjoint according to the reasoner.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_disjoint(self, entity1: OWLObject, entity2: OWLObject):\n\"\"\"Check if two entities are disjoint according to the reasoner.\"\"\"\n    entity_type = self.get_entity_type(entity1)\n    assert entity_type == self.get_entity_type(entity2)\n\n    disjoint_axiom = getattr(self.owl_data_factory, f\"getOWLDisjoint{entity_type}Axiom\")([entity1, entity2])\n\n    return self.owl_reasoner.isEntailed(disjoint_axiom)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_common_descendants","title":"<code>check_common_descendants(entity1, entity2)</code>","text":"<p>Check if two entities have a common decendant.</p> <p>Entities can be OWL class or property expressions, and can be either atomic or complex. It takes longer computation time for the complex ones. Complex entities do not have an IRI. This method is optimised in the way that if there exists an atomic entity <code>A</code>, we compute descendants for <code>A</code> and compare them against the other entity which could be complex.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_common_descendants(self, entity1: OWLObject, entity2: OWLObject):\n\"\"\"Check if two entities have a common decendant.\n\n    Entities can be **OWL class or property expressions**, and can be either **atomic\n    or complex**. It takes longer computation time for the complex ones. Complex\n    entities do not have an IRI. This method is optimised in the way that if\n    there exists an atomic entity `A`, we compute descendants for `A` and\n    compare them against the other entity which could be complex.\n    \"\"\"\n    entity_type = self.get_entity_type(entity1)\n    assert entity_type == self.get_entity_type(entity2)\n\n    if not self.has_iri(entity1) and not self.has_iri(entity2):\n        logger.warn(\"Computing descendants for two complex entities is not efficient.\")\n\n    # `computed` is the one we compute the descendants\n    # `compared` is the one we compare `computed`'s descendant one-by-one\n    # we set the atomic entity as `computed` for efficiency if there is one\n    computed, compared = entity1, entity2\n    if not self.has_iri(entity1) and self.has_iri(entity2):\n        computed, compared = entity2, entity1\n\n    # for every inferred child of `computed`, check if it is subsumed by `compared``\n    for descendant_iri in self.get_inferred_sub_entities(computed, direct=False):\n        # print(\"check a subsumption\")\n        if self.check_subsumption(self.onto.get_owl_object(descendant_iri), compared):\n            return True\n    return False\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.instances_of","title":"<code>instances_of(owl_class, direct=False)</code>","text":"<p>Return the list of named individuals that are instances of a given OWL class expression.</p> <p>Parameters:</p> Name Type Description Default <code>owl_class</code> <code>OWLClassExpression</code> <p>An ontology class of interest.</p> required <code>direct</code> <code>bool</code> <p>Return direct instances (<code>direct=True</code>) or also include the sub-classes' instances (<code>direct=False</code>). Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>list[OWLIndividual]</code> <p>A list of named individuals that are instances of <code>owl_class</code>.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def instances_of(self, owl_class: OWLClassExpression, direct: bool = False):\n\"\"\"Return the list of named individuals that are instances of a given OWL class expression.\n\n    Args:\n        owl_class (OWLClassExpression): An ontology class of interest.\n        direct (bool, optional): Return direct instances (`direct=True`) or\n            also include the sub-classes' instances (`direct=False`). Defaults to `False`.\n\n    Returns:\n        (list[OWLIndividual]): A list of named individuals that are instances of `owl_class`.\n    \"\"\"\n    return list(self.owl_reasoner.getInstances(owl_class, direct).getFlattened())\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_instance","title":"<code>check_instance(owl_instance, owl_class)</code>","text":"<p>Check if a named individual is an instance of an OWL class.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_instance(self, owl_instance: OWLIndividual, owl_class: OWLClassExpression):\n\"\"\"Check if a named individual is an instance of an OWL class.\"\"\"\n    assertion_axiom = self.owl_data_factory.getOWLClassAssertionAxiom(owl_class, owl_instance)\n    return self.owl_reasoner.isEntailed(assertion_axiom)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_common_instances","title":"<code>check_common_instances(owl_class1, owl_class2)</code>","text":"<p>Check if two OWL class expressions have a common instance.</p> <p>Class expressions can be atomic or complex, and it takes longer computation time for the complex ones. Complex classes do not have an IRI. This method is optimised in the way that if there exists an atomic class <code>A</code>, we compute instances for <code>A</code> and compare them against the other class which could be complex.</p> <p>Difference with <code>check_common_descendants</code></p> <p>The inputs of this function are restricted to OWL class expressions. This is because <code>descendant</code> is related to hierarchy and both class and property expressions have a hierarchy, but <code>instance</code> is restricted to classes.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_common_instances(self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression):\n\"\"\"Check if two OWL class expressions have a common instance.\n\n    Class expressions can be **atomic or complex**, and it takes longer computation time\n    for the complex ones. Complex classes do not have an IRI. This method is optimised\n    in the way that if there exists an atomic class `A`, we compute instances for `A` and\n    compare them against the other class which could be complex.\n\n    !!! note \"Difference with [`check_common_descendants`][deeponto.onto.OntologyReasoner.check_common_descendants]\"\n        The inputs of this function are restricted to **OWL class expressions**. This is because\n        `descendant` is related to hierarchy and both class and property expressions have a hierarchy,\n        but `instance` is restricted to classes.\n    \"\"\"\n\n    if not self.has_iri(owl_class1) and not self.has_iri(owl_class2):\n        logger.warn(\"Computing instances for two complex classes is not efficient.\")\n\n    # `computed` is the one we compute the instances\n    # `compared` is the one we compare `computed`'s descendant one-by-one\n    # we set the atomic entity as `computed` for efficiency if there is one\n    computed, compared = owl_class1, owl_class2\n    if not self.has_iri(owl_class1) and self.has_iri(owl_class2):\n        computed, compared = owl_class2, owl_class2\n\n    # for every inferred instance of `computed`, check if it is subsumed by `compared``\n    for instance in self.instances_of(computed, direct=False):\n        if self.check_instance(instance, compared):\n            return True\n    return False\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_assumed_disjoint","title":"<code>check_assumed_disjoint(owl_class1, owl_class2)</code>","text":"<p>Check if two OWL class expressions satisfy the Assumed Disjointness.</p> <p>Paper</p> <p>The definition of Assumed Disjointness comes from the paper: Language Model Analysis for Ontology Subsumption Inference.</p> <p>Assumed Disjointness (Definition)</p> <p>Two class expressions \\(C\\) and \\(D\\) are assumed to be disjoint if they meet the followings:</p> <ol> <li>By adding the disjointness axiom of them into the ontology, \\(C\\) and \\(D\\) are still satisfiable.</li> <li>\\(C\\) and \\(D\\) do not have a common descendant (otherwise \\(C\\) and \\(D\\) can be satisfiable but their common descendants become the bottom \\(\\bot\\).)</li> </ol> <p>Note that the special case where \\(C\\) and \\(D\\) are already disjoint is covered by the first check. The paper also proposed a practical alternative to decide Assumed Disjointness. See <code>check_assumed_disjoint_alternative</code>.</p> <p>Examples:</p> <p>Suppose pre-load an ontology <code>onto</code> from the disease ontology file <code>doid.owl</code>.</p> <pre><code>&gt;&gt;&gt; c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n&gt;&gt;&gt; c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n&gt;&gt;&gt; onto.reasoner.check_assumed_disjoint(c1, c2)\n[SUCCESSFULLY] Adding the axiom DisjointClasses(&lt;http://purl.obolibrary.org/obo/DOID_0001816&gt; &lt;http://purl.obolibrary.org/obo/DOID_4058&gt;) into the ontology.\n[CHECK1 True] input classes are still satisfiable;\n[SUCCESSFULLY] Removing the axiom from the ontology.\n[CHECK2 False] input classes have NO common descendant.\n[PASSED False] assumed disjointness check done.\nFalse\n</code></pre> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_assumed_disjoint(self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression):\nr\"\"\"Check if two OWL class expressions satisfy the Assumed Disjointness.\n\n    !!! credit \"Paper\"\n\n        The definition of **Assumed Disjointness** comes from the paper:\n        [Language Model Analysis for Ontology Subsumption Inference](https://aclanthology.org/2023.findings-acl.213).\n\n    !!! note \"Assumed Disjointness (Definition)\"\n        Two class expressions $C$ and $D$ are assumed to be disjoint if they meet the followings:\n\n        1. By adding the disjointness axiom of them into the ontology, $C$ and $D$ are **still satisfiable**.\n        2. $C$ and $D$ **do not have a common descendant** (otherwise $C$ and $D$ can be satisfiable but their\n        common descendants become the bottom $\\bot$.)\n\n    Note that the special case where $C$ and $D$ are already disjoint is covered by the first check.\n    The paper also proposed a practical alternative to decide Assumed Disjointness.\n    See [`check_assumed_disjoint_alternative`][deeponto.onto.OntologyReasoner.check_assumed_disjoint_alternative].\n\n    Examples:\n        Suppose pre-load an ontology `onto` from the disease ontology file `doid.owl`.\n\n        ```python\n        &gt;&gt;&gt; c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n        &gt;&gt;&gt; c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n        &gt;&gt;&gt; onto.reasoner.check_assumed_disjoint(c1, c2)\n        [SUCCESSFULLY] Adding the axiom DisjointClasses(&lt;http://purl.obolibrary.org/obo/DOID_0001816&gt; &lt;http://purl.obolibrary.org/obo/DOID_4058&gt;) into the ontology.\n        [CHECK1 True] input classes are still satisfiable;\n        [SUCCESSFULLY] Removing the axiom from the ontology.\n        [CHECK2 False] input classes have NO common descendant.\n        [PASSED False] assumed disjointness check done.\n        False\n        ```\n    \"\"\"\n    # banner_message(\"Check Asssumed Disjointness\")\n\n    entity_type = self.get_entity_type(owl_class1)\n    assert entity_type == self.get_entity_type(owl_class2)\n\n    # adding the disjointness axiom of `class1`` and `class2``\n    disjoint_axiom = getattr(self.owl_data_factory, f\"getOWLDisjoint{entity_type}Axiom\")([owl_class1, owl_class2])\n    undo_change = self.onto.add_axiom(disjoint_axiom, return_undo=True)\n    self.load_reasoner(self.reasoner_type)\n\n    # check if they are still satisfiable\n    still_satisfiable = self.owl_reasoner.isSatisfiable(owl_class1)\n    still_satisfiable = still_satisfiable and self.owl_reasoner.isSatisfiable(owl_class2)\n    logger.info(f\"[CHECK1 {still_satisfiable}] input classes are still satisfiable;\")\n\n    # remove the axiom and re-construct the reasoner\n    undo_change_result = self.onto.owl_onto.applyChange(undo_change)\n    logger.info(f\"[{str(undo_change_result)}] Removing the axiom from the ontology.\")\n    self.load_reasoner(self.reasoner_type)\n\n    # failing first check, there is no need to do the second.\n    if not still_satisfiable:\n        logger.info(\"Failed `satisfiability check`, skip the `common descendant` check.\")\n        logger.info(f\"[PASSED {still_satisfiable}] assumed disjointness check done.\")\n        return False\n\n    # otherwise, the classes are still satisfiable and we should conduct the second check\n    has_common_descendants = self.check_common_descendants(owl_class1, owl_class2)\n    logger.info(f\"[CHECK2 {not has_common_descendants}] input classes have NO common descendant.\")\n    logger.info(f\"[PASSED {not has_common_descendants}] assumed disjointness check done.\")\n    return not has_common_descendants\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_assumed_disjoint_alternative","title":"<code>check_assumed_disjoint_alternative(owl_class1, owl_class2, verbose=False)</code>","text":"<p>Check if two OWL class expressions satisfy the Assumed Disjointness.</p> <p>Paper</p> <p>The definition of Assumed Disjointness comes from the paper: Language Model Analysis for Ontology Subsumption Inference.</p> <p>The practical alternative version of <code>check_assumed_disjoint</code> with following conditions:</p> <p>Assumed Disjointness (Practical Alternative)</p> <p>Two class expressions \\(C\\) and \\(D\\) are assumed to be disjoint if they</p> <ol> <li>do not have a subsumption relationship between them,</li> <li>do not have a common descendant (in TBox),</li> <li>do not have a common instance (in ABox).</li> </ol> <p>If all the conditions have been met, then we assume <code>class1</code> and <code>class2</code> as disjoint.</p> <p>Examples:</p> <p>Suppose pre-load an ontology <code>onto</code> from the disease ontology file <code>doid.owl</code>.</p> <p><pre><code>&gt;&gt;&gt; c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n&gt;&gt;&gt; c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n&gt;&gt;&gt; onto.reasoner.check_assumed_disjoint(c1, c2, verbose=True)\n[CHECK1 True] input classes have NO subsumption relationship;\n[CHECK2 False] input classes have NO common descendant;\nFailed the `common descendant check`, skip the `common instance` check.\n[PASSED False] assumed disjointness check done.\nFalse\n</code></pre> In this alternative implementation, we do no need to add and remove axioms which will then be time-saving.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_assumed_disjoint_alternative(\n    self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression, verbose: bool = False\n):\nr\"\"\"Check if two OWL class expressions satisfy the Assumed Disjointness.\n\n    !!! credit \"Paper\"\n\n        The definition of **Assumed Disjointness** comes from the paper:\n        [Language Model Analysis for Ontology Subsumption Inference](https://aclanthology.org/2023.findings-acl.213).\n\n    The practical alternative version of [`check_assumed_disjoint`][deeponto.onto.OntologyReasoner.check_assumed_disjoint]\n    with following conditions:\n\n\n    !!! note \"Assumed Disjointness (Practical Alternative)\"\n        Two class expressions $C$ and $D$ are assumed to be disjoint if they\n\n        1. **do not** have a **subsumption relationship** between them,\n        2. **do not** have a **common descendant** (in TBox),\n        3. **do not** have a **common instance** (in ABox).\n\n    If all the conditions have been met, then we assume `class1` and `class2` as disjoint.\n\n    Examples:\n        Suppose pre-load an ontology `onto` from the disease ontology file `doid.owl`.\n\n        ```python\n        &gt;&gt;&gt; c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n        &gt;&gt;&gt; c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n        &gt;&gt;&gt; onto.reasoner.check_assumed_disjoint(c1, c2, verbose=True)\n        [CHECK1 True] input classes have NO subsumption relationship;\n        [CHECK2 False] input classes have NO common descendant;\n        Failed the `common descendant check`, skip the `common instance` check.\n        [PASSED False] assumed disjointness check done.\n        False\n        ```\n        In this alternative implementation, we do no need to add and remove axioms which will then\n        be time-saving.\n    \"\"\"\n    # banner_message(\"Check Asssumed Disjointness (Alternative)\")\n\n    # # Check for entailed disjointness (short-cut)\n    # if self.check_disjoint(owl_class1, owl_class2):\n    #     print(f\"Input classes are already entailed as disjoint.\")\n    #     return True\n\n    # Check for entailed subsumption,\n    # common descendants and common instances\n\n    has_subsumption = self.check_subsumption(owl_class1, owl_class2)\n    has_subsumption = has_subsumption or self.check_subsumption(owl_class2, owl_class1)\n    if verbose:\n        logger.info(f\"[CHECK1 {not has_subsumption}] input classes have NO subsumption relationship;\")\n    if has_subsumption:\n        if verbose:\n            logger.info(\"Failed the `subsumption check`, skip the `common descendant` check.\")\n            logger.info(f\"[PASSED {not has_subsumption}] assumed disjointness check done.\")\n        return False\n\n    has_common_descendants = self.check_common_descendants(owl_class1, owl_class2)\n    if verbose:\n        logger.info(f\"[CHECK2 {not has_common_descendants}] input classes have NO common descendant;\")\n    if has_common_descendants:\n        if verbose:\n            logger.info(\"Failed the `common descendant check`, skip the `common instance` check.\")\n            logger.info(f\"[PASSED {not has_common_descendants}] assumed disjointness check done.\")\n        return False\n\n    # TODO: `check_common_instances` is still experimental because we have not tested it with ontologies of rich ABox.\n    has_common_instances = self.check_common_instances(owl_class1, owl_class2)\n    if verbose:\n        logger.info(f\"[CHECK3 {not has_common_instances}] input classes have NO common instance;\")\n        logger.info(f\"[PASSED {not has_common_instances}] assumed disjointness check done.\")\n    return not has_common_instances\n</code></pre>"},{"location":"deeponto/onto/taxonomy/","title":"Ontology Taxonomy","text":"<p>Extracting the taxonomy from an ontology often comes in handy for graph-based machine learning techniques. Here we provide a basic <code>Taxonomy</code> class built upon <code>networkx.DiGraph</code> where nodes represent entities and edges represent subsumptions. We then provide the <code>OntologyTaxonomy</code> class that extends the basic <code>Taxonomy</code>. It utilises the simple structural reasoner to enrich the ontology subsumptions beyond asserted ones, and build the taxonomy over the expanded subsumptions. Each node represents a named class and has a label (<code>rdfs:label</code>) attribute. The root node <code>owl:Thing</code> is also specified for functions like counting the node depths, etc. Moreover, we provide the <code>WordnetTaxonomy</code> class that wraps the WordNet knowledge graph for easier access.</p> <p>Note<p>It is also possible to use <code>OntologyProjector</code> to extract triples from the ontology as edges of the taxonomy. We will consider this feature in the future.</p> </p>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy","title":"<code>Taxonomy(edges, root_node=None)</code>","text":"<p>Class for building the taxonomy over structured data.</p> <p>Attributes:</p> Name Type Description <code>nodes</code> <code>list</code> <p>A list of entity ids.</p> <code>edges</code> <code>list</code> <p>A list of <code>(parent, child)</code> pairs.</p> <code>graph</code> <code>networkx.DiGraph</code> <p>A directed graph that represents the taxonomy.</p> <code>root_node</code> <code>Optional[str]</code> <p>Optional root node id. Defaults to <code>None</code>.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def __init__(self, edges: list, root_node: Optional[str] = None):\n    self.edges = edges\n    self.graph = nx.DiGraph(self.edges)\n    self.nodes = list(self.graph.nodes)\n    self.root_node = root_node\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_node_attributes","title":"<code>get_node_attributes(entity_id)</code>","text":"<p>Get the attributes of the given entity.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_node_attributes(self, entity_id: str):\n\"\"\"Get the attributes of the given entity.\"\"\"\n    return self.graph.nodes[entity_id]\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_children","title":"<code>get_children(entity_id, apply_transitivity=False)</code>","text":"<p>Get the set of children for a given entity.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_children(self, entity_id: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of children for a given entity.\"\"\"\n    if not apply_transitivity:\n        return set(self.graph.successors(entity_id))\n    else:\n        return set(itertools.chain.from_iterable(nx.dfs_successors(self.graph, entity_id).values()))\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_parents","title":"<code>get_parents(entity_id, apply_transitivity=False)</code>","text":"<p>Get the set of parents for a given entity.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_parents(self, entity_id: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of parents for a given entity.\"\"\"\n    if not apply_transitivity:\n        return set(self.graph.predecessors(entity_id))\n    else:\n        # NOTE: the nx.dfs_predecessors does not give desirable results\n        frontier = list(self.get_parents(entity_id))\n        explored = set()\n        descendants = frontier\n        while frontier:\n            for candidate in frontier:\n                descendants += list(self.get_parents(candidate))\n            explored.update(frontier)\n            frontier = set(descendants) - explored\n        return set(descendants)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_descendant_graph","title":"<code>get_descendant_graph(entity_id)</code>","text":"<p>Create a descendant graph (<code>networkx.DiGraph</code>) for a given entity.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_descendant_graph(self, entity_id: str):\nr\"\"\"Create a descendant graph (`networkx.DiGraph`) for a given entity.\"\"\"\n    descendants = self.get_children(entity_id, apply_transitivity=True)\n    return self.graph.subgraph(list(descendants))\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_shortest_node_depth","title":"<code>get_shortest_node_depth(entity_id)</code>","text":"<p>Get the shortest depth of the given entity in the taxonomy.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_shortest_node_depth(self, entity_id: str):\n\"\"\"Get the shortest depth of the given entity in the taxonomy.\"\"\"\n    if not self.root_node:\n        raise RuntimeError(\"No root node specified.\")\n    return nx.shortest_path_length(self.graph, self.root_node, entity_id)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_longest_node_depth","title":"<code>get_longest_node_depth(entity_id)</code>","text":"<p>Get the longest depth of the given entity in the taxonomy.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_longest_node_depth(self, entity_id: str):\n\"\"\"Get the longest depth of the given entity in the taxonomy.\"\"\"\n    if not self.root_node:\n        raise RuntimeError(\"No root node specified.\")\n    return max([len(p) for p in nx.all_simple_paths(self.graph, self.root_node, entity_id)])\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_lowest_common_ancestor","title":"<code>get_lowest_common_ancestor(entity_id1, entity_id2)</code>","text":"<p>Get the lowest common ancestor of the given two entities.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_lowest_common_ancestor(self, entity_id1: str, entity_id2: str):\n\"\"\"Get the lowest common ancestor of the given two entities.\"\"\"\n    return nx.lowest_common_ancestor(self.graph, entity_id1, entity_id2)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy","title":"<code>OntologyTaxonomy(onto, reasoner_type='struct')</code>","text":"<p>             Bases: <code>Taxonomy</code></p> <p>Class for building the taxonomy (top-down subsumption graph) from an ontology.</p> <p>The nodes of this graph are named classes only, but the hierarchy is enriched (beyond asserted axioms) by an ontology reasoner.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The input ontology to build the taxonomy.</p> <code>reasoner_type</code> <code>str</code> <p>The type of reasoner used. Defaults to <code>\"struct\"</code>. Options are <code>[\"hermit\", \"elk\", \"struct\"]</code>.</p> <code>reasoner</code> <code>OntologyReasoner</code> <p>An ontology reasoner used for completing the hierarchy. If the <code>reasoner_type</code> is the same as <code>onto.reasoner_type</code>, then re-use <code>onto.reasoner</code>; otherwise, create a new one.</p> <code>root_node</code> <code>str</code> <p>The root node that represents <code>owl:Thing</code>.</p> <code>nodes</code> <code>list</code> <p>A list of named class IRIs.</p> <code>edges</code> <code>list</code> <p>A list of <code>(parent, child)</code> class pairs. That is, if \\(C \\sqsubseteq D\\), then \\((D, C)\\) will be added as an edge.</p> <code>graph</code> <code>networkx.DiGraph</code> <p>A directed subsumption graph.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def __init__(self, onto: Ontology, reasoner_type: str = \"struct\"):\n    self.onto = onto\n    # the reasoner is used for completing the hierarchy\n    self.reasoner_type = reasoner_type\n    # re-use onto.reasoner if the reasoner type is the same; otherwise create a new one\n    self.reasoner = (\n        self.onto.reasoner\n        if reasoner_type == self.onto.reasoner_type\n        else OntologyReasoner(self.onto, reasoner_type)\n    )\n    root_node = \"owl:Thing\"\n    subsumption_pairs = []\n    for cl_iri, cl in self.onto.owl_classes.items():\n        # NOTE: this is different from using self.onto.get_asserted_parents which does not conduct simple reasoning\n        named_parents = self.reasoner.get_inferred_super_entities(cl, direct=True)\n        if not named_parents:\n            # if no parents then add root node as the parent\n            named_parents.append(root_node)\n        for named_parent in named_parents:\n            subsumption_pairs.append((named_parent, cl_iri))\n    super().__init__(edges=subsumption_pairs, root_node=root_node)\n\n    # set node annotations (rdfs:label)\n    for class_iri in self.nodes:\n        if class_iri == self.root_node:\n            self.graph.nodes[class_iri][\"label\"] = \"Thing\"\n        else:\n            owl_class = self.onto.get_owl_object(class_iri)\n            self.graph.nodes[class_iri][\"label\"] = self.onto.get_annotations(owl_class, RDFS_LABEL)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_parents","title":"<code>get_parents(class_iri, apply_transitivity=False)</code>","text":"<p>Get the set of parents for a given class.</p> <p>It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning. For more advanced logical reasoning, use the DL reasoner <code>self.onto.reasoner</code> instead.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_parents(self, class_iri: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of parents for a given class.\n\n    It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning.\n    For more advanced logical reasoning, use the DL reasoner `self.onto.reasoner` instead.\n    \"\"\"\n    return super().get_parents(class_iri, apply_transitivity)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_children","title":"<code>get_children(class_iri, apply_transitivity=False)</code>","text":"<p>Get the set of children for a given class.</p> <p>It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning. For more advanced logical reasoning, use the DL reasoner <code>self.onto.reasoner</code> instead.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_children(self, class_iri: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of children for a given class.\n\n    It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning.\n    For more advanced logical reasoning, use the DL reasoner `self.onto.reasoner` instead.\n    \"\"\"\n    return super().get_children(class_iri, apply_transitivity)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_descendant_graph","title":"<code>get_descendant_graph(class_iri)</code>","text":"<p>Create a descendant graph (<code>networkx.DiGraph</code>) for a given ontology class.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_descendant_graph(self, class_iri: str):\nr\"\"\"Create a descendant graph (`networkx.DiGraph`) for a given ontology class.\"\"\"\n    super().get_descendant_graph(class_iri)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_shortest_node_depth","title":"<code>get_shortest_node_depth(class_iri)</code>","text":"<p>Get the shortest depth of the given named class in the taxonomy.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_shortest_node_depth(self, class_iri: str):\n\"\"\"Get the shortest depth of the given named class in the taxonomy.\"\"\"\n    return nx.shortest_path_length(self.graph, self.root_node, class_iri)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_longest_node_depth","title":"<code>get_longest_node_depth(class_iri)</code>","text":"<p>Get the longest depth of the given named class in the taxonomy.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_longest_node_depth(self, class_iri: str):\n\"\"\"Get the longest depth of the given named class in the taxonomy.\"\"\"\n    return max([len(p) for p in nx.all_simple_paths(self.graph, self.root_node, class_iri)])\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_lowest_common_ancestor","title":"<code>get_lowest_common_ancestor(class_iri1, class_iri2)</code>","text":"<p>Get the lowest common ancestor of the given two named classes.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_lowest_common_ancestor(self, class_iri1: str, class_iri2: str):\n\"\"\"Get the lowest common ancestor of the given two named classes.\"\"\"\n    return super().get_lowest_common_ancestor(class_iri1, class_iri2)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy","title":"<code>WordnetTaxonomy(pos='n', include_membership=False)</code>","text":"<p>             Bases: <code>Taxonomy</code></p> <p>Class for the building the taxonomy (hypernym graph) from wordnet.</p> <p>Attributes:</p> Name Type Description <code>pos</code> <code>str</code> <p>The pos-tag of entities to be extracted from wordnet.</p> <code>nodes</code> <code>list</code> <p>A list of entity ids extracted from wordnet.</p> <code>edges</code> <code>list</code> <p>A list of hyponym-hypernym pairs.</p> <code>graph</code> <code>networkx.DiGraph</code> <p>A directed hypernym graph.</p> <p>Parameters:</p> Name Type Description Default <code>pos</code> <code>str</code> <p>The pos-tag of entities to be extracted from wordnet.</p> <code>'n'</code> <code>include_membership</code> <code>bool</code> <p>Whether to include <code>instance_hypernyms</code> or not (e.g., London is an instance of City).  Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def __init__(self, pos: str = \"n\", include_membership: bool = False):\nr\"\"\"Initialise the wordnet taxonomy.\n\n    Args:\n        pos (str): The pos-tag of entities to be extracted from wordnet.\n        include_membership (bool): Whether to include `instance_hypernyms` or not (e.g., London is an instance of City).  Defaults to `False`.\n    \"\"\"\n\n    self.pos = pos\n    synsets = self.fetch_synsets(pos=pos)\n    hypernym_pairs = self.fetch_hypernyms(synsets, include_membership)\n    super().__init__(edges=hypernym_pairs)\n\n    # set node annotations\n    for synset in synsets:\n        try:\n            self.graph.nodes[synset.name()][\"name\"] = synset.name().split(\".\")[0].replace(\"_\", \" \")\n            self.graph.nodes[synset.name()][\"definition\"] = synset.definition()\n        except:\n            continue\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy.fetch_synsets","title":"<code>fetch_synsets(pos='n')</code>  <code>staticmethod</code>","text":"<p>Get synsets of certain pos-tag from wordnet.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>@staticmethod\ndef fetch_synsets(pos: str = \"n\"):\n\"\"\"Get synsets of certain pos-tag from wordnet.\"\"\"\n    words = wn.words()\n    synsets = set()\n    for word in words:\n        synsets.update(wn.synsets(word, pos=pos))\n    logger.info(f'{len(synsets)} synsets (pos=\"{pos}\") fetched.')\n    return synsets\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy.fetch_hypernyms","title":"<code>fetch_hypernyms(synsets, include_membership=False)</code>  <code>staticmethod</code>","text":"<p>Get hypernym-hyponym pairs from a given set of wordnet synsets.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>@staticmethod\ndef fetch_hypernyms(synsets: set, include_membership: bool = False):\n\"\"\"Get hypernym-hyponym pairs from a given set of wordnet synsets.\"\"\"\n    hypernym_hyponym_pairs = []\n    for synset in synsets:\n        for h_synset in synset.hypernyms():\n            hypernym_hyponym_pairs.append((h_synset.name(), synset.name()))\n        if include_membership:\n            for h_synset in synset.instance_hypernyms():\n                hypernym_hyponym_pairs.append((h_synset.name(), synset.name()))\n    logger.info(f\"{len(hypernym_hyponym_pairs)} hypernym-hyponym pairs fetched.\")\n    return hypernym_hyponym_pairs\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler","title":"<code>TaxonomyNegativeSampler(taxonomy, entity_weights=None)</code>","text":"<p>Class for the efficient negative sampling with buffer over the taxonomy.</p> <p>Attributes:</p> Name Type Description <code>taxonomy</code> <code>str</code> <p>The taxonomy for negative sampling.</p> <code>entity_weights</code> <code>Optional[dict]</code> <p>A dictionary with the taxonomy entities as keys and their corresponding weights as values. Defaults to <code>None</code>.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def __init__(self, taxonomy: Taxonomy, entity_weights: Optional[dict] = None):\n    self.taxonomy = taxonomy\n    self.entities = self.taxonomy.nodes\n    # uniform distribution if weights not provided\n    self.entity_weights = entity_weights\n\n    self._entity_probs = None\n    if self.entity_weights:\n        self._entity_probs = np.array([self.entity_weights[e] for e in self.entities])\n        self._entity_probs = self._entity_probs / self._entity_probs.sum()\n    self._buffer = []\n    self._default_buffer_size = 10000\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler.fill","title":"<code>fill(buffer_size=None)</code>","text":"<p>Buffer a large collection of entities sampled with replacement for faster negative sampling.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def fill(self, buffer_size: Optional[int] = None):\n\"\"\"Buffer a large collection of entities sampled with replacement for faster negative sampling.\"\"\"\n    buffer_size = buffer_size if buffer_size else self._default_buffer_size\n    if self._entity_probs:\n        self._buffer = np.random.choice(self.entities, size=buffer_size, p=self._entity_probs)\n    else:\n        self._buffer = np.random.choice(self.entities, size=buffer_size)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler.sample","title":"<code>sample(entity_id, n_samples, buffer_size=None)</code>","text":"<p>Sample N negative samples for a given entity with replacement.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def sample(self, entity_id: str, n_samples: int, buffer_size: Optional[int] = None):\n\"\"\"Sample N negative samples for a given entity with replacement.\"\"\"\n    negative_samples = []\n    positive_samples = self.taxonomy.get_parents(entity_id, True)\n    while len(negative_samples) &lt; n_samples:\n        if len(self._buffer) &lt; n_samples:\n            self.fill(buffer_size)\n        negative_samples += list(filter(lambda x: x not in positive_samples, self._buffer[:n_samples]))\n        self._buffer = self._buffer[n_samples:]  # remove the samples from the buffer\n    return negative_samples[:n_samples]\n</code></pre>"},{"location":"deeponto/onto/verbalisation/","title":"Ontology Verbalisation","text":"<p>Verbalising an ontology into natural language texts is a challenging task. \\(\\textsf{DeepOnto}\\) provides some basic building blocks for achieving this goal. The implemented <code>OntologyVerbaliser</code> is essentially a recursive concept verbaliser that first splits a complex concept \\(C\\) into a sub-formula tree, verbalising the leaf nodes (atomic concepts or object properties) by their names, then merging the verbalised child nodes according to the logical pattern at their parent node. </p> <p>Please cite the following paper if you consider using our verbaliser.</p> <p>Paper</p> <p>The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).</p> <pre><code>@inproceedings{he-etal-2023-language,\n    title = \"Language Model Analysis for Ontology Subsumption Inference\",\n    author = \"He, Yuan  and\n    Chen, Jiaoyan  and\n    Jimenez-Ruiz, Ernesto  and\n    Dong, Hang  and\n    Horrocks, Ian\",\n    booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n    month = jul,\n    year = \"2023\",\n    address = \"Toronto, Canada\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.findings-acl.213\",\n    doi = \"10.18653/v1/2023.findings-acl.213\",\n    pages = \"3439--3453\"\n}\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser","title":"<code>OntologyVerbaliser(onto, apply_lowercasing=False, keep_iri=False, apply_auto_correction=False, add_quantifier_word=False)</code>","text":"<p>A recursive natural language verbaliser for the OWL logical expressions, e.g., <code>OWLAxiom</code> and <code>OWLClassExpression</code>.</p> <p>The concept patterns supported by this verbaliser are shown below:</p> Pattern Verbalisation (\\(\\mathcal{V}\\)) \\(A\\) (atomic) the name (\\(\\texttt{rdfs:label}\\)) of \\(A\\)  (auto-correction is optional) \\(r\\) (property) the name (\\(\\texttt{rdfs:label}\\)) of \\(r\\)  (auto-correction is optional) \\(\\neg C\\) \"not \\(\\mathcal{V}(C)\\)\" \\(\\exists r.C\\) \"something that \\(\\mathcal{V}(r)\\) some \\(\\mathcal{V}(C)\\)\"  (the quantifier word \"some\" is optional) \\(\\forall r.C\\) \"something that \\(\\mathcal{V}(r)\\) only \\(\\mathcal{V}(C)\\)\"  (the quantifier word \"only\" is optional) \\(C_1 \\sqcap ... \\sqcap C_n\\) if \\(C_i = \\exists/\\forall r.D_i\\) and \\(C_j = \\exists/\\forall r.D_j\\), they will be re-written into \\(\\exists/\\forall r.(D_i \\sqcap D_j)\\) before verbalisation; suppose after re-writing the new expression is \\(C_1 \\sqcap ... \\sqcap C_{n'}\\) <p> (a) if all \\(C_i\\)s (for \\(i = 1, ..., n'\\)) are restrictions, in the form of \\(\\exists/\\forall r_i.D_i\\):  \"something that \\(\\mathcal{V}(r_1)\\) some/only \\(V(D_1)\\) and ... and \\(\\mathcal{V}(r_{n'})\\) some/only \\(V(D_{n'})\\)\" (b) if some \\(C_i\\)s (for \\(i = m+1, ..., n'\\)) are restrictions, in the form of \\(\\exists/\\forall r_i.D_i\\):  \"\\(\\mathcal{V}(C_{1})\\) and ... and \\(\\mathcal{V}(C_{m})\\) that \\(\\mathcal{V}(r_{m+1})\\) some/only \\(V(D_{m+1})\\) and ... and \\(\\mathcal{V}(r_{n'})\\) some/only \\(V(D_{n'})\\)\" (c) if no \\(C_i\\) is a restriction:  \"\\(\\mathcal{V}(C_{1})\\) and ... and \\(\\mathcal{V}(C_{n'})\\)\" \\(C_1 \\sqcup ... \\sqcup C_n\\) similar to verbalising \\(C_1 \\sqcap ... \\sqcap C_n\\) except that \"and\" is replaced by \"or\" and case (b) uses the same verbalisation as case (c) \\(r_1 \\cdot r_2\\) (property chain) \\(\\mathcal{V}(r_1)\\) something that \\(\\mathcal{V}(r_2)\\) <p>With this concept verbaliser, a range of OWL axioms are supported:</p> <ul> <li>Class axioms for subsumption, equivalence, assertion.</li> <li>Object property axioms for subsumption, assertion.</li> </ul> <p>The verbaliser operates at the concept level, and an additional template is needed to integrate the verbalised components of an axiom.</p> <p>Warning</p> <p>This verbaliser utilises spacy for POS tagging used in the auto-correction of property names. Automatic download of the rule-based library <code>en_core_web_sm</code> is available at the init function. However, if you somehow cannot find it, please manually download it using <code>python -m spacy download en_core_web_sm</code>.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>An ontology whose entities and axioms are to be verbalised.</p> <code>parser</code> <code>OntologySyntaxParser</code> <p>A syntax parser for the string representation of an <code>OWLObject</code>.</p> <code>vocab</code> <code>dict[str, list[str]]</code> <p>A dictionary with <code>(entity_iri, entity_name)</code> pairs, by default the names are retrieved from \\(\\texttt{rdfs:label}\\).</p> <code>apply_lowercasing</code> <code>bool</code> <p>Whether to apply lowercasing to the entity names. Defaults to <code>False</code>.</p> <code>keep_iri</code> <code>bool</code> <p>Whether to keep the IRIs of entities without verbalising them using <code>self.vocab</code>. Defaults to <code>False</code>.</p> <code>apply_auto_correction</code> <code>bool</code> <p>Whether to automatically apply rule-based auto-correction to entity names. Defaults to <code>False</code>.</p> <code>add_quantifier_word</code> <code>bool</code> <p>Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to <code>False</code>.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>An ontology whose entities and axioms are to be verbalised.</p> required <code>apply_lowercasing</code> <code>bool</code> <p>Whether to apply lowercasing to the entity names. Defaults to <code>False</code>.</p> <code>False</code> <code>keep_iri</code> <code>bool</code> <p>Whether to keep the IRIs of entities without verbalising them using <code>self.vocab</code>. Defaults to <code>False</code>.</p> <code>False</code> <code>apply_auto_correction</code> <code>bool</code> <p>Whether to automatically apply rule-based auto-correction to entity names. Defaults to <code>False</code>.</p> <code>False</code> <code>add_quantifier_word</code> <code>bool</code> <p>Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def __init__(\n    self,\n    onto: Ontology,\n    apply_lowercasing: bool = False,\n    keep_iri: bool = False,\n    apply_auto_correction: bool = False,\n    add_quantifier_word: bool = False,\n):\n\"\"\"Initialise an ontology verbaliser.\n\n    Args:\n        onto (Ontology): An ontology whose entities and axioms are to be verbalised.\n        apply_lowercasing (bool, optional): Whether to apply lowercasing to the entity names. Defaults to `False`.\n        keep_iri (bool, optional): Whether to keep the IRIs of entities without verbalising them using `self.vocab`. Defaults to `False`.\n        apply_auto_correction (bool, optional): Whether to automatically apply rule-based auto-correction to entity names. Defaults to `False`.\n        add_quantifier_word (bool, optional): Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to `False`.\n    \"\"\"\n    self.onto = onto\n    self.parser = OntologySyntaxParser()\n\n    # download en_core_web_sm for object property\n    try:\n        spacy.load(\"en_core_web_sm\")\n    except:\n        print(\"Download `en_core_web_sm` for pos tagger.\")\n        os.system(\"python -m spacy download en_core_web_sm\")\n\n    self.nlp = spacy.load(\"en_core_web_sm\")\n\n    # build the default vocabulary for entities\n    self.apply_lowercasing_to_vocab = apply_lowercasing\n    self.vocab = dict()\n    for entity_type in [\"Classes\", \"ObjectProperties\", \"DataProperties\", \"Individuals\"]:\n        entity_annotations, _ = self.onto.build_annotation_index(\n            entity_type=entity_type, apply_lowercasing=self.apply_lowercasing_to_vocab\n        )\n        self.vocab.update(**entity_annotations)\n    literal_or_iri = lambda k, v: list(v)[0] if v else k  # set vocab to IRI if no string available\n    self.vocab = {k: literal_or_iri(k, v) for k, v in self.vocab.items()}  # only set one name for each entity\n\n    self.keep_iri = keep_iri\n    self.apply_auto_correction = apply_auto_correction\n    self.add_quantifier_word = add_quantifier_word\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.update_entity_name","title":"<code>update_entity_name(entity_iri, entity_name)</code>","text":"<p>Update the name of an entity in <code>self.vocab</code>.</p> <p>If you want to change the name of a specific entity, you should call this function before applying verbalisation.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def update_entity_name(self, entity_iri: str, entity_name: str):\n\"\"\"Update the name of an entity in `self.vocab`.\n\n    If you want to change the name of a specific entity, you should call this\n    function before applying verbalisation.\n    \"\"\"\n    self.vocab[entity_iri] = entity_name\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_expression","title":"<code>verbalise_class_expression(class_expression)</code>","text":"<p>Verbalise a class expression (<code>OWLClassExpression</code>) or its parsed form (in <code>RangeNode</code>).</p> <p>See currently supported types of class (or concept) expressions here.</p> <p>Parameters:</p> Name Type Description Default <code>class_expression</code> <code>Union[OWLClassExpression, str, RangeNode]</code> <p>A class expression to be verbalised.</p> required <p>Raises:</p> Type Description <code>RuntimeError</code> <p>Occurs when the class expression is not in one of the supported types.</p> <p>Returns:</p> Type Description <code>CfgNode</code> <p>A nested dictionary that presents the recursive results of verbalisation. The verbalised string can be accessed with the key <code>[\"verbal\"]</code> or with the attribute <code>.verbal</code>.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_class_expression(self, class_expression: Union[OWLClassExpression, str, RangeNode]):\nr\"\"\"Verbalise a class expression (`OWLClassExpression`) or its parsed form (in `RangeNode`).\n\n    See currently supported types of class (or concept) expressions [here][deeponto.onto.verbalisation.OntologyVerbaliser].\n\n\n    Args:\n        class_expression (Union[OWLClassExpression, str, RangeNode]): A class expression to be verbalised.\n\n    Raises:\n        RuntimeError: Occurs when the class expression is not in one of the supported types.\n\n    Returns:\n        (CfgNode): A nested dictionary that presents the recursive results of verbalisation. The verbalised string\n            can be accessed with the key `[\"verbal\"]` or with the attribute `.verbal`.\n    \"\"\"\n\n    if not isinstance(class_expression, RangeNode):\n        parsed_class_expression = self.parser.parse(class_expression).children[0]  # skip the root node\n    else:\n        parsed_class_expression = class_expression\n\n    # for a singleton IRI\n    if parsed_class_expression.is_iri:\n        return self._verbalise_iri(parsed_class_expression)\n\n    if parsed_class_expression.name.startswith(\"NEG\"):\n        # negation only has one child\n        cl = self.verbalise_class_expression(parsed_class_expression.children[0])\n        return CfgNode({\"verbal\": \"not \" + cl.verbal, \"class\": cl, \"type\": \"NEG\"})\n\n    # for existential and universal restrictions\n    if parsed_class_expression.name.startswith(\"EX.\") or parsed_class_expression.name.startswith(\"ALL\"):\n        return self._verbalise_restriction(parsed_class_expression)\n\n    # for conjunction and disjunction\n    if parsed_class_expression.name.startswith(\"AND\") or parsed_class_expression.name.startswith(\"OR\"):\n        return self._verbalise_junction(parsed_class_expression)\n\n    # for a property chain\n    if parsed_class_expression.name.startswith(\"OPC\"):\n        return self._verbalise_property(parsed_class_expression)\n\n    raise RuntimeError(f\"Input class expression `{str(class_expression)}` is not in one of the supported types.\")\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_subsumption_axiom","title":"<code>verbalise_class_subsumption_axiom(class_subsumption_axiom)</code>","text":"<p>Verbalise a class subsumption axiom.</p> <p>The subsumption axiom can have two forms:</p> <ul> <li>\\(C_{sub} \\sqsubseteq C_{super}\\), the <code>SubClassOf</code> axiom;</li> <li>\\(C_{super} \\sqsupseteq C_{sub}\\), the <code>SuperClassOf</code> axiom.</li> </ul> <p>Parameters:</p> Name Type Description Default <code>class_subsumption_axiom</code> <code>OWLAxiom</code> <p>Then class subsumption axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised sub-concept \\(\\mathcal{V}(C_{sub})\\) and super-concept \\(\\mathcal{V}(C_{super})\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_class_subsumption_axiom(self, class_subsumption_axiom: OWLAxiom):\nr\"\"\"Verbalise a class subsumption axiom.\n\n    The subsumption axiom can have two forms:\n\n    - $C_{sub} \\sqsubseteq C_{super}$, the `SubClassOf` axiom;\n    - $C_{super} \\sqsupseteq C_{sub}$, the `SuperClassOf` axiom.\n\n    Args:\n        class_subsumption_axiom (OWLAxiom): Then class subsumption axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised sub-concept $\\mathcal{V}(C_{sub})$ and super-concept $\\mathcal{V}(C_{super})$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(class_subsumption_axiom, \"SubClassOf\", \"SuperClassOf\")\n\n    parsed_subsumption_axiom = self.parser.parse(class_subsumption_axiom).children[0]  # skip the root node\n    if str(class_subsumption_axiom).startswith(\"SubClassOf\"):\n        parsed_sub_class, parsed_super_class = parsed_subsumption_axiom.children\n    elif str(class_subsumption_axiom).startswith(\"SuperClassOf\"):\n        parsed_super_class, parsed_sub_class = parsed_subsumption_axiom.children\n\n    verbalised_sub_class = self.verbalise_class_expression(parsed_sub_class)\n    verbalised_super_class = self.verbalise_class_expression(parsed_super_class)\n    return verbalised_sub_class, verbalised_super_class\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_equivalence_axiom","title":"<code>verbalise_class_equivalence_axiom(class_equivalence_axiom)</code>","text":"<p>Verbalise a class equivalence axiom.</p> <p>The equivalence axiom has the form \\(C \\equiv D\\).</p> <p>Parameters:</p> Name Type Description Default <code>class_equivalence_axiom</code> <code>OWLAxiom</code> <p>The class equivalence axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised concept \\(\\mathcal{V}(C)\\) and its equivalent concept \\(\\mathcal{V}(D)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_class_equivalence_axiom(self, class_equivalence_axiom: OWLAxiom):\nr\"\"\"Verbalise a class equivalence axiom.\n\n    The equivalence axiom has the form $C \\equiv D$.\n\n    Args:\n        class_equivalence_axiom (OWLAxiom): The class equivalence axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised concept $\\mathcal{V}(C)$ and its equivalent concept $\\mathcal{V}(D)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(class_equivalence_axiom, \"EquivalentClasses\")\n\n    parsed_equivalence_axiom = self.parser.parse(class_equivalence_axiom).children[0]  # skip the root node\n    parsed_class_left, parsed_class_right = parsed_equivalence_axiom.children\n\n    verbalised_left_class = self.verbalise_class_expression(parsed_class_left)\n    verbalised_right_class = self.verbalise_class_expression(parsed_class_right)\n    return verbalised_left_class, verbalised_right_class\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_assertion_axiom","title":"<code>verbalise_class_assertion_axiom(class_assertion_axiom)</code>","text":"<p>Verbalise a class assertion axiom.</p> <p>The class assertion axiom has the form \\(C(x)\\).</p> <p>Parameters:</p> Name Type Description Default <code>class_assertion_axiom</code> <code>OWLAxiom</code> <p>The class assertion axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised class \\(\\mathcal{V}(C)\\) and individual \\(\\mathcal{V}(x)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_class_assertion_axiom(self, class_assertion_axiom: OWLAxiom):\nr\"\"\"Verbalise a class assertion axiom.\n\n    The class assertion axiom has the form $C(x)$.\n\n    Args:\n        class_assertion_axiom (OWLAxiom): The class assertion axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised class $\\mathcal{V}(C)$ and individual $\\mathcal{V}(x)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(class_assertion_axiom, \"ClassAssertion\")\n\n    parsed_equivalence_axiom = self.parser.parse(class_assertion_axiom).children[0]  # skip the root node\n    parsed_class, parsed_individual = parsed_equivalence_axiom.children\n\n    verbalised_class = self.verbalise_class_expression(parsed_class)\n    verbalised_individual = self._verbalise_iri(parsed_individual)\n    return verbalised_class, verbalised_individual\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_subsumption_axiom","title":"<code>verbalise_object_property_subsumption_axiom(object_property_subsumption_axiom)</code>","text":"<p>Verbalise an object property subsumption axiom.</p> <p>The subsumption axiom can have two forms:</p> <ul> <li>\\(r_{sub} \\sqsubseteq r_{super}\\), the <code>SubObjectPropertyOf</code> axiom;</li> <li>\\(r_{super} \\sqsupseteq r_{sub}\\), the <code>SuperObjectPropertyOf</code> axiom.</li> </ul> <p>Parameters:</p> Name Type Description Default <code>object_property_subsumption_axiom</code> <code>OWLAxiom</code> <p>The object property subsumption axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised sub-property \\(\\mathcal{V}(r_{sub})\\) and super-property \\(\\mathcal{V}(r_{super})\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_object_property_subsumption_axiom(self, object_property_subsumption_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property subsumption axiom.\n\n    The subsumption axiom can have two forms:\n\n    - $r_{sub} \\sqsubseteq r_{super}$, the `SubObjectPropertyOf` axiom;\n    - $r_{super} \\sqsupseteq r_{sub}$, the `SuperObjectPropertyOf` axiom.\n\n    Args:\n        object_property_subsumption_axiom (OWLAxiom): The object property subsumption axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised sub-property $\\mathcal{V}(r_{sub})$ and super-property $\\mathcal{V}(r_{super})$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(\n        object_property_subsumption_axiom,\n        \"SubObjectPropertyOf\",\n        \"SuperObjectPropertyOf\",\n        \"SubPropertyChainOf\",\n        \"SuperPropertyChainOf\",\n    )\n\n    parsed_subsumption_axiom = self.parser.parse(object_property_subsumption_axiom).children[\n        0\n    ]  # skip the root node\n    if str(object_property_subsumption_axiom).startswith(\"SubObjectPropertyOf\"):\n        parsed_sub_property, parsed_super_property = parsed_subsumption_axiom.children\n    elif str(object_property_subsumption_axiom).startswith(\"SuperObjectPropertyOf\"):\n        parsed_super_property, parsed_sub_property = parsed_subsumption_axiom.children\n\n    verbalised_sub_property = self._verbalise_property(parsed_sub_property)\n    verbalised_super_property = self._verbalise_property(parsed_super_property)\n    return verbalised_sub_property, verbalised_super_property\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_assertion_axiom","title":"<code>verbalise_object_property_assertion_axiom(object_property_assertion_axiom)</code>","text":"<p>Verbalise an object property assertion axiom.</p> <p>The object property assertion axiom has the form \\(r(x, y)\\).</p> <p>Parameters:</p> Name Type Description Default <code>object_property_assertion_axiom</code> <code>OWLAxiom</code> <p>The object property assertion axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised object property \\(\\mathcal{V}(r)\\) and two individuals \\(\\mathcal{V}(x)\\) and \\(\\mathcal{V}(y)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_object_property_assertion_axiom(self, object_property_assertion_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property assertion axiom.\n\n    The object property assertion axiom has the form $r(x, y)$.\n\n    Args:\n        object_property_assertion_axiom (OWLAxiom): The object property assertion axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and two individuals $\\mathcal{V}(x)$ and $\\mathcal{V}(y)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(object_property_assertion_axiom, \"ObjectPropertyAssertion\")\n\n    # skip the root node\n    parsed_object_property_assertion_axiom = self.parser.parse(object_property_assertion_axiom).children[0]\n    parsed_obj_prop, parsed_indiv_x, parsed_indiv_y = parsed_object_property_assertion_axiom.children\n\n    verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n    verbalised_individual_x = self._verbalise_iri(parsed_indiv_x)\n    verbalised_individual_y = self._verbalise_iri(parsed_indiv_y)\n    return verbalised_object_property, verbalised_individual_x, verbalised_individual_y\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_domain_axiom","title":"<code>verbalise_object_property_domain_axiom(object_property_domain_axiom)</code>","text":"<p>Verbalise an object property domain axiom.</p> <p>The domain of a property \\(r: X \\rightarrow Y\\) specifies the concept expression \\(X\\) of its subject.</p> <p>Parameters:</p> Name Type Description Default <code>object_property_domain_axiom</code> <code>OWLAxiom</code> <p>The object property domain axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised object property \\(\\mathcal{V}(r)\\) and its domain \\(\\mathcal{V}(X)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_object_property_domain_axiom(self, object_property_domain_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property domain axiom.\n\n    The domain of a property $r: X \\rightarrow Y$ specifies the concept expression $X$ of its subject.\n\n    Args:\n        object_property_domain_axiom (OWLAxiom): The object property domain axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and its domain $\\mathcal{V}(X)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(object_property_domain_axiom, \"ObjectPropertyDomain\")\n\n    # skip the root node\n    parsed_object_property_domain_axiom = self.parser.parse(object_property_domain_axiom).children[0]\n    parsed_obj_prop, parsed_obj_prop_domain = parsed_object_property_domain_axiom.children\n\n    verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n    verbalised_object_property_domain = self.verbalise_class_expression(parsed_obj_prop_domain)\n\n    return verbalised_object_property, verbalised_object_property_domain\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_range_axiom","title":"<code>verbalise_object_property_range_axiom(object_property_range_axiom)</code>","text":"<p>Verbalise an object property range axiom.</p> <p>The range of a property \\(r: X \\rightarrow Y\\) specifies the concept expression \\(Y\\) of its object.</p> <p>Parameters:</p> Name Type Description Default <code>object_property_range_axiom</code> <code>OWLAxiom</code> <p>The object property range axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised object property \\(\\mathcal{V}(r)\\) and its range \\(\\mathcal{V}(Y)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_object_property_range_axiom(self, object_property_range_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property range axiom.\n\n    The range of a property $r: X \\rightarrow Y$ specifies the concept expression $Y$ of its object.\n\n    Args:\n        object_property_range_axiom (OWLAxiom): The object property range axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and its range $\\mathcal{V}(Y)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(object_property_range_axiom, \"ObjectPropertyRange\")\n\n    # skip the root node\n    parsed_object_property_range_axiom = self.parser.parse(object_property_range_axiom).children[0]\n    parsed_obj_prop, parsed_obj_prop_range = parsed_object_property_range_axiom.children\n\n    verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n    verbalised_object_property_range = self.verbalise_class_expression(parsed_obj_prop_range)\n\n    return verbalised_object_property, verbalised_object_property_range\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser","title":"<code>OntologySyntaxParser()</code>","text":"<p>A syntax parser for the OWL logical expressions, e.g., <code>OWLAxiom</code> and <code>OWLClassExpression</code>.</p> <p>It makes use of the string representation (based on Manchester Syntax) defined in the OWLAPI. In Python, such string can be accessed by simply using <code>str(some_owl_object)</code>.</p> <p>To keep the Java import in the main <code>Ontology</code> class, this parser does not deal with <code>OWLAxiom</code> directly but instead its string representation.</p> <p>Due to the <code>OWLObject</code> syntax, this parser relies on two components:</p> <ol> <li>Parentheses matching;</li> <li>Tree construction (<code>RangeNode</code>).</li> </ol> <p>As a result, it will return a <code>RangeNode</code> that specifies the sub-formulas (and their respective positions in the string representation) in a tree structure.</p> <p>Examples:</p> <p>Suppose the input is an <code>OWLAxiom</code> that has the string representation:</p> <pre><code>&gt;&gt;&gt; str(owl_axiom)\n&gt;&gt;&gt; 'EquivalentClasses(&lt;http://purl.obolibrary.org/obo/FOODON_00001707&gt; ObjectIntersectionOf(&lt;http://purl.obolibrary.org/obo/FOODON_00002044&gt; ObjectSomeValuesFrom(&lt;http://purl.obolibrary.org/obo/RO_0001000&gt; &lt;http://purl.obolibrary.org/obo/FOODON_03412116&gt;)) )'\n</code></pre> <p>This corresponds to the following logical expression:</p> \\[ CephalopodFoodProduct \\equiv MolluskFoodProduct \\sqcap \\exists derivesFrom.Cephalopod \\] <p>After apply the parser, a <code>RangeNode</code> will be returned which can be rentered as:</p> <pre><code>axiom_parser = OntologySyntaxParser()\nprint(axiom_parser.parse(str(owl_axiom)).render_tree())\n</code></pre> <code>Output:</code> <pre><code>Root@[0:inf]\n\u2514\u2500\u2500 EQV@[0:212]\n    \u251c\u2500\u2500 FOODON_00001707@[6:54]\n    \u2514\u2500\u2500 AND@[55:210]\n        \u251c\u2500\u2500 FOODON_00002044@[61:109]\n        \u2514\u2500\u2500 EX.@[110:209]\n            \u251c\u2500\u2500 RO_0001000@[116:159]\n            \u2514\u2500\u2500 FOODON_03412116@[160:208]\n</code></pre> <p>Or, if <code>graphviz</code> (installed by e.g., <code>sudo apt install graphviz</code>) is available, you can visualise the tree as an image by:</p> <pre><code>axiom_parser.parse(str(owl_axiom)).render_image()\n</code></pre> <p><code>Output:</code></p> <p> </p> <p>The name for each node has the form <code>{node_type}@[{start}:{end}]</code>, which means a node of the type <code>{node_type}</code> is located at the range <code>[{start}:{end}]</code> in the abbreviated expression  (see <code>abbreviate_owl_expression</code> below).</p> <p>The leaf nodes are IRIs and they are represented by the last segment (split by <code>\"/\"</code>) of the whole IRI.</p> <p>Child nodes can be accessed by <code>.children</code>, the string representation of the sub-formula in this node can be accessed by <code>.text</code>. For example:</p> <pre><code>parser.parse(str(owl_axiom)).children[0].children[1].text\n</code></pre> <code>Output:</code> <pre><code>'[AND](&lt;http://purl.obolibrary.org/obo/FOODON_00002044&gt; [EX.](&lt;http://purl.obolibrary.org/obo/RO_0001000&gt; &lt;http://purl.obolibrary.org/obo/FOODON_03412116&gt;))'\n</code></pre> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def __init__(self):\n    pass\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.abbreviate_owl_expression","title":"<code>abbreviate_owl_expression(owl_expression)</code>","text":"<p>Abbreviate the string representations of logical operators to a fixed length (easier for parsing).</p> <p>The abbreviations are specified at <code>deeponto.onto.verbalisation.ABBREVIATION_DICT</code>.</p> <p>Parameters:</p> Name Type Description Default <code>owl_expression</code> <code>str</code> <p>The string representation of an <code>OWLObject</code>.</p> required <p>Returns:</p> Type Description <code>str</code> <p>The modified string representation of this <code>OWLObject</code> where the logical operators are abbreviated.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def abbreviate_owl_expression(self, owl_expression: str):\nr\"\"\"Abbreviate the string representations of logical operators to a\n    fixed length (easier for parsing).\n\n    The abbreviations are specified at `deeponto.onto.verbalisation.ABBREVIATION_DICT`.\n\n    Args:\n        owl_expression (str): The string representation of an `OWLObject`.\n\n    Returns:\n        (str): The modified string representation of this `OWLObject` where the logical operators are abbreviated.\n    \"\"\"\n    for k, v in ABBREVIATION_DICT.items():\n        owl_expression = owl_expression.replace(k, v)\n    return owl_expression\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.parse","title":"<code>parse(owl_expression)</code>","text":"<p>Parse an <code>OWLAxiom</code> into a <code>RangeNode</code>.</p> <p>This is the main entry for using the parser, which relies on the <code>parse_by_parentheses</code> method below.</p> <p>Parameters:</p> Name Type Description Default <code>owl_expression</code> <code>Union[str, OWLObject]</code> <p>The string representation of an <code>OWLObject</code> or the <code>OWLObject</code> itself.</p> required <p>Returns:</p> Type Description <code>RangeNode</code> <p>A parsed syntactic tree given what parentheses to be matched.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def parse(self, owl_expression: Union[str, OWLObject]) -&gt; RangeNode:\nr\"\"\"Parse an `OWLAxiom` into a [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n    This is the main entry for using the parser, which relies on the [`parse_by_parentheses`][deeponto.onto.verbalisation.OntologySyntaxParser.parse_by_parentheses]\n    method below.\n\n    Args:\n        owl_expression (Union[str, OWLObject]): The string representation of an `OWLObject` or the `OWLObject` itself.\n\n    Returns:\n        (RangeNode): A parsed syntactic tree given what parentheses to be matched.\n    \"\"\"\n    if not isinstance(owl_expression, str):\n        owl_expression = str(owl_expression)\n    owl_expression = self.abbreviate_owl_expression(owl_expression)\n    # print(\"To parse the following (transformed) axiom text:\\n\", owl_expression)\n    # parse complex patterns first\n    cur_parsed = self.parse_by_parentheses(owl_expression)\n    # parse the IRI patterns latter\n    return self.parse_by_parentheses(owl_expression, cur_parsed, for_iri=True)\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.parse_by_parentheses","title":"<code>parse_by_parentheses(owl_expression, already_parsed=None, for_iri=False)</code>  <code>classmethod</code>","text":"<p>Parse an <code>OWLAxiom</code> based on parentheses matching into a <code>RangeNode</code>.</p> <p>This function needs to be applied twice to get a fully parsed <code>RangeNode</code> because IRIs have a different parenthesis pattern.</p> <p>Parameters:</p> Name Type Description Default <code>owl_expression</code> <code>str</code> <p>The string representation of an <code>OWLObject</code>.</p> required <code>already_parsed</code> <code>RangeNode</code> <p>A partially parsed <code>RangeNode</code> to continue with. Defaults to <code>None</code>.</p> <code>None</code> <code>for_iri</code> <code>bool</code> <p>Parentheses are by default <code>()</code> but will be changed to <code>&lt;&gt;</code> for IRIs. Defaults to <code>False</code>.</p> <code>False</code> <p>Raises:</p> Type Description <code>RuntimeError</code> <p>Raised when the input axiom text is nor properly formatted.</p> <p>Returns:</p> Type Description <code>RangeNode</code> <p>A parsed syntactic tree given what parentheses to be matched.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>@classmethod\ndef parse_by_parentheses(\n    cls, owl_expression: str, already_parsed: RangeNode = None, for_iri: bool = False\n) -&gt; RangeNode:\nr\"\"\"Parse an `OWLAxiom` based on parentheses matching into a [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n    This function needs to be applied twice to get a fully parsed [`RangeNode`][deeponto.onto.verbalisation.RangeNode] because IRIs have\n    a different parenthesis pattern.\n\n    Args:\n        owl_expression (str): The string representation of an `OWLObject`.\n        already_parsed (RangeNode, optional): A partially parsed [`RangeNode`][deeponto.onto.verbalisation.RangeNode] to continue with. Defaults to `None`.\n        for_iri (bool, optional): Parentheses are by default `()` but will be changed to `&lt;&gt;` for IRIs. Defaults to `False`.\n\n    Raises:\n        RuntimeError: Raised when the input axiom text is nor properly formatted.\n\n    Returns:\n        (RangeNode): A parsed syntactic tree given what parentheses to be matched.\n    \"\"\"\n    if not already_parsed:\n        # a root node that covers the entire sentence\n        parsed = RangeNode(0, math.inf, name=f\"Root\", text=owl_expression, is_iri=False)\n    else:\n        parsed = already_parsed\n    stack = []\n    left_par = \"(\"\n    right_par = \")\"\n    if for_iri:\n        left_par = \"&lt;\"\n        right_par = \"&gt;\"\n\n    for i, c in enumerate(owl_expression):\n        if c == left_par:\n            stack.append(i)\n        if c == right_par:\n            try:\n                start = stack.pop()\n                end = i\n                if not for_iri:\n                    # the first character is actually \"[\"\n                    real_start = start - 5\n                    axiom_type = owl_expression[real_start + 1 : start - 1]\n                    node = RangeNode(\n                        real_start,\n                        end + 1,\n                        name=f\"{axiom_type}\",\n                        text=owl_expression[real_start : end + 1],\n                        is_iri=False,\n                    )\n                    parsed.insert_child(node)\n                else:\n                    # no preceding characters for just atomic class (IRI)\n                    abbr_iri = owl_expression[start : end + 1].split(\"/\")[-1].rstrip(\"&gt;\")\n                    node = RangeNode(\n                        start, end + 1, name=abbr_iri, text=owl_expression[start : end + 1], is_iri=True\n                    )\n                    parsed.insert_child(node)\n            except IndexError:\n                print(\"Too many closing parentheses\")\n\n    if stack:  # check if stack is empty afterwards\n        raise RuntimeError(\"Too many opening parentheses\")\n\n    return parsed\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode","title":"<code>RangeNode(start, end, name=None, **kwargs)</code>","text":"<p>             Bases: <code>NodeMixin</code></p> <p>A tree implementation for ranges (without partial overlap).</p> <ul> <li>Parent node's range fully covers child node's range, e.g., <code>[1, 10]</code> is a parent of <code>[2, 5]</code>.</li> <li>Partial overlap between ranges are not allowed, e.g., <code>[2, 4]</code> and <code>[3, 5]</code> cannot appear in the same <code>RangeNodeTree</code>.</li> <li>Non-overlap ranges are on different branches (irrelevant).</li> <li>Child nodes are ordered according to their relative positions.</li> </ul> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def __init__(self, start, end, name=None, **kwargs):\n    if start &gt;= end:\n        raise RuntimeError(\"invalid start and end positions ...\")\n    self.start = start\n    self.end = end\n    self.name = \"Root\" if not name else name\n    self.name = f\"{self.name}@[{self.start}:{self.end}]\"  # add start and ent to the name\n    for k, v in kwargs.items():\n        setattr(self, k, v)\n    super().__init__()\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.__gt__","title":"<code>__gt__(other)</code>","text":"<p>Compare two ranges if they have a different <code>start</code> and/or a different <code>end</code>.</p> <ul> <li>\\(R_1 \\lt R_2\\): if range \\(R_1\\) is completely contained in range \\(R_2\\), and \\(R_1 \\neq R_2\\).</li> <li>\\(R_1 \\gt R_2\\): if range \\(R_2\\) is completely contained in range \\(R_1\\),  and \\(R_1 \\neq R_2\\).</li> <li><code>\"irrelevant\"</code>: if range \\(R_1\\) and range \\(R_2\\) have no overlap.</li> </ul> <p>Warning</p> <p>Partial overlap is not allowed.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def __gt__(self, other: RangeNode):\nr\"\"\"Compare two ranges if they have a different `start` and/or a different `end`.\n\n    - $R_1 \\lt R_2$: if range $R_1$ is completely contained in range $R_2$, and $R_1 \\neq R_2$.\n    - $R_1 \\gt R_2$: if range $R_2$ is completely contained in range $R_1$,  and $R_1 \\neq R_2$.\n    - `\"irrelevant\"`: if range $R_1$ and range $R_2$ have no overlap.\n\n    !!! warning\n\n        Partial overlap is not allowed.\n    \"\"\"\n    # ranges inside\n    if self.start &lt;= other.start and other.end &lt;= self.end:\n        return True\n\n    # ranges outside\n    if other.start &lt;= self.start and self.end &lt;= other.end:\n        return False\n\n    if other.end &lt; self.start or self.end &lt; other.start:\n        return \"irrelevant\"\n\n    raise RuntimeError(\"Compared ranges have a partial overlap.\")\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.sort_by_start","title":"<code>sort_by_start(nodes)</code>  <code>staticmethod</code>","text":"<p>A sorting function that sorts the nodes by their starting positions.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>@staticmethod\ndef sort_by_start(nodes: List[RangeNode]):\n\"\"\"A sorting function that sorts the nodes by their starting positions.\"\"\"\n    temp = {sib: sib.start for sib in nodes}\n    return list(dict(sorted(temp.items(), key=lambda item: item[1])).keys())\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.insert_child","title":"<code>insert_child(node)</code>","text":"<p>Inserting a child <code>RangeNode</code>.</p> <p>Child nodes have a smaller (inclusive) range, e.g., <code>[2, 5]</code> is a child of <code>[1, 6]</code>.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def insert_child(self, node: RangeNode):\nr\"\"\"Inserting a child [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n    Child nodes have a smaller (inclusive) range, e.g., `[2, 5]` is a child of `[1, 6]`.\n    \"\"\"\n    if node &gt; self:\n        raise RuntimeError(\"invalid child node\")\n    if node.start == self.start and node.end == self.end:\n        # duplicated node\n        return\n    # print(self.children)\n    if self.children:\n        inserted = False\n        for ch in self.children:\n            if (node &lt; ch) is True:\n                # print(\"further down\")\n                ch.insert_child(node)\n                inserted = True\n                break\n            elif (node &gt; ch) is True:\n                # print(\"insert in between\")\n                ch.parent = node\n                # NOTE: should not break here as it could be parent of multiple children !\n                # break\n            # NOTE: the equal case is when two nodes are exactly the same, no operation needed\n        if not inserted:\n            self.children = list(self.children) + [node]\n            self.children = self.sort_by_start(self.children)\n    else:\n        node.parent = self\n        self.children = [node]\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.render_tree","title":"<code>render_tree()</code>","text":"<p>Render the whole tree.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def render_tree(self):\n\"\"\"Render the whole tree.\"\"\"\n    return RenderTree(self)\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.render_image","title":"<code>render_image()</code>","text":"<p>Calling this function will generate a temporary <code>range_node.png</code> file which will be displayed.</p> <p>To make this visualisation work, you need to install <code>graphviz</code> by, e.g.,</p> <pre><code>sudo apt install graphviz\n</code></pre> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def render_image(self):\n\"\"\"Calling this function will generate a temporary `range_node.png` file\n    which will be displayed.\n\n    To make this visualisation work, you need to install `graphviz` by, e.g.,\n\n    ```bash\n    sudo apt install graphviz\n    ```\n    \"\"\"\n    RenderTreeGraph(self).to_picture(\"range_node.png\")\n    return Image(\"range_node.png\")\n</code></pre>"},{"location":"deeponto/utils/data_utils/","title":"Data Utilities","text":""},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.set_seed","title":"<code>set_seed(seed)</code>","text":"<p>Set seed function imported from transformers.</p> Source code in <code>src/deeponto/utils/data_utils.py</code> <pre><code>def set_seed(seed):\n\"\"\"Set seed function imported from transformers.\"\"\"\n    t_set_seed(seed)\n</code></pre>"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.sort_dict_by_values","title":"<code>sort_dict_by_values(dic, desc=True, k=None)</code>","text":"<p>Return a sorted dict by values with first k reserved if provided.</p> Source code in <code>src/deeponto/utils/data_utils.py</code> <pre><code>def sort_dict_by_values(dic: dict, desc: bool = True, k: Optional[int] = None):\n\"\"\"Return a sorted dict by values with first k reserved if provided.\"\"\"\n    sorted_items = list(sorted(dic.items(), key=lambda item: item[1], reverse=desc))\n    return dict(sorted_items[:k])\n</code></pre>"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.uniqify","title":"<code>uniqify(ls)</code>","text":"<p>Return a list of unique elements without messing around the order</p> Source code in <code>src/deeponto/utils/data_utils.py</code> <pre><code>def uniqify(ls):\n\"\"\"Return a list of unique elements without messing around the order\"\"\"\n    non_empty_ls = list(filter(lambda x: x != \"\", ls))\n    return list(dict.fromkeys(non_empty_ls))\n</code></pre>"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.print_dict","title":"<code>print_dict(dic)</code>","text":"<p>Pretty print a dictionary.</p> Source code in <code>src/deeponto/utils/data_utils.py</code> <pre><code>def print_dict(dic: dict):\n\"\"\"Pretty print a dictionary.\"\"\"\n    pretty_print = json.dumps(dic, indent=4, separators=(\",\", \": \"))\n    # print(pretty_print)\n    return pretty_print\n</code></pre>"},{"location":"deeponto/utils/decorators/","title":"Decorators","text":""},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.timer","title":"<code>timer(function)</code>","text":"<p>Print the runtime of the decorated function.</p> Source code in <code>src/deeponto/utils/decorators.py</code> <pre><code>def timer(function):\n\"\"\"Print the runtime of the decorated function.\"\"\"\n\n    @wraps(function)\n    def wrapper_timer(*args, **kwargs):\n        start_time = time.perf_counter()  # 1\n        value = function(*args, **kwargs)\n        end_time = time.perf_counter()  # 2\n        run_time = end_time - start_time  # 3\n        print(f\"Finished {function.__name__!r} in {run_time:.4f} secs.\")\n        return value\n\n    return wrapper_timer\n</code></pre>"},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.debug","title":"<code>debug(function)</code>","text":"<p>Print the function signature and return value.</p> Source code in <code>src/deeponto/utils/decorators.py</code> <pre><code>def debug(function):\n\"\"\"Print the function signature and return value.\"\"\"\n\n    @wraps(function)\n    def wrapper_debug(*args, **kwargs):\n        args_repr = [repr(a) for a in args]\n        kwargs_repr = [f\"{k}={v!r}\" for k, v in kwargs.items()]\n        signature = \", \".join(args_repr + kwargs_repr)\n        print(f\"Calling {function.__name__}({signature})\")\n        value = function(*args, **kwargs)\n        print(f\"{function.__name__!r} returned {value!r}.\")\n        return value\n\n    return wrapper_debug\n</code></pre>"},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.paper","title":"<code>paper(title, link)</code>","text":"<p>Add paper tagger for methods.</p> Source code in <code>src/deeponto/utils/decorators.py</code> <pre><code>def paper(title: str, link: str):\n\"\"\"Add paper tagger for methods.\"\"\"\n    # Define a new decorator, named \"decorator\", to return\n    def decorator(func):\n        # Ensure the decorated function keeps its metadata\n        @wraps(func)\n        def wrapper(*args, **kwargs):\n            # Call the function being decorated and return the result\n            return func(*args, **kwargs)\n\n        wrapper.paper_title = f'This method is associated with tha paper of title: \"{title}\".'\n        wrapper.paper_link = f\"This method is associated with the paper with link: {link}.\"\n        return wrapper\n\n    # Return the new decorator\n    return decorator\n</code></pre>"},{"location":"deeponto/utils/file_utils/","title":"File Utilities","text":""},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.create_path","title":"<code>create_path(path)</code>","text":"<p>Create a path recursively.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def create_path(path: str):\n\"\"\"Create a path recursively.\"\"\"\n    Path(path).mkdir(parents=True, exist_ok=True)\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.save_file","title":"<code>save_file(obj, save_path, sort_keys=False)</code>","text":"<p>Save an object to a certain format.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def save_file(obj, save_path: str, sort_keys: bool = False):\n\"\"\"Save an object to a certain format.\"\"\"\n    if save_path.endswith(\".json\"):\n        with open(save_path, \"w\") as output:\n            json.dump(obj, output, indent=4, separators=(\",\", \": \"), sort_keys=sort_keys)\n    elif save_path.endswith(\".pkl\"):\n        with open(save_path, \"wb\") as output:\n            pickle.dump(obj, output, -1)\n    elif save_path.endswith(\".yaml\"):\n        with open(save_path, \"w\") as output:\n            yaml.dump(obj, output, default_flow_style=False, allow_unicode=True)\n    else:\n        raise RuntimeError(f\"Unsupported saving format: {save_path}\")\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.load_file","title":"<code>load_file(save_path)</code>","text":"<p>Load an object of a certain format.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def load_file(save_path: str):\n\"\"\"Load an object of a certain format.\"\"\"\n    if save_path.endswith(\".json\"):\n        with open(save_path, \"r\") as input:\n            return json.load(input)\n    elif save_path.endswith(\".pkl\"):\n        with open(save_path, \"rb\") as input:\n            return pickle.load(input)\n    elif save_path.endswith(\".yaml\"):\n        with open(save_path, \"r\") as input:\n            return yaml.safe_load(input)\n    else:\n        raise RuntimeError(f\"Unsupported loading format: {save_path}\")\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.copy2","title":"<code>copy2(source, destination)</code>","text":"<p>Copy a file from source to destination.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def copy2(source: str, destination: str):\n\"\"\"Copy a file from source to destination.\"\"\"\n    try:\n        shutil.copy2(source, destination)\n        print(f\"copied successfully FROM {source} TO {destination}\")\n    except shutil.SameFileError:\n        print(f\"same file exists at {destination}\")\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_table","title":"<code>read_table(table_file_path)</code>","text":"<p>Read <code>csv</code> or <code>tsv</code> file as pandas dataframe without treating <code>\"NULL\"</code>, <code>\"null\"</code>, and <code>\"n/a\"</code> as an empty string.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def read_table(table_file_path: str):\nr\"\"\"Read `csv` or `tsv` file as pandas dataframe without treating `\"NULL\"`, `\"null\"`, and `\"n/a\"` as an empty string.\"\"\"\n    # TODO: this might change with the version of pandas\n    na_vals = pd.io.parsers.readers.STR_NA_VALUES.difference({\"NULL\", \"null\", \"n/a\"})\n    sep = \"\\t\" if table_file_path.endswith(\".tsv\") else \",\"\n    return pd.read_csv(table_file_path, sep=sep, na_values=na_vals, keep_default_na=False)\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_jsonl","title":"<code>read_jsonl(file_path)</code>","text":"<p>Read <code>.jsonl</code> file (list of json) introduced in the BLINK project.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def read_jsonl(file_path: str):\n\"\"\"Read `.jsonl` file (list of json) introduced in the BLINK project.\"\"\"\n    results = []\n    key_set = []\n    with open(file_path, \"r\", encoding=\"utf-8-sig\") as f:\n        lines = f.readlines()\n        for line in lines:\n            record = json.loads(line)\n            results.append(record)\n            key_set += list(record.keys())\n    print(f\"all available keys: {set(key_set)}\")\n    return results\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_oaei_mappings","title":"<code>read_oaei_mappings(rdf_file)</code>","text":"<p>To read mapping files in the OAEI rdf format.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def read_oaei_mappings(rdf_file: str):\n\"\"\"To read mapping files in the OAEI rdf format.\"\"\"\n    xml_root = ET.parse(rdf_file).getroot()\n    ref_mappings = []  # where relation is \"=\"\n    ignored_mappings = []  # where relation is \"?\"\n\n    for elem in xml_root.iter():\n        # every Cell contains a mapping of en1 -rel(some value)-&gt; en2\n        if \"Cell\" in elem.tag:\n            en1, en2, rel, measure = None, None, None, None\n            for sub_elem in elem:\n                if \"entity1\" in sub_elem.tag:\n                    en1 = list(sub_elem.attrib.values())[0]\n                elif \"entity2\" in sub_elem.tag:\n                    en2 = list(sub_elem.attrib.values())[0]\n                elif \"relation\" in sub_elem.tag:\n                    rel = sub_elem.text\n                elif \"measure\" in sub_elem.tag:\n                    measure = sub_elem.text\n            row = (en1, en2, measure)\n            # =: equivalent; &gt; superset of; &lt; subset of.\n            if rel == \"=\" or rel == \"&gt;\" or rel == \"&lt;\":\n                # rel.replace(\"&amp;gt;\", \"&gt;\").replace(\"&amp;lt;\", \"&lt;\")\n                ref_mappings.append(row)\n            elif rel == \"?\":\n                ignored_mappings.append(row)\n            else:\n                print(\"Unknown Relation Warning: \", rel)\n\n    print('#Maps (\"=\"):', len(ref_mappings))\n    print('#Maps (\"?\"):', len(ignored_mappings))\n\n    return ref_mappings, ignored_mappings\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.run_jar","title":"<code>run_jar(jar_command, timeout=3600)</code>","text":"<p>Run jar command using subprocess.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def run_jar(jar_command: str, timeout=3600):\n\"\"\"Run jar command using subprocess.\"\"\"\n    print(f\"Run jar command with timeout: {timeout}s.\")\n    proc = subprocess.Popen(jar_command.split(\" \"))\n    try:\n        _, _ = proc.communicate(timeout=timeout)\n    except subprocess.TimeoutExpired:\n        warnings.warn(\"kill the jar process as timed out\")\n        proc.kill()\n        _, _ = proc.communicate()\n</code></pre>"},{"location":"deeponto/utils/logging/","title":"Logging","text":""},{"location":"deeponto/utils/logging/#deeponto.utils.logging.RuntimeFormatter","title":"<code>RuntimeFormatter(*args, **kwargs)</code>","text":"<p>             Bases: <code>logging.Formatter</code></p> <p>Auxiliary class for runtime formatting in the logger.</p> Source code in <code>src/deeponto/utils/logging.py</code> <pre><code>def __init__(self, *args, **kwargs):\n    super().__init__(*args, **kwargs)\n    self.start_time = time.time()\n</code></pre>"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.RuntimeFormatter.formatTime","title":"<code>formatTime(record, datefmt=None)</code>","text":"<p>Record relative runtime in hr:min:sec format\u3002</p> Source code in <code>src/deeponto/utils/logging.py</code> <pre><code>def formatTime(self, record, datefmt=None):\n\"\"\"Record relative runtime in hr:min:sec format\u3002\"\"\"\n    duration = datetime.datetime.utcfromtimestamp(record.created - self.start_time)\n    elapsed = duration.strftime(\"%H:%M:%S\")\n    return \"{}\".format(elapsed)\n</code></pre>"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.create_logger","title":"<code>create_logger(model_name, saved_path)</code>","text":"<p>Create logger for both console info and saved info.</p> <p>The pre-existed log file will be cleared before writing into new messages.</p> Source code in <code>src/deeponto/utils/logging.py</code> <pre><code>def create_logger(model_name: str, saved_path: str):\n\"\"\"Create logger for both console info and saved info.\n\n    The pre-existed log file will be cleared before writing into new messages.\n    \"\"\"\n    logger = logging.getLogger(model_name)\n    logger.setLevel(logging.DEBUG)\n    # create file handler which logs even debug messages\n    fh = logging.FileHandler(f\"{saved_path}/{model_name}.log\", mode=\"w\")  # \"w\" means clear the log file before writing\n    fh.setLevel(logging.DEBUG)\n    # create console handler with a higher log level\n    ch = logging.StreamHandler()\n    ch.setLevel(logging.INFO)\n    # create formatter and add it to the handlers\n    formatter = RuntimeFormatter(\"[Time: %(asctime)s] - [PID: %(process)d] - [Model: %(name)s] \\n%(message)s\")\n    fh.setFormatter(formatter)\n    ch.setFormatter(formatter)\n    # add the handlers to the logger\n    logger.addHandler(fh)\n    logger.addHandler(ch)\n    logger.propagate = False\n    return logger\n</code></pre>"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.banner_message","title":"<code>banner_message(message, sym='^')</code>","text":"<p>Print a banner message surrounded by special symbols.</p> Source code in <code>src/deeponto/utils/logging.py</code> <pre><code>def banner_message(message: str, sym=\"^\"):\n\"\"\"Print a banner message surrounded by special symbols.\"\"\"\n    print()\n    message = message.upper()\n    banner_len = len(message) + 4\n    message = \" \" * ((banner_len - len(message)) // 2) + message\n    message = message + \" \" * (banner_len - len(message))\n    print(message)\n    print(sym * banner_len)\n    print()\n</code></pre>"},{"location":"deeponto/utils/text_utils/","title":"Text Utilities","text":""},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer","title":"<code>Tokenizer(tokenizer_type)</code>","text":"<p>A Tokenizer class for both sub-word (pre-trained) and word (rule-based) level tokenization.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def __init__(self, tokenizer_type: str):\n    self.type = tokenizer_type\n    self._tokenizer = None  # hidden tokenizer\n    self.tokenize = None  # the tokenization method\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer.from_pretrained","title":"<code>from_pretrained(pretrained_path='bert-base-uncased')</code>  <code>classmethod</code>","text":"<p>(Based on transformers) Load a sub-word level tokenizer from pre-trained model.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>@classmethod\ndef from_pretrained(cls, pretrained_path: str = \"bert-base-uncased\"):\n\"\"\"(Based on **transformers**) Load a sub-word level tokenizer from pre-trained model.\"\"\"\n    instance = cls(\"pre-trained\")\n    instance._tokenizer = AutoTokenizer.from_pretrained(pretrained_path)\n    instance.tokenize = instance._tokenizer.tokenize\n    return instance\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer.from_rule_based","title":"<code>from_rule_based()</code>  <code>classmethod</code>","text":"<p>(Based on spacy) Load a word-level (rule-based) tokenizer.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>@classmethod\ndef from_rule_based(cls):\n\"\"\"(Based on **spacy**) Load a word-level (rule-based) tokenizer.\"\"\"\n    spacy.prefer_gpu()\n    instance = cls(\"rule-based\")\n    instance._tokenizer = English()\n    instance.tokenize = lambda texts: [word.text for word in instance._tokenizer(texts).doc]\n    return instance\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.InvertedIndex","title":"<code>InvertedIndex(index, tokenizer)</code>","text":"<p>Inverted index built from a text index.</p> <p>Attributes:</p> Name Type Description <code>tokenizer</code> <code>Tokenizer</code> <p>A tokenizer instance to be used.</p> <code>original_index</code> <code>defaultdict</code> <p>A dictionary where the values are text strings to be tokenized.</p> <code>constructed_index</code> <code>defaultdict</code> <p>A dictionary that acts as the inverted index of <code>original_index</code>.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def __init__(self, index: defaultdict, tokenizer: Tokenizer):\n    self.tokenizer = tokenizer\n    self.original_index = index\n    self.constructed_index = defaultdict(list)\n    for k, v in self.original_index.items():\n        # value is a list of strings\n        for token in self.tokenizer(v):\n            self.constructed_index[token].append(k)\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.InvertedIndex.idf_select","title":"<code>idf_select(texts, pool_size=200)</code>","text":"<p>Given a list of tokens, select a set candidates based on the inverted document frequency (idf) scores.</p> <p>We use <code>idf</code> instead of  <code>tf</code> because labels have different lengths and thus tf is not a fair measure.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def idf_select(self, texts: Union[str, List[str]], pool_size: int = 200):\n\"\"\"Given a list of tokens, select a set candidates based on the inverted document frequency (idf) scores.\n\n    We use `idf` instead of  `tf` because labels have different lengths and thus tf is not a fair measure.\n    \"\"\"\n    candidate_pool = defaultdict(lambda: 0)\n    # D := number of \"documents\", i.e., number of \"keys\" in the original index\n    D = len(self.original_index)\n    for token in self.tokenizer(texts):\n        # each token is associated with some classes\n        potential_candidates = self.constructed_index[token]\n        if not potential_candidates:\n            continue\n        # We use idf instead of tf because the text for each class is of different length, tf is not a fair measure\n        # inverse document frequency: with more classes to have the current token tk, the score decreases\n        idf = math.log10(D / len(potential_candidates))\n        for candidate in potential_candidates:\n            # each candidate class is scored by sum(idf)\n            candidate_pool[candidate] += idf\n    candidate_pool = list(sorted(candidate_pool.items(), key=lambda item: item[1], reverse=True))\n    # print(f\"Select {min(len(candidate_pool), pool_size)} candidates.\")\n    # select the first K ranked\n    return candidate_pool[:pool_size]\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.process_annotation_literal","title":"<code>process_annotation_literal(annotation_literal, apply_lowercasing=False, normalise_identifiers=False)</code>","text":"<p>Pre-process an annotation literal string.</p> <p>Parameters:</p> Name Type Description Default <code>annotation_literal</code> <code>str</code> <p>A literal string of an entity's annotation.</p> required <code>apply_lowercasing</code> <code>bool</code> <p>A boolean that determines lowercasing or not. Defaults to <code>False</code>.</p> <code>False</code> <code>normalise_identifiers</code> <code>bool</code> <p>Whether to normalise annotation text that is in the Java identifier format. Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>str</code> <p>the processed annotation literal string.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def process_annotation_literal(\n    annotation_literal: str, apply_lowercasing: bool = False, normalise_identifiers: bool = False\n):\n\"\"\"Pre-process an annotation literal string.\n\n    Args:\n        annotation_literal (str): A literal string of an entity's annotation.\n        apply_lowercasing (bool): A boolean that determines lowercasing or not. Defaults to `False`.\n        normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format. Defaults to `False`.\n\n    Returns:\n        (str): the processed annotation literal string.\n    \"\"\"\n\n    # replace the underscores with spaces\n    annotation_literal = annotation_literal.replace(\"_\", \" \")\n\n    # if the annotation literal is a valid identifier with first letter capitalised\n    # we suspect that it could be a Java style identifier that needs to be split\n    if normalise_identifiers and annotation_literal[0].isupper() and annotation_literal.isidentifier():\n        annotation_literal = split_java_identifier(annotation_literal)\n\n    # lowercase the annotation literal if specfied\n    if apply_lowercasing:\n        annotation_literal = annotation_literal.lower()\n\n    return annotation_literal\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.split_java_identifier","title":"<code>split_java_identifier(java_style_identifier)</code>","text":"<p>Split words in java's identifier style into natural language phrase.</p> <p>Examples:</p> <ul> <li><code>\"SuperNaturalPower\"</code> \\(\\rightarrow\\) <code>\"Super Natural Power\"</code></li> <li><code>\"APIReference\"</code> \\(\\rightarrow\\) <code>\"API Reference\"</code></li> <li><code>\"Covid19\"</code> \\(\\rightarrow\\) <code>\"Covid 19\"</code></li> </ul> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def split_java_identifier(java_style_identifier: str):\nr\"\"\"Split words in java's identifier style into natural language phrase.\n\n    Examples:\n        - `\"SuperNaturalPower\"` $\\rightarrow$ `\"Super Natural Power\"`\n        - `\"APIReference\"` $\\rightarrow$ `\"API Reference\"`\n        - `\"Covid19\"` $\\rightarrow$ `\"Covid 19\"`\n    \"\"\"\n    # split at every capital letter or number (numbers are treated as capital letters)\n    raw_words = re.findall(\"([0-9A-Z][a-z]*)\", java_style_identifier)\n    words = []\n    capitalized_word = \"\"\n    for i, w in enumerate(raw_words):\n        # the above regex pattern will split at capitals\n        # so the capitalized words are split into characters\n        # i.e., (len(w) == 1)\n        if len(w) == 1:\n            capitalized_word += w\n            # edge case for the last word\n            if i == len(raw_words) - 1:\n                words.append(capitalized_word)\n\n        # if the the current w is a full word, save the previous\n        # cached capitalized_word and also save current full word\n        elif capitalized_word:\n            words.append(capitalized_word)\n            words.append(w)\n            capitalized_word = \"\"\n\n        # just save the current full word otherwise\n        else:\n            words.append(w)\n\n    return \" \".join(words)\n</code></pre>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"DeepOnto","text":"<p>   A package for ontology engineering with deep learning.  </p> <p>News </p> <ul> <li> Layout re-organisation and amend taxonomy features; integrate ICON into DeepOnto. (unreleased)</li> <li> Deploy <code>deeponto.onto.taxonomy</code>; add the structural reasoner type. (v0.8.8)</li> <li> Deploy various new ontology processing functions especially for reasoning and verbalisation; update OAEI utitlities for evaluation. (v0.8.7)</li> <li> Minor modifications of certain methods and set all utility methods to direct import. (v0.8.5)</li> <li> Deploy OAEI utilities at <code>deeponto.align.oaei</code> for scripts at the sub-repository OAEI-Bio-ML as well as bug fixing. (v0.8.4)</li> <li> Bug fixing for BERTMap (stuck at reasoning) and ontology alignment evaluation. (v0.8.3)</li> <li> Deploy <code>deeponto.onto.OntologyNormaliser</code> and <code>deeponto.onto.OntologyProjector</code> (v0.8.0).</li> <li> Upload Java dependencies directly and remove mowl from pip dependencies (v0.7.5).</li> <li> Deploy the <code>deeponto.subs.bertsubs</code> and <code>deeponto.onto.pruning</code> modules (v0.7.0).</li> <li> Deploy the <code>deeponto.probe.ontolama</code> and <code>deeponto.onto.verbalisation</code> modules (v0.6.0). </li> <li> Rebuild the whole package based on the OWLAPI; remove owlready2 from the essential dependencies (from v0.5.x). </li> </ul> <p>Check the complete changelog and FAQs. The FAQs page does not contain much information now but will be updated according to feedback.</p>"},{"location":"#about","title":"About","text":"<p>\\(\\textsf{DeepOnto}\\) aims to provide building blocks for implementing deep learning models, constructing resources, and conducting evaluation for various ontology engineering purposes.</p> <ul> <li>Documentation: https://krr-oxford.github.io/DeepOnto/.</li> <li>Github Repository: https://github.com/KRR-Oxford/DeepOnto. </li> <li>PyPI: https://pypi.org/project/deeponto/. </li> </ul>"},{"location":"#installation","title":"Installation","text":""},{"location":"#owlapi","title":"OWLAPI","text":"<p>\\(\\textsf{DeepOnto}\\) relies on OWLAPI version 4 (written in Java) for ontologies. </p> <p>We follow what has been implemented in mOWL that uses JPype to bridge Python and Java Virtual Machine (JVM). Please check JPype's installation page for successful JVM initialisation.</p>"},{"location":"#pytorch","title":"Pytorch","text":"<p>\\(\\textsf{DeepOnto}\\) relies on Pytorch for deep learning framework.</p> <p>We recommend installing Pytorch prior to installing DeepOnto following the commands listed on the Pytorch webpage. Notice that users can choose either GPU (with CUDA) or CPU version of Pytorch.</p> <p>In case the most recent Pytorch version causes any incompatibility issues, use the following command (with <code>CUDA 11.6</code>) known to work:</p> <pre><code>pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116\n</code></pre> <p>Basic usage of DeepOnto does not rely on GPUs, but for efficient deep learning model training, please make sure <code>torch.cuda.is_available()</code> returns <code>True</code>.</p>"},{"location":"#install-from-pypi","title":"Install from PyPI","text":"<p>Other dependencies are specified in <code>setup.cfg</code> and <code>requirements.txt</code> which are supposed to be installed along with <code>deeponto</code>.</p> <pre><code># requiring Python&gt;=3.8\npip install deeponto\n</code></pre>"},{"location":"#install-from-git-repository","title":"Install from Git Repository","text":"<p>To install the latest, probably unreleased version of deeponto, you can directly install from the repository. </p> <pre><code>pip install git+https://github.com/KRR-Oxford/DeepOnto.git\n</code></pre>"},{"location":"#main-features","title":"Main Features","text":"<p> <p>Figure: Illustration of DeepOnto's architecture.</p> </p>"},{"location":"#ontology-processing","title":"Ontology Processing","text":"<p>The base class of \\(\\textsf{DeepOnto}\\) is <code>Ontology</code>, which serves as the main entry point for introducing the OWLAPI's features, such as accessing ontology entities, querying for ancestor/descendent (and parent/child) concepts, deleting entities, modifying axioms, and retrieving annotations. See quick usage at load an ontology. Along with these basic functionalities, several essential sub-modules are built to enhance the core module, including the following:</p> <ul> <li> <p>Ontology Reasoning (<code>OntologyReasoner</code>): Each instance of \\(\\textsf{DeepOnto}\\) has a reasoner as its attribute. It is used for conducting reasoning activities, such as obtaining inferred subsumers and subsumees, as well as checking entailment and consistency. </p> </li> <li> <p>Ontology Pruning (<code>OntologyPruner</code>): This sub-module aims to incorporate pruning algorithms for extracting a sub-ontology from an input ontology. We currently implement the one proposed in [2], which introduces subsumption axioms between the asserted (atomic or complex) parents and children of the class targeted for removal.</p> </li> <li> <p>Ontology Verbalisation (<code>OntologyVerbaliser</code>): The recursive concept verbaliser proposed in [4] is implemented here, which can automatically transform a complex logical expression into a textual sentence based on entity names or labels available in the ontology. See verbalising ontology concepts.</p> </li> <li> <p>Ontology Projection (<code>OntologyProjector</code>): The projection algorithm adopted in the OWL2Vec* ontology embeddings is implemented here, which is to transform an ontology's TBox into a set of RDF triples. The relevant code is modified from the mOWL library.</p> </li> <li> <p>Ontology Normalisation (<code>OntologyNormaliser</code>): The implemented \\(\\mathcal{EL}\\) normalisation is also modified from the mOWL library, which is used to transform TBox axioms into normalised forms to support, e.g., geometric ontology embeddings.</p> </li> <li> <p>Ontology Taxonomy (<code>OntologyTaxonomy</code>): The taxonomy extracted from an ontology is a directed acyclic graph for the subsumption hierarchy, which is often used to support graph-based deep learning applications.</p> </li> </ul>"},{"location":"#tools-and-resources","title":"Tools and Resources","text":"<p>Individual tools and resources are implemented based on the core ontology processing module. Currently, \\(\\textsf{DeepOnto}\\) supports the following:</p> <ul> <li> <p>BERTMap [1] is a BERT-based ontology matching (OM) system originally developed in repo but is now maintained in \\(\\textsf{DeepOnto}\\). See Ontology Matching with BERTMap &amp; BERTMapLt.</p> </li> <li> <p>Bio-ML [2] is an OM resource that has been used in the Bio-ML track of the OAEI. See Bio-ML: A Comprehensive Documentation. </p> </li> <li> <p>BERTSubs [3] is a system for ontology subsumption prediction. We have transformed its original experimental code into this project. See Subsumption Inference with BERTSubs.</p> </li> <li> <p>OntoLAMA [4] is an evaluation of language model probing datasets for ontology subsumption inference. See OntoLAMA: Dataset Overview &amp; Usage Guide for the use of the datasets and the prompt-based probing approach.</p> </li> </ul>"},{"location":"#license","title":"License","text":"<p>License</p> <p>Copyright 2021-2023 Yuan He. Copyright 2023 Yuan He, Jiaoyan Chen. All rights reserved.</p> <p>Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0</p> <p>Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.</p>"},{"location":"#citation","title":"Citation","text":"<p>The preprint of our system paper for \\(\\textsf{DeepOnto}\\) is currently available at arxiv.</p> <p>Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, and Brahmananda Sapkota. DeepOnto: A Python Package for Ontology Engineering with Deep Learning. arXiv preprint arXiv:2307.03067 (2023).</p> <pre><code>@article{he2023deeponto,\n  title={DeepOnto: A Python Package for Ontology Engineering with Deep Learning},\n  author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian and Allocca, Carlo and Kim, Taehun and Sapkota, Brahmananda},\n  journal={arXiv preprint arXiv:2307.03067},\n  year={2023}\n}\n</code></pre>"},{"location":"#relevant-publications","title":"Relevant Publications","text":"<ul> <li>[1] Yuan He\u201a Jiaoyan Chen\u201a Denvar Antonyrajah and Ian Horrocks. BERTMap: A BERT\u2212Based Ontology Alignment System. In Proceedings of 36th AAAI Conference on Artificial Intelligence (AAAI-2022). /arxiv/ /aaai/  </li> <li>[2] Yuan He\u201a Jiaoyan Chen\u201a Hang Dong, Ernesto Jim\u00e9nez-Ruiz, Ali Hadian and Ian Horrocks. Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching. The 21st International Semantic Web Conference (ISWC-2022, Best Resource Paper Candidate). /arxiv/ /iswc/  </li> <li>[3] Jiaoyan Chen, Yuan He, Yuxia Geng, Ernesto Jim\u00e9nez-Ruiz, Hang Dong and Ian Horrocks. Contextual Semantic Embeddings for Ontology Subsumption Prediction. World Wide Web Journal \uff08WWWJ-2023). /arxiv/ /wwwj/  </li> <li>[4] Yuan He\u201a Jiaoyan Chen, Ernesto Jim\u00e9nez-Ruiz, Hang Dong and Ian Horrocks. Language Model Analysis for Ontology Subsumption Inference. Findings of the Association for Computational Linguistics (ACL-2023). /arxiv/ /acl/ </li> <li>[5] Yuan He, Jiaoyan Chen, Hang Dong, and Ian Horrocks. Exploring Large Language Models for Ontology Alignment. ISWC 2023 Posters and Demos: 22nd International Semantic Web Conference. /arxiv/ /iswc/ </li> </ul> <p>Please report any bugs or queries by raising a GitHub issue or sending emails to the maintainers (Yuan He or Jiaoyan Chen) through:</p> <p>first_name.last_name@cs.ox.ac.uk</p>"},{"location":"bertmap/","title":"Ontology Matching with BERTMap and BERTMapLt","text":"<p>Paper</p> <p>Paper for BERTMap: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).</p> <pre><code>@inproceedings{he2022bertmap,\n  title={BERTMap: a BERT-based ontology alignment system},\n  author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},\n  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n  volume={36},\n  number={5},\n  pages={5684--5691},\n  year={2022}\n}\n</code></pre> <p>This page gives the tutorial for \\(\\textsf{BERTMap}\\) family including the summary of the models and how to use them.</p> <p> <p>Figure 1. Pipeline illustration of BERTMap.</p> </p> <p> The ontology matching (OM) pipeline of \\(\\textsf{BERTMap}\\) consists of following steps:</p> <ol> <li>Load the source and target ontologies and build annotation indices from them based on selected annotation properties.</li> <li>Construct the text semantics corpora including intra-ontology (from input ontologies), cross-ontology (optional, from input mappings), and auxiliary (optional, from auxiliary ontologies) sub-corpora. </li> <li>Split samples in the form of <code>(src_annotation, tgt_annotation, synonym_label)</code> into training and validation sets.</li> <li>Fine-tune a BERT synonym classifier on the samples and obtain the best checkpoint on the validation split.</li> <li> <p>Predict mappings for each class \\(c\\) of the source ontology \\(\\mathcal{O}\\) by:</p> <ul> <li>Selecting plausible candidates \\(c'\\)s in the target ontology \\(\\mathcal{O'}\\) based on idf scores w.r.t. the sub-word inverted index built from the target ontology annotation index. For \\(c\\) and a candidate \\(c'\\), first check if they can be string-matched (i.e., share a common annotation, or equivalently the maximum edit similarity score is \\(1.0\\)); if not, consider all combinations (cartesian product) of their respective class annotations, compute a synonym score for each combination, and take the average of synonym scores as the mapping score.</li> <li>\\(N\\) best scored mappings (no filtering) will be preserved as raw predictions which should have relatively higher recall and lower precision.</li> </ul> </li> <li> <p>Extend the raw predictions using an iterative algorithm based on the locality principle. To be specific, if \\(c\\) and \\(c'\\) are matched with a relatively high mapping score (\\(\\geq \\kappa\\)), then search for plausible mappings between the parents (resp. children) of \\(c\\) and the parents (resp. children) of \\(c'\\). This process is iterative because there would be new highly scored mappings at each round. Terminate mapping extension when there is no new mapping with score \\(\\geq \\kappa\\) found or it exceeds the maximum number of iterations. Note that \\(\\kappa\\) is set to \\(0.9\\) by default, as in the original paper.</p> </li> <li> <p>Truncate the extended mappings by preserving only those with scores \\(\\geq \\lambda\\). In the original paper, \\(\\lambda\\) is supposed to be tuned on validation mappings \u2013 which are often not available. Also, \\(\\lambda\\) is not a sensitive hyperparameter in practice. Therefore, we manually set \\(\\lambda\\) to \\(0.9995\\) as a default value which usually yields a higher F1 score. Note that both \\(\\kappa\\) and \\(\\lambda\\) are made available in the configuration file.</p> </li> <li> <p>Repair the rest of the mappings with the repair module built in LogMap (BERTMap does not focus on mapping repair). In short, a minimum set of inconsistent mappings will be removed (further improve precision).</p> </li> </ol> <p>Steps 5-8 are referred to as the global matching process which computes OM mappings from two input ontologies. \\(\\textsf{BERTMapLt}\\) is the light-weight version without BERT training and mapping refinement. The mapping filtering threshold for \\(\\textsf{BERTMapLt}\\) is \\(1.0\\) (i.e., string-matched). </p> <p>In addition to the traditional OM procedure, the scoring modules of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) can be used to evaluate any class pair given their selected annotations. This is useful in ranking-based evaluation. </p> <p>Warning</p> <p>The \\(\\textsf{BERTMap}\\) family rely on sufficient class annotations for constructing training corpora of the BERT synonym classifier, especially under the unsupervised setting where there are no input mappings and/or external resources. It is very important to specify correct annotation properties in the configuration file.</p>"},{"location":"bertmap/#usage","title":"Usage","text":"<p>To use \\(\\textsf{BERTMap}\\), a configuration file and two input ontologies to be matched should be imported.</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.bertmap import BERTMapPipeline\n\nconfig_file = \"path_to_config.yaml\"\nsrc_onto_file = \"path_to_the_source_ontology.owl\"  \ntgt_onto_file = \"path_to_the_target_ontology.owl\" \n\nconfig = BERTMapPipeline.load_bertmap_config(config_file)\nsrc_onto = Ontology(src_onto_file)\ntgt_onto = Ontology(tgt_onto_file)\n\nBERTMapPipeline(src_onto, tgt_onto, config)\n</code></pre> <p>The default configuration file can be loaded as:</p> <pre><code>from deeponto.align.bertmap import BERTMapPipeline, DEFAULT_CONFIG_FILE\n\nconfig = BERTMapPipeline.load_bertmap_config(DEFAULT_CONFIG_FILE)\n</code></pre> <p>The loaded configuration is a <code>CfgNode</code> object supporting attribute access of dictionary keys.  To customise the configuration, users can either copy the <code>DEFAULT_CONFIG_FILE</code>, save it locally using <code>BERTMapPipeline.save_bertmap_config</code> method, and modify it accordingly; or change it in the run time.</p> <pre><code>from deeponto.align.bertmap import BERTMapPipeline, DEFAULT_CONFIG_FILE\n\nconfig = BERTMapPipeline.load_bertmap_config(DEFAULT_CONFIG_FILE)\n\n# save the configuration file\nBERTMapPipeline.save_bertmap_config(config, \"path_to_saved_config.yaml\")\n\n# modify it in the run time\n# for example, add more annotation properties for synonyms\nconfig.annotation_property_iris.append(\"http://...\") \n</code></pre> <p>If using \\(\\textsf{BERTMap}\\) for scoring class pairs instead of global matching, disable automatic global matching and load class pairs to be scored.</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.bertmap import BERTMapPipeline\n\nconfig_file = \"path_to_config.yaml\"\nsrc_onto_file = \"path_to_the_source_ontology.owl\"  \ntgt_onto_file = \"path_to_the_target_ontology.owl\" \n\nconfig = BERTMapPipeline.load_bertmap_config(config_file)\nconfig.global_matching.enabled = False\nsrc_onto = Ontology(src_onto_file)\ntgt_onto = Ontology(tgt_onto_file)\n\nbertmap = BERTMapPipeline(src_onto, tgt_onto, config)\n\nclass_pairs_to_be_scored = [...]  # (src_class_iri, tgt_class_iri)\nfor src_class_iri, tgt_class_iri in class_pairs_to_be_scored:\n    # retrieve class annotations\n    src_class_annotations = bertmap.src_annotation_index[src_class_iri]\n    tgt_class_annotations = bertmap.tgt_annotation_index[tgt_class_iri]\n    # the bertmap score\n    bertmap_score = bertmap.mapping_predictor.bert_mapping_score(\n        src_class_annotations, tgt_class_annotations\n    )\n    # the bertmaplt score\n    bertmaplt_score = bertmap.mapping_predictor.edit_similarity_mapping_score(\n        src_class_annotations, tgt_class_annotations\n    )\n    ...\n</code></pre> <p>Tip</p> <p>The implemented \\(\\textsf{BERTMap}\\) by default searches for each source ontology class a set of possible matched target ontology classes. Because of this, it is recommended to set the source ontology as the one with a smaller number of classes for efficiency.</p> <p>Note that in the original paper, the model is expected to match for both directions <code>src2tgt</code> and <code>tgt2src</code>, and also consider the combination of both results. However, this does not usually bring better performance and consumes significantly more time. Therefore, this feature is discarded and the users can choose which direction to match.</p> <p>Warning</p> <p>Occasionally, the fine-tuning loss may not be converging and the validation accuracy is not improving; in that case, set to a different random seed can usually fix the problem. </p>"},{"location":"bertmap/#configuration","title":"Configuration","text":"<p>The default configuration file looks like:</p> <pre><code>model: bertmap  # bertmap or bertmaplt\n\noutput_path: null  # if not provided, the current path \".\" is used\n\nannotation_property_iris:\n- http://www.w3.org/2000/01/rdf-schema#label  # rdfs:label\n- http://www.geneontology.org/formats/oboInOwl#hasSynonym\n- http://www.geneontology.org/formats/oboInOwl#hasExactSynonym\n- http://www.w3.org/2004/02/skos/core#exactMatch\n- http://www.ebi.ac.uk/efo/alternative_term\n- http://www.orpha.net/ORDO/Orphanet_#symbol\n- http://purl.org/sig/ont/fma/synonym\n- http://www.w3.org/2004/02/skos/core#prefLabel\n- http://www.w3.org/2004/02/skos/core#altLabel\n- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P108\n- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#P90\n\n# additional corpora \nknown_mappings: null  # cross-ontology corpus\nauxiliary_ontos: [] # auxiliary corpus\n\n# bert config\nbert:  pretrained_path: emilyalsentzer/Bio_ClinicalBERT  max_length_for_input: 128 num_epochs_for_training: 3.0\nbatch_size_for_training: 32\nbatch_size_for_prediction: 128\nresume_training: null\n\n# global matching config\nglobal_matching:\nenabled: true\nnum_raw_candidates: 200 num_best_predictions: 10 mapping_extension_threshold: 0.9   mapping_filtered_threshold: 0.9995 for_oaei: false\n</code></pre>"},{"location":"bertmap/#bertmap-or-bertmaplt","title":"BERTMap or BERTMapLt","text":"<code>config.model</code> By changing this parameter to <code>bertmap</code> or <code>bertmaplt</code>, users can switch between  \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\). Note that \\(\\textsf{BERTMapLt}\\) does not use any training and mapping refinement parameters."},{"location":"bertmap/#annotation-properties","title":"Annotation Properties","text":"<code>config.annotation_property_iris</code> The IRIs stored in this parameter refer to annotation properties with literal values that define the synonyms of an ontology class. Many ontology matching systems rely on synonyms for good performance, including the \\(\\textsf{BERTMap}\\) family. The default <code>config.annotation_property_iris</code> are in line with the Bio-ML dataset, which will be constantly updated. Users can append or delete IRIs for specific input ontologies. <p>Note that it is safe to specify all possible annotation properties regardless of input ontologies because the ones that are not used will be ignored.</p>"},{"location":"bertmap/#additional-training-data","title":"Additional Training Data","text":"<p>The text semantics corpora by default (unsupervised setting) will consist of two intra-ontology sub-corpora built from two input ontologies (based on the specified annotation properties). To add more training data, users can opt to feed input mappings (cross-ontology sub-corpus) and/or a list of auxiliary ontologies (auxiliary sub-corpora). </p> <code>config.known_mappings</code> Specify the path to input mapping file here; the input mapping file should be a <code>.tsv</code> or <code>.csv</code> file with three columns with headings: <code>[\"SrcEntity\", \"TgtEntity\", \"Score\"]</code>. Each row corresponds to a triple \\((c, c', s(c, c'))\\) where \\(c\\) is a source ontology class, \\(c'\\) is a target ontology class, and \\(s(c, c')\\) is the matching score. Note that in the BERTMap context, input mapppings are assumed to be gold standard (reference) mappings with scores equal to \\(1.0\\). Regardless of scores specified in the mapping file, the scores of the input mapppings will be adjusted to \\(1.0\\) automatically. <code>config.auxiliary_ontos</code> Specify a list of paths to auxiliary ontology files here. For each auxiliary ontology, a corresponding intra-ontology corpus will be created and thus produce more synonym and non-synonym samples."},{"location":"bertmap/#bert-settings","title":"BERT Settings","text":"<code>config.bert.pretrained_path</code> \\(\\textsf{BERTMap}\\) uses the pre-trained Bio-Clincal BERT as specified in this parameter because it was originally applied on biomedical ontologies. For general purpose ontology matching, users can use pre-trained variants such as <code>bert-base-uncased</code>. <code>config.bert.batch_size_for_training</code> Batch size for BERT fine-tuning. <code>config.bert.batch_size_for_prediction</code> Batch size for BERT validation and mapping prediction. <p>Adjust these two parameters if users found an inappropriate GPU memory fit. </p> <code>config.bert.resume_training</code> Set to <code>true</code> if the BERT training process is somehow interrupted and users wish to continue training."},{"location":"bertmap/#global-matching-settings","title":"Global Matching Settings","text":"<code>config.global_matching.enabled</code> As mentioned in usage, users can disable automatic global matching by setting this parameter to <code>false</code> if they wish to use the mapping scoring module only.  <code>config.global_matching.num_raw_candidates</code> Set the number of raw candidates selected in the mapping prediction phase.  <code>config.global_matching.num_best_predictions</code> Set the number of best scored mappings preserved in the mapping prediction phase. The default value <code>10</code> is often more than enough. <code>config.global_matching.mapping_extension_threshold</code> Set the score threshold of mappings used in the iterative mapping extension process. Higher value shortens the time but reduces the recall.  <code>config.global_matching.mapping_filtered_threshold</code> The score threshold of mappings preserved for final mapping refinement.  <code>config.global_matching.for_oaei</code> Set to <code>false</code> for normal use and set to <code>true</code> for the OAEI 2023 Bio-ML Track such that entities that are annotated as not used in alignment will be ignored during global matching."},{"location":"bertmap/#output-format","title":"Output Format","text":"<p>Running \\(\\textsf{BERTMap}\\) will create a directory named <code>bertmap</code> or <code>bertmaplt</code> in the specified output path. The file structure of this directory is as follows:</p> <pre><code>bertmap\n\u251c\u2500\u2500 data\n\u2502   \u251c\u2500\u2500 fine-tune.data.json\n\u2502   \u2514\u2500\u2500 text-semantics.corpora.json\n\u251c\u2500\u2500 bert\n\u2502   \u251c\u2500\u2500 tensorboard\n\u2502   \u251c\u2500\u2500 checkpoint-{some_number}\n\u2502   \u2514\u2500\u2500 checkpoint-{some_number}\n\u251c\u2500\u2500 match\n\u2502   \u251c\u2500\u2500 logmap-repair\n\u2502   \u251c\u2500\u2500 raw_mappings.json\n\u2502   \u251c\u2500\u2500 repaired_mappings.tsv \n\u2502   \u251c\u2500\u2500 raw_mappings.tsv\n\u2502   \u251c\u2500\u2500 extended_mappings.tsv\n\u2502   \u2514\u2500\u2500 filtered_mappings.tsv\n\u251c\u2500\u2500 bertmap.log\n\u2514\u2500\u2500 config.yaml\n</code></pre> <p>It is worth mentioning that the <code>match</code> sub-directory contains all the global matching files:</p> <code>raw_mappings.tsv</code> The raw mapping predictions before mapping refinement. The <code>.json</code> one is used internally to prevent accidental interruption. Note that <code>bertmaplt</code> only produces raw mapping predictions (no mapping refinement). <code>extended_mappings.tsv</code> The output mappings after applying mapping extension.  <code>filtered_mappings.tsv</code> The output mappings after mapping extension and threshold filtering.  <code>logmap-repair</code> A folder containing intermediate files needed for applying LogMap's debugger. <code>repaired_mappings.tsv</code> The final output mappings after mapping repair."},{"location":"bertsubs/","title":"Subsumption Prediction with BERTSubs","text":"<p>Paper</p> <p>Paper for BERTSubs: Contextual Semantic Embeddings for Ontology Subsumption Prediction (accepted by WWW Journal in 2023).</p> <pre><code>@article{chen2023contextual,\n  title={Contextual semantic embeddings for ontology subsumption prediction},\n  author={Chen, Jiaoyan and He, Yuan and Geng, Yuxia and Jim{\\'e}nez-Ruiz, Ernesto and Dong, Hang and Horrocks, Ian},\n  journal={World Wide Web},\n  pages={1--23},\n  year={2023},\n  publisher={Springer}\n}\n</code></pre> <p>This page gives the tutorial for \\(\\textsf{BERTSubs}\\) including the functions, the summary of the models and usage instructions.</p> <p> The current version of \\(\\textsf{BERTSubs}\\) is able to predict:</p> <ol> <li>named subsumptions between two named classes, or complex subsumptions between one named class and one complex class, within an ontology,</li> <li>named subsumption between two named classes, or complex subsumptions between one named class and one complex class, across two ontologies (note the former corresponds to subsumption mapping).  </li> </ol> <p> <p>Figure 1. Pipeline illustration of BERTSubs.</p> </p> <p> The pipeline of \\(\\textsf{BERTSubs}\\) consists of following steps.</p> <ol> <li> <p>Corpus Construction: extracting a set of sentence pairs from positive and negative subsumptions from the target ontology (or ontologies), with one of the following three templates used for transforming each class into a sentence,</p> <ul> <li>Isolated Class, which just uses the names of the input class,</li> <li>Path Context, which uses the names of the upper (resp. down) path starting from the input superclass (resp. subclass),</li> <li>Breadth-first Class Context, which uses the names of the input class's surrounding classes.</li> </ul> </li> <li> <p>Model Fine-tuning: fine-tuning a language model such as BERT with the above sentence pairs.</p> </li> <li>Prediction: using the fine-tuned language model to predict the subsumption scores of the given candidate class pairs.</li> </ol> <p>Note that the optionally given subsumptions via a train subsumption file can also be used for fine-tuning.  Please see more technical details in the paper.</p>"},{"location":"bertsubs/#evaluation-case-and-dataset-ontology-completion","title":"Evaluation Case and Dataset (Ontology Completion)","text":"<p>The evaluation is implemented scripts/bertsubs_intra_evaluate.py. Download an ontology (e.g., FoodOn) and run: <pre><code>python bertsubs_intra_evaluate.py --onto_file ./foodon-merged.0.4.8.owl\n</code></pre></p> <p>The parameter --subsumption_type can be set to 'restriction' for complex class subsumptions, and 'named_class' for named class subsumptions. Please see the programme for more parameters and their meanings.</p> <p>It executes the following procedure:</p> <ol> <li> <p>The named class or complex class subsumption axioms of an ontology is partitioned into a train set, a valid set and a test set. They are saved as train, valid and test files, respectively.</p> </li> <li> <p>The test and the valid subsumption axioms are removed from the original ontology, and a new ontology is saved.</p> </li> </ol> <p>Notice: for a named class test/valid subsumption, a set of negative candidate super classes are extracted from the ground truth super class's neighbourhood. For a complex class test/valid subsumption, a set of negative candidate super classes are randomly extracted from all the complex classes in the ontology.</p>"},{"location":"bertsubs/#usage","title":"Usage","text":"<p>To run \\(\\textsf{BERTSubs}\\), a configuration file and one input ontology (or two ontologies) are mandatory. If candidate class pairs are given, a fine-tuned language model and a file with predicted scores of the candidate class pairs in the test file are output; otherwise, only the fine-grained language model is output. The test metrics (MRR and Hits@K) can also be output if the ground truth and a set of negative candidate super classes are given for the subclass of each valid/test subsumption. </p> <ol> <li> <p>The following code is for intra-ontology subsumption. <pre><code>from yacs.config import CfgNode\nfrom deeponto.complete.bertsubs import BERTSubsIntraPipeline, DEFAULT_CONFIG_FILE_INTRA\nfrom deeponto.utils import load_file\nfrom deeponto.onto import Ontology\n\nconfig = CfgNode(load_file(DEFAULT_CONFIG_FILE_INTRA)) # Load default configuration file\nconfig.onto_file = './foodon.owl'\nconfig.train_subsumption_file = './train_subsumptions.csv' # optional\nconfig.valid_subsumption_file = './valid_subsumptions.csv' # optional\nconfig.test_subsumption_file = './test_subsumptions.csv' #optional\nconfig.test_type = 'evaluation' #'evaluation': calculate metrics with ground truths given in the test_subsumption_file; 'prediction': predict scores for candidate subsumptions given in test_submission_file\nconfig.subsumption_type = 'named_class'  # 'named_class' or 'restriction' \nconfig.prompt.prompt_type = 'isolated'  # 'isolated', 'traversal', 'path' (three templates)\n\nonto = Ontology(owl_path=config.onto_file)\nintra_pipeline = BERTSubsIntraPipeline(onto=onto, config=config)\n</code></pre></p> </li> <li> <p>The following code is for inter-ontology subsumption. <pre><code>from yacs.config import CfgNode\nfrom deeponto.complete.bertsubs import BERTSubsInterPipeline, DEFAULT_CONFIG_FILE_INTER\nfrom deeponto.utils import load_file\nfrom deeponto.onto import Ontology\n\nconfig = CfgNode(load_file(DEFAULT_CONFIG_FILE_INTER)) # Load default configuration file\nconfig.src_onto_file = './helis2foodon/helis_v1.00.owl'\nconfig.tgt_onto_file = './helis2foodon/foodon-merged.0.4.8.subs.owl'\nconfig.train_subsumption_file = './helis2foodon/train_subsumptions.csv' # optional\nconfig.valid_subsumption_file = './helis2foodon/valid_subsumptions.csv' # optional\nconfig.test_subsumption_file = './helis2foodon/test_subsumptions.csv' # optional\nconfig.test_type = 'evaluation' # 'evaluation', or 'prediction'\nconfig.subsumption_type = 'named_class'  # 'named_class', or 'restriction'\nconfig.prompt.prompt_type = 'path'   # 'isolated', 'traversal', 'path' (three templates)\n\nsrc_onto = Ontology(owl_path=config.src_onto_file)\ntgt_onto = Ontology(owl_path=config.tgt_onto_file)\ninter_pipeline = BERTSubsInterPipeline(src_onto=src_onto, tgt_onto=tgt_onto, config=config)\n</code></pre></p> </li> </ol> <p>For more details on the configuration, please see the comment in the default configuration files  default_config_intra.yaml and default_config_inter.yaml.</p>"},{"location":"bio-ml/","title":"Bio-ML: A Comprehensive Documentation","text":"<p>paper</p> <p>Paper for Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022). It was nominated as the best resource paper candidate at ISWC 2022.</p> <pre><code>@inproceedings{he2022machine,\n  title={Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching},\n  author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian},\n  booktitle={The Semantic Web--ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23--27, 2022, Proceedings},\n  pages={575--591},\n  year={2022},\n  organization={Springer}\n}\n</code></pre>"},{"location":"bio-ml/#overview","title":"Overview","text":"<p>\\(\\textsf{Bio-ML}\\) is a comprehensive ontology matching (OM) dataset that includes five ontology pairs for both equivalence and subsumption ontology matching. Two of these pairs are based on the Mondo ontology, and the remaining three are based on the UMLS ontology. The construction of these datasets encompasses several steps:</p> <ul> <li>Ontology Preprocessing: This phase involves verifying the integrity of the ontology and eliminating deprecated or superfluous classes.</li> <li>Ontology Pruning: In this stage, a sub-ontology is obtained in accordance with a list of preserved class IRIs. For Mondo ontologies, class preservation is based on reference mappings, while for UMLS ontologies, it relies on semantic types (see Ontology Pruning).</li> <li>Subsumption Mapping Construction: Reference subsumption mappings are built from reference equivalence mappings, subject to target class deletion. To clarify, if an equivalence mapping is utilised for constructing a subsumption mapping, its corresponding target ontology class will be discarded to enforce direct subsumption matching (see Subsumption Mapping Construction).</li> <li>Candidate Mapping Generation: For the purpose of evaluating an Ontology Matching (OM) system using ranking-based metrics, we generate a list of negative candidate mappings for each reference mapping by employing various heuristics (see Candidate Mapping Generation).</li> <li>Locality Module Enrichment (NEW ): Newly introduced in the OAEI 2023 version, the pruned ontologies are enriched with classes that serve as context (annotated as not used in alignment) for existing classes, leveraging the locality module technique (access the code). OM systems can use these supplemental classes as auxiliary information while excluding them from the alignment process. These additional classes will also be omitted from the final evaluation. </li> <li>Bio-LLM: A Special Sub-Track for Large Language Models (NEW ): Another addition to the OAEI 2023 version, we introduced a unique sub-track for Large Language Model (LLM)-based OM systems. This is achieved by extracting small but challenging subsets from the NCIT-DOID and SNOMED-FMA (Body) datasets (see OAEI Bio-LLM 2023).</li> </ul>"},{"location":"bio-ml/#important-links","title":"Important Links","text":"<ul> <li> <p>Dataset Download (License: CC BY 4.0 International):</p> <ul> <li>OAEI 2022: https://doi.org/10.5281/zenodo.6946466 (see OAEI Bio-ML 2022 for detailed description).</li> <li>OAEI 2023: https://doi.org/10.5281/zenodo.8193375 (see OAEI Bio-ML 2023 for detailed description).</li> </ul> </li> <li> <p>Complete Documentation: https://krr-oxford.github.io/DeepOnto/bio-ml/ (this page).</p> </li> <li>Reference Paper: https://arxiv.org/abs/2205.03447 (revised arXiv version).</li> <li>Official OAEI Page: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/index.html (OAEI participation and results).</li> </ul>"},{"location":"bio-ml/#ontology-pruning","title":"Ontology Pruning","text":"<p>In order to derive scalable Ontology Matching (OM) pairs, the ontology pruning algorithm propoased in the \\(\\textsf{Bio-ML}\\) paper can be utilised. This algorithm is designed to trim a large-scale ontology based on certain criteria, such as involvement in a reference mapping or association with a particular semantic type (see UMLS data scripts). The primary goal of the pruning function is to discard irrelevant ontology classes whilst preserving the relevant hierarchical structure. </p> <p>More specifically, for each class, denoted as \\(c\\), that needs to be removed, subsumption axioms are created between the parent and child elements of \\(c\\). This step is followed by the removal of all axioms related to the unwanted classes.</p> <p>Once a list of class IRIs to be removed has been compiled, the ontology pruning can be executed using the following code:</p> <pre><code>from deeponto.onto import Ontology, OntologyPruner\n\n# Load the DOID ontology\ndoid = Ontology(\"doid.owl\")\n\n# Initialise the ontology pruner\npruner = OntologyPruner(doid)\n\n# Specify the classes to be removed\nto_be_removed_class_iris = [\n    \"http://purl.obolibrary.org/obo/DOID_0060158\",\n    \"http://purl.obolibrary.org/obo/DOID_9969\"\n]\n\n# Perform the pruning operation\npruner.prune(to_be_removed_class_iris)\n\n# Save the pruned ontology locally\npruner.save_onto(\"doid.pruned.owl\")  \n</code></pre>"},{"location":"bio-ml/#subsumption-mapping-construction","title":"Subsumption Mapping Construction","text":"<p>Ontology Matching (OM) datasets often include equivalence matching, but not subsumption matching. However, it is feasible to create a subsumption matching task from an equivalence matching task. Given a list of reference equivalence mappings, which take the form of \\({(c, c') | c \\equiv c' }\\), one can construct reference subsumption mappings by identifying the subsumers of \\(c'\\) and producing \\({(c, c'') | c \\equiv c', c' \\sqsubseteq c'' }\\). We have developed a subsumption mapping generator for this purpose.</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.mapping import SubsFromEquivMappingGenerator, ReferenceMapping\n\n# Load the NCIT and DOID ontologies\nncit = Ontology(\"ncit.owl\")\ndoid = Ontology(\"doid.owl\")\n\n# Load the equivalence mappings\nncit2doid_equiv_mappings = ReferenceMapping.read_table_mappings(\"ncit2doid_equiv_mappings.tsv\")  # The headings are [\"SrcEntity\", \"TgtEntity\", \"Score\"]\n\n# Initialise the subsumption mapping generator \n# and the mapping construction is automatically done\nsubs_generator = SubsFromEquivMappingGenerator(\n  ncit, doid, ncit2doid_equiv_mappings, \n  subs_generation_ratio=1, delete_used_equiv_tgt_class=True\n)\n</code></pre> <code>Output:</code> <pre><code>3299/4686 are used for creating at least one subsumption mapping.\n3305 subsumption mappings are created in the end.\n</code></pre> <p>Retrieve the generated subsumption mappings with:</p> <pre><code>subs_generator.subs_from_equivs\n</code></pre> <code>Output:</code> <pre><code>[('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C9311',\n  'http://purl.obolibrary.org/obo/DOID_120',\n  1.0),\n ('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C8410',\n  'http://purl.obolibrary.org/obo/DOID_1612',\n  1.0), ...]\n</code></pre> <p>See a concrete data script for this process at <code>OAEI-Bio-ML/data_scripts/generate_subs_maps.py</code>.</p> <p>The <code>subs_generation_ratio</code> parameter determines at most how many subsumption mappings can be generated from an equivalence mapping. The <code>delete_used_equiv_tgt_class</code> determines whether or not to sabotage equivalence mappings used for creating at least one subsumption mappings. If it is set to <code>True</code>, then the target side of an (used) equivalence mapping will be marked as deleted from the target ontology. Then, apply ontology pruning to the list of to-be-deleted target ontology classes:</p> <pre><code>from deeponto.onto import OntologyPruner\n\npruner = OntologyPruner(doid)\npruner.prune(subs_generator.used_equiv_tgt_class_iris)\npruner.save_onto(\"doid.subs.owl\")\n</code></pre> <p>See a concrete data script for this process at <code>OAEI-Bio-ML/data_scripts/generate_cand_maps.py</code>.</p> <p>Note</p> <p>In the OAEI 2023 version, the target class deletion is disabled as modularisation counteracts the effects of such deletion. For more details, refer to OAEI Bio-ML 2023.</p>"},{"location":"bio-ml/#candidate-mapping-generation","title":"Candidate Mapping Generation","text":"<p>To evaluate an Ontology Matching (OM) model's capacity to identify correct mappings amid a pool of challenging negative candidates, we utilise the negative candidate mapping generation algorithm as proposed in the Bio-ML paper. This algorithm uses <code>idf_sample</code> to generate candidates that are textually ambiguous (i.e., with similar naming), and <code>neighbour_sample</code> to generate candidates that are structurally ambiguous (e.g., siblings). The algorithm ensures that none of the reference mappings are added as negative candidates. Additionally, for subsumption cases, the algorithm carefully excludes ancestors as they are technically correct subsumptions.</p> <p>Use the following Python code to perform this operation:</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.mapping import NegativeCandidateMappingGenerator, ReferenceMapping\nfrom deeponto.align.bertmap import BERTMapPipeline\n\n# Load the NCIT and DOID ontologies\nncit = Ontology(\"ncit.owl\")\ndoid = Ontology(\"doid.owl\")\n\n# Load the equivalence mappings\nncit2doid_equiv_mappings = ReferenceMapping.read_table_mappings(\"ncit2doid_equiv_mappings.tsv\")  # The headings are [\"SrcEntity\", \"TgtEntity\", \"Score\"]\n\n# Load default config in BERTMap\nconfig = BERTMapPipeline.load_bertmap_config()\n\n# Initialise the candidate mapping generator\ncand_generator = NegativeCandidateMappingGenerator(\n  ncit, doid, ncit2doid_equiv_mappings, \n  annotation_property_iris = config.annotation_property_iris,  # Used for idf sample\n  tokenizer=Tokenizer.from_pretrained(config.bert.pretrained_path),  # Used for idf sample\n  max_hops=5, # Used for neighbour sample\n  for_subsumptions=False,  # Set to False because the input mappings in this example are equivalence mappings\n)\n\n# Sample candidate mappings for each reference equivalence mapping\nresults = []\nfor test_map in ncit2doid_equiv_mappings:\n    valid_tgts, stats = neg_gen.mixed_sample(test_map, idf=50, neighbour=50)\n    print(f\"STATS for {test_map}:\\n{stats}\")\n    results.append((test_map.head, test_map.tail, valid_tgts))\nresults = pd.DataFrame(results, columns=[\"SrcEntity\", \"TgtEntity\", \"TgtCandidates\"])\nresults.to_csv(result_path, sep=\"\\t\", index=False)\n</code></pre> <p>See a concrete data script for this process at <code>OAEI-Bio-ML/data_scripts/generate_cand_maps.py</code>.</p> <p>The process of sampling using idf scores was originally proposed in the BERTMap paper. The <code>annotation_property_iris</code> parameter specifies the list of annotation properties used to extract the names or aliases of an ontology class. The <code>tokenizer</code> parameter refers to a pre-trained sub-word level tokenizer used to build the inverted annotation index. These aspects are thoroughly explained in the BERTMap tutorial.</p>"},{"location":"bio-ml/#evaluation-framework","title":"Evaluation Framework","text":"<p>Our evaluation protocol concerns two scenarios for OM: global matching for overall assessment and local ranking for partial assessment.</p>"},{"location":"bio-ml/#global-matching","title":"Global Matching","text":"<p>As an overall assessment, given a complete set of reference mappings, an OM system is expected to compute a set of true mappings and compare against the reference mappings using Precision, Recall, and F-score metrics. With \\(\\textsf{DeepOnto}\\), the evaluation can be performed using the following code. </p> <p>Matching Result<p>Download an example of matching result file. The three columns, <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code> refer to the source class IRI, the target class IRI, and the matching score.</p> </p> <pre><code>from deeponto.align.evaluation import AlignmentEvaluator\nfrom deeponto.align.mapping import ReferenceMapping, EntityMapping\n\n# load prediction mappings and reference mappings\npreds = EntityMapping.read_table_mappings(f\"{experiment_dir}/bertmap/match/repaired_mappings.tsv\")\nrefs = ReferenceMapping.read_table_mappings(f\"{data_dir}/refs_equiv/full.tsv\")\n\n# compute the precision, recall and F-score metrics\nresults = AlignmentEvaluator.f1(preds, refs)\nprint(results)\n</code></pre> <p>The associated formulas for Precision, Recall and F-score are:</p> \\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}, \\ \\ R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}, \\ \\ F_1 = \\frac{2 P R}{P + R}\\] <p>where \\(\\mathcal{M}_{pred}\\) and \\(\\mathcal{M}_{ref}\\) denote the prediction mappings and reference mappings, respectively.</p> <code>Output:</code> <pre><code>{'P': 0.887, 'R': 0.879, 'F1': 0.883}\n</code></pre> <p>For the semi-supervised setting where a small set of training mappings is provided, the training set should also be loaded and set as null (neither positive nor negative) with <code>null_reference_mappings</code> during evaluation:</p> <pre><code>train_refs = ReferenceMapping.read_table_mappings(f\"{data_dir}/refs_equiv/train.tsv\")\nresults = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=train_refs)\n</code></pre> <p>When null reference mappings are involved, the formulas of Precision and Recall become:</p> \\[P = \\frac{|(\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}) - \\mathcal{M}_{null}|}{|\\mathcal{M}_{pred} - \\mathcal{M}_{null} |}, \\ \\ R = \\frac{|(\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}) - \\mathcal{M}_{null}|}{|\\mathcal{M}_{ref} - \\mathcal{M}_{null}|}\\] <p>As for the OAEI 2023 version, some prediction mappings could involve classes that are marked as not used in alignment. Therefore, we need to filter out those mappings before evaluation.</p> <pre><code>from deeponto.onto import Ontology\nfrom deeponto.align.oaei import *\n\n# load the source and target ontologies and  \n# extract classes that are marked as not used in alignment\nsrc_onto = Ontology(\"src_onto_file\")\ntgt_onto = Ontology(\"tgt_onto_file\")\nignored_class_index = get_ignored_class_index(src_onto)\nignored_class_index.update(get_ignored_class_index(tgt_onto))\n\n# filter the prediction mappings\npreds = remove_ignored_mappings(preds, ignored_class_index)\n\n# then compute the results\nresults = AlignmentEvaluator.f1(preds, refs, ...)\n</code></pre> <p>Tip</p> <p>We have encapsulated above features in the <code>matching_eval</code> function in the OAEI utilities.</p> <p>However,</p> <ul> <li>The scores will be biased towards high-precision, low-recall OM systems if the set of reference mappings is incomplete. </li> <li>For efficient OM system development and debugging, an intermediate evaluation is required.</li> </ul> <p>Therefore, the ranking-based evaluation protocol is presented as follows.</p>"},{"location":"bio-ml/#local-ranking","title":"Local Ranking","text":"<p>An OM system is also expected to distinguish the reference mapping among a set of candidate mappings and the performance can be reflected in Hits@K and MRR metrics. </p> <p>Warning</p> <p>The reference subsumption mappings are inherently incomplete, so only the ranking metrics are adopted in evaluating system performance in subsumption matching.</p> <p>Ranking Result<p>Download an example of raw (unscored) candidate mapping file and an example of scored candidate mapping file. The <code>\"SrcEntity\"</code> and <code>\"TgtEntity\"</code> columns refer to the source class IRI and the target class IRI involved in a reference mapping. The <code>\"TgtCandidates\"</code> column stores a sequence of <code>tgt_cand_iri</code> in the unscored file and a list of tuples <code>(tgt_cand_iri, score)</code> in the scored file, which can be accessed by the built-in Python function <code>eval</code>. </p> </p> <p>With \\(\\textsf{DeepOnto}\\), the evaluation can be performed as follows. First, an OM system needs to assign a score to each target candidate class and save the results as a list of tuples <code>(tgt_cand_class_iri, matching_score)</code>. </p> <pre><code>from deeponto.utils import read_table\nimport pandas as pd\n\ntest_candidate_mappings = read_table(\"test.cands.tsv\").values.to_list()\nranking_results = []\nfor src_ref_class, tgt_ref_class, tgt_cands in test_candidate_mappings:\n    tgt_cands = eval(tgt_cands)  # transform string into list or sequence\n    scored_cands = []\n    for tgt_cand in tgt_cands:\n        # assign a score to each candidate with an OM system\n        ...\n        scored_cands.append((tgt_cand, matching_score))\n    ranking_results.append((src_ref_class, tgt_ref_class, scored_cands))\n# save the scored candidate mappings in the same format as the original `test.cands.tsv`\npd.DataFrame(ranking_results, columns=[\"SrcEntity\", \"TgtEntity\", \"TgtCandidates\"]).to_csv(\"scored.test.cands.tsv\", sep=\"\\t\", index=False)\n</code></pre> <p>Then, the ranking evaluation results can be obtained by:</p> <pre><code>from deeponto.align.oaei import *\n\n# If `has_score` is False, assume default ranking (see tips below)\nranking_eval(\"scored.test.cands.tsv\", has_score=True, Ks=[1, 5, 10])\n</code></pre> <code>Output:</code> <pre><code>{'MRR': 0.9586373098280843,\n 'Hits@1': 0.9371951219512196,\n 'Hits@5': 0.9820121951219513,\n 'Hits@10': 0.9878048780487805}\n</code></pre> <p>The associated formulas for MRR and Hits@K are:</p> \\[MRR = \\sum_i^N rank_i^{-1} / N, \\ \\ Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N\\] <p>where \\(N\\) is the number of reference mappings used for testing, \\(rank_i\\) is the relative rank of the reference mapping among its candidate mappings.</p> <p>Tip</p> <p>If matching scores are not available, the target candidate classes should be sorted in descending order and saved in a list, the <code>ranking_eval</code> function will compute scores according to the sorted list.</p>"},{"location":"bio-ml/#oaei-bio-ml-2022","title":"OAEI Bio-ML 2022","text":"<p>Below is a table showing the data statistics for the original Bio-ML used in OAEI 2022. In the Category column, \"Disease\" indicates that the data from Mondo mainly covers disease concepts, while \"Body\", \"Pharm\", and \"Neoplas\" denote semantic types of \"Body Part, Organ, or Organ Components\", \"Pharmacologic Substance\", and \"Neoplastic Process\" in UMLS, respectively. </p> <p>Note that each subsumption matching task is constructed from an equivalence matching task subject to target ontology class deletion, therefore <code>#TgtCls (subs)</code> differs from <code>#TgtCls</code>.</p> <p> Source Task Category #SrcCls #TgtCls #TgtCls(\\(\\sqsubseteq\\)) #Ref(\\(\\equiv\\)) #Ref(\\(\\sqsubseteq\\)) Mondo OMIM-ORDO Disease 9,642 8,838 8,735 3,721 103 Mondo NCIT-DOID Disease 6,835 8,448 5,113 4,686 3,339 UMLS SNOMED-FMA Body 24,182 64,726 59,567 7,256 5,506 UMLS SNOMED-NCIT Pharm 16,045 15,250 12,462 5,803 4,225 UMLS SNOMED-NCIT Neoplas 11,271 13,956 13,790 3,804 213 <p> </p> <p>The datasets, which can be downloaded from Zenodo, include <code>Mondo.zip</code> and <code>UMLS.zip</code> for resources constructed from Mondo and UMLS, respectively. Each <code>.zip</code> file contains three folders: <code>raw_data</code>, <code>equiv_match</code>, and <code>subs_match</code>, corresponding to the raw source ontologies, data for equivalence matching, and data for subsumption matching, respectively. The detailed file structure is illustrated in the figure below.</p> <p></p> <p> </p>"},{"location":"bio-ml/#oaei-bio-ml-2023","title":"OAEI Bio-ML 2023","text":"<p>For the OAEI 2023 version, we implemented several updates, including:</p> <ul> <li> <p>Locality Module Enrichment: In response to the loss of ontology context due to pruning, we used the locality module technique (access the code) to enrich pruned ontologies with logical modules that provide context for existing classes. To ensure the completeness of reference mappings, the new classes added are annotated as not used in alignment with the annotation property <code>use_in_alignment</code> set to <code>false</code>. While these supplemental classes can be used by OM systems as auxiliary information, they can be excluded from the alignment process. Even they are considered in the final output mappings, our evaluation will ensure that they are excluded in the metric computation (see Evaluation Framework). </p> </li> <li> <p>Simplified Task Settings: For each of the five OM pairs, we simplified the task settings to the following:</p> </li> <li>Equivalence Matching: <ul> <li>Unsupervised: We cancelled the validation mapping set, and the full reference mapping set available at <code>{task_name}/refs_equiv/full.tsv</code> is used for global matching evaluation.</li> <li>Semi-supervised: The validation mapping set is incorporated into the training set (rendering train-val splitting optional), and the test mapping set available at <code>{task_name}/refs_equiv/test.tsv</code> is used for global matching evaluation.</li> <li>Local Ranking: Both unsupervised and semi-supervised settings use the same set of candidate mappings found at <code>{task_name}/refs_equiv/test.cands.tsv</code> for local ranking evaluation.</li> </ul> </li> <li> <p>Subsumption Matching:</p> <ul> <li>Target Ontology: In the OAEI 2022 version, the target ontology for subsumption matching is different from the one for equivalence matching due to target class deletion. However, as the locality modules counteract such deletion process, we use the same target ontology for both types of matching.</li> <li>Unsupervised: We removed the unsupervised setting since the subsumption matching task did not attract enough participation.</li> <li>Semi-supervised: The validation mapping set is merged into the training set (rendering train-val splitting optional). We conduct a local ranking evaluation (global matching is not applicable for subsumption matching) for candidate mappings available at <code>{task_name}/refs_subs/test.cands.tsv</code>. </li> </ul> </li> <li> <p>Bio-LLM: A Special Sub-Track for Large Language Models: We introduced a unique sub-track for Large Language Model (LLM)-based OM systems. We extracted small but challenging subsets from the NCIT-DOID and SNOMED-FMA (Body) datasets for this purpose (refer to OAEI Bio-LLM 2023).</p> </li> </ul> <p>Below demonstrates the data statistics for the OAEI 2023 version of Bio-ML, where the input ontologies are enriched with locality modules compared to the pruned versions used in OAEI 2022. The augmented structural and logical contexts make these ontologies more similar to their original versions without any processing (available at <code>raw_data</code>). The changes compared to the previous version (see Bio-ML OAEI 2022) are reflected in the <code>+</code> numbers of ontology classes. </p> <p>In the Category column, \"Disease\" indicates that the Mondo data are mainly about disease concepts, while \"Body\", \"Pharm\", and \"Neoplas\" denote semantic types of \"Body Part, Organ, or Organ Components\", \"Pharmacologic Substance\", and \"Neoplastic Process\" in UMLS, respectively. </p> <p> Source Task Category #SrcCls #TgtCls #Ref(\\(\\equiv\\)) #Ref(\\(\\sqsubseteq\\)) Mondo OMIM-ORDO Disease 9,648 (+6) 9,275 (+437) 3,721 103 Mondo NCIT-DOID Disease 15,762 (+8,927) 8,465 (+17) 4,686 3,339 UMLS SNOMED-FMA Body 34,418 (+10,236) 88,955 (+24,229) 7,256 5,506 UMLS SNOMED-NCIT Pharm 29,500 (+13,455) 22,136 (+6,886) 5,803 4,225 UMLS SNOMED-NCIT Neoplas 22,971 (+11,700) 20,247 (+6291) 3,804 213 <p> </p> <p>The file structure for the download datasets (from Zenodo) is also simplified this year to accommodate the changes. Detailed structure is presented in the following figure.</p> <p></p> <p> </p> <p>Remarks on this figure:</p> <ol> <li>For equivalence matching, testing of the global matching evaluation should be performed on <code>refs_equiv/full.tsv</code> in the unsupervised setting, and on <code>refs_equiv/test.tsv</code> (with <code>refs_equiv/train.tsv</code> set to null reference mappings) in the semi-supervised setting. Testing of the local ranking evaluation should be performed on <code>refs_equiv/test.cands.tsv</code> for both settings.</li> <li>For subsumption matching, the local ranking evaluation should be performed on <code>refs_equiv/test.cands.tsv</code> and the training mapping set <code>refs_subs/train.tsv</code> is optional.</li> <li>The <code>test.cands.tsv</code> file in the Bio-LLM sub-track is different from the main Bio-LM track ones. See OAEI Bio-LLM 2023 for more information and how to evaluate on it.</li> </ol>"},{"location":"bio-ml/#oaei-bio-llm-2023","title":"OAEI Bio-LLM 2023","text":"<p>As Large Language Models (LLMs) are trending in the AI community, we formulate a special sub-track for evaluating LLM-based OM systems. However, evaluating LLMs with the current OM datasets can be time and resource intensive. To yield insightful results prior to full implementation, we leverage two challenging subsets extracted from the NCIT-DOID and the SNOMED-FMA (Body) equivalence matching datasets.</p> <p>For each original dataset, we first randomly select 50 matched class pairs from ground truth mappings, but excluding pairs that can be aligned with direct string matching (i.e., having at least one shared label) to restrict the efficacy of conventional lexical matching. Next, with a fixed source ontology class, we further select 99 negative target ontology classes, thus forming a total of 100 candidate mappings (inclusive of the ground truth mapping). This selection is guided by the sub-word inverted index-based idf scores as in the BERTMap paper (see BERTMap tutorial for more details), which are capable of producing target ontology classes lexically akin to the fixed source class. We finally randomly choose 50 source classes that do not have a matched target class according to the ground truth mappings, and create 100 candidate mappings using the inverted index for each. Therefore, each subset comprises 50 source ontology classes with a match and 50 without. Each class is associated with 100 candidate mappings, culminating in a total extraction of 10,000, i.e., (50+50)*100, class pairs.</p>"},{"location":"bio-ml/#evaluation","title":"Evaluation","text":""},{"location":"bio-ml/#matching","title":"Matching","text":"<p>From all the 10,000 class pairs in a given subset, the OM system is expected to predict the true mappings among them, which can be compared against the 50 available ground truth mappings using  Precision, Recall, and F-score. </p> <p>We use the same formulas in the main track evaluation framework to calculate Precision, Recall, and F-score. The prediction mappings \\(\\mathcal{M}_{pred}\\) are the class pairs an OM system predicts as true mappings, and the reference mappings \\(\\mathcal{M}_{ref}\\) refers to the 50 matched pairs. </p> \\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}, \\ \\ R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}, \\ \\ F_1 = \\frac{2 P R}{P + R}\\]"},{"location":"bio-ml/#ranking","title":"Ranking","text":"<p>Given that each source class is associated with 100 candidate mappings, we can compute ranking-based metrics based on their scores. Specifically, we calculate:</p> <ul> <li> <p>\\(Hits@1\\) for the 50 matched source classes, counting a hit when the top-ranked candidate mapping is a ground truth mapping. The corresponding formula is:</p> \\[ Hits@K = \\sum_{(c, c') \\in \\mathcal{M}_{ref}} \\mathbb{I}_{rank_{c'} \\leq K} / |\\mathcal{M}_{ref}| \\] <p>where \\(rank_{c'}\\) is the predicted relative rank of \\(c'\\) among its candidates, \\(\\mathbb{I}_{rank_{c'} \\leq K}\\) is a binary indicator function that outputs 1 if the rank is less than or equal to \\(K\\) and outputs 0 otherwise.</p> </li> <li> <p>The \\(MRR\\) score is also computed for these matched source classes, summing the inverses of the ground truth mappings' relative ranks among candidate mappings. The corresponding formula is:</p> \\[ MRR = \\sum_{(c, c') \\in \\mathcal{M}_{ref}} rank_{c'}^{-1} / |\\mathcal{M}_{ref}| \\] </li> <li> <p>For the 50 unmatched source classes, we compute the rejection rate (denoted as \\(RR\\)), counting a successful rejection when all the candidate mappings are predicted as false mappings. We assign each unmatched source class with a null class \\(c_{null}\\), which refers to any target class that does not have a match with the source class, and denote this set of unreferenced mappings as \\(\\mathcal{M}_{unref}\\). </p> \\[ RR = \\sum_{(c, c_{null}) \\in \\mathcal{M}_{unref}} \\prod_{d \\in \\mathcal{T}_c} (1 - \\mathbb{I}_{c \\equiv d})  / |\\mathcal{M}_{unref}| \\] <p>where \\(\\mathcal{T}_c\\) is the set of target candidate classes for \\(c\\), and \\(\\mathbb{I}_{c \\equiv d}\\) is a binary indicator that outputs 0 if the OM system predicts a false mapping between \\(c\\) and \\(d\\), and outputs 1 otherwise. The product term in this equation returns 1 if all target candidate classes are predicted as unmatched, i.e., \\(\\forall d \\in \\mathcal{T}_c.\\mathbb{I}_{c \\equiv d}=0\\).</p> </li> </ul> <p>To summarise, the Bio-LLM sub-track provides two representative OM subsets and adopts a range of evaluation metrics to gain meaningful insights from this partial assessment, thus promoting robust and efficient development of LLM-based OM systems.</p>"},{"location":"bio-ml/#oaei-participation","title":"OAEI Participation","text":"<p>To participate in the OAEI track, please visit the OAEI Bio-ML website for more information, especially on the instructions of system submission or direct result submission. In the following, we present the formats of result files we expect participants to submit.</p>"},{"location":"bio-ml/#result-submission-format","title":"Result Submission Format","text":"<p>For the main Bio-ML track, we expect two result files for each setting:</p> <ul> <li> <p>(1) A prediction mapping file named <code>match.result.tsv</code> in the same format as the reference mapping file (e.g., <code>task_name/refs_equiv/full.tsv</code>).</p> <p>Matching Result<p>Download an example of mapping file. The three columns, <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code> refer to the source class IRI, the target class IRI, and the matching score.</p> </p> </li> <li> <p>(2) A scored or ranked candidate mapping file named <code>rank.result.tsv</code> in the same format as the test candidate mapping file (e.g., <code>task_name/refs_equiv/test.cands.tsv</code>). </p> <p>Ranking Result<p>Download an example of raw (unscored) candidate mapping file and an example of scored candidate mapping file. The <code>\"SrcEntity\"</code> and <code>\"TgtEntity\"</code> columns refer to the source class IRI and the target class IRI involved in a reference mapping. The <code>\"TgtCandidates\"</code> column stores a sequence of <code>tgt_cand_iri</code> in the unscored file and a list of tuples <code>(tgt_cand_iri, score)</code> in the scored file, which can be accessed by the built-in Python function <code>eval</code>. </p> <p>We also accept a result file without scores and in that case we assume the list of <code>tgt_cand_iri</code> has been sorted in descending order.</p> </p> </li> </ul> <p>Note that each OM pair is accompanied with an unsupervised and a semi-supervised setting and thus separate sets of result files should be submitted. Moreover, for subsumption matching, only the ranking result file in (2) is required.</p> <p>For the Bio-LLM sub-track, we expect one result file (similar to (2) but requiring a list of triples) for the task:</p> <ul> <li> <p>(3)  A scored or ranked (with answers) candidate mapping file named <code>biollm.result.tsv</code> in the same format as the test candidate mapping file (i.e., <code>task_name/test.cands.tsv</code>).</p> <p>Bio-LLM Result<p>Download an example of bio-llm mapping file. The <code>\"SrcEntity\"</code> and <code>\"TgtEntity\"</code> columns refer to the source class IRI and the target class IRI involved in a reference mapping. The <code>\"TgtCandidates\"</code> column stores a sequence of a list of triples <code>(tgt_cand_iri, score, answer)</code> in the scored file, which can be accessed by the built-in Python function <code>eval</code>. The additional <code>answer</code> values are <code>True</code> or <code>False</code> indicating whether the OM system predicts <code>(src_class_iri, tgt_cand_iri)</code> as a true mapping.</p> </p> </li> </ul> <p>It is important to notice that the <code>answer</code> values are necessary for the matching evaluation of P, R, F-score, and the computation of rejection rate, the <code>score</code> values are used for ranking evaluation of MRR and Hits@1.</p>"},{"location":"changelog/","title":"Changelog","text":""},{"location":"changelog/#unreleased","title":"Unreleased","text":""},{"location":"changelog/#added","title":"Added","text":"<ul> <li>[] Add ICON as a completion service at <code>deeponto.complete</code>.</li> <li> Add empty annotation index warning for BERTMap, related to issue #18.</li> <li> Add <code>check_consistency()</code> at <code>deeponto.onto.Ontology</code>.</li> <li> Add a warning message for empty vocab at <code>deeponto.onto.OntologyVerbaliser</code>.</li> </ul>"},{"location":"changelog/#changed","title":"Changed","text":"<ul> <li> Change <code>deeponto.subs</code> to <code>deeponto.complete</code>.</li> <li> Move <code>deeponto.probe.ontolama</code> into <code>deeponto.complete</code>.</li> </ul> <p>...</p>"},{"location":"changelog/#v088-2023-october","title":"v0.8.8 (2023 October)","text":""},{"location":"changelog/#added_1","title":"Added","text":"<ul> <li> Add object property domain/range verbalisation at <code>deeponto.onto.OntologyVerbaliser</code>.</li> <li> Add new reasoner type <code>\"struct\"</code> (Structural Reasoner) at <code>deeponto.onto.OntologyReasoner</code>.</li> <li> Add <code>load_reasoner()</code> method at <code>deeponto.onto.OntologyReasoner</code> for convenience of changing the reasoner type and remove <code>reload_reasoner()</code> method as it is a special case of <code>load_reasoner()</code>.</li> <li> Add <code>rdflib</code> into the dependencies for building graph-related features.</li> <li> Add <code>deeponto.onto.taxonomy</code> for building the taxonomy over ontologies and potentially other structured data.</li> </ul>"},{"location":"changelog/#changed_1","title":"Changed","text":"<ul> <li> Change printing to appropriate logging (gradually).</li> <li> Change <code>read_table_mappings()</code> method at <code>deeponto.align.mapping</code> from using <code>dataframe.iterrows()</code> to <code>dataframe.itertuples()</code> which is much more efficient.</li> <li> Change the default lowcasing argument of <code>deeponto.utils.process_annotation_literal()</code> to <code>False</code>.</li> <li> Change the default logging level of <code>slf4j</code> to <code>warn</code> to prevent tons of printing at ELK (issue (#13)[https://github.com/KRR-Oxford/DeepOnto/issues/13]).</li> </ul>"},{"location":"changelog/#v087-2023-september","title":"v0.8.7 (2023 September)","text":""},{"location":"changelog/#added_2","title":"Added","text":"<ul> <li> Add the OAEI evaluation code for the main track global matching, local ranking, and the special sub-track bio-llm at <code>deeponto.align.oaei</code>.</li> <li> Add <code>reasoner_type</code> argument at <code>deeponto.onto.OntologyReasoner</code>, now supporting <code>hermit</code> (default) and <code>elk</code>.</li> <li> Add <code>get_all_axioms()</code> method at <code>deeponto.onto.Ontology</code>.</li> <li> <p> Add <code>get_iri()</code> method at <code>deeponto.onto.Ontology</code>.</p> </li> <li> <p> Add new features into <code>deeponto.onto.OntologyVerbaliser</code> including:</p> </li> <li> <p><code>verbalise_object_property_subsumption()</code> for object property subsumption axioms.</p> </li> <li>property chain verbalisation at <code>verbalise_class_expression()</code>.</li> <li><code>verbalise_class_subsumption()</code> for class subsumption axioms;</li> <li><code>verbalise_class_equivalence()</code> for class equivalence axioms;</li> <li><code>verbalise_class_assertion()</code> for class assertion axioms;</li> <li><code>verbalise_relation_assertion()</code> for relation assertion axioms;</li> <li><code>auto-correction</code> option for fixing entity names.</li> <li><code>keep_iri</code> option for keeping entity IRIs.</li> <li> <p><code>add_quantifier_word</code> option for adding quantifier words as in the Manchester syntax.</p> </li> <li> <p> Add <code>get_assertion_axioms()</code> method at <code>deeponto.onto.Ontology</code>.</p> </li> <li> Add <code>get_axiom_type()</code> method at <code>deeponto.onto.Ontology</code>.</li> <li> Add <code>owl_individuals</code> attribute at <code>deeponto.onto.Ontology</code>.</li> </ul>"},{"location":"changelog/#changed_2","title":"Changed","text":"<ul> <li> Change <code>get_owl_objects()</code> method to be anonymous as it is only used for creating pre-processed entity index at <code>deeponto.onto.Ontology</code>.</li> <li> Change <code>get_owl_object_from_iri()</code> method to <code>get_owl_object()</code> at <code>deeponto.onto.Ontology</code>.</li> <li> Change the log level of the ELK reasoner to <code>ERROR</code>.</li> </ul>"},{"location":"changelog/#fixed","title":"Fixed","text":"<ul> <li> Fix the file path problem of loading ontologies for Windows systems.</li> <li> Fix the version of ELK to the latest by manually adding in the dependencies. See download link at https://github.com/liveontologies/elk-reasoner/wiki/GettingElk.</li> </ul>"},{"location":"changelog/#v085-2023-september","title":"v0.8.5 (2023 September)","text":""},{"location":"changelog/#added_3","title":"Added","text":"<ul> <li> Add <code>set_seed()</code> method at <code>deeponto.utils</code>.</li> </ul>"},{"location":"changelog/#changed_3","title":"Changed","text":"<ul> <li> Change the layout of all utility methods by making them stand-alone instead of static methods.</li> <li> Change the <code>.verbalise_class_expression()</code> method by adding an option to keep entity IRIs without verbalising them using <code>.vocabs</code> at <code>deeponto.onto.OntologyVerbaliser</code>.</li> <li> Change default <code>apply_lowercasing</code> value to <code>False</code> for both <code>.get_annotations()</code> and <code>.build_annotation_index()</code> methods at <code>deeponto.onto.Ontology</code>.</li> <li> Change the method <code>.get_owl_object_annotations()</code> to <code>.get_annotations()</code> at <code>deeponto.onto.Ontology</code>.</li> <li> Change the LogMap debugger memory options for BERTMap's mapping repair.</li> <li> Change the default jar command timeout to 1 hour.</li> </ul>"},{"location":"changelog/#fixed_1","title":"Fixed","text":"<ul> <li> Fix duplicate logging in running BERTMap due to progapagation.</li> </ul>"},{"location":"changelog/#v084-2023-july","title":"v0.8.4 (2023 July)","text":""},{"location":"changelog/#added_4","title":"Added","text":"<ul> <li> Add specific check of <code>use_in_alignment</code> annotation in BERTMap for the OAEI.</li> <li> Add OAEI utilities at <code>deeponto.align.oaei</code>.</li> </ul>"},{"location":"changelog/#changed_4","title":"Changed","text":"<ul> <li> Change the <code>read_table_mappings</code> method to allow <code>None</code> for threshold.</li> </ul>"},{"location":"changelog/#fixed_2","title":"Fixed","text":"<ul> <li> Fix BERTMap error and add corresponding warning when an input ontology has no sibling class group, related to Issue #10.</li> <li> Fix BERTMap error and add corresponding warning when an input ontology has some class with no label (annotation), related to Issue #10.</li> </ul>"},{"location":"changelog/#v083-2023-july","title":"v0.8.3 (2023 July)","text":""},{"location":"changelog/#changed_5","title":"Changed","text":"<ul> <li> Change the mapping extension from using reasoner to direct assertions.</li> <li> Change the name of pruning function in <code>deeponto.onto.OntologyPruner</code>.</li> <li> Change the verbalisation function by setting quantifier words as optional (by default not adding).</li> <li> Change sibing retrieval from using reasoner to direct assertions.</li> </ul>"},{"location":"changelog/#fixed_3","title":"Fixed","text":"<ul> <li> Fix the minor bug for the <code>f1</code> and <code>MRR</code> method in <code>deeponto.align.evaluation.AlignmentEvaluator</code>.</li> </ul>"},{"location":"changelog/#v080-2023-june","title":"v0.8.0 (2023 June)","text":""},{"location":"changelog/#added_5","title":"Added","text":"<ul> <li> Add the ontology normaliser at <code>deeponto.onto.OntologyNormaliser</code>.</li> <li> Add the ontology projector at <code>deeponto.onto.OntologyProjector</code>.</li> </ul>"},{"location":"changelog/#changed_6","title":"Changed","text":"<ul> <li> Change the dependency <code>transformers</code> to <code>transformers[torch]</code>.</li> </ul>"},{"location":"changelog/#v075-2023-june","title":"v0.7.5 (2023 June)","text":""},{"location":"changelog/#changed_7","title":"Changed","text":"<ul> <li> Change Java dependencies from using <code>lib</code> from mowl to direct import.</li> <li> Change <code>get_owl_object_annotations</code> by adding <code>uniqify</code> at the end to preserve the order.</li> </ul>"},{"location":"changelog/#fixed_4","title":"Fixed","text":"<ul> <li> Fix BERTMap's non-synonym sampling when the class labels are not available using the try-catch block.</li> </ul>"},{"location":"changelog/#v070-2023-april","title":"v0.7.0 (2023 April)","text":""},{"location":"changelog/#added_6","title":"Added","text":"<ul> <li> Add the BERTSubs module at <code>deeponto.subs.bertsubs</code>; its inter-ontology setting is also imported at <code>deeponto.align.bertsubs</code>.</li> </ul>"},{"location":"changelog/#changed_8","title":"Changed","text":"<ul> <li> Move the pruning functionality into <code>deeponto.onto.OntologyPruner</code> as a separate module.</li> <li> Amend JVM checking before displaying the JVM memory prompt from importing <code>deeponto.onto.Ontology</code>; if started already, skip this step.</li> <li> Change the function <code>get_owl_object_annotations</code> at <code>deeponto.onto.Ontology</code> by preserving the relative order of annotation retrieval, i.e., create <code>set</code> first and use the <code>.add()</code> function instead of casting the <code>list</code> into <code>set</code> in the end.</li> </ul>"},{"location":"changelog/#fixed_5","title":"Fixed","text":"<ul> <li> Fix the function <code>check_deprecated</code> at <code>deeponto.onto.Ontology</code> by adding a check for the \\(\\texttt{owl:deprecated}\\) annotation property -- if this property does not exist in the current ontology, return <code>False</code> (not deprecated).</li> </ul>"},{"location":"changelog/#v061-2023-april","title":"v0.6.1 (2023 April)","text":""},{"location":"changelog/#added_7","title":"Added","text":"<ul> <li> Add the method <code>remove_axiom</code> for removing an axiom from the ontology at <code>deeponto.onto.Ontology</code> (note that the counterpart <code>add_axiom</code> has already been available).</li> <li> Add the method <code>check_named_entity</code> for checking if an entity is named at <code>deeponto.onto.Ontology</code>.</li> <li> Add the method <code>get_subsumption_axioms</code> for getting subsumption axioms subject to different entity types at <code>deeponto.onto.Ontology</code>.</li> <li> Add the method <code>get_asserted_complex_classes</code> for getting all complex classes that occur in ontology (subsumption and/or equivalence) axioms at <code>deeponto.onto.Ontology</code>.</li> <li> Add the methods <code>get_asserted_parents</code> and <code>get_asserted_children</code> for getting asserted parent and children for a given entity at <code>deeponto.onto.Ontology</code>.</li> <li> Add the method <code>check_deprecation</code> for checking an owl object's deprecation (annotated) at <code>deeponto.onto.Ontology</code>.</li> </ul>"},{"location":"changelog/#changed_9","title":"Changed","text":"<ul> <li> Move the spacy <code>en_core_web_sm</code> download into the initialisation of <code>OntologyVerbaliser</code>.</li> <li> Change the method of getting equivalence axioms by adding support to different entity types at <code>deeponto.onto.Ontology</code>.</li> <li> Rename the methods of getting inferred super-entities and sub-entities at <code>deeponto.onto.OntologyReasoner</code>:</li> <li><code>super_entities_of</code> \\(\\rightarrow\\) <code>get_inferred_super_entities</code></li> <li><code>sub_entities_of</code> \\(\\rightarrow\\) <code>get_inferred_sub_entities</code></li> </ul>"},{"location":"changelog/#fixed_6","title":"Fixed","text":"<ul> <li> Fix the top and bottom data property iris (from \"https:\" to \"http:\") at <code>deeponto.onto.Ontology</code>.</li> </ul>"},{"location":"changelog/#v060-2023-mar","title":"v0.6.0 (2023 Mar)","text":"<ul> <li> Add the OntoLAMA module at <code>deeponto.lama</code>.</li> <li> Add the verb auto-correction and more precise documentation for <code>deeponto.onto.verbalisation</code>.</li> </ul>"},{"location":"changelog/#v05x-2023-jan-feb","title":"v0.5.x (2023 Jan - Feb)","text":"<ul> <li> Add the preliminary ontology verbalisation module at <code>deeponto.onto.verbalisation</code>.</li> <li> Fix PyPI issues based on the new code layout.</li> <li> Change code layout to the <code>src/</code> layout.</li> <li> Rebuild the whole package based on the OWLAPI.</li> <li> Remove owlready2 from the essential dependencies.</li> </ul>"},{"location":"changelog/#deprecated-before-2023","title":"Deprecated (before 2023)","text":"<p>The code before v0.5.0 is no longer available.</p>"},{"location":"faqs/","title":"FAQs","text":"<ul> <li> <p>Q1: System compatibility?</p> <ul> <li>Ans: Reported successfull installation on different platforms include:<ul> <li>Windows 11: Python 3.8 with a virtual environment.</li> <li>Ubuntu 22: Python 3.8, 3.9 and 3.10 with a virtual environment.</li> </ul> </li> </ul> </li> <li> <p>Q2: Encountering issues with the JPype installation?</p> <ul> <li>Ans: JPype seems to be not compatible with the most recent version of Python; check valid Python versions across platforms at Q1.</li> </ul> </li> <li> <p>Q3: Missing system-level dependencies on Linux?</p> <ul> <li>Ans: Please ensure that the essential dev tools package has been deployed if you are using a Linux system. Also, according to JPype's documentation, <code>g++</code> and <code>python-dev</code> need to be installed.</li> </ul> </li> </ul>"},{"location":"ontolama/","title":"OntoLAMA: Dataset Overview and Usage Guide","text":"<p>paper</p> <p>Paper for OntoLAMA: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).</p> <pre><code>@inproceedings{he-etal-2023-language,\n    title = \"Language Model Analysis for Ontology Subsumption Inference\",\n    author = \"He, Yuan  and\n    Chen, Jiaoyan  and\n    Jimenez-Ruiz, Ernesto  and\n    Dong, Hang  and\n    Horrocks, Ian\",\n    booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n    month = jul,\n    year = \"2023\",\n    address = \"Toronto, Canada\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.findings-acl.213\",\n    doi = \"10.18653/v1/2023.findings-acl.213\",\n    pages = \"3439--3453\"\n}\n</code></pre> <p>This page provides an overview of the \\(\\textsf{OntoLAMA}\\) datasets, how to use them, and the related probing approach introduced in the research paper.</p>"},{"location":"ontolama/#overview","title":"Overview","text":"<p>\\(\\textsf{OntoLAMA}\\) is a set of language model (LM) probing datasets and a prompt-based probing method for ontology subsumption inference or ontology completion. The work follows the \"LMs-as-KBs\" literature but focuses on conceptualised knowledge extracted from formalised KBs such as the OWL ontologies. Specifically, the subsumption inference (SI) task is introduced and formulated in the Natural Language Inference (NLI) style, where the sub-concept and the super-concept involved in a subsumption axiom are verbalised and fitted into a template to form the premise and hypothesis, respectively. The sampled axioms are verified through ontology reasoning. The SI task is further divided into Atomic SI and Complex SI where the former involves only atomic named concepts and the latter involves both atomic and complex concepts. Real-world ontologies of different scales and domains are used for constructing OntoLAMA and in total there are four Atomic SI datasets and two Complex SI datasets.</p>"},{"location":"ontolama/#useful-links","title":"Useful Links","text":"<ul> <li>Datasets available at Zenodo: https://doi.org/10.5281/zenodo.6480540 (CC BY 4.0 International).</li> <li>Also available at Huggingface: https://huggingface.co/datasets/krr-oxford/OntoLAMA.</li> <li>The source code for dataset construction and LM probing is available at: https://krr-oxford.github.io/DeepOnto/deeponto/probe/ontolama/.</li> </ul>"},{"location":"ontolama/#statistics","title":"Statistics","text":"<p> Source #NamedConcepts #EquivAxioms #Dataset (Train/Dev/Test) Schema.org 894 - Atomic SI: 808/404/2,830 DOID 11,157 - Atomic SI: 90,500/11,312/11,314 FoodOn 30,995 2,383 Atomic SI: 768,486/96,060/96,062  Complex SI: 3,754/1,850/13,080 GO 43,303 11,456 Atomic SI: 772,870/96,608/96,610  Complex SI: 72,318/9,040/9,040 MNLI - - biMNLI: 235,622/26,180/12,906 <p></p>"},{"location":"ontolama/#usage","title":"Usage","text":"<p>Users have two options for accessing the OntoLAMA datasets. They can either download the datasets directly from Zenodo or use the Huggingface Datasets platform. </p> <p>If using Huggingface, users should first install the <code>dataset</code> package:</p> <pre><code>pip install datasets\n</code></pre> <p>Then, a dataset can be accessed by:</p> <pre><code>from datasets import load_dataset\n# dataset = load_dataset(\"krr-oxford/OntoLAMA\", dataset_name)\n# for example, loading the Complex SI dataset of Go\ndataset = load_dataset(\"krr-oxford/OntoLAMA\", \"go-complex-SI\") \n</code></pre> <p>Options of <code>dataset_name</code> include:</p> <ul> <li><code>\"bimnli\"</code> (from MNLI)</li> <li><code>\"schemaorg-atomic-SI\"</code> (from Schema.org)</li> <li><code>\"doid-atomic-SI\"</code> (from DOID)</li> <li><code>\"foodon-atomic-SI\"</code>, <code>\"foodon-complex-SI\"</code> (from FoodOn)</li> <li><code>\"go-atomic-SI\"</code>, <code>\"go-complex-SI\"</code> (from Go)</li> </ul> <p>After loading the dataset, a particular data split can be accessed by:</p> <pre><code>dataset[split_name]  # split_name = \"train\", \"validation\", or \"test\"\n</code></pre> <p>Please refer to the Huggingface page for examples of data points and explanations of data fields.</p> <p>If downloading from Zenodo, users can simply target on specific <code>.jsonl</code> files.</p>"},{"location":"ontolama/#prompt-based-probing","title":"Prompt-based Probing","text":"<p>\\(\\textsf{OntoLAMA}\\) adopts the prompt-based probing approach to examine an LM's knowledge. Specifically, it wraps the verbalised sub-concept and super-concept into a template with a masked position; the LM is expected to predict the masked token and determine whether there exists a subsumption relationship between the two concepts.</p> <p>The verbalisation algorithm has been implemented as a separate ontology processing module, see  verbalise ontology concepts.</p> <p>To conduct probing, users can write the following code into a script, e.g., <code>probing.py</code>:</p> <pre><code>from openprompt.config import get_config\nfrom deeponto.complete.ontolama import run_inference\n\nconfig, args = get_config()\n# you can then manipulate the configuration before running the inference\nconfig.learning_setting = \"few_shot\"  # zero_shot, full\nconfig.manual_template.choice = 0  # using the first template in the template file\n...\n\n# run the subsumption inference\nrun_inference(config, args)\n</code></pre> <p>Then, run the script with the following command:</p> <pre><code>python probing.py --config_yaml config.yaml\n</code></pre> <p>See an example of <code>config.yaml</code> at <code>DeepOnto/scripts/ontolama/config.yaml</code></p> <p>The template file for the SI task (two templates) is located in <code>DeepOnto/scripts/ontolama/si_templates.txt</code>.</p> <p>The template file for the biMNLI task (two templates) is located in <code>DeepOnto/scripts/ontolama/nli_templates.txt</code>.</p> <p>The label word file for both SI and biMNLI tasks is located in <code>DeepOnto/scripts/ontolama/label_words.jsonl</code>.</p>"},{"location":"ontology/","title":"Basic Usage of Ontology","text":"<p>\\(\\textsf{DeepOnto}\\) extends from the OWLAPI and implements many useful methods for ontology processing and reasoning, integrated in the base class <code>Ontology</code>.</p> <p>This page gives typical examples of how to use <code>Ontology</code>. There are other more specific usages, please refer to the documentation by clicking <code>Ontology</code>.</p>"},{"location":"ontology/#loading-ontology","title":"Loading Ontology","text":"<p><code>Ontology</code> can be easily loaded from a local ontology file by its path:</p> <pre><code>from deeponto.onto import Ontology\n</code></pre> <p>Importing <code>Ontology</code> will require JVM memory allocation (defaults to <code>8g</code>; if <code>nohup</code> is used to run the program in the backend, use <code>nohup echo \"8g\" | python command</code>):</p> <pre><code>Please enter the maximum memory located to JVM: [8g]: 16g\n\n16g maximum memory allocated to JVM.\nJVM started successfully.\n</code></pre> <p>Loading an ontology from a local file:</p> <pre><code>onto = Ontology(\"path_to_ontology.owl\")\n</code></pre> <p>It also possible to choose a reasoner to be used:</p> <pre><code>onto = Ontology(\"path_to_ontology.owl\", \"hermit\")\n</code></pre> <p>Tip<p>For faster (but incomplete) reasoning over larger ontologies, choose a reasoner like <code>\"elk\"</code>.</p> </p>"},{"location":"ontology/#acessing-ontology-entities","title":"Acessing Ontology Entities","text":"<p>The most fundamental feature of <code>Ontology</code> is to access entities in the ontology such as classes (or concepts) and properties (object, data, and annotation properties). To get an entity by its IRI, do the following:</p> <pre><code>from deeponto.onto import Ontology\n# e.g., load the disease ontology\ndoid = Ontology(\"doid.owl\")\n# class or property IRI as input\ndoid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\")\n</code></pre> <p>To get the asserted parents or children of a given class or property, do the following:</p> <pre><code>doid.get_asserted_parents(doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"))\ndoid.get_asserted_children(doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"))\n</code></pre> <p>To obtain the literal values (as <code>Set[str]</code>) of an annotation property (such as \\(\\texttt{rdfs:label}\\)) for an entity:</p> <pre><code># note that annotations with no language tags are deemed as in English (\"en\")\ndoid.get_annotations(\n    doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\"),\n    annotation_property_iri='http://www.w3.org/2000/01/rdf-schema#label',\n    annotation_language_tag=None,\n    apply_lowercasing=False,\n    normalise_identifiers=False\n)\n</code></pre> <code>Output:</code> <pre><code>{'carotenemia'}\n</code></pre> <p>To get the special entities related to top (\\(\\top\\)) and bottom (\\(\\bot\\)), for example, to get \\(\\texttt{owl:Thing}\\):</p> <pre><code>doid.OWLThing\n</code></pre>"},{"location":"ontology/#ontology-reasoning","title":"Ontology Reasoning","text":"<p><code>Ontology</code> has an important attribute <code>.reasoner</code> for conducting reasoning activities. Currently, two types of reasoners are supported, i.e., HermitT and ELK.</p>"},{"location":"ontology/#inferring-super-and-sub-entities","title":"Inferring Super- and Sub-Entities","text":"<p>To get the super-entities (a super-class, or a super-propety) of an entity, do the following:</p> <pre><code>doid_class = doid.get_owl_object(\"http://purl.obolibrary.org/obo/DOID_9969\")\ndoid.reasoner.get_inferred_super_entities(doid_class, direct=False) \n</code></pre> <code>Output:</code> <pre><code>['http://purl.obolibrary.org/obo/DOID_0014667',\n'http://purl.obolibrary.org/obo/DOID_0060158',\n'http://purl.obolibrary.org/obo/DOID_4']\n</code></pre> <p>The outputs are IRIs of the corresponding super-entities. <code>direct</code> is a boolean value indicating whether the returned entities are parents (<code>direct=True</code>) or ancestors (<code>direct=False</code>).</p> <p>To get the sub-entities, simply replace the method name with <code>sub_entities_of</code>.</p>"},{"location":"ontology/#inferring-class-instances","title":"Inferring Class Instances","text":"<p>To retrieve the entailed instances of a class:</p> <pre><code>doid.reasoner.instances_of(doid_class)\n</code></pre>"},{"location":"ontology/#checking-entailment","title":"Checking Entailment","text":"<p>The implemented reasoner also supports several entailment checks for subsumption, disjointness, and so on. For example:</p> <pre><code>doid.reasoner.check_subsumption(doid_potential_sub_entity, doid_potential_super_entity)\n</code></pre>"},{"location":"ontology/#feature-requests","title":"Feature Requests","text":"<p>Should you have any feature requests (such as those commonly used in the OWLAPI), please raise a ticket in the \\(\\textsf{DeepOnto}\\) GitHub repository.</p>"},{"location":"verbaliser/","title":"Verbalise Ontology Concepts","text":"<p>Verbalising concept expressions is very useful for models that take textual inputs. While the named concepts can be verbalised simply using their names (or labels), complex concepts that involve logical operators require a more sophisticated algorithm. In \\(\\textsf{DeepOnto}\\), we have implemented the recursive concept verbaliser originally proposed in the OntoLAMA paper to address the need.</p> <p>Paper</p> <p>The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).</p> <pre><code>@inproceedings{he-etal-2023-language,\n    title = \"Language Model Analysis for Ontology Subsumption Inference\",\n    author = \"He, Yuan  and\n    Chen, Jiaoyan  and\n    Jimenez-Ruiz, Ernesto  and\n    Dong, Hang  and\n    Horrocks, Ian\",\n    booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n    month = jul,\n    year = \"2023\",\n    address = \"Toronto, Canada\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.findings-acl.213\",\n    doi = \"10.18653/v1/2023.findings-acl.213\",\n    pages = \"3439--3453\"\n}\n</code></pre> <p>This rule-based verbaliser (found in <code>OntologyVerbaliser</code>) first parses a complex concept expression into a sub-formula tree (with <code>OntologySyntaxParser</code>). Each intermediate node within the tree represents the decomposition of a specific logical operator, while the leaf nodes are named concepts or properties.  The verbaliser then recursively merges the verbalisations in a bottom-to-top manner, creating the overall textual representation of the complex concept. An example of this process is shown in the following figure:</p> <p></p> <p> <p>Figure 1. Verbalising a complex concept recursively. </p> </p> <p></p> <p>To use the verbaliser, do the following:</p> <pre><code>from deeponto.onto import Ontology, OntologyVerbaliser\n\n# load an ontology and init the verbaliser\nonto = Ontology(\"some_ontology_file.owl\")\nverbaliser = OntologyVerbaliser(onto)\n</code></pre> <p>To verbalise a complex concept expression:</p> <pre><code># get complex concepts asserted in the ontology\ncomplex_concepts = list(onto.get_asserted_complex_classes())\n\n# verbalise the first complex concept\nv_concept = verbaliser.verbalise_class_expression(complex_concepts[0])\n</code></pre> <p>To verbaliser a class subsumption axiom:</p> <pre><code># get subsumption axioms from the ontology\nsubsumption_axioms = onto.get_subsumption_axioms(entity_type=\"Classes\")\n\n# verbalise the first subsumption axiom\nv_sub, v_super = verbaliser.verbalise_class_subsumption_axiom(subsumption_axioms[0])\n</code></pre> <p>Tip<p>The concept verbaliser is under development to incorporate the parsing of various axiom types. Please check the existing functions of <code>OntologyVerbaliser</code> for specific usage.</p> </p> <p>Notice that the verbalised result is a <code>CfgNode</code> object which keeps track of the recursive process. Users can access the final verbalisation by:</p> <pre><code>result.verbal\n</code></pre> <p>Users can also manually update the vocabulary for named entities by:</p> <pre><code>verbaliser.update_entity_name(entity_iri, entity_name)\n</code></pre> <p>This is useful when the entity labels are not naturally fitted into the verbalised sentence.</p> <p>Moreover, users can see the parsed sub-formula tree using:</p> <pre><code>tree = verbaliser.parser.parse(str(subsumption_axioms[0]))\ntree.render_image()\n</code></pre> <p>Note that rendering the image requires <code>graphiviz</code> to be installed. Check this link for installing <code>graphiviz</code>.</p> <p>See an example with image at <code>OntologySyntaxParser</code>.</p>"},{"location":"deeponto/align/evaluation/","title":"Evaluation","text":""},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator","title":"<code>AlignmentEvaluator()</code>","text":"<p>Class that provides evaluation metrics for alignment.</p> Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>def __init__(self):\n    pass\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.precision","title":"<code>precision(prediction_mappings, reference_mappings)</code>  <code>staticmethod</code>","text":"<p>The percentage of correct predictions.</p> \\[P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}\\] Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef precision(prediction_mappings: List[EntityMapping], reference_mappings: List[ReferenceMapping]) -&gt; float:\nr\"\"\"The percentage of correct predictions.\n\n    $$P = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{pred}|}$$\n    \"\"\"\n    preds = [p.to_tuple() for p in prediction_mappings]\n    refs = [r.to_tuple() for r in reference_mappings]\n    return len(set(preds).intersection(set(refs))) / len(set(preds))\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.recall","title":"<code>recall(prediction_mappings, reference_mappings)</code>  <code>staticmethod</code>","text":"<p>The percentage of correct retrievals.</p> \\[R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}\\] Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef recall(prediction_mappings: List[EntityMapping], reference_mappings: List[ReferenceMapping]) -&gt; float:\nr\"\"\"The percentage of correct retrievals.\n\n    $$R = \\frac{|\\mathcal{M}_{pred} \\cap \\mathcal{M}_{ref}|}{|\\mathcal{M}_{ref}|}$$\n    \"\"\"\n    preds = [p.to_tuple() for p in prediction_mappings]\n    refs = [r.to_tuple() for r in reference_mappings]\n    return len(set(preds).intersection(set(refs))) / len(set(refs))\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.f1","title":"<code>f1(prediction_mappings, reference_mappings, null_reference_mappings=[])</code>  <code>staticmethod</code>","text":"<p>Compute the F1 score given the prediction and reference mappings.</p> \\[F_1 = \\frac{2 P R}{P + R}\\] <p><code>null_reference_mappings</code> is an additional set whose elements should be ignored in the calculation, i.e., neither positive nor negative. Specifically, both \\(\\mathcal{M}_{pred}\\) and \\(\\mathcal{M}_{ref}\\) will substract \\(\\mathcal{M}_{null}\\) from them.</p> Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef f1(\n    prediction_mappings: List[EntityMapping],\n    reference_mappings: List[ReferenceMapping],\n    null_reference_mappings: List[ReferenceMapping] = [],\n):\nr\"\"\"Compute the F1 score given the prediction and reference mappings.\n\n    $$F_1 = \\frac{2 P R}{P + R}$$\n\n    `null_reference_mappings` is an additional set whose elements\n    should be **ignored** in the calculation, i.e., **neither positive nor negative**.\n    Specifically, both $\\mathcal{M}_{pred}$ and $\\mathcal{M}_{ref}$ will **substract**\n    $\\mathcal{M}_{null}$ from them.\n    \"\"\"\n    preds = [p.to_tuple() for p in prediction_mappings]\n    refs = [r.to_tuple() for r in reference_mappings]\n    null_refs = [n.to_tuple() for n in null_reference_mappings]\n    # elements in the {null_set} are removed from both {pred} and {ref} (ignored)\n    if null_refs:\n        preds = set(preds) - set(null_refs)\n        refs = set(refs) - set(null_refs)\n    P = len(set(preds).intersection(set(refs))) / len(set(preds))\n    R = len(set(preds).intersection(set(refs))) / len(set(refs))\n    F1 = 2 * P * R / (P + R)\n\n    return {\"P\": round(P, 3), \"R\": round(R, 3), \"F1\": round(F1, 3)}\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.hits_at_K","title":"<code>hits_at_K(reference_and_candidates, K)</code>  <code>staticmethod</code>","text":"<p>Compute \\(Hits@K\\) for a list of <code>(reference_mapping, candidate_mappings)</code> pair.</p> <p>It is computed as the number of a <code>reference_mapping</code> existed in the first \\(K\\) ranked <code>candidate_mappings</code>, divided by the total number of input pairs.</p> \\[Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N\\] Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef hits_at_K(reference_and_candidates: List[Tuple[ReferenceMapping, List[EntityMapping]]], K: int):\nr\"\"\"Compute $Hits@K$ for a list of `(reference_mapping, candidate_mappings)` pair.\n\n    It is computed as the number of a `reference_mapping` existed in the first $K$ ranked `candidate_mappings`,\n    divided by the total number of input pairs.\n\n    $$Hits@K = \\sum_i^N \\mathbb{I}_{rank_i \\leq k} / N$$\n    \"\"\"\n    n_hits = 0\n    for pred, cands in reference_and_candidates:\n        ordered_candidates = [c.to_tuple() for c in EntityMapping.sort_entity_mappings_by_score(cands, k=K)]\n        if pred.to_tuple() in ordered_candidates:\n            n_hits += 1\n    return n_hits / len(reference_and_candidates)\n</code></pre>"},{"location":"deeponto/align/evaluation/#deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank","title":"<code>mean_reciprocal_rank(reference_and_candidates)</code>  <code>staticmethod</code>","text":"<p>Compute \\(MRR\\) for a list of <code>(reference_mapping, candidate_mappings)</code> pair.</p> \\[MRR = \\sum_i^N rank_i^{-1} / N\\] Source code in <code>src/deeponto/align/evaluation.py</code> <pre><code>@staticmethod\ndef mean_reciprocal_rank(reference_and_candidates: List[Tuple[ReferenceMapping, List[EntityMapping]]]):\nr\"\"\"Compute $MRR$ for a list of `(reference_mapping, candidate_mappings)` pair.\n\n    $$MRR = \\sum_i^N rank_i^{-1} / N$$\n    \"\"\"\n    sum_inverted_ranks = 0\n    for pred, cands in reference_and_candidates:\n        ordered_candidates = [c.to_tuple() for c in EntityMapping.sort_entity_mappings_by_score(cands)]\n        if pred.to_tuple() in ordered_candidates:\n            rank = ordered_candidates.index(pred.to_tuple()) + 1\n        else:\n            rank = math.inf\n        sum_inverted_ranks += 1 / rank\n    return sum_inverted_ranks / len(reference_and_candidates)\n</code></pre>"},{"location":"deeponto/align/mapping/","title":"Mapping","text":""},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping","title":"<code>EntityMapping(src_entity_iri, tgt_entity_iri, relation=DEFAULT_REL, score=0.0)</code>","text":"<p>A datastructure for entity mapping.</p> <p>Such entities should be named and have an IRI.</p> <p>Attributes:</p> Name Type Description <code>src_entity_iri</code> <code>str</code> <p>The IRI of the source entity, usually its IRI if available.</p> <code>tgt_entity_iri</code> <code>str</code> <p>The IRI of the target entity, usually its IRI if available.</p> <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>score</code> <code>float</code> <p>The score that indicates the confidence of this mapping. Defaults to <code>0.0</code>.</p> <p>Parameters:</p> Name Type Description Default <code>src_entity_iri</code> <code>str</code> <p>The IRI of the source entity, usually its IRI if available.</p> required <code>tgt_entity_iri</code> <code>str</code> <p>The IRI of the target entity, usually its IRI if available.</p> required <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <code>score</code> <code>float</code> <p>The score that indicates the confidence of this mapping. Defaults to <code>0.0</code>.</p> <code>0.0</code> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def __init__(self, src_entity_iri: str, tgt_entity_iri: str, relation: str = DEFAULT_REL, score: float = 0.0):\n\"\"\"Intialise an entity mapping.\n\n    Args:\n        src_entity_iri (str): The IRI of the source entity, usually its IRI if available.\n        tgt_entity_iri (str): The IRI of the target entity, usually its IRI if available.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n        score (float, optional): The score that indicates the confidence of this mapping. Defaults to `0.0`.\n    \"\"\"\n    self.head = src_entity_iri\n    self.tail = tgt_entity_iri\n    self.relation = relation\n    self.score = score\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.from_owl_objects","title":"<code>from_owl_objects(src_entity, tgt_entity, relation=DEFAULT_REL, score=0.0)</code>  <code>classmethod</code>","text":"<p>Create an entity mapping from two <code>OWLObject</code> entities which have an IRI.</p> <p>Parameters:</p> Name Type Description Default <code>src_entity</code> <code>OWLObject</code> <p>The source entity in <code>OWLObject</code>.</p> required <code>tgt_entity</code> <code>OWLObject</code> <p>The target entity in <code>OWLObject</code>.</p> required <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <code>score</code> <code>float</code> <p>The score that indicates the confidence of this mapping. Defaults to <code>0.0</code>.</p> <code>0.0</code> <p>Returns:</p> Type Description <code>EntityMapping</code> <p>The entity mapping created from the source and target entities.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@classmethod\ndef from_owl_objects(\n    cls, src_entity: OWLObject, tgt_entity: OWLObject, relation: str = DEFAULT_REL, score: float = 0.0\n):\n\"\"\"Create an entity mapping from two `OWLObject` entities which have an IRI.\n\n    Args:\n        src_entity (OWLObject): The source entity in `OWLObject`.\n        tgt_entity (OWLObject): The target entity in `OWLObject`.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n        score (float, optional): The score that indicates the confidence of this mapping. Defaults to `0.0`.\n    Returns:\n        (EntityMapping): The entity mapping created from the source and target entities.\n    \"\"\"\n    return cls(str(src_entity.getIRI()), str(tgt_entity.getIRI()), relation, score)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.to_tuple","title":"<code>to_tuple(with_score=False)</code>","text":"<p>Transform an entity mapping (<code>self</code>) to a tuple representation</p> <p>Note that <code>relation</code> is discarded and <code>score</code> is optionally preserved).</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def to_tuple(self, with_score: bool = False):\n\"\"\"Transform an entity mapping (`self`) to a tuple representation\n\n    Note that `relation` is discarded and `score` is optionally preserved).\n    \"\"\"\n    if with_score:\n        return (self.head, self.tail, self.score)\n    else:\n        return (self.head, self.tail)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.as_tuples","title":"<code>as_tuples(entity_mappings, with_score=False)</code>  <code>staticmethod</code>","text":"<p>Transform a list of entity mappings to their tuple representations.</p> <p>Note that <code>relation</code> is discarded and <code>score</code> is optionally preserved).</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@staticmethod\ndef as_tuples(entity_mappings: List[EntityMapping], with_score: bool = False):\n\"\"\"Transform a list of entity mappings to their tuple representations.\n\n    Note that `relation` is discarded and `score` is optionally preserved).\n    \"\"\"\n    return [m.to_tuple(with_score=with_score) for m in entity_mappings]\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.sort_entity_mappings_by_score","title":"<code>sort_entity_mappings_by_score(entity_mappings, k=None)</code>  <code>staticmethod</code>","text":"<p>Sort the entity mappings in a list by their scores in descending order.</p> <p>Parameters:</p> Name Type Description Default <code>entity_mappings</code> <code>List[EntityMapping]</code> <p>A list entity mappings to sort.</p> required <code>k</code> <code>int</code> <p>The number of top \\(k\\) scored entities preserved if specified. Defaults to <code>None</code> which means to return all entity mappings.</p> <code>None</code> <p>Returns:</p> Type Description <code>List[EntityMapping]</code> <p>A list of sorted entity mappings.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@staticmethod\ndef sort_entity_mappings_by_score(entity_mappings: List[EntityMapping], k: Optional[int] = None):\nr\"\"\"Sort the entity mappings in a list by their scores in descending order.\n\n    Args:\n        entity_mappings (List[EntityMapping]): A list entity mappings to sort.\n        k (int, optional): The number of top $k$ scored entities preserved if specified. Defaults to `None` which\n            means to return **all** entity mappings.\n\n    Returns:\n        (List[EntityMapping]): A list of sorted entity mappings.\n    \"\"\"\n    return list(sorted(entity_mappings, key=lambda x: x.score, reverse=True))[:k]\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.EntityMapping.read_table_mappings","title":"<code>read_table_mappings(table_of_mappings_file, threshold=None, relation=DEFAULT_REL, is_reference=False)</code>  <code>staticmethod</code>","text":"<p>Read entity mappings from <code>.csv</code> or <code>.tsv</code> files.</p> <p>Mapping Table Format</p> <p>The columns of the mapping table must have the headings: <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code>.</p> <p>Parameters:</p> Name Type Description Default <code>table_of_mappings_file</code> <code>str</code> <p>The path to the table (<code>.csv</code> or <code>.tsv</code>) of mappings.</p> required <code>threshold</code> <code>Optional[float]</code> <p>Mappings with scores less than <code>threshold</code> will not be loaded. Defaults to 0.0.</p> <code>None</code> <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <code>is_reference</code> <code>bool</code> <p>Whether the loaded mappings are reference mappigns; if so, <code>threshold</code> is disabled and mapping scores are all set to \\(1.0\\). Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>List[EntityMapping]</code> <p>A list of entity mappings loaded from the table file.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@staticmethod\ndef read_table_mappings(\n    table_of_mappings_file: str,\n    threshold: Optional[float] = None,\n    relation: str = DEFAULT_REL,\n    is_reference: bool = False,\n) -&gt; List[EntityMapping]:\nr\"\"\"Read entity mappings from `.csv` or `.tsv` files.\n\n    !!! note \"Mapping Table Format\"\n\n        The columns of the mapping table must have the headings: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`.\n\n    Args:\n        table_of_mappings_file (str): The path to the table (`.csv` or `.tsv`) of mappings.\n        threshold (Optional[float], optional): Mappings with scores less than `threshold` will not be loaded. Defaults to 0.0.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n        is_reference (bool): Whether the loaded mappings are reference mappigns; if so, `threshold` is disabled and mapping scores\n            are all set to $1.0$. Defaults to `False`.\n\n    Returns:\n        (List[EntityMapping]): A list of entity mappings loaded from the table file.\n    \"\"\"\n    df = read_table(table_of_mappings_file)\n    entity_mappings = []\n    for dp in df.itertuples():\n        if is_reference:\n            entity_mappings.append(ReferenceMapping(dp.SrcEntity, dp.TgtEntity, relation))\n        else:\n            # allow `None` for threshold\n            if not threshold or dp[\"Score\"] &gt;= threshold:\n                entity_mappings.append(EntityMapping(dp.SrcEntity, dp.TgtEntity, relation, dp.Score))\n    return entity_mappings\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping","title":"<code>ReferenceMapping(src_entity_iri, tgt_entity_iri, relation=DEFAULT_REL, candidate_mappings=[])</code>","text":"<p>             Bases: <code>EntityMapping</code></p> <p>A datastructure for entity mapping that acts as a reference mapping.</p> <p>A reference mapppings is a ground truth entity mapping (with \\(score = 1.0\\)) and can have several entity mappings as candidates. These candidate mappings should have the same <code>head</code> (i.e., source entity) as the reference mapping.</p> <p>Attributes:</p> Name Type Description <code>src_entity_iri</code> <code>str</code> <p>The IRI of the source entity, usually its IRI if available.</p> <code>tgt_entity_iri</code> <code>str</code> <p>The IRI of the target entity, usually its IRI if available.</p> <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <p>Parameters:</p> Name Type Description Default <code>src_entity_iri</code> <code>str</code> <p>The IRI of the source entity, usually its IRI if available.</p> required <code>tgt_entity_iri</code> <code>str</code> <p>The IRI of the target entity, usually its IRI if available.</p> required <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <code>candidate_mappings</code> <code>List[EntityMapping]</code> <p>A list of entity mappings that are candidates for this reference mapping. Defaults to <code>[]</code>.</p> <code>[]</code> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def __init__(\n    self,\n    src_entity_iri: str,\n    tgt_entity_iri: str,\n    relation: str = DEFAULT_REL,\n    candidate_mappings: Optional[List[EntityMapping]] = [],\n):\nr\"\"\"Intialise a reference mapping.\n\n    Args:\n        src_entity_iri (str): The IRI of the source entity, usually its IRI if available.\n        tgt_entity_iri (str): The IRI of the target entity, usually its IRI if available.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n        candidate_mappings (List[EntityMapping], optional): A list of entity mappings that are candidates for this reference mapping. Defaults to `[]`.\n    \"\"\"\n    super().__init__(src_entity_iri, tgt_entity_iri, relation, 1.0)\n    self.candidates = []\n    for candidate in candidate_mappings:\n        self.add_candidate(candidate)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping.add_candidate","title":"<code>add_candidate(candidate_mapping)</code>","text":"<p>Add a candidate mapping whose relation and head entity are the same as the reference mapping's.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def add_candidate(self, candidate_mapping: EntityMapping):\n\"\"\"Add a candidate mapping whose relation and head entity are the\n    same as the reference mapping's.\n    \"\"\"\n    if self.relation != candidate_mapping.relation:\n        raise ValueError(\n            f\"Expect relation of candidate mapping to be {self.relation} but got {candidate_mapping.relation}\"\n        )\n    if self.head != candidate_mapping.head:\n        raise ValueError(\"Candidate mapping does not have the same head entity as the anchor mapping.\")\n    self.candidates.append(candidate_mapping)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.ReferenceMapping.read_table_mappings","title":"<code>read_table_mappings(table_of_mappings_file, relation=DEFAULT_REL)</code>  <code>staticmethod</code>","text":"<p>Read reference mappings from <code>.csv</code> or <code>.tsv</code> files.</p> <p>Mapping Table Format</p> <p>The columns of the mapping table must have the headings: <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code>.</p> <p>Parameters:</p> Name Type Description Default <code>table_of_mappings_file</code> <code>str</code> <p>The path to the table (<code>.csv</code> or <code>.tsv</code>) of mappings.</p> required <code>relation</code> <code>str</code> <p>A symbol that represents what semantic relation this mapping stands for. Defaults to <code>&lt;?rel&gt;</code> which means unspecified. Suggested inputs are <code>\"&lt;EquivalentTo&gt;\"</code> and <code>\"&lt;SubsumedBy&gt;\"</code>.</p> <code>DEFAULT_REL</code> <p>Returns:</p> Type Description <code>List[ReferenceMapping]</code> <p>A list of reference mappings loaded from the table file.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>@staticmethod\ndef read_table_mappings(table_of_mappings_file: str, relation: str = DEFAULT_REL):\nr\"\"\"Read reference mappings from `.csv` or `.tsv` files.\n\n    !!! note \"Mapping Table Format\"\n\n        The columns of the mapping table must have the headings: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`.\n\n    Args:\n        table_of_mappings_file (str): The path to the table (`.csv` or `.tsv`) of mappings.\n        relation (str, optional): A symbol that represents what semantic relation this mapping stands for. Defaults to `&lt;?rel&gt;` which means unspecified.\n            Suggested inputs are `\"&lt;EquivalentTo&gt;\"` and `\"&lt;SubsumedBy&gt;\"`.\n\n    Returns:\n        (List[ReferenceMapping]): A list of reference mappings loaded from the table file.\n    \"\"\"\n    return EntityMapping.read_table_mappings(table_of_mappings_file, relation=relation, is_reference=True)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator","title":"<code>SubsFromEquivMappingGenerator(src_onto, tgt_onto, equiv_mappings, subs_generation_ratio=None, delete_used_equiv_tgt_class=True)</code>","text":"<p>Generating subsumption mappings from gold standard equivalence mappings.</p> <p>paper</p> <p>The online subsumption mapping construction algorithm is proposed in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).</p> <p>This generator has an attribute <code>delete_used_equiv_tgt_class</code> for determining whether or not to sabotage the equivalence mappings used to create \\(\\geq 1\\) subsumption mappings. The reason is that, if the equivalence mapping is broken, then the OM tool is expected to predict subsumption mappings directly without relying on the equivalence mappings as an intermediate.</p> <p>Attributes:</p> Name Type Description <code>src_onto</code> <code>Ontology</code> <p>The source ontology.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology.</p> <code>equiv_class_pairs</code> <code>List[Tuple[str, str]]</code> <p>A list of class pairs (in IRIs) that are equivalent according to the input equivalence mappings.</p> <code>subs_generation_ratio</code> <code>int</code> <p>The maximum number of subsumption mappings generated from each equivalence mapping. Defaults to <code>None</code> which means there is no limit on the number of subsumption mappings.</p> <code>delete_used_equiv_tgt_class</code> <code>bool</code> <p>Whether to mark the target side of an equivalence mapping used for creating at least one subsumption mappings as \"deleted\". Defaults to <code>True</code>.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def __init__(\n    self,\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    equiv_mappings: List[ReferenceMapping],\n    subs_generation_ratio: Optional[int] = None,\n    delete_used_equiv_tgt_class: bool = True,\n):\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    self.equiv_class_pairs = [m.to_tuple() for m in equiv_mappings]\n    self.subs_generation_ratio = subs_generation_ratio\n    self.delete_used_equiv_tgt_class = delete_used_equiv_tgt_class\n\n    subs_from_equivs, self.used_equiv_tgt_class_iris = self.online_construction()\n    # turn into triples with scores 1.0\n    self.subs_from_equivs = [(c, p, 1.0) for c, p in subs_from_equivs]\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator.online_construction","title":"<code>online_construction()</code>","text":"<p>An online algorithm for constructing subsumption mappings from gold standard equivalence mappings.</p> <p>Let \\(t\\) denote the boolean value that indicates if the target class involved in an equivalence mapping will be deleted. If \\(t\\) is true, then for each equivalent class pair \\((c, c')\\), do the following:</p> <ol> <li>If \\(c'\\) has been inolved in a subsumption mapping, skip this pair as otherwise \\(c'\\) will need to be deleted.</li> <li>For each parent class of \\(c'\\), skip it if it has been marked deleted (i.e., involved in an equivalence mapping that has been used to create a subsumption mapping).</li> <li>If any subsumption mapping has been created from \\((c, c')\\), mark \\(c'\\) as deleted.</li> </ol> <p>Steps 1 and 2 ensure that target classes that have been involved in a subsumption mapping have no conflicts with target classes that have been used to create a subsumption mapping.</p> <p>This algorithm is online because the construction and deletion depend on the order of the input equivalent class pairs.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def online_construction(self):\nr\"\"\"An **online** algorithm for constructing subsumption mappings from gold standard equivalence mappings.\n\n    Let $t$ denote the boolean value that indicates if the target class involved in an equivalence mapping\n    will be deleted. If $t$ is true, then for each equivalent class pair $(c, c')$, do the following:\n\n    1. If $c'$ has been inolved in a subsumption mapping, skip this pair as otherwise $c'$ will need to be deleted.\n    2. For each parent class of $c'$, skip it if it has been marked deleted (i.e., involved in an equivalence mapping that has been used to create a subsumption mapping).\n    3. If any subsumption mapping has been created from $(c, c')$, mark $c'$ as deleted.\n\n    Steps 1 and 2 ensure that target classes that have been **involved in a subsumption mapping** have **no conflicts** with\n    target classes that have been **used to create a subsumption mapping**.\n\n    This algorithm is *online* because the construction and deletion depend on the order of the input equivalent class pairs.\n    \"\"\"\n    subs_class_pairs = []\n    in_subs = defaultdict(lambda: False)  # in a subsumption mapping\n    used_equivs = defaultdict(lambda: False)  # in a used equivalence mapping\n\n    for src_class_iri, tgt_class_iri in self.equiv_class_pairs:\n\n        cur_subs_pairs = []\n\n        # NOTE (1) an equiv pair is skipped if the target side is marked constructed\n        if self.delete_used_equiv_tgt_class and in_subs[tgt_class_iri]:\n            continue\n\n        # construct subsumption pairs by matching the source class and the target class's parents\n        tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n        # tgt_class_parent_iris = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class, direct=True)\n        tgt_class_parent_iris = [str(p.getIRI()) for p in self.tgt_onto.get_asserted_parents(tgt_class, named_only=True)]\n        for parent_iri in tgt_class_parent_iris:\n            # skip this parent if it is marked as \"used\"\n            if self.delete_used_equiv_tgt_class and used_equivs[parent_iri]:\n                continue\n            cur_subs_pairs.append((src_class_iri, parent_iri))\n            # if successfully created, mark this parent as \"in\"\n            if self.delete_used_equiv_tgt_class:\n                in_subs[parent_iri] = True\n\n        # mark the target class as \"used\" because it has been used for creating a subsumption mapping\n        if self.delete_used_equiv_tgt_class and cur_subs_pairs:\n            used_equivs[tgt_class_iri] = True\n\n        if self.subs_generation_ratio and len(cur_subs_pairs) &gt; self.subs_generation_ratio:\n            cur_subs_pairs = random.sample(cur_subs_pairs, self.subs_generation_ratio)\n        subs_class_pairs += cur_subs_pairs\n\n    used_equiv_tgt_class_iris = None\n    if self.delete_used_equiv_tgt_class:\n        used_equiv_tgt_class_iris = [iri for iri, used in used_equivs.items() if used is True]\n        logger.info(\n            f\"{len(used_equiv_tgt_class_iris)}/{len(self.equiv_class_pairs)} are used for creating at least one subsumption mapping.\"\n        )\n\n    subs_class_pairs = uniqify(subs_class_pairs)\n    logger.info(f\"{len(subs_class_pairs)} subsumption mappings are created in the end.\")\n\n    return subs_class_pairs, used_equiv_tgt_class_iris\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.SubsFromEquivMappingGenerator.save_subs","title":"<code>save_subs(save_path)</code>","text":"<p>Save the constructed subsumption mappings (in tuples) to a local <code>.tsv</code> file.</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def save_subs(self, save_path: str):\n\"\"\"Save the constructed subsumption mappings (in tuples) to a local `.tsv` file.\"\"\"\n    subs_df = pd.DataFrame(self.subs_from_equivs, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n    subs_df.to_csv(save_path, sep=\"\\t\", index=False)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator","title":"<code>NegativeCandidateMappingGenerator(src_onto, tgt_onto, reference_class_mappings, annotation_property_iris, tokenizer, max_hops=5, for_subsumption=False)</code>","text":"<p>Generating negative candidate mappings for each gold standard mapping.</p> <p>Note that the source side of the golden standard mapping is fixed, i.e., candidate mappings are generated according to the target side.</p> <p>paper</p> <p>The candidate mapping generation algorithm is proposed in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).</p> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def __init__(\n    self,\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    reference_class_mappings: List[ReferenceMapping],  # equivalence or subsumption\n    annotation_property_iris: List[str],  # for text-based candidates\n    tokenizer: Tokenizer,  # for text-based candidates\n    max_hops: int = 5,  # for graph-based candidates\n    for_subsumption: bool = False,  # if for subsumption, avoid adding ancestors as candidates\n):\n\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    self.reference_class_mappings = reference_class_mappings\n    self.reference_class_dict = defaultdict(list)  # to prevent wrongly adding negative candidates\n    for m in self.reference_class_mappings:\n        src_class_iri, tgt_class_iri = m.to_tuple()\n        self.reference_class_dict[src_class_iri].append(tgt_class_iri)\n\n    # for IDF sample\n    self.tgt_annotation_index, self.annotation_property_iris = self.tgt_onto.build_annotation_index(\n        annotation_property_iris, apply_lowercasing=True\n    )\n    self.tokenizer = tokenizer\n    self.tgt_inverted_annotation_index = self.tgt_onto.build_inverted_annotation_index(\n        self.tgt_annotation_index, self.tokenizer\n    )\n\n    # for neighbour sample\n    self.max_hops = max_hops\n\n    # if for subsumption, avoid adding ancestors as candidates\n    self.for_subsumption = for_subsumption\n    # if for subsumption, add (src_class, tgt_class_ancestor) into the reference mappings\n    if self.for_subsumption:\n        for m in self.reference_class_mappings:\n            src_class_iri, tgt_class_iri = m.to_tuple()\n            tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n            tgt_class_ancestors = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class)\n            for tgt_ancestor_iri in tgt_class_ancestors:\n                self.reference_class_dict[src_class_iri].append(tgt_ancestor_iri)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.mixed_sample","title":"<code>mixed_sample(reference_class_mapping, **strategy2nums)</code>","text":"<p>A mixed sampling approach that combines several sampling strategies.</p> <p>As introduced in the Bio-ML paper, this mixed approach guarantees that the number of samples for each strategy is either the maximum that can be sampled or the required number.</p> <p>Specifically, at each sampling iteration, the number of candidates is first increased by the number of  previously sampled candidates, as in the worst case, all the candidates sampled at this iteration will be duplicated with the previous. </p> <p>The random sampling is used as the amending strategy, i.e., if other sampling strategies cannot retrieve the specified number of samples, then use random sampling to amend the number.</p> <p>Parameters:</p> Name Type Description Default <code>reference_class_mapping</code> <code>ReferenceMapping</code> <p>The reference class mapping for generating the candidate mappings.</p> required <code>**strategy2nums</code> <code>int</code> <p>The keyword arguments that specify the expected number of candidates for each sampling strategy.</p> <code>{}</code> Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def mixed_sample(self, reference_class_mapping: ReferenceMapping, **strategy2nums):\n\"\"\"A mixed sampling approach that combines several sampling strategies.\n\n    As introduced in the Bio-ML paper, this mixed approach guarantees that the number of samples for each\n    strategy is either the **maximum that can be sampled** or the required number.\n\n    Specifically, at each sampling iteration, the number of candidates is **first increased by the number of \n    previously sampled candidates**, as in the worst case, all the candidates sampled at this iteration\n    will be duplicated with the previous. \n\n    The random sampling is used as the amending strategy, i.e., if other sampling strategies cannot retrieve\n    the specified number of samples, then use random sampling to amend the number.\n\n    Args:\n        reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n        **strategy2nums (int): The keyword arguments that specify the expected number of candidates for each\n            sampling strategy.\n    \"\"\"\n\n    valid_tgt_candidate_iris = []\n    sample_stats = defaultdict(lambda: 0)\n    i = 0\n    total_num_candidates = 0\n    for strategy, num_canddiates in strategy2nums.items():\n        i += 1\n        if strategy in SAMPLING_OPTIONS:\n            sampler = getattr(self, f\"{strategy}_sample\")\n            # for ith iteration, the worst case is when all n_cands are duplicated\n            # or should be excluded from other reference targets so we generate\n            # NOTE:  total_num_candidates + num_candidates + len(excluded_tgt_class_iris)\n            # candidates first and prune the rest; another edge case is when sampled\n            # candidates are not sufficient and we use random sample to meet n_cands\n            cur_valid_tgt_candidate_iris = sampler(\n                reference_class_mapping, total_num_candidates + num_canddiates\n            )\n            # remove the duplicated candidates (and excluded refs) and prune the tail\n            cur_valid_tgt_candidate_iris = list(\n                set(cur_valid_tgt_candidate_iris) - set(valid_tgt_candidate_iris)\n            )[:num_canddiates]\n            sample_stats[strategy] += len(cur_valid_tgt_candidate_iris)\n            # use random samples for complementation if not enough\n            while len(cur_valid_tgt_candidate_iris) &lt; num_canddiates:\n                amend_candidate_iris = self.random_sample(\n                    reference_class_mapping, num_canddiates - len(cur_valid_tgt_candidate_iris)\n                )\n                amend_candidate_iris = list(\n                    set(amend_candidate_iris)\n                    - set(valid_tgt_candidate_iris)\n                    - set(cur_valid_tgt_candidate_iris)\n                )\n                cur_valid_tgt_candidate_iris += amend_candidate_iris\n            assert len(cur_valid_tgt_candidate_iris) == num_canddiates\n            # record how many random samples to amend\n            if strategy != \"random\":\n                sample_stats[\"random\"] += num_canddiates - sample_stats[strategy]\n            valid_tgt_candidate_iris += cur_valid_tgt_candidate_iris\n            total_num_candidates += num_canddiates\n        else:\n            raise ValueError(f\"Invalid sampling trategy: {strategy}.\")\n    assert len(valid_tgt_candidate_iris) == total_num_candidates\n\n    # TODO: add the candidate mappings into the reference mapping \n\n    return valid_tgt_candidate_iris, sample_stats\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.random_sample","title":"<code>random_sample(reference_class_mapping, num_candidates)</code>","text":"<p>Randomly sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\).</p> <p>The sampled candidate classes will be combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).</p> <p>Parameters:</p> Name Type Description Default <code>reference_class_mapping</code> <code>ReferenceMapping</code> <p>The reference class mapping for generating the candidate mappings.</p> required <code>num_candidates</code> <code>int</code> <p>The expected number of candidate mappings to generate.</p> required Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def random_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"**Randomly** sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$.\n\n    The sampled candidate classes will be combined with the source reference class $c$ to get a set of\n    candidate mappings $\\{(c, c'_{cand})\\}$.\n\n    Args:\n        reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n        num_candidates (int): The expected number of candidate mappings to generate.\n    \"\"\"\n    ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n    all_tgt_class_iris = set(self.tgt_onto.owl_classes.keys())\n    valid_tgt_class_iris = all_tgt_class_iris - set(\n        self.reference_class_dict[ref_src_class_iri]\n    )  # exclude gold standards\n    assert not ref_tgt_class_iri in valid_tgt_class_iris\n    return random.sample(valid_tgt_class_iris, num_candidates)\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.idf_sample","title":"<code>idf_sample(reference_class_mapping, num_candidates)</code>","text":"<p>Sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\) based on the \\(idf\\) scores w.r.t. the inverted annotation index (sub-word level).</p> <p>Candidate classes with higher \\(idf\\) scores will be considered first, and then combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).</p> <p>Parameters:</p> Name Type Description Default <code>reference_class_mapping</code> <code>ReferenceMapping</code> <p>The reference class mapping for generating the candidate mappings.</p> required <code>num_candidates</code> <code>int</code> <p>The expected number of candidate mappings to generate.</p> required Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def idf_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"Sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$ based on the $idf$ scores\n    w.r.t. the inverted annotation index (sub-word level).\n\n    Candidate classes with higher $idf$ scores will be considered first, and then combined with the source reference class $c$\n    to get a set of candidate mappings $\\{(c, c'_{cand})\\}$.\n\n    Args:\n        reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n        num_candidates (int): The expected number of candidate mappings to generate.\n    \"\"\"\n    ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n\n    tgt_candidates = self.tgt_inverted_annotation_index.idf_select(\n        self.tgt_annotation_index[ref_tgt_class_iri]\n    )  # select all non-trivial candidates first\n    valid_tgt_class_iris = []\n    for tgt_candidate_iri, _ in tgt_candidates:\n        # valid as long as it is not one of the reference target\n        if tgt_candidate_iri not in self.reference_class_dict[ref_src_class_iri]:\n            valid_tgt_class_iris.append(tgt_candidate_iri)\n        if len(valid_tgt_class_iris) == num_candidates:\n            break\n    assert not ref_tgt_class_iri in valid_tgt_class_iris\n    return valid_tgt_class_iris\n</code></pre>"},{"location":"deeponto/align/mapping/#deeponto.align.mapping.NegativeCandidateMappingGenerator.neighbour_sample","title":"<code>neighbour_sample(reference_class_mapping, num_candidates)</code>","text":"<p>Sample a set of target class candidates \\(c'_{cand}\\) for a given reference mapping \\((c, c')\\) based on the subsumption hierarchy.</p> <p>Define one-hop as one edge derived from an asserted subsumption axiom, i.e., to the parent class or the child class. Candidates classes with nearer hops will be considered first, and then combined with the source reference class \\(c\\) to get a set of candidate mappings \\(\\{(c, c'_{cand})\\}\\).</p> <p>Parameters:</p> Name Type Description Default <code>reference_class_mapping</code> <code>ReferenceMapping</code> <p>The reference class mapping for generating the candidate mappings.</p> required <code>num_candidates</code> <code>int</code> <p>The expected number of candidate mappings to generate.</p> required Source code in <code>src/deeponto/align/mapping.py</code> <pre><code>def neighbour_sample(self, reference_class_mapping: ReferenceMapping, num_candidates: int):\nr\"\"\"Sample a set of target class candidates $c'_{cand}$ for a given reference mapping $(c, c')$ based on the **subsumption\n    hierarchy**.\n\n    Define one-hop as one edge derived from an **asserted** subsumption axiom, i.e., to the parent class or the child class.\n    Candidates classes with nearer hops will be considered first, and then combined with the source reference class $c$\n    to get a set of candidate mappings $\\{(c, c'_{cand})\\}$.\n\n    Args:\n        reference_class_mapping (ReferenceMapping): The reference class mapping for generating the candidate mappings.\n        num_candidates (int): The expected number of candidate mappings to generate.\n    \"\"\"\n    ref_src_class_iri, ref_tgt_class_iri = reference_class_mapping.to_tuple()\n\n    valid_tgt_class_iris = set()\n    cur_hop = 1\n    frontier = [ref_tgt_class_iri]\n    # extract from the nearest neighbours until enough candidates or max hop\n    while len(valid_tgt_class_iris) &lt; num_candidates and cur_hop &lt;= self.max_hops:\n\n        neighbours_of_cur_hop = []\n        for tgt_class_iri in frontier:\n            tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n            parents = self.tgt_onto.reasoner.get_inferred_super_entities(tgt_class, direct=True)\n            children = self.tgt_onto.reasoner.get_inferred_sub_entities(tgt_class, direct=True)\n            neighbours_of_cur_hop += parents + children  # used for further hop expansion\n\n        valid_neighbours_of_cur_hop = set(neighbours_of_cur_hop) - set(self.reference_class_dict[ref_src_class_iri])\n        # print(valid_neighbours_of_cur_hop)\n\n        # NOTE if by adding neighbours of current hop the require number will be met\n        # we randomly pick among them\n        if len(valid_neighbours_of_cur_hop) &gt; num_candidates - len(valid_tgt_class_iris):\n            valid_neighbours_of_cur_hop = random.sample(\n                valid_neighbours_of_cur_hop, num_candidates - len(valid_tgt_class_iris)\n            )\n        valid_tgt_class_iris.update(valid_neighbours_of_cur_hop)\n\n        frontier = neighbours_of_cur_hop  # update the frontier with all possible neighbors\n        cur_hop += 1\n\n    assert not ref_tgt_class_iri in valid_tgt_class_iris\n    return list(valid_tgt_class_iris)\n</code></pre>"},{"location":"deeponto/align/oaei/","title":"OAEI Utilities","text":"<p>This page concerns utility functions used in the OAEI.</p>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.get_ignored_class_index","title":"<code>get_ignored_class_index(onto)</code>","text":"<p>Get an index for filtering classes that are marked as not used in alignment.</p> <p>This is indicated by the special class annotation <code>use_in_alignment</code> with the following IRI:     http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def get_ignored_class_index(onto: Ontology):\n\"\"\"Get an index for filtering classes that are marked as not used in alignment.\n\n    This is indicated by the special class annotation `use_in_alignment` with the following IRI:\n        http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\n    \"\"\"\n    ignored_class_index = defaultdict(lambda: False)\n    for class_iri, class_obj in onto.owl_classes.items():\n        use_in_alignment = onto.get_annotations(\n            class_obj, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\"\n        )\n        if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n            ignored_class_index[class_iri] = True\n    return ignored_class_index\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.remove_ignored_mappings","title":"<code>remove_ignored_mappings(mappings, ignored_class_index)</code>","text":"<p>Filter prediction mappings that involve classes to be ignored.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def remove_ignored_mappings(mappings: List[EntityMapping], ignored_class_index: dict):\n\"\"\"Filter prediction mappings that involve classes to be ignored.\"\"\"\n    results = []\n    for m in mappings:\n        if ignored_class_index[m.head] or ignored_class_index[m.tail]:\n            continue\n        results.append(m)\n    return results\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.matching_eval","title":"<code>matching_eval(pred_maps_file, ref_maps_file, null_ref_maps_file=None, ignored_class_index=None, pred_maps_threshold=None)</code>","text":"<p>Conduct global matching evaluation for the prediction mappings against the reference mappings.</p> <p>The prediction mappings are formatted the same as <code>full.tsv</code> (the full reference mappings), with three columns: <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"Score\"</code>, indicating the source class IRI, the target class IRI, and the corresponding mapping score.</p> <p>An <code>ignored_class_index</code> needs to be constructed for omitting prediction mappings that involve a class marked as not used in alignment.</p> <p>Use the following code to obtain such index for both the source and target ontologies:</p> <pre><code>ignored_class_index = get_ignored_class_index(src_onto)\nignored_class_index.update(get_ignored_class_index(tgt_onto))\n</code></pre> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def matching_eval(\n    pred_maps_file: str,\n    ref_maps_file: str,\n    null_ref_maps_file: Optional[str] = None,\n    ignored_class_index: Optional[dict] = None,\n    pred_maps_threshold: Optional[float] = None,\n):\nr\"\"\"Conduct **global matching** evaluation for the prediction mappings against the\n    reference mappings.\n\n    The prediction mappings are formatted the same as `full.tsv` (the full reference mappings),\n    with three columns: `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"Score\"`, indicating the source\n    class IRI, the target class IRI, and the corresponding mapping score.\n\n    An `ignored_class_index` needs to be constructed for omitting prediction mappings\n    that involve a class marked as **not used in alignment**.\n\n    Use the following code to obtain such index for both the source and target ontologies:\n\n    ```python\n    ignored_class_index = get_ignored_class_index(src_onto)\n    ignored_class_index.update(get_ignored_class_index(tgt_onto))\n    ```\n    \"\"\"\n    refs = ReferenceMapping.read_table_mappings(ref_maps_file, relation=\"=\")\n    preds = EntityMapping.read_table_mappings(pred_maps_file, relation=\"=\", threshold=pred_maps_threshold)\n    if ignored_class_index:\n        preds = remove_ignored_mappings(preds, ignored_class_index)\n    null_refs = ReferenceMapping.read_table_mappings(null_ref_maps_file, relation=\"=\") if null_ref_maps_file else []\n    results = AlignmentEvaluator.f1(preds, refs, null_reference_mappings=null_refs)\n    return results\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.read_candidate_mappings","title":"<code>read_candidate_mappings(cand_maps_file, for_biollm=False, threshold=0.0)</code>","text":"<p>Load scored or already ranked candidate mappings.</p> <p>The predicted candidate mappings are formatted the same as <code>test.cands.tsv</code>, with three columns: <code>\"SrcEntity\"</code>, <code>\"TgtEntity\"</code>, and <code>\"TgtCandidates\"</code>, indicating the source reference class IRI, the target reference class IRI, and a list of tuples in the form of <code>(target_candidate_class_IRI, score)</code> where <code>score</code> is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, <code>\"TgtCandidates\"</code> refers to a list of triples in the form of <code>(target_candidate_class_IRI, score, answer)</code> where the <code>answer</code> is required for computing matching scores.</p> <p>This method loads the candidate mappings in this format and parse them into the inputs of <code>mean_reciprocal_rank</code> and [<code>hits_at_K</code>][[<code>mean_reciprocal_rank</code>][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].</p> <p>For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of <code>f1</code>.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def read_candidate_mappings(cand_maps_file: str, for_biollm: bool = False, threshold: float = 0.0):\nr\"\"\"Load scored or already ranked candidate mappings.\n\n    The predicted candidate mappings are formatted the same as `test.cands.tsv`, with three columns:\n    `\"SrcEntity\"`, `\"TgtEntity\"`, and `\"TgtCandidates\"`, indicating the source reference class IRI, the\n    target reference class IRI, and a list of **tuples** in the form of `(target_candidate_class_IRI, score)` where\n    `score` is optional if the candidate mappings have been ranked. For the Bio-LLM special sub-track, `\"TgtCandidates\"`\n    refers to a list of **triples** in the form of `(target_candidate_class_IRI, score, answer)` where the `answer` is\n    required for computing matching scores.\n\n    This method loads the candidate mappings in this format and parse them into the inputs of [`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.mean_reciprocal_rank]\n    and [`hits_at_K`][[`mean_reciprocal_rank`][deeponto.align.evaluation.AlignmentEvaluator.hits_at_K].\n\n    For Bio-LLM, the true prediction mappings and reference mappings will also be generated for the matching evaluation, i.e., the inputs of [`f1`][deeponto.align.evaluation.AlignmentEvaluator.f1].\n    \"\"\"\n\n    all_cand_maps = read_table(cand_maps_file).values.tolist()\n    cands = []\n    unmatched_cands = []\n    preds = []  # only used for bio-llm\n    refs = []  # only used for bio-llm\n\n    for src_ref_class, tgt_ref_class, tgt_cands in all_cand_maps:\n        ref_map = ReferenceMapping(src_ref_class, tgt_ref_class, \"=\")\n        tgt_cands = eval(tgt_cands)\n        has_score = True if all([not isinstance(x, str) for x in tgt_cands]) else False\n        cand_maps = []\n        refs.append(ref_map) if tgt_ref_class != \"UnMatched\" else None\n        if for_biollm:\n            for t, s, a in tgt_cands:\n                m = EntityMapping(src_ref_class, t, \"=\", s)\n                cand_maps.append(m)\n                if a is True and s &gt;= threshold:  # only keep first one\n                    preds.append(m)\n        elif has_score:\n            cand_maps = [EntityMapping(src_ref_class, t, \"=\", s) for t, s in tgt_cands]\n        else:\n            warnings.warn(\"Input candidate mappings do not have a score, assume default rank in descending order.\")\n            cand_maps = [\n                EntityMapping(src_ref_class, t, \"=\", (len(tgt_cands) - i) / len(tgt_cands))\n                for i, t in enumerate(tgt_cands)\n            ]\n        cand_maps = EntityMapping.sort_entity_mappings_by_score(cand_maps)\n        if for_biollm and tgt_ref_class == \"UnMatched\":\n            unmatched_cands.append((ref_map, cand_maps))\n        else:\n            cands.append((ref_map, cand_maps))\n\n    if for_biollm:\n        return cands, unmatched_cands, preds, refs\n    else:\n        return cands\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.ranking_result_file_check","title":"<code>ranking_result_file_check(cand_maps_file, ref_cand_maps_file)</code>","text":"<p>Check if the ranking result file is formatted correctly as the original <code>test.cands.tsv</code> file provided in the dataset.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def ranking_result_file_check(cand_maps_file: str, ref_cand_maps_file: str):\nr\"\"\"Check if the ranking result file is formatted correctly as the original\n    `test.cands.tsv` file provided in the dataset.\n    \"\"\"\n    formatted_cand_maps = read_candidate_mappings(cand_maps_file)\n    formatted_ref_cand_maps = read_candidate_mappings(ref_cand_maps_file)\n    assert len(formatted_cand_maps) == len(\n        formatted_ref_cand_maps\n    ), f\"Mismatched number of reference mappings: {len(formatted_cand_maps)}; should be {len(formatted_ref_cand_maps)}.\"\n    for i in range(len(formatted_cand_maps)):\n        anchor, cands = formatted_cand_maps[i]\n        ref_anchor, ref_cands = formatted_ref_cand_maps[i]\n        assert (\n            anchor.to_tuple() == ref_anchor.to_tuple()\n        ), f\"Mismatched reference mapping: {anchor}; should be {ref_anchor}.\"\n        cands = [c.to_tuple() for c in cands]\n        ref_cands = [rc.to_tuple() for rc in ref_cands]\n        assert not (\n            set(cands) - set(ref_cands)\n        ), f\"Mismatch set of candidate mappings for the reference mapping: {anchor}.\"\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.ranking_eval","title":"<code>ranking_eval(cand_maps_file, Ks=[1, 5, 10])</code>","text":"<p>Conduct local ranking evaluation for the scored or ranked candidate mappings.</p> <p>See <code>read_candidate_mappings</code> for the file format and loading.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def ranking_eval(cand_maps_file: str, Ks=[1, 5, 10]):\nr\"\"\"Conduct **local ranking** evaluation for the scored or ranked candidate mappings.\n\n    See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.\n    \"\"\"\n    formatted_cand_maps = read_candidate_mappings(cand_maps_file)\n    results = {\"MRR\": AlignmentEvaluator.mean_reciprocal_rank(formatted_cand_maps)}\n    for K in Ks:\n        results[f\"Hits@{K}\"] = AlignmentEvaluator.hits_at_K(formatted_cand_maps, K=K)\n    return results\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.is_rejection","title":"<code>is_rejection(preds, cands)</code>","text":"<p>A successful rejection means none of the candidate mappings are predicted as true mappings.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def is_rejection(preds: List[EntityMapping], cands: List[EntityMapping]):\n\"\"\"A successful rejection means none of the candidate mappings are predicted as true mappings.\"\"\"\n    return set([p.to_tuple() for p in preds]).intersection(set([c.to_tuple() for c in cands])) == set()\n</code></pre>"},{"location":"deeponto/align/oaei/#deeponto.align.oaei.biollm_eval","title":"<code>biollm_eval(cand_maps_file, Ks=[1], threshold=0.0)</code>","text":"<p>Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.</p> <p>See <code>read_candidate_mappings</code> for the file format and loading.</p> Source code in <code>src/deeponto/align/oaei.py</code> <pre><code>def biollm_eval(cand_maps_file, Ks=[1], threshold: float = 0.0):\nr\"\"\"Conduct Bio-LLM evaluation for the Bio-LLM formatted candidate mappings.\n\n    See [`read_candidate_mappings`][deeponto.align.oaei.read_candidate_mappings] for the file format and loading.\n    \"\"\"\n    matched_cand_maps, unmatched_cand_maps, preds, refs = read_candidate_mappings(\n        cand_maps_file, for_biollm=True, threshold=threshold\n    )\n\n    results = AlignmentEvaluator.f1(preds, refs)\n    for K in Ks:\n        results[f\"Hits@{K}\"] = AlignmentEvaluator.hits_at_K(matched_cand_maps, K=K)\n    results[\"MRR\"] = AlignmentEvaluator.mean_reciprocal_rank(matched_cand_maps)\n    rej = 0\n    for _, cs in unmatched_cand_maps:\n        rej += int(is_rejection(preds, cs))\n    results[\"RR\"] = rej / len(unmatched_cand_maps)\n    return results\n</code></pre>"},{"location":"deeponto/align/bertmap/","title":"BERTMap","text":"<p>Paper</p> <p>\\(\\textsf{BERTMap}\\) is proposed in the paper: BERTMap: A BERT-based Ontology Alignment System (AAAI-2022).</p> <pre><code>@inproceedings{he2022bertmap,\n    title={BERTMap: a BERT-based ontology alignment system},\n    author={He, Yuan and Chen, Jiaoyan and Antonyrajah, Denvar and Horrocks, Ian},\n    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n    volume={36},\n    number={5},\n    pages={5684--5691},\n    year={2022}\n}\n</code></pre> <p>\\(\\textsf{BERTMap}\\) is a BERT-based ontology matching (OM) system consisting of following components:</p> <ul> <li>Text semantics corpora construction from input ontologies, and optionally from input mappings and other auxiliary ontologies.</li> <li>BERT synonym classifier training on synonym and non-synonym samples in text semantics corpora.</li> <li>Sub-word Inverted Index construction from the tokenised class annotations for candidate selection in mapping prediction.</li> <li>Mapping Predictor which integrates a simple edit distance-based string matching module and the fine-tuned BERT synonym classifier for mapping scoring. For each source ontology class, narrow down target class candidates using the sub-word inverted index, apply string matching for \"easy\" mappings and then apply BERT matching.</li> <li>Mapping Refiner which consists of the mapping extension and mapping repair modules. Mapping extension is an iterative process based on the locality principle. Mapping repair utilises the LogMap's debugger. </li> </ul> <p>\\(\\textsf{BERTMapLt}\\) is a light-weight version of \\(\\textsf{BERTMap}\\) without the BERT module and mapping refiner.</p> <p>See the tutorial for \\(\\textsf{BERTMap}\\) here.</p>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline","title":"<code>BERTMapPipeline(src_onto, tgt_onto, config)</code>","text":"<p>Class for the whole ontology alignment pipeline of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) models.</p> <p>Note</p> <p>Parameters related to BERT training are <code>None</code> by default. They will be constructed for \\(\\textsf{BERTMap}\\) and stay as <code>None</code> for \\(\\textsf{BERTMapLt}\\).</p> <p>Attributes:</p> Name Type Description <code>config</code> <code>CfgNode</code> <p>The configuration for BERTMap or BERTMapLt.</p> <code>name</code> <code>str</code> <p>The name of the model, either <code>bertmap</code> or <code>bertmaplt</code>.</p> <code>output_path</code> <code>str</code> <p>The path to the output directory.</p> <code>src_onto</code> <code>Ontology</code> <p>The source ontology to be matched.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology to be matched.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>The annotation property IRIs used for extracting synonyms and nonsynonyms.</p> <code>src_annotation_index</code> <code>dict</code> <p>A dictionary that stores the <code>(class_iri, class_annotations)</code> pairs from <code>src_onto</code> according to <code>annotation_property_iris</code>.</p> <code>tgt_annotation_index</code> <code>dict</code> <p>A dictionary that stores the <code>(class_iri, class_annotations)</code> pairs from <code>tgt_onto</code> according to <code>annotation_property_iris</code>.</p> <code>known_mappings</code> <code>List[ReferenceMapping]</code> <p>List of known mappings for constructing the cross-ontology corpus.</p> <code>auxliary_ontos</code> <code>List[Ontology]</code> <p>List of auxiliary ontolgoies for constructing any auxiliary corpus.</p> <code>corpora</code> <code>dict</code> <p>A dictionary that stores the <code>summary</code> of built text semantics corpora and the sampled <code>synonyms</code> and <code>nonsynonyms</code>.</p> <code>finetune_data</code> <code>dict</code> <p>A dictionary that stores the <code>training</code> and <code>validation</code> splits of samples from <code>corpora</code>.</p> <code>bert</code> <code>BERTSynonymClassifier</code> <p>A BERT model for synonym classification and mapping prediction.</p> <code>best_checkpoint</code> <code>str</code> <p>The path to the best BERT checkpoint which will be loaded after training.</p> <code>mapping_predictor</code> <code>MappingPredictor</code> <p>The predictor function based on class annotations, used for global matching or mapping scoring.</p> <p>Parameters:</p> Name Type Description Default <code>src_onto</code> <code>Ontology</code> <p>The source ontology for alignment.</p> required <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology for alignment.</p> required <code>config</code> <code>CfgNode</code> <p>The configuration for BERTMap or BERTMapLt.</p> required Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def __init__(self, src_onto: Ontology, tgt_onto: Ontology, config: CfgNode):\n\"\"\"Initialise the BERTMap or BERTMapLt model.\n\n    Args:\n        src_onto (Ontology): The source ontology for alignment.\n        tgt_onto (Ontology): The target ontology for alignment.\n        config (CfgNode): The configuration for BERTMap or BERTMapLt.\n    \"\"\"\n    # load the configuration and confirm model name is valid\n    self.config = config\n    self.name = self.config.model\n    if not self.name in MODEL_OPTIONS.keys():\n        raise RuntimeError(f\"`model` {self.name} in the config file is not one of the supported.\")\n\n    # create the output directory, e.g., experiments/bertmap\n    self.config.output_path = \".\" if not self.config.output_path else self.config.output_path\n    self.config.output_path = os.path.abspath(self.config.output_path)\n    self.output_path = os.path.join(self.config.output_path, self.name)\n    create_path(self.output_path)\n\n    # create logger and progress manager (hidden attribute) \n    self.logger = create_logger(self.name, self.output_path)\n    self.enlighten_manager = enlighten.get_manager()\n\n    # ontology\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    self.annotation_property_iris = self.config.annotation_property_iris\n    self.logger.info(f\"Load the following configurations:\\n{print_dict(self.config)}\")\n    config_path = os.path.join(self.output_path, \"config.yaml\")\n    self.logger.info(f\"Save the configuration file at {config_path}.\")\n    self.save_bertmap_config(self.config, config_path)\n\n    # build the annotation thesaurus\n    self.src_annotation_index, _ = self.src_onto.build_annotation_index(self.annotation_property_iris, apply_lowercasing=True)\n    self.tgt_annotation_index, _ = self.tgt_onto.build_annotation_index(self.annotation_property_iris, apply_lowercasing=True)\n    if (not self.src_annotation_index) or (not self.tgt_annotation_index):\n        raise RuntimeError(\"No class annotations found in input ontologies; unable to produce alignment.\")\n\n    # provided mappings if any\n    self.known_mappings = self.config.known_mappings\n    if self.known_mappings:\n        self.known_mappings = ReferenceMapping.read_table_mappings(self.known_mappings)\n\n    # auxiliary ontologies if any\n    self.auxiliary_ontos = self.config.auxiliary_ontos\n    if self.auxiliary_ontos:\n        self.auxiliary_ontos = [Ontology(ao) for ao in self.auxiliary_ontos]\n\n    self.data_path = os.path.join(self.output_path, \"data\")\n    # load or construct the corpora\n    self.corpora_path = os.path.join(self.data_path, \"text-semantics.corpora.json\")\n    self.corpora = self.load_text_semantics_corpora()\n\n    # load or construct fine-tune data\n    self.finetune_data_path = os.path.join(self.data_path, \"fine-tune.data.json\")\n    self.finetune_data = self.load_finetune_data()\n\n    # load the bert model and train\n    self.bert_config = self.config.bert\n    self.bert_pretrained_path = self.bert_config.pretrained_path\n    self.bert_finetuned_path = os.path.join(self.output_path, \"bert\")\n    self.bert_resume_training = self.bert_config.resume_training\n    self.bert_synonym_classifier = None\n    self.best_checkpoint = None\n    if self.name == \"bertmap\":\n        self.bert_synonym_classifier = self.load_bert_synonym_classifier()\n        # train if the loaded classifier is not in eval mode\n        if self.bert_synonym_classifier.eval_mode == False:\n            self.logger.info(\n                f\"Data statistics:\\n \\\n{print_dict(self.bert_synonym_classifier.data_stat)}\"\n            )\n            self.bert_synonym_classifier.train(self.bert_resume_training)\n            # turn on eval mode after training\n            self.bert_synonym_classifier.eval()\n        # NOTE potential redundancy here: after training, load the best checkpoint\n        self.best_checkpoint = self.load_best_checkpoint()\n        if not self.best_checkpoint:\n            raise RuntimeError(f\"No best checkpoint found for the BERT synonym classifier model.\")\n        self.logger.info(f\"Fine-tuning finished, found best checkpoint at {self.best_checkpoint}.\")\n    else:\n        self.logger.info(f\"No training needed; skip BERT fine-tuning.\")\n\n    # pretty progress bar tracking\n    self.enlighten_status = self.enlighten_manager.status_bar(\n        status_format=u'Global Matching{fill}Stage: {demo}{fill}{elapsed}',\n        color='bold_underline_bright_white_on_lightslategray',\n        justify=enlighten.Justify.CENTER, demo='Initializing',\n        autorefresh=True, min_delta=0.5\n    )\n\n    # mapping predictions\n    self.global_matching_config = self.config.global_matching\n\n    # build ignored class index for OAEI\n    self.ignored_class_index = None  \n    if self.global_matching_config.for_oaei:\n        self.ignored_class_index = defaultdict(lambda: False)\n        for src_class_iri, src_class in self.src_onto.owl_classes.items():\n            use_in_alignment = self.src_onto.get_annotations(src_class, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\")\n            if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n                self.ignored_class_index[src_class_iri] = True\n        for tgt_class_iri, tgt_class in self.tgt_onto.owl_classes.items():\n            use_in_alignment = self.tgt_onto.get_annotations(tgt_class, \"http://oaei.ontologymatching.org/bio-ml/ann/use_in_alignment\")\n            if use_in_alignment and str(use_in_alignment[0]).lower() == \"false\":\n                self.ignored_class_index[tgt_class_iri] = True\n\n    self.mapping_predictor = MappingPredictor(\n        output_path=self.output_path,\n        tokenizer_path=self.bert_config.pretrained_path,\n        src_annotation_index=self.src_annotation_index,\n        tgt_annotation_index=self.tgt_annotation_index,\n        bert_synonym_classifier=self.bert_synonym_classifier,\n        num_raw_candidates=self.global_matching_config.num_raw_candidates,\n        num_best_predictions=self.global_matching_config.num_best_predictions,\n        batch_size_for_prediction=self.bert_config.batch_size_for_prediction,\n        logger=self.logger,\n        enlighten_manager=self.enlighten_manager,\n        enlighten_status=self.enlighten_status,\n        ignored_class_index=self.ignored_class_index,\n    )\n    self.mapping_refiner = None\n\n    # if global matching is disabled (potentially used for class pair scoring)\n    if self.config.global_matching.enabled:\n        self.mapping_predictor.mapping_prediction()  # mapping prediction\n        if self.name == \"bertmap\":\n            self.mapping_refiner = MappingRefiner(\n                output_path=self.output_path,\n                src_onto=self.src_onto,\n                tgt_onto=self.tgt_onto,\n                mapping_predictor=self.mapping_predictor,\n                mapping_extension_threshold=self.global_matching_config.mapping_extension_threshold,\n                mapping_filtered_threshold=self.global_matching_config.mapping_filtered_threshold,\n                logger=self.logger,\n                enlighten_manager=self.enlighten_manager,\n                enlighten_status=self.enlighten_status\n            )\n            self.mapping_refiner.mapping_extension()  # mapping extension\n            self.mapping_refiner.mapping_repair()  # mapping repair\n        self.enlighten_status.update(demo=\"Finished\")  \n    else:\n        self.enlighten_status.update(demo=\"Skipped\")  \n\n    self.enlighten_status.close()\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_or_construct","title":"<code>load_or_construct(data_file, data_name, construct_func, *args, **kwargs)</code>","text":"<p>Load existing data or construct a new one.</p> <p>An auxlirary function that checks the existence of a data file and loads it if it exists. Otherwise, construct new data with the input <code>construct_func</code> which is supported generate a local data file.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_or_construct(self, data_file: str, data_name: str, construct_func: Callable, *args, **kwargs):\n\"\"\"Load existing data or construct a new one.\n\n    An auxlirary function that checks the existence of a data file and loads it if it exists.\n    Otherwise, construct new data with the input `construct_func` which is supported generate\n    a local data file.\n    \"\"\"\n    if os.path.exists(data_file):\n        self.logger.info(f\"Load existing {data_name} from {data_file}.\")\n    else:\n        self.logger.info(f\"Construct new {data_name} and save at {data_file}.\")\n        construct_func(*args, **kwargs)\n    # load the data file that is supposed to be saved locally\n    return load_file(data_file)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_text_semantics_corpora","title":"<code>load_text_semantics_corpora()</code>","text":"<p>Load or construct text semantics corpora.</p> <p>See <code>TextSemanticsCorpora</code>.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_text_semantics_corpora(self):\n\"\"\"Load or construct text semantics corpora.\n\n    See [`TextSemanticsCorpora`][deeponto.align.bertmap.text_semantics.TextSemanticsCorpora].\n    \"\"\"\n    data_name = \"text semantics corpora\"\n\n    if self.name == \"bertmap\":\n\n        def construct():\n            corpora = TextSemanticsCorpora(\n                src_onto=self.src_onto,\n                tgt_onto=self.tgt_onto,\n                annotation_property_iris=self.annotation_property_iris,\n                class_mappings=self.known_mappings,\n                auxiliary_ontos=self.auxiliary_ontos,\n            )\n            self.logger.info(str(corpora))\n            corpora.save(self.data_path)\n\n        return self.load_or_construct(self.corpora_path, data_name, construct)\n\n    self.logger.info(f\"No training needed; skip the construction of {data_name}.\")\n    return None\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_finetune_data","title":"<code>load_finetune_data()</code>","text":"<p>Load or construct fine-tuning data from text semantics corpora.</p> <p>Steps of constructing fine-tuning data from text semantics:</p> <ol> <li>Mix synonym and nonsynonym data.</li> <li>Randomly sample 90% as training samples and 10% as validation.</li> </ol> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_finetune_data(self):\nr\"\"\"Load or construct fine-tuning data from text semantics corpora.\n\n    Steps of constructing fine-tuning data from text semantics:\n\n    1. Mix synonym and nonsynonym data.\n    2. Randomly sample 90% as training samples and 10% as validation.\n    \"\"\"\n    data_name = \"fine-tuning data\"\n\n    if self.name == \"bertmap\":\n\n        def construct():\n            finetune_data = dict()\n            samples = self.corpora[\"synonyms\"] + self.corpora[\"nonsynonyms\"]\n            random.shuffle(samples)\n            split_index = int(0.9 * len(samples))  # split at 90%\n            finetune_data[\"training\"] = samples[:split_index]\n            finetune_data[\"validation\"] = samples[split_index:]\n            save_file(finetune_data, self.finetune_data_path)\n\n        return self.load_or_construct(self.finetune_data_path, data_name, construct)\n\n    self.logger.info(f\"No training needed; skip the construction of {data_name}.\")\n    return None\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_bert_synonym_classifier","title":"<code>load_bert_synonym_classifier()</code>","text":"<p>Load the BERT model from a pre-trained or a local checkpoint.</p> <ul> <li>If loaded from pre-trained, it means to start training from a pre-trained model such as <code>bert-uncased</code>.</li> <li>If loaded from local, turn on the <code>eval</code> mode for mapping predictions.</li> <li>If <code>self.bert_resume_training</code> is <code>True</code>, it will be loaded from the latest saved checkpoint.</li> </ul> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_bert_synonym_classifier(self):\n\"\"\"Load the BERT model from a pre-trained or a local checkpoint.\n\n    - If loaded from pre-trained, it means to start training from a pre-trained model such as `bert-uncased`.\n    - If loaded from local, turn on the `eval` mode for mapping predictions.\n    - If `self.bert_resume_training` is `True`, it will be loaded from the latest saved checkpoint.\n    \"\"\"\n    checkpoint = self.load_best_checkpoint()  # load the best checkpoint or nothing\n    eval_mode = True\n    # if no checkpoint has been found, start training from scratch OR resume training\n    # no point to load the best checkpoint if resume training (will automatically search for the latest checkpoint)\n    if not checkpoint or self.bert_resume_training:\n        checkpoint = self.bert_pretrained_path\n        eval_mode = False  # since it is for training now\n\n    return BERTSynonymClassifier(\n        loaded_path=checkpoint,\n        output_path=self.bert_finetuned_path,\n        eval_mode=eval_mode,\n        max_length_for_input=self.bert_config.max_length_for_input,\n        num_epochs_for_training=self.bert_config.num_epochs_for_training,\n        batch_size_for_training=self.bert_config.batch_size_for_training,\n        batch_size_for_prediction=self.bert_config.batch_size_for_prediction,\n        training_data=self.finetune_data[\"training\"],\n        validation_data=self.finetune_data[\"validation\"],\n    )\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_best_checkpoint","title":"<code>load_best_checkpoint()</code>","text":"<p>Find the best checkpoint by searching for trainer states in each checkpoint file.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>def load_best_checkpoint(self) -&gt; Optional[str]:\n\"\"\"Find the best checkpoint by searching for trainer states in each checkpoint file.\"\"\"\n    best_checkpoint = -1\n\n    if os.path.exists(self.bert_finetuned_path):\n        for file in os.listdir(self.bert_finetuned_path):\n            # load trainer states from each checkpoint file\n            if file.startswith(\"checkpoint\"):\n                trainer_state = load_file(\n                    os.path.join(self.bert_finetuned_path, file, \"trainer_state.json\")\n                )\n                checkpoint = int(trainer_state[\"best_model_checkpoint\"].split(\"/\")[-1].split(\"-\")[-1])\n                # find the latest best checkpoint\n                if checkpoint &gt; best_checkpoint:\n                    best_checkpoint = checkpoint\n\n    if best_checkpoint == -1:\n        best_checkpoint = None\n    else:\n        best_checkpoint = os.path.join(self.bert_finetuned_path, f\"checkpoint-{best_checkpoint}\")\n\n    return best_checkpoint\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.load_bertmap_config","title":"<code>load_bertmap_config(config_file=None)</code>  <code>staticmethod</code>","text":"<p>Load the BERTMap configuration in <code>.yaml</code>. If the file is not provided, use the default configuration.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>@staticmethod\ndef load_bertmap_config(config_file: Optional[str] = None):\n\"\"\"Load the BERTMap configuration in `.yaml`. If the file\n    is not provided, use the default configuration.\n    \"\"\"\n    if not config_file:\n        config_file = DEFAULT_CONFIG_FILE\n        print(f\"Use the default configuration at {DEFAULT_CONFIG_FILE}.\")  \n    if not config_file.endswith(\".yaml\"):\n        raise RuntimeError(\"Configuration file should be in `yaml` format.\")\n    return CfgNode(load_file(config_file))\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.pipeline.BERTMapPipeline.save_bertmap_config","title":"<code>save_bertmap_config(config, config_file)</code>  <code>staticmethod</code>","text":"<p>Save the BERTMap configuration in <code>.yaml</code>.</p> Source code in <code>src/deeponto/align/bertmap/pipeline.py</code> <pre><code>@staticmethod\ndef save_bertmap_config(config: CfgNode, config_file: str):\n\"\"\"Save the BERTMap configuration in `.yaml`.\"\"\"\n    with open(config_file, \"w\") as c:\n        config.dump(stream=c, sort_keys=False, default_flow_style=False)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus","title":"<code>AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)</code>","text":"<p>A thesaurus class for synonyms and non-synonyms extracted from an ontology.</p> <p>Some related definitions of arguments here:</p> <ul> <li>A <code>synonym_group</code> is a set of annotation phrases that are synonymous to each other;</li> <li>The <code>transitivity</code> of synonyms means if A and B are synonymous and B and C are synonymous, then A and C are synonymous. This is achieved by a connected graph-based algorithm.</li> <li>A <code>synonym_pair</code> is a pair synonymous annotation phrase which can be extracted from the cartesian product of a <code>synonym_group</code> and itself. NOTE that reflexivity and symmetry are preserved meaning that (i) every phrase A is a synonym of itself and (ii) if (A, B) is a synonym pair then (B, A) is a synonym pair, too.</li> </ul> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>An ontology to construct the annotation thesaurus from.</p> <code>annotation_index</code> <code>Dict[str, Set[str]]</code> <p>An index of the class annotations with <code>(class_iri, annotations)</code> pairs.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>A list of annotation property IRIs used to extract the annotations.</p> <code>average_number_of_annotations_per_class</code> <code>int</code> <p>The average number of (extracted) annotations per ontology class.</p> <code>apply_transitivity</code> <code>bool</code> <p>Apply synonym transitivity to merge synonym groups or not.</p> <code>synonym_groups</code> <code>List[Set[str]]</code> <p>The list of synonym groups extracted from the ontology according to specified annotation properties.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The input ontology to extract annotations from.</p> required <code>annotation_property_iris</code> <code>List[str]</code> <p>Specify which annotation properties to be used.</p> required <code>apply_transitivity</code> <code>bool</code> <p>Apply synonym transitivity to merge synonym groups or not. Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def __init__(self, onto: Ontology, annotation_property_iris: List[str], apply_transitivity: bool = False):\nr\"\"\"Initialise a thesaurus for ontology class annotations.\n\n    Args:\n        onto (Ontology): The input ontology to extract annotations from.\n        annotation_property_iris (List[str]): Specify which annotation properties to be used.\n        apply_transitivity (bool, optional): Apply synonym transitivity to merge synonym groups or not. Defaults to `False`.\n    \"\"\"\n\n    self.onto = onto\n    # build the annotation index to extract synonyms from `onto`\n    # the input property iris may not exist in this ontology\n    # the output property iris will be truncated to the existing ones\n    index, iris = self.onto.build_annotation_index(\n        annotation_property_iris=annotation_property_iris,\n        entity_type=\"Classes\",\n        apply_lowercasing=True,\n    )\n    self.annotation_index = index\n    self.annotation_property_iris = iris\n    total_number_of_annotations = sum([len(v) for v in self.annotation_index.values()])\n    self.average_number_of_annotations_per_class = total_number_of_annotations / len(self.annotation_index)\n\n    # synonym groups\n    self.apply_transitivity = apply_transitivity\n    self.synonym_groups = list(self.annotation_index.values())\n    if self.apply_transitivity:\n        self.synonym_groups = self.merge_synonym_groups_by_transitivity(self.synonym_groups)\n\n    # summary\n    self.info = {\n        type(self).__name__: {\n            \"ontology\": self.onto.info[type(self.onto).__name__],\n            \"average_number_of_annotations_per_class\": round(self.average_number_of_annotations_per_class, 3),\n            \"number_of_synonym_groups\": len(self.synonym_groups),\n        }\n    }\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.get_synonym_pairs","title":"<code>get_synonym_pairs(synonym_group, remove_duplicates=True)</code>  <code>staticmethod</code>","text":"<p>Get synonym pairs from a synonym group through a cartesian product.</p> <p>Parameters:</p> Name Type Description Default <code>synonym_group</code> <code>Set[str]</code> <p>A set of annotation phrases that are synonymous to each other.</p> required <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of synonym pairs.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>@staticmethod\ndef get_synonym_pairs(synonym_group: Set[str], remove_duplicates: bool = True):\n\"\"\"Get synonym pairs from a synonym group through a cartesian product.\n\n    Args:\n        synonym_group (Set[str]): A set of annotation phrases that are synonymous to each other.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of synonym pairs.\n    \"\"\"\n    synonym_pairs = list(itertools.product(synonym_group, synonym_group))\n    if remove_duplicates:\n        return uniqify(synonym_pairs)\n    else:\n        return synonym_pairs\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.merge_synonym_groups_by_transitivity","title":"<code>merge_synonym_groups_by_transitivity(synonym_groups)</code>  <code>staticmethod</code>","text":"<p>Merge synonym groups by transitivity.</p> <p>Synonym groups that share a common annotation phrase will be merged. NOTE that for multiple ontologies, we can merge their synonym groups by first concatenating them then use this function.</p> <p>Note</p> <p>In \\(\\textsf{BERTMap}\\) experiments we have considered this as a data augmentation approach but it does not bring a significant performance improvement. However, if the overall number of annotations is not large enough then this could be a good option.</p> <p>Parameters:</p> Name Type Description Default <code>synonym_groups</code> <code>List[Set[str]]</code> <p>A sequence of synonym groups to be merged.</p> required <p>Returns:</p> Type Description <code>List[Set[str]]</code> <p>A list of merged synonym groups.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>@staticmethod\ndef merge_synonym_groups_by_transitivity(synonym_groups: List[Set[str]]):\nr\"\"\"Merge synonym groups by transitivity.\n\n    Synonym groups that share a common annotation phrase will be merged. NOTE that for\n    multiple ontologies, we can merge their synonym groups by first concatenating them\n    then use this function.\n\n    !!! note\n\n        In $\\textsf{BERTMap}$ experiments we have considered this as a data augmentation approach\n        but it does not bring a significant performance improvement. However, if the\n        overall number of annotations is not large enough then this could be a good option.\n\n    Args:\n        synonym_groups (List[Set[str]]): A sequence of synonym groups to be merged.\n\n    Returns:\n        (List[Set[str]]): A list of merged synonym groups.\n    \"\"\"\n    synonym_pairs = []\n    for synonym_group in synonym_groups:\n        # gather synonym pairs from the self-product of a synonym group\n        synonym_pairs += AnnotationThesaurus.get_synonym_pairs(synonym_group, remove_duplicates=False)\n    synonym_pairs = uniqify(synonym_pairs)\n    merged_grouped_synonyms = AnnotationThesaurus.connected_labels(synonym_pairs)\n    return merged_grouped_synonyms\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.connected_annotations","title":"<code>connected_annotations(synonym_pairs)</code>  <code>staticmethod</code>","text":"<p>Build a graph for adjacency among the class annotations (labels) such that the transitivity of synonyms is ensured.</p> <p>Auxiliary function for <code>merge_synonym_groups_by_transitivity</code>.</p> <p>Parameters:</p> Name Type Description Default <code>synonym_pairs</code> <code>List[Tuple[str, str]]</code> <p>List of pairs of phrases that are synonymous.</p> required <p>Returns:</p> Type Description <code>List[Set[str]]</code> <p>A list of synonym groups.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>@staticmethod\ndef connected_annotations(synonym_pairs: List[Tuple[str, str]]):\n\"\"\"Build a graph for adjacency among the class annotations (labels) such that\n    the **transitivity** of synonyms is ensured.\n\n    Auxiliary function for [`merge_synonym_groups_by_transitivity`][deeponto.align.bertmap.text_semantics.AnnotationThesaurus.merge_synonym_groups_by_transitivity].\n\n    Args:\n        synonym_pairs (List[Tuple[str, str]]): List of pairs of phrases that are synonymous.\n\n    Returns:\n        (List[Set[str]]): A list of synonym groups.\n    \"\"\"\n    graph = nx.Graph()\n    graph.add_edges_from(synonym_pairs)\n    # nx.draw(G, with_labels = True)\n    connected = list(nx.connected_components(graph))\n    return connected\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.synonym_sampling","title":"<code>synonym_sampling(num_samples=None)</code>","text":"<p>Sample synonym pairs from a list of synonym groups extracted from the input ontology.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, synonyms are defined as label pairs that belong to the same ontology class.</p> <p>NOTE this has been validated for getting the same results as in the original \\(\\textsf{BERTMap}\\) repository.</p> <p>Parameters:</p> Name Type Description Default <code>num_samples</code> <code>int</code> <p>The (maximum) number of unique samples extracted. Defaults to <code>None</code>.</p> <code>None</code> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique synonym pair samples.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def synonym_sampling(self, num_samples: Optional[int] = None):\nr\"\"\"Sample synonym pairs from a list of synonym groups extracted from the input ontology.\n\n    According to the $\\textsf{BERTMap}$ paper, **synonyms** are defined as label pairs that belong\n    to the same ontology class.\n\n    NOTE this has been validated for getting the same results as in the original $\\textsf{BERTMap}$ repository.\n\n    Args:\n        num_samples (int, optional): The (maximum) number of **unique** samples extracted. Defaults to `None`.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique synonym pair samples.\n    \"\"\"\n    synonym_pool = []\n    for synonym_group in self.synonym_groups:\n        # do not remove duplicates in the loop to save time\n        synonym_pairs = self.get_synonym_pairs(synonym_group, remove_duplicates=False)\n        synonym_pool += synonym_pairs\n    # remove duplicates afer the loop\n    synonym_pool = uniqify(synonym_pool)\n\n    if (not num_samples) or (num_samples &gt;= len(synonym_pool)):\n        # print(\"Return all synonym pairs without downsampling.\")\n        return synonym_pool\n    else:\n        return random.sample(synonym_pool, num_samples)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.soft_nonsynonym_sampling","title":"<code>soft_nonsynonym_sampling(num_samples, max_iter=5)</code>","text":"<p>Sample soft non-synonyms from a list of synonym groups extracted from the input ontology.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, soft non-synonyms are defined as label pairs from two different synonym groups that are randomly selected.</p> <p>Parameters:</p> Name Type Description Default <code>num_samples</code> <code>int</code> <p>The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups).</p> required <code>max_iter</code> <code>int</code> <p>The maximum number of iterations for conducting sampling. Defaults to <code>5</code>.</p> <code>5</code> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique (soft) non-synonym pair samples.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def soft_nonsynonym_sampling(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample **soft** non-synonyms from a list of synonym groups extracted from the input ontology.\n\n    According to the $\\textsf{BERTMap}$ paper, **soft non-synonyms** are defined as label pairs\n    from two *different* synonym groups that are **randomly** selected.\n\n    Args:\n        num_samples (int): The (maximum) number of **unique** samples extracted; this is\n            required **unlike for synonym sampling** because the non-synonym pool is **significantly\n            larger** (considering random combinations of different synonym groups).\n        max_iter (int): The maximum number of iterations for conducting sampling. Defaults to `5`.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique (soft) non-synonym pair samples.\n    \"\"\"\n    nonsyonym_pool = []\n    # randomly select disjoint synonym group pairs from all\n    for _ in range(num_samples):\n        left_synonym_group, right_synonym_group = tuple(random.sample(self.synonym_groups, 2))\n        try:\n            # randomly choose one label from a synonym group\n            left_label = random.choice(list(left_synonym_group))\n            right_label = random.choice(list(right_synonym_group))\n            nonsyonym_pool.append((left_label, right_label))\n        except:\n            # skip if there are no class labels\n            continue\n\n    # DataUtils.uniqify is too slow so we should avoid operating it too often\n    nonsyonym_pool = uniqify(nonsyonym_pool)\n\n    while len(nonsyonym_pool) &lt; num_samples and max_iter &gt; 0:\n        max_iter = max_iter - 1  # reduce the iteration to prevent exhausting loop\n        nonsyonym_pool += self.soft_nonsynonym_sampling(num_samples - len(nonsyonym_pool), max_iter)\n        nonsyonym_pool = uniqify(nonsyonym_pool)\n\n    return nonsyonym_pool\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.weighted_random_choices_of_sibling_groups","title":"<code>weighted_random_choices_of_sibling_groups(k=1)</code>","text":"<p>Randomly (weighted) select a number of sibling class groups.</p> <p>The weights are computed according to the sizes of the sibling class groups.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def weighted_random_choices_of_sibling_groups(self, k: int = 1):\n\"\"\"Randomly (weighted) select a number of sibling class groups.\n\n    The weights are computed according to the sizes of the sibling class groups.\n    \"\"\"\n    weights = [len(s) for s in self.onto.sibling_class_groups]\n    weights = [w / sum(weights) for w in weights]  # normalised\n    return random.choices(self.onto.sibling_class_groups, weights=weights, k=k)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.AnnotationThesaurus.hard_nonsynonym_sampling","title":"<code>hard_nonsynonym_sampling(num_samples, max_iter=5)</code>","text":"<p>Sample hard non-synonyms from sibling classes of the input ontology.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, hard non-synonyms are defined as label pairs that belong to two disjoint ontology classes. For practical reason, the condition is eased to two sibling ontology classes.</p> <p>Parameters:</p> Name Type Description Default <code>num_samples</code> <code>int</code> <p>The (maximum) number of unique samples extracted; this is required unlike for synonym sampling because the non-synonym pool is significantly larger (considering random combinations of different synonym groups).</p> required <code>max_iter</code> <code>int</code> <p>The maximum number of iterations for conducting sampling. Defaults to <code>5</code>.</p> <code>5</code> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique (hard) non-synonym pair samples.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def hard_nonsynonym_sampling(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample **hard** non-synonyms from sibling classes of the input ontology.\n\n    According to the $\\textsf{BERTMap}$ paper, **hard non-synonyms** are defined as label pairs\n    that belong to two **disjoint** ontology classes. For practical reason, the condition\n    is eased to two **sibling** ontology classes.\n\n    Args:\n        num_samples (int): The (maximum) number of **unique** samples extracted; this is\n            required **unlike for synonym sampling** because the non-synonym pool is **significantly\n            larger** (considering random combinations of different synonym groups).\n        max_iter (int): The maximum number of iterations for conducting sampling. Defaults to `5`.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique (hard) non-synonym pair samples.\n    \"\"\"\n    # intialise the sibling class groups\n    self.onto.sibling_class_groups\n\n    if not self.onto.sibling_class_groups:\n        warnings.warn(\"Skip hard negative sampling as no sibling class groups are defined.\")\n        return []\n\n    # flatten the disjointness groups into all pairs of hard neagtives\n    nonsynonym_pool = []\n    # randomly (weighted) select a number of sibling class groups with replacement\n    sibling_class_groups = self.weighted_random_choices_of_sibling_groups(k=num_samples)\n\n    for sibling_class_group in sibling_class_groups:\n        # random select two sibling classes; no weights this time\n        left_class_iri, right_class_iri = tuple(random.sample(sibling_class_group, 2))\n        try:\n            # random select a label for each of them\n            left_label = random.choice(list(self.annotation_index[left_class_iri]))\n            right_label = random.choice(list(self.annotation_index[right_class_iri]))\n            # add the label pair to the pool\n            nonsynonym_pool.append((left_label, right_label))\n        except:\n            # skip them if there are no class labels\n            continue\n\n    # DataUtils.uniqify is too slow so we should avoid operating it too often\n    nonsynonym_pool = uniqify(nonsynonym_pool)\n\n    while len(nonsynonym_pool) &lt; num_samples and max_iter &gt; 0:\n        max_iter = max_iter - 1  # reduce the iteration to prevent exhausting loop\n        nonsynonym_pool += self.hard_nonsynonym_sampling(num_samples - len(nonsynonym_pool), max_iter)\n        nonsynonym_pool = uniqify(nonsynonym_pool)\n\n    return nonsynonym_pool\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.IntraOntologyTextSemanticsCorpus","title":"<code>IntraOntologyTextSemanticsCorpus(onto, annotation_property_iris, soft_negative_ratio=2, hard_negative_ratio=2)</code>","text":"<p>Class for creating the intra-ontology text semantics corpus from an ontology.</p> <p>As defined in the \\(\\textsf{BERTMap}\\) paper, the intra-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the ontology class annotations.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>An ontology to construct the intra-ontology text semantics corpus from.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>Specify which annotation properties to be used.</p> <code>soft_negative_ratio</code> <code>int</code> <p>The expected negative sample ratio of the soft non-synonyms to the extracted synonyms. Defaults to <code>2</code>.</p> <code>hard_negative_ratio</code> <code>int</code> <p>The expected negative sample ratio of the hard non-synonyms to the extracted synonyms. Defaults to <code>2</code>. However, hard non-synonyms are sometimes insufficient given an ontology's hierarchy, the soft ones are used to compensate the number in this case.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def __init__(\n    self,\n    onto: Ontology,\n    annotation_property_iris: List[str],\n    soft_negative_ratio: int = 2,\n    hard_negative_ratio: int = 2,\n):\n    self.onto = onto\n    # $\\textsf{BERTMap}$ does not apply synonym transitivity\n    self.thesaurus = AnnotationThesaurus(onto, annotation_property_iris, apply_transitivity=False)\n\n    self.synonyms = self.thesaurus.synonym_sampling()\n    # sample hard negatives first as they might not be enough\n    num_hard = hard_negative_ratio * len(self.synonyms)\n    self.hard_nonsynonyms = self.thesaurus.hard_nonsynonym_sampling(num_hard)\n    # compensate the number of hard negatives as soft negatives are almost always available\n    num_soft = (soft_negative_ratio + hard_negative_ratio) * len(self.synonyms) - len(self.hard_nonsynonyms)\n    self.soft_nonsynonyms = self.thesaurus.soft_nonsynonym_sampling(num_soft)\n\n    self.info = {\n        type(self).__name__: {\n            \"num_synonyms\": len(self.synonyms),\n            \"num_nonsynonyms\": len(self.soft_nonsynonyms) + len(self.hard_nonsynonyms),\n            \"num_soft_nonsynonyms\": len(self.soft_nonsynonyms),\n            \"num_hard_nonsynonyms\": len(self.hard_nonsynonyms),\n            \"annotation_thesaurus\": self.thesaurus.info[\"AnnotationThesaurus\"],\n        }\n    }\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.IntraOntologyTextSemanticsCorpus.save","title":"<code>save(save_path)</code>","text":"<p>Save the intra-ontology corpus (a <code>.json</code> file for label pairs and its summary) in the specified directory.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def save(self, save_path: str):\n\"\"\"Save the intra-ontology corpus (a `.json` file for label pairs\n    and its summary) in the specified directory.\n    \"\"\"\n    create_path(save_path)\n    save_json = {\n        \"summary\": self.info,\n        \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n        \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.soft_nonsynonyms + self.hard_nonsynonyms],\n    }\n    save_file(save_json, os.path.join(save_path, \"intra-onto.corpus.json\"))\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus","title":"<code>CrossOntologyTextSemanticsCorpus(class_mappings, src_onto, tgt_onto, annotation_property_iris, negative_ratio=4)</code>","text":"<p>Class for creating the cross-ontology text semantics corpus from two ontologies and provided mappings between them.</p> <p>As defined in the \\(\\textsf{BERTMap}\\) paper, the cross-ontology text semantics corpus consists of synonym and non-synonym pairs extracted from the annotations/labels of class pairs involved in the provided cross-ontology mappigns.</p> <p>Attributes:</p> Name Type Description <code>class_mappings</code> <code>List[ReferenceMapping]</code> <p>A list of cross-ontology class mappings.</p> <code>src_onto</code> <code>Ontology</code> <p>The source ontology whose class IRIs are heads of the <code>class_mappings</code>.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology whose class IRIs are tails of the <code>class_mappings</code>.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>A list of annotation property IRIs used to extract the annotations.</p> <code>negative_ratio</code> <code>int</code> <p>The expected negative sample ratio of the non-synonyms to the extracted synonyms. Defaults to <code>4</code>. NOTE that we do not have hard non-synonyms at the cross-ontology level.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def __init__(\n    self,\n    class_mappings: List[ReferenceMapping],\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    annotation_property_iris: List[str],\n    negative_ratio: int = 4,\n):\n    self.class_mappings = class_mappings\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    # build the annotation thesaurus for each ontology\n    self.src_thesaurus = AnnotationThesaurus(src_onto, annotation_property_iris)\n    self.tgt_thesaurus = AnnotationThesaurus(tgt_onto, annotation_property_iris)\n    self.negative_ratio = negative_ratio\n\n    self.synonyms = self.synonym_sampling_from_mappings()\n    num_negative = negative_ratio * len(self.synonyms)\n    self.nonsynonyms = self.nonsynonym_sampling_from_mappings(num_negative)\n\n    self.info = {\n        type(self).__name__: {\n            \"num_synonyms\": len(self.synonyms),\n            \"num_nonsynonyms\": len(self.nonsynonyms),\n            \"num_mappings\": len(self.class_mappings),\n            \"src_annotation_thesaurus\": self.src_thesaurus.info[\"AnnotationThesaurus\"],\n            \"tgt_annotation_thesaurus\": self.tgt_thesaurus.info[\"AnnotationThesaurus\"],\n        }\n    }\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.save","title":"<code>save(save_path)</code>","text":"<p>Save the cross-ontology corpus (a <code>.json</code> file for label pairs and its summary) in the specified directory.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def save(self, save_path: str):\n\"\"\"Save the cross-ontology corpus (a `.json` file for label pairs\n    and its summary) in the specified directory.\n    \"\"\"\n    create_path(save_path)\n    save_json = {\n        \"summary\": self.info,\n        \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n        \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.nonsynonyms],\n    }\n    save_file(save_json, os.path.join(save_path, \"cross-onto.corpus.json\"))\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.synonym_sampling_from_mappings","title":"<code>synonym_sampling_from_mappings()</code>","text":"<p>Sample synonyms from cross-ontology class mappings.</p> <p>Arguments of this method are all class attributes. See <code>CrossOntologyTextSemanticsCorpus</code>.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, cross-ontology synonyms are defined as label pairs that belong to two matched classes. Suppose the class \\(C\\) from the source ontology and the class \\(D\\) from the target ontology are matched according to one of the <code>class_mappings</code>, then the cartesian product of labels of \\(C\\) and labels of \\(D\\) form cross-ontology synonyms. Note that identity synonyms in the form of \\((a, a)\\) are removed because they have been covered in the intra-ontology case.</p> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique synonym pair samples from ontology class mappings.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def synonym_sampling_from_mappings(self):\nr\"\"\"Sample synonyms from cross-ontology class mappings.\n\n    Arguments of this method are all class attributes.\n    See [`CrossOntologyTextSemanticsCorpus`][deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus].\n\n    According to the $\\textsf{BERTMap}$ paper, **cross-ontology synonyms** are defined as label pairs\n    that belong to two **matched** classes. Suppose the class $C$ from the source ontology\n    and the class $D$ from the target ontology are matched according to one of the `class_mappings`,\n    then the cartesian product of labels of $C$ and labels of $D$ form cross-ontology synonyms.\n    Note that **identity synonyms** in the form of $(a, a)$ are removed because they have been covered\n    in the intra-ontology case.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique synonym pair samples from ontology class mappings.\n    \"\"\"\n    synonym_pool = []\n\n    for class_mapping in self.class_mappings:\n        src_class_iri, tgt_class_iri = class_mapping.to_tuple()\n        src_class_annotations = self.src_thesaurus.annotation_index[src_class_iri]\n        tgt_class_annotations = self.tgt_thesaurus.annotation_index[tgt_class_iri]\n        synonym_pairs = list(itertools.product(src_class_annotations, tgt_class_annotations))\n        # remove the identity synonyms as the have been covered in the intra-ontology case\n        synonym_pairs = [(l, r) for l, r in synonym_pairs if l != r]\n        backward_synonym_pairs = [(r, l) for l, r in synonym_pairs]\n        synonym_pool += synonym_pairs + backward_synonym_pairs\n\n    synonym_pool = uniqify(synonym_pool)\n    return synonym_pool\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus.nonsynonym_sampling_from_mappings","title":"<code>nonsynonym_sampling_from_mappings(num_samples, max_iter=5)</code>","text":"<p>Sample non-synonyms from cross-ontology class mappings.</p> <p>Arguments of this method are all class attributes. See <code>CrossOntologyTextSemanticsCorpus</code>.</p> <p>According to the \\(\\textsf{BERTMap}\\) paper, cross-ontology non-synonyms are defined as label pairs that belong to two unmatched classes. Assume that the provided class mappings are self-contained in the sense that they are complete for the classes involved in them, then we can randomly sample two cross-ontology classes that are not matched according to the mappings and take their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since the number of incorrect mappings is much larger than the number of correct ones.</p> <p>Returns:</p> Type Description <code>List[Tuple[str, str]]</code> <p>A list of unique nonsynonym pair samples from ontology class mappings.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def nonsynonym_sampling_from_mappings(self, num_samples: int, max_iter: int = 5):\nr\"\"\"Sample non-synonyms from cross-ontology class mappings.\n\n    Arguments of this method are all class attributes.\n    See [`CrossOntologyTextSemanticsCorpus`][deeponto.align.bertmap.text_semantics.CrossOntologyTextSemanticsCorpus].\n\n    According to the $\\textsf{BERTMap}$ paper, **cross-ontology non-synonyms** are defined as label pairs\n    that belong to two **unmatched** classes. Assume that the provided class mappings are self-contained\n    in the sense that they are complete for the classes involved in them, then we can randomly\n    sample two cross-ontology classes that are not matched according to the mappings and take\n    their labels as nonsynonyms. In practice, it is quite unlikely to obtain false negatives since\n    the number of incorrect mappings is much larger than the number of correct ones.\n\n    Returns:\n        (List[Tuple[str, str]]): A list of unique nonsynonym pair samples from ontology class mappings.\n    \"\"\"\n    nonsynonym_pool = []\n\n    # form cross-ontology synonym groups\n    cross_onto_synonym_group_pair = []\n    for class_mapping in self.class_mappings:\n        src_class_iri, tgt_class_iri = class_mapping.to_tuple()\n        src_class_annotations = self.src_thesaurus.annotation_index[src_class_iri]\n        tgt_class_annotations = self.tgt_thesaurus.annotation_index[tgt_class_iri]\n        # let each matched class pair's annotations form a synonym group_pair\n        cross_onto_synonym_group_pair.append((src_class_annotations, tgt_class_annotations))\n\n    # randomly select disjoint synonym group pairs from all\n    for _ in range(num_samples):\n        left_class_pair, right_class_pair = tuple(random.sample(cross_onto_synonym_group_pair, 2))\n        try:\n            # randomly choose one label from a synonym group\n            left_label = random.choice(list(left_class_pair[0]))  # choosing the src side by [0]\n            right_label = random.choice(list(right_class_pair[1]))  # choosing the tgt side by [1]\n            nonsynonym_pool.append((left_label, right_label))\n        except:\n            # skip if there are no class labels\n            continue\n\n    # DataUtils.uniqify is too slow so we should avoid operating it too often\n    nonsynonym_pool = uniqify(nonsynonym_pool)\n    while len(nonsynonym_pool) &lt; num_samples and max_iter &gt; 0:\n        max_iter = max_iter - 1  # reduce the iteration to prevent exhausting loop\n        nonsynonym_pool += self.nonsynonym_sampling_from_mappings(num_samples - len(nonsynonym_pool), max_iter)\n        nonsynonym_pool = uniqify(nonsynonym_pool)\n    return nonsynonym_pool\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora","title":"<code>TextSemanticsCorpora(src_onto, tgt_onto, annotation_property_iris, class_mappings=None, auxiliary_ontos=None)</code>","text":"<p>Class for creating the collection text semantics corpora.</p> <p>As defined in the \\(\\textsf{BERTMap}\\) paper, the collection of text semantics corpora contains at least two intra-ontology sub-corpora from the source and target ontologies, respectively. If some class mappings are provided, then a cross-ontology sub-corpus will be created. If some additional auxiliary ontologies are provided, the intra-ontology corpora created from them will serve as the auxiliary sub-corpora.</p> <p>Attributes:</p> Name Type Description <code>src_onto</code> <code>Ontology</code> <p>The source ontology to be matched or aligned.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology to be matched or aligned.</p> <code>annotation_property_iris</code> <code>List[str]</code> <p>A list of annotation property IRIs used to extract the annotations.</p> <code>class_mappings</code> <code>List[ReferenceMapping]</code> <p>A list of cross-ontology class mappings between the source and the target ontologies. Defaults to <code>None</code>.</p> <code>auxiliary_ontos</code> <code>List[Ontology]</code> <p>A list of auxiliary ontologies for augmenting more synonym/non-synonym samples. Defaults to <code>None</code>.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def __init__(\n    self,\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    annotation_property_iris: List[str],\n    class_mappings: Optional[List[ReferenceMapping]] = None,\n    auxiliary_ontos: Optional[List[Ontology]] = None,\n):\n    self.synonyms = []\n    self.nonsynonyms = []\n\n    # build intra-ontology corpora\n    # negative sample ratios are by default\n    self.intra_src_onto_corpus = IntraOntologyTextSemanticsCorpus(src_onto, annotation_property_iris)\n    self.add_samples_from_sub_corpus(self.intra_src_onto_corpus)\n    self.intra_tgt_onto_corpus = IntraOntologyTextSemanticsCorpus(tgt_onto, annotation_property_iris)\n    self.add_samples_from_sub_corpus(self.intra_tgt_onto_corpus)\n\n    # build cross-ontolgoy corpora\n    self.class_mappings = class_mappings\n    self.cross_onto_corpus = None\n    if self.class_mappings:\n        self.cross_onto_corpus = CrossOntologyTextSemanticsCorpus(\n            class_mappings, src_onto, tgt_onto, annotation_property_iris\n        )\n        self.add_samples_from_sub_corpus(self.cross_onto_corpus)\n\n    # build auxiliary ontology corpora (same as intra-ontology)\n    self.auxiliary_ontos = auxiliary_ontos\n    self.auxiliary_onto_corpora = []\n    if self.auxiliary_ontos:\n        for auxiliary_onto in self.auxiliary_ontos:\n            self.auxiliary_onto_corpora.append(\n                IntraOntologyTextSemanticsCorpus(auxiliary_onto, annotation_property_iris)\n            )\n    for auxiliary_onto_corpus in self.auxiliary_onto_corpora:\n        self.add_samples_from_sub_corpus(auxiliary_onto_corpus)\n\n    # DataUtils.uniqify the samples\n    self.synonyms = uniqify(self.synonyms)\n    self.nonsynonyms = uniqify(self.nonsynonyms)\n    # remove invalid nonsynonyms\n    self.nonsynonyms = list(set(self.nonsynonyms) - set(self.synonyms))\n\n    # summary\n    self.info = {\n        type(self).__name__: {\n            \"num_synonyms\": len(self.synonyms),\n            \"num_nonsynonyms\": len(self.nonsynonyms),\n            \"intra_src_onto_corpus\": self.intra_src_onto_corpus.info[\"IntraOntologyTextSemanticsCorpus\"],\n            \"intra_tgt_onto_corpus\": self.intra_tgt_onto_corpus.info[\"IntraOntologyTextSemanticsCorpus\"],\n            \"cross_onto_corpus\": self.cross_onto_corpus.info[\"CrossOntologyTextSemanticsCorpus\"]\n            if self.cross_onto_corpus\n            else None,\n            \"auxiliary_onto_corpora\": [\n                a.info[\"IntraOntologyTextSemanticsCorpus\"] for a in self.auxiliary_onto_corpora\n            ],\n        }\n    }\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora.save","title":"<code>save(save_path)</code>","text":"<p>Save the overall text semantics corpora (a <code>.json</code> file for label pairs and its summary) in the specified directory.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def save(self, save_path: str):\n\"\"\"Save the overall text semantics corpora (a `.json` file for label pairs\n    and its summary) in the specified directory.\n    \"\"\"\n    create_path(save_path)\n    save_json = {\n        \"summary\": self.info,\n        \"synonyms\": [(pos[0], pos[1], 1) for pos in self.synonyms],\n        \"nonsynonyms\": [(neg[0], neg[1], 0) for neg in self.nonsynonyms],\n    }\n    save_file(save_json, os.path.join(save_path, \"text-semantics.corpora.json\"))\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.text_semantics.TextSemanticsCorpora.add_samples_from_sub_corpus","title":"<code>add_samples_from_sub_corpus(sub_corpus)</code>","text":"<p>Add synonyms and non-synonyms from each sub-corpus to the overall collection.</p> Source code in <code>src/deeponto/align/bertmap/text_semantics.py</code> <pre><code>def add_samples_from_sub_corpus(\n    self, sub_corpus: Union[IntraOntologyTextSemanticsCorpus, CrossOntologyTextSemanticsCorpus]\n):\n\"\"\"Add synonyms and non-synonyms from each sub-corpus to the overall collection.\"\"\"\n    self.synonyms += sub_corpus.synonyms\n    if isinstance(sub_corpus, IntraOntologyTextSemanticsCorpus):\n        self.nonsynonyms += sub_corpus.soft_nonsynonyms + sub_corpus.hard_nonsynonyms\n    else:\n        self.nonsynonyms += sub_corpus.nonsynonyms\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier","title":"<code>BERTSynonymClassifier(loaded_path, output_path, eval_mode, max_length_for_input, num_epochs_for_training=None, batch_size_for_training=None, batch_size_for_prediction=None, training_data=None, validation_data=None)</code>","text":"<p>Class for BERT synonym classifier.</p> <p>The main scoring module of \\(\\textsf{BERTMap}\\) consisting of a BERT model and a binary synonym classifier.</p> <p>Attributes:</p> Name Type Description <code>loaded_path</code> <code>str</code> <p>The path to the checkpoint of a pre-trained BERT model.</p> <code>output_path</code> <code>str</code> <p>The path to the output BERT model (usually fine-tuned).</p> <code>eval_mode</code> <code>bool</code> <p>Set to <code>False</code> if the model is loaded for training.</p> <code>max_length_for_input</code> <code>int</code> <p>The maximum length of an input sequence.</p> <code>num_epochs_for_training</code> <code>int</code> <p>The number of epochs for training a BERT model.</p> <code>batch_size_for_training</code> <code>int</code> <p>The batch size for training a BERT model.</p> <code>batch_size_for_prediction</code> <code>int</code> <p>The batch size for making predictions.</p> <code>training_data</code> <code>Dataset</code> <p>Data for training the model if <code>for_training</code> is set to <code>True</code>. Defaults to <code>None</code>.</p> <code>validation_data</code> <code>Dataset</code> <p>Data for validating the model if <code>for_training</code> is set to <code>True</code>. Defaults to <code>None</code>.</p> <code>training_args</code> <code>TrainingArguments</code> <p>Training arguments for training the model if <code>for_training</code> is set to <code>True</code>. Defaults to <code>None</code>.</p> <code>trainer</code> <code>Trainer</code> <p>The model trainer fed with <code>training_args</code> and data samples. Defaults to <code>None</code>.</p> <code>softmax</code> <code>torch.nn.SoftMax</code> <p>The softmax layer used for normalising synonym scores. Defaults to <code>None</code>.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def __init__(\n    self,\n    loaded_path: str,\n    output_path: str,\n    eval_mode: bool,\n    max_length_for_input: int,\n    num_epochs_for_training: Optional[float] = None,\n    batch_size_for_training: Optional[int] = None,\n    batch_size_for_prediction: Optional[int] = None,\n    training_data: Optional[List[Tuple[str, str, int]]] = None,  # (sentence1, sentence2, label)\n    validation_data: Optional[List[Tuple[str, str, int]]] = None,\n):\n    # Load the pretrained BERT model from the given path\n    self.loaded_path = loaded_path\n    print(f\"Loading a BERT model from: {self.loaded_path}.\")\n    self.model = AutoModelForSequenceClassification.from_pretrained(\n        self.loaded_path, output_hidden_states=eval_mode\n    )\n    self.tokenizer = Tokenizer.from_pretrained(loaded_path)\n\n    self.output_path = output_path\n    self.eval_mode = eval_mode\n    self.max_length_for_input = max_length_for_input\n    self.num_epochs_for_training = num_epochs_for_training\n    self.batch_size_for_training = batch_size_for_training\n    self.batch_size_for_prediction = batch_size_for_prediction\n    self.training_data = None\n    self.validation_data = None\n    self.data_stat = {}\n    self.training_args = None\n    self.trainer = None\n    self.softmax = None\n\n    # load the pre-trained BERT model and set it to eval mode (static)\n    if self.eval_mode:\n        self.eval()\n    # load the pre-trained BERT model for fine-tuning\n    else:\n        if not training_data:\n            raise RuntimeError(\"Training data should be provided when `for_training` is `True`.\")\n        if not validation_data:\n            raise RuntimeError(\"Validation data should be provided when `for_training` is `True`.\")\n        # load data (max_length is used for truncation)\n        self.training_data = self.load_dataset(training_data, \"training\")\n        self.validation_data = self.load_dataset(validation_data, \"validation\")\n        self.data_stat = {\n            \"num_training\": len(self.training_data),\n            \"num_validation\": len(self.validation_data),\n        }\n\n        # generate training arguments\n        epoch_steps = len(self.training_data) // self.batch_size_for_training  # total steps of an epoch\n        if torch.cuda.device_count() &gt; 0:\n            epoch_steps = epoch_steps // torch.cuda.device_count()  # to deal with multi-gpus case\n        # keep logging steps consisitent even for small batch size\n        # report logging on every 0.02 epoch\n        logging_steps = int(epoch_steps * 0.02)\n        # eval on every 0.2 epoch\n        eval_steps = 10 * logging_steps\n        # generate the training arguments\n        self.training_args = TrainingArguments(\n            output_dir=self.output_path,\n            num_train_epochs=self.num_epochs_for_training,\n            per_device_train_batch_size=self.batch_size_for_training,\n            per_device_eval_batch_size=self.batch_size_for_training,\n            warmup_ratio=0.0,\n            weight_decay=0.01,\n            logging_steps=logging_steps,\n            logging_dir=f\"{self.output_path}/tensorboard\",\n            eval_steps=eval_steps,\n            evaluation_strategy=\"steps\",\n            do_train=True,\n            do_eval=True,\n            save_steps=eval_steps,\n            save_total_limit=2,\n            load_best_model_at_end=True,\n        )\n        # build the trainer\n        self.trainer = Trainer(\n            model=self.model,\n            args=self.training_args,\n            train_dataset=self.training_data,\n            eval_dataset=self.validation_data,\n            compute_metrics=self.compute_metrics,\n            tokenizer=self.tokenizer._tokenizer,\n        )\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.train","title":"<code>train(resume_from_checkpoint=None)</code>","text":"<p>Start training the BERT model.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def train(self, resume_from_checkpoint: Optional[Union[bool, str]] = None):\n\"\"\"Start training the BERT model.\"\"\"\n    if self.eval_mode:\n        raise RuntimeError(\"Training cannot be started in `eval` mode.\")\n    self.trainer.train(resume_from_checkpoint=resume_from_checkpoint)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.eval","title":"<code>eval()</code>","text":"<p>To eval mode.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def eval(self):\n\"\"\"To eval mode.\"\"\"\n    print(\"The BERT model is set to eval mode for making predictions.\")\n    self.model.eval()\n    # TODO: to implement multi-gpus for inference\n    self.device = self.get_device(device_num=0)\n    self.model.to(self.device)\n    self.softmax = torch.nn.Softmax(dim=1).to(self.device)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.predict","title":"<code>predict(sent_pairs)</code>","text":"<p>Run prediction pipeline for synonym classification.</p> <p>Return the <code>softmax</code> probailities of predicting pairs as synonyms (<code>index=1</code>).</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def predict(self, sent_pairs: List[Tuple[str, str]]):\nr\"\"\"Run prediction pipeline for synonym classification.\n\n    Return the `softmax` probailities of predicting pairs as synonyms (`index=1`).\n    \"\"\"\n    inputs = self.process_inputs(sent_pairs)\n    with torch.no_grad():\n        return self.softmax(self.model(**inputs).logits)[:, 1]\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.load_dataset","title":"<code>load_dataset(data, split)</code>","text":"<p>Load the list of <code>(annotation1, annotation2, label)</code> samples into a <code>datasets.Dataset</code>.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def load_dataset(self, data: List[Tuple[str, str, int]], split: str) -&gt; Dataset:\nr\"\"\"Load the list of `(annotation1, annotation2, label)` samples into a `datasets.Dataset`.\"\"\"\n\n    def iterate():\n        for sample in data:\n            yield {\"annotation1\": sample[0], \"annotation2\": sample[1], \"labels\": sample[2]}\n\n    dataset = Dataset.from_generator(iterate)\n    # NOTE: no padding here because the Trainer class supports dynamic padding\n    dataset = dataset.map(\n        lambda examples: self.tokenizer._tokenizer(\n            examples[\"annotation1\"], examples[\"annotation2\"], max_length=self.max_length_for_input, truncation=True\n        ),\n        batched=True,\n        desc=f\"Load {split} data:\",\n    )\n    return dataset\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.process_inputs","title":"<code>process_inputs(sent_pairs)</code>","text":"<p>Process input sentence pairs for the BERT model.</p> <p>Transform the sentences into BERT input embeddings and load them into the device. This function is called only when the BERT model is about to make predictions (<code>eval</code> mode).</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>def process_inputs(self, sent_pairs: List[Tuple[str, str]]):\nr\"\"\"Process input sentence pairs for the BERT model.\n\n    Transform the sentences into BERT input embeddings and load them into the device.\n    This function is called only when the BERT model is about to make predictions (`eval` mode).\n    \"\"\"\n    return self.tokenizer._tokenizer(\n        sent_pairs,\n        return_tensors=\"pt\",\n        max_length=self.max_length_for_input,\n        padding=True,\n        truncation=True,\n    ).to(self.device)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.compute_metrics","title":"<code>compute_metrics(pred)</code>  <code>staticmethod</code>","text":"<p>Add more evaluation metrics into the training log.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>@staticmethod\ndef compute_metrics(pred):\n\"\"\"Add more evaluation metrics into the training log.\"\"\"\n    # TODO: currently only accuracy is added, will expect more in the future if needed\n    labels = pred.label_ids\n    preds = pred.predictions.argmax(-1)\n    acc = accuracy_score(labels, preds)\n    return {\"accuracy\": acc}\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.get_device","title":"<code>get_device(device_num=0)</code>  <code>staticmethod</code>","text":"<p>Get a device (GPU or CPU) for the torch model</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>@staticmethod\ndef get_device(device_num: int = 0):\n\"\"\"Get a device (GPU or CPU) for the torch model\"\"\"\n    # If there's a GPU available...\n    if torch.cuda.is_available():\n        # Tell PyTorch to use the GPU.\n        device = torch.device(f\"cuda:{device_num}\")\n        print(\"There are %d GPU(s) available.\" % torch.cuda.device_count())\n        print(\"We will use the GPU:\", torch.cuda.get_device_name(device_num))\n    # If not...\n    else:\n        print(\"No GPU available, using the CPU instead.\")\n        device = torch.device(\"cpu\")\n    return device\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.bert_classifier.BERTSynonymClassifier.set_seed","title":"<code>set_seed(seed_val=888)</code>  <code>staticmethod</code>","text":"<p>Set random seed for reproducible results.</p> Source code in <code>src/deeponto/align/bertmap/bert_classifier.py</code> <pre><code>@staticmethod\ndef set_seed(seed_val: int = 888):\n\"\"\"Set random seed for reproducible results.\"\"\"\n    random.seed(seed_val)\n    np.random.seed(seed_val)\n    torch.manual_seed(seed_val)\n    torch.cuda.manual_seed_all(seed_val)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor","title":"<code>MappingPredictor(output_path, tokenizer_path, src_annotation_index, tgt_annotation_index, bert_synonym_classifier, num_raw_candidates, num_best_predictions, batch_size_for_prediction, logger, enlighten_manager, enlighten_status, ignored_class_index=None)</code>","text":"<p>Class for the mapping prediction module of \\(\\textsf{BERTMap}\\) and \\(\\textsf{BERTMapLt}\\) models.</p> <p>Attributes:</p> Name Type Description <code>tokenizer</code> <code>Tokenizer</code> <p>The tokenizer used for constructing the inverted annotation index and candidate selection.</p> <code>src_annotation_index</code> <code>dict</code> <p>A dictionary that stores the <code>(class_iri, class_annotations)</code> pairs from <code>src_onto</code> according to <code>annotation_property_iris</code>.</p> <code>tgt_annotation_index</code> <code>dict</code> <p>A dictionary that stores the <code>(class_iri, class_annotations)</code> pairs from <code>tgt_onto</code> according to <code>annotation_property_iris</code>.</p> <code>tgt_inverted_annotation_index</code> <code>InvertedIndex</code> <p>The inverted index built from <code>tgt_annotation_index</code> used for target class candidate selection.</p> <code>bert_synonym_classifier</code> <code>BERTSynonymClassifier</code> <p>The BERT synonym classifier fine-tuned on text semantics corpora.</p> <code>num_raw_candidates</code> <code>int</code> <p>The maximum number of selected target class candidates for a source class.</p> <code>num_best_predictions</code> <code>int</code> <p>The maximum number of best scored mappings presevred for a source class.</p> <code>batch_size_for_prediction</code> <code>int</code> <p>The batch size of class annotation pairs for computing synonym scores.</p> <code>ignored_class_index</code> <code>dict</code> <p>OAEI arguemnt, a dictionary that stores the <code>(class_iri, used_in_alignment)</code> pairs.</p> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>def __init__(\n    self,\n    output_path: str,\n    tokenizer_path: str,\n    src_annotation_index: dict,\n    tgt_annotation_index: dict,\n    bert_synonym_classifier: Optional[BERTSynonymClassifier],\n    num_raw_candidates: Optional[int],\n    num_best_predictions: Optional[int],\n    batch_size_for_prediction: int,\n    logger: Logger,\n    enlighten_manager: enlighten.Manager,\n    enlighten_status: enlighten.StatusBar,\n    ignored_class_index: Optional[dict] = None,\n):\n    self.logger = logger\n    self.enlighten_manager = enlighten_manager\n    self.enlighten_status = enlighten_status\n\n    self.tokenizer = Tokenizer.from_pretrained(tokenizer_path)\n\n    self.logger.info(\"Build inverted annotation index for candidate selection.\")\n    self.src_annotation_index = src_annotation_index\n    self.tgt_annotation_index = tgt_annotation_index\n    self.tgt_inverted_annotation_index = Ontology.build_inverted_annotation_index(\n        tgt_annotation_index, self.tokenizer\n    )\n    # the fundamental judgement for whether bertmap or bertmaplt is loaded\n    self.bert_synonym_classifier = bert_synonym_classifier\n    self.num_raw_candidates = num_raw_candidates\n    self.num_best_predictions = num_best_predictions\n    self.batch_size_for_prediction = batch_size_for_prediction\n    self.output_path = output_path\n\n    # for the OAEI, adding in check for classes that are not used in alignment\n    self.ignored_class_index = ignored_class_index\n\n    self.init_class_mapping = lambda head, tail, score: EntityMapping(head, tail, \"&lt;EquivalentTo&gt;\", score)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.bert_mapping_score","title":"<code>bert_mapping_score(src_class_annotations, tgt_class_annotations)</code>","text":"<p>\\(\\textsf{BERTMap}\\)'s main mapping score module which utilises the fine-tuned BERT synonym classifier.</p> <p>Compute the synonym score for each pair of src-tgt class annotations, and return the average score as the mapping score. Apply string matching before applying the BERT module to filter easy mappings (with scores \\(1.0\\)).</p> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>def bert_mapping_score(\n    self,\n    src_class_annotations: Set[str],\n    tgt_class_annotations: Set[str],\n):\nr\"\"\"$\\textsf{BERTMap}$'s main mapping score module which utilises the fine-tuned BERT synonym\n    classifier.\n\n    Compute the **synonym score** for each pair of src-tgt class annotations, and return\n    the **average** score as the mapping score. Apply string matching before applying the\n    BERT module to filter easy mappings (with scores $1.0$).\n    \"\"\"\n\n    if not src_class_annotations or not tgt_class_annotations:\n        warnings.warn(\"Return zero score due to empty input class annotations...\")\n        return 0.0\n\n    # apply string matching before applying the bert module\n    prelim_score = self.edit_similarity_mapping_score(\n        src_class_annotations,\n        tgt_class_annotations,\n        string_match_only=True,\n    )\n    if prelim_score == 1.0:\n        return prelim_score\n    # apply BERT classifier and define mapping score := Average(SynonymScores)\n    class_annotation_pairs = list(itertools.product(src_class_annotations, tgt_class_annotations))\n    synonym_scores = self.bert_synonym_classifier.predict(class_annotation_pairs)\n    # only one element tensor is able to be extracted as a scalar by .item()\n    return float(torch.mean(synonym_scores).item())\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.edit_similarity_mapping_score","title":"<code>edit_similarity_mapping_score(src_class_annotations, tgt_class_annotations, string_match_only=False)</code>  <code>staticmethod</code>","text":"<p>\\(\\textsf{BERTMap}\\)'s string match module and \\(\\textsf{BERTMapLt}\\)'s mapping prediction function.</p> <p>Compute the normalised edit similarity <code>(1 - normalised edit distance)</code> for each pair of src-tgt class annotations, and return the maximum score as the mapping score.</p> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>@staticmethod\ndef edit_similarity_mapping_score(\n    src_class_annotations: Set[str],\n    tgt_class_annotations: Set[str],\n    string_match_only: bool = False,\n):\nr\"\"\"$\\textsf{BERTMap}$'s string match module and $\\textsf{BERTMapLt}$'s mapping prediction function.\n\n    Compute the **normalised edit similarity** `(1 - normalised edit distance)` for each pair\n    of src-tgt class annotations, and return the **maximum** score as the mapping score.\n    \"\"\"\n\n    if not src_class_annotations or not tgt_class_annotations:\n        warnings.warn(\"Return zero score due to empty input class annotations...\")\n        return 0.0\n\n    # edge case when src and tgt classes have an exact match of annotation\n    if len(src_class_annotations.intersection(tgt_class_annotations)) &gt; 0:\n        return 1.0\n    # a shortcut to save time for $\\textsf{BERTMap}$\n    if string_match_only:\n        return 0.0\n    annotation_pairs = itertools.product(src_class_annotations, tgt_class_annotations)\n    sim_scores = [levenshtein.normalized_similarity(src, tgt) for src, tgt in annotation_pairs]\n    return max(sim_scores)\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction_for_src_class","title":"<code>mapping_prediction_for_src_class(src_class_iri)</code>","text":"<p>Predict \\(N\\) best scored mappings for a source ontology class, where \\(N\\) is specified in <code>self.num_best_predictions</code>.</p> <ol> <li>Apply the string matching module to compute \"easy\" mappings.</li> <li>Return the mappings if found any, or if there is no BERT synonym classifier as in \\(\\textsf{BERTMapLt}\\).</li> <li> <p>If using the BERT synonym classifier module:</p> <ul> <li>Generate batches for class annotation pairs. Each batch contains the combinations of the source class annotations and \\(M\\) target candidate classes' annotations. \\(M\\) is determined by <code>batch_size_for_prediction</code>, i.e., stop adding annotations of a target class candidate into the current batch if this operation will cause the size of current batch to exceed the limit.</li> <li>Compute the synonym scores for each batch and aggregate them into mapping scores; preserve \\(N\\) best scored candidates and update them in the next batch. By this dynamic process, we eventually get \\(N\\) best scored mappings for a source ontology class.</li> </ul> </li> </ol> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>def mapping_prediction_for_src_class(self, src_class_iri: str) -&gt; List[EntityMapping]:\nr\"\"\"Predict $N$ best scored mappings for a source ontology class, where\n    $N$ is specified in `self.num_best_predictions`.\n\n    1. Apply the **string matching** module to compute \"easy\" mappings.\n    2. Return the mappings if found any, or if there is no BERT synonym classifier\n    as in $\\textsf{BERTMapLt}$.\n    3. If using the BERT synonym classifier module:\n\n        - Generate batches for class annotation pairs. Each batch contains the combinations of the\n        source class annotations and $M$ target candidate classes' annotations. $M$ is determined\n        by `batch_size_for_prediction`, i.e., stop adding annotations of a target class candidate into\n        the current batch if this operation will cause the size of current batch to exceed the limit.\n        - Compute the synonym scores for each batch and aggregate them into mapping scores; preserve\n        $N$ best scored candidates and update them in the next batch. By this dynamic process, we eventually\n        get $N$ best scored mappings for a source ontology class.\n    \"\"\"\n\n    src_class_annotations = self.src_annotation_index[src_class_iri]\n    # previously wrongly put tokenizer again !!!\n    tgt_class_candidates = self.tgt_inverted_annotation_index.idf_select(\n        list(src_class_annotations), pool_size=len(self.tgt_annotation_index.keys())\n    )  # [(tgt_class_iri, idf_score)]\n    # if some classes are set to be ignored, remove them from the candidates\n    if self.ignored_class_index:\n        tgt_class_candidates = [(iri, idf_score) for iri, idf_score in tgt_class_candidates if not self.ignored_class_index[iri]]\n    # select a truncated number of candidates\n    tgt_class_candidates = tgt_class_candidates[:self.num_raw_candidates]\n    best_scored_mappings = []\n\n    # for string matching: save time if already found string-matched candidates\n    def string_match():\n\"\"\"Compute string-matched mappings.\"\"\"\n        string_matched_mappings = []\n        for tgt_candidate_iri, _ in tgt_class_candidates:\n            tgt_candidate_annotations = self.tgt_annotation_index[tgt_candidate_iri]\n            prelim_score = self.edit_similarity_mapping_score(\n                src_class_annotations,\n                tgt_candidate_annotations,\n                string_match_only=True,\n            )\n            if prelim_score &gt; 0.0:\n                # if src_class_annotations.intersection(tgt_candidate_annotations):\n                string_matched_mappings.append(\n                    self.init_class_mapping(src_class_iri, tgt_candidate_iri, prelim_score)\n                )\n\n        return string_matched_mappings\n\n    best_scored_mappings += string_match()\n    # return string-matched mappings if found or if there is no bert module (bertmaplt)\n    if best_scored_mappings or not self.bert_synonym_classifier:\n        self.logger.info(f\"The best scored class mappings for {src_class_iri} are\\n{best_scored_mappings}\")\n        return best_scored_mappings\n\n    def generate_batched_annotations(batch_size: int):\n\"\"\"Generate batches of class annotations for the input source class and its\n        target candidates.\n        \"\"\"\n        batches = []\n        # the `nums`` parameter determines how the annotations are grouped\n        current_batch = CfgNode({\"annotations\": [], \"nums\": []})\n        for i, (tgt_candidate_iri, _) in enumerate(tgt_class_candidates):\n            tgt_candidate_annotations = self.tgt_annotation_index[tgt_candidate_iri]\n            annotation_pairs = list(itertools.product(src_class_annotations, tgt_candidate_annotations))\n            current_batch.annotations += annotation_pairs\n            num_annotation_pairs = len(annotation_pairs)\n            current_batch.nums.append(num_annotation_pairs)\n            # collect when the batch is full or for the last target class candidate\n            if sum(current_batch.nums) &gt; batch_size or i == len(tgt_class_candidates) - 1:\n                batches.append(current_batch)\n                current_batch = CfgNode({\"annotations\": [], \"nums\": []})\n        return batches\n\n    def bert_match():\n\"\"\"Compute mappings with fine-tuned BERT synonym classifier.\"\"\"\n        bert_matched_mappings = []\n        class_annotation_batches = generate_batched_annotations(self.batch_size_for_prediction)\n        batch_base_candidate_idx = (\n            0  # after each batch, the base index will be increased by # of covered target candidates\n        )\n        device = self.bert_synonym_classifier.device\n\n        # intialize N prediction scores and N corresponding indices w.r.t `tgt_class_candidates`\n        final_best_scores = torch.tensor([-1] * self.num_best_predictions).to(device)\n        final_best_idxs = torch.tensor([-1] * self.num_best_predictions).to(device)\n\n        for annotation_batch in class_annotation_batches:\n\n            synonym_scores = self.bert_synonym_classifier.predict(annotation_batch.annotations)\n            # aggregating to mappings cores\n            grouped_synonym_scores = torch.split(\n                synonym_scores,\n                split_size_or_sections=annotation_batch.nums,\n            )\n            mapping_scores = torch.stack([torch.mean(chunk) for chunk in grouped_synonym_scores])\n            assert len(mapping_scores) == len(annotation_batch.nums)\n\n            # preserve N best scored mappings\n            # scale N in case there are less than N tgt candidates in this batch\n            N = min(len(mapping_scores), self.num_best_predictions)\n            batch_best_scores, batch_best_idxs = torch.topk(mapping_scores, k=N)\n            batch_best_idxs += batch_base_candidate_idx\n\n            # we do the substitution for every batch to prevent from memory overflow\n            final_best_scores, _idxs = torch.topk(\n                torch.cat([batch_best_scores, final_best_scores]),\n                k=self.num_best_predictions,\n            )\n            final_best_idxs = torch.cat([batch_best_idxs, final_best_idxs])[_idxs]\n\n            # update the index for target candidate classes\n            batch_base_candidate_idx += len(annotation_batch.nums)\n\n        for candidate_idx, mapping_score in zip(final_best_idxs, final_best_scores):\n            # ignore intial values (-1.0) for dummy mappings\n            # the threshold 0.9 is for mapping extension\n            if mapping_score.item() &gt;= 0.9:\n                tgt_candidate_iri = tgt_class_candidates[candidate_idx.item()][0]\n                bert_matched_mappings.append(\n                    self.init_class_mapping(\n                        src_class_iri,\n                        tgt_candidate_iri,\n                        mapping_score.item(),\n                    )\n                )\n\n        assert len(bert_matched_mappings) &lt;= self.num_best_predictions\n        self.logger.info(f\"The best scored class mappings for {src_class_iri} are\\n{bert_matched_mappings}\")\n        return bert_matched_mappings\n\n    return bert_match()\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction","title":"<code>mapping_prediction()</code>","text":"<p>Apply global matching for each class in the source ontology.</p> <p>See <code>mapping_prediction_for_src_class</code>.</p> <p>If this process is accidentally stopped, it can be resumed from already saved predictions. The progress bar keeps track of the number of source ontology classes that have been matched.</p> Source code in <code>src/deeponto/align/bertmap/mapping_prediction.py</code> <pre><code>def mapping_prediction(self):\nr\"\"\"Apply global matching for each class in the source ontology.\n\n    See [`mapping_prediction_for_src_class`][deeponto.align.bertmap.mapping_prediction.MappingPredictor.mapping_prediction_for_src_class].\n\n    If this process is accidentally stopped, it can be resumed from already saved predictions. The progress\n    bar keeps track of the number of source ontology classes that have been matched.\n    \"\"\"\n    self.logger.info(\"Start global matching for each class in the source ontology.\")\n\n    match_dir = os.path.join(self.output_path, \"match\")\n    try:\n        mapping_index = load_file(os.path.join(match_dir, \"raw_mappings.json\"))\n        self.logger.info(\"Load the existing mapping prediction file.\")\n    except:\n        mapping_index = dict()\n        create_path(match_dir)\n\n    progress_bar = self.enlighten_manager.counter(\n        total=len(self.src_annotation_index), desc=\"Mapping Prediction\", unit=\"per src class\"\n    )\n    self.enlighten_status.update(demo=\"Mapping Prediction\")\n\n    for i, src_class_iri in enumerate(self.src_annotation_index.keys()):\n        # skip computed classes\n        if src_class_iri in mapping_index.keys():\n            self.logger.info(f\"[Class {i}] Skip matching {src_class_iri} as already computed.\")\n            progress_bar.update()\n            continue\n        # for OAEI\n        if self.ignored_class_index and self.ignored_class_index[src_class_iri]:\n            self.logger.info(f\"[Class {i}] Skip matching {src_class_iri} as marked as not used in alignment.\")\n            progress_bar.update()\n            continue\n        mappings = self.mapping_prediction_for_src_class(src_class_iri)\n        mapping_index[src_class_iri] = [m.to_tuple(with_score=True) for m in mappings]\n\n        if i % 100 == 0 or i == len(self.src_annotation_index) - 1:\n            save_file(mapping_index, os.path.join(match_dir, \"raw_mappings.json\"))\n            # also save a .tsv version\n            mapping_in_tuples = list(itertools.chain.from_iterable(mapping_index.values()))\n            mapping_df = pd.DataFrame(mapping_in_tuples, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n            mapping_df.to_csv(os.path.join(match_dir, \"raw_mappings.tsv\"), sep=\"\\t\", index=False)\n            self.logger.info(\"Save currently computed mappings to prevent undesirable loss.\")\n\n        progress_bar.update()\n\n    self.logger.info(\"Finished mapping prediction for each class in the source ontology.\")\n    progress_bar.close()\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner","title":"<code>MappingRefiner(output_path, src_onto, tgt_onto, mapping_predictor, mapping_extension_threshold, mapping_filtered_threshold, logger, enlighten_manager, enlighten_status)</code>","text":"<p>Class for the mapping refinement module of \\(\\textsf{BERTMap}\\).</p> <p>\\(\\textsf{BERTMapLt}\\) does not go through mapping refinement for its being \"light\". All the attributes of this class are supposed to be passed from <code>BERTMapPipeline</code>.</p> <p>Attributes:</p> Name Type Description <code>src_onto</code> <code>Ontology</code> <p>The source ontology to be matched.</p> <code>tgt_onto</code> <code>Ontology</code> <p>The target ontology to be matched.</p> <code>mapping_predictor</code> <code>MappingPredictor</code> <p>The mapping prediction module of BERTMap.</p> <code>mapping_extension_threshold</code> <code>float</code> <p>Mappings with scores \\(\\geq\\) this value will be considered in the iterative mapping extension process.</p> <code>raw_mappings</code> <code>List[EntityMapping]</code> <p>List of raw class mappings predicted in the global matching phase.</p> <code>mapping_score_dict</code> <code>dict</code> <p>A dynamic dictionary that keeps track of mappings (with scores) that have already been computed.</p> <code>mapping_filter_threshold</code> <code>float</code> <p>Mappings with scores \\(\\geq\\) this value will be preserved for the final mapping repairing.</p> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def __init__(\n    self,\n    output_path: str,\n    src_onto: Ontology,\n    tgt_onto: Ontology,\n    mapping_predictor: MappingPredictor,\n    mapping_extension_threshold: float,\n    mapping_filtered_threshold: float,\n    logger: Logger,\n    enlighten_manager: enlighten.Manager,\n    enlighten_status: enlighten.StatusBar\n):\n    self.output_path = output_path\n    self.logger = logger\n    self.enlighten_manager = enlighten_manager\n    self.enlighten_status = enlighten_status\n\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n\n    # iterative mapping extension\n    self.mapping_predictor = mapping_predictor\n    self.mapping_extension_threshold = mapping_extension_threshold  # \\kappa\n    self.raw_mappings = EntityMapping.read_table_mappings(\n        os.path.join(self.output_path, \"match\", \"raw_mappings.tsv\"),\n        threshold=self.mapping_extension_threshold,\n        relation=\"&lt;EquivalentTo&gt;\",\n    )\n    # keep track of already scored mappings to prevent duplicated predictions\n    self.mapping_score_dict = dict()\n    for m in self.raw_mappings:\n        src_class_iri, tgt_class_iri, score = m.to_tuple(with_score=True)\n        self.mapping_score_dict[(src_class_iri, tgt_class_iri)] = score\n\n    # the threshold for final filtering the extended mappings\n    self.mapping_filtered_threshold = mapping_filtered_threshold  # \\lambda\n\n    # logmap mapping repair folder\n    self.logmap_repair_path = os.path.join(self.output_path, \"match\", \"logmap-repair\")\n\n    # paths for mapping extension and repair\n    self.extended_mapping_path = os.path.join(self.output_path, \"match\", \"extended_mappings.tsv\")\n    self.filtered_mapping_path = os.path.join(self.output_path, \"match\", \"filtered_mappings.tsv\")\n    self.repaired_mapping_path = os.path.join(self.output_path, \"match\", \"repaired_mappings.tsv\")\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.mapping_extension","title":"<code>mapping_extension(max_iter=10)</code>","text":"<p>Iterative mapping extension based on the locality principle.</p> <p>For each class pair \\((c, c')\\) (scored in the global matching phase) with score  \\(\\geq \\kappa\\), search for plausible mappings between the parents of \\(c\\) and \\(c'\\), and between the children of \\(c\\) and \\(c'\\). This is an iterative process as the set  newly discovered mappings can act renew the frontier for searching. Terminate if no new mappings with score \\(\\geq \\kappa\\) can be found or the limit <code>max_iter</code> has  been reached. Note that \\(\\kappa\\) is set to \\(0.9\\) by default (can be altered in the configuration file). The mapping extension progress bar keeps track of the  total number of extended mappings (including the previously predicted ones).</p> <p>A further filtering will be performed by only preserving mappings with score \\(\\geq \\lambda\\), in the original BERTMap paper, \\(\\lambda\\) is determined by the validation mappings, but in practice \\(\\lambda\\) is not a sensitive hyperparameter and validation mappings are often not available. Therefore, we manually set \\(\\lambda\\) to \\(0.9995\\) by default (can be altered in the configuration file). The mapping filtering progress bar keeps track of the  total number of filtered mappings (this bar is purely for logging purpose).</p> <p>Parameters:</p> Name Type Description Default <code>max_iter</code> <code>int</code> <p>The maximum number of mapping extension iterations. Defaults to <code>10</code>.</p> <code>10</code> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def mapping_extension(self, max_iter: int = 10):\nr\"\"\"Iterative mapping extension based on the locality principle.\n\n    For each class pair $(c, c')$ (scored in the global matching phase) with score \n    $\\geq \\kappa$, search for plausible mappings between the parents of $c$ and $c'$,\n    and between the children of $c$ and $c'$. This is an iterative process as the set \n    newly discovered mappings can act renew the frontier for searching. Terminate if\n    no new mappings with score $\\geq \\kappa$ can be found or the limit `max_iter` has \n    been reached. Note that $\\kappa$ is set to $0.9$ by default (can be altered\n    in the configuration file). The mapping extension progress bar keeps track of the \n    total number of extended mappings (including the previously predicted ones).\n\n    A further filtering will be performed by only preserving mappings with score $\\geq \\lambda$,\n    in the original BERTMap paper, $\\lambda$ is determined by the validation mappings, but\n    in practice $\\lambda$ is not a sensitive hyperparameter and validation mappings are often\n    not available. Therefore, we manually set $\\lambda$ to $0.9995$ by default (can be altered\n    in the configuration file). The mapping filtering progress bar keeps track of the \n    total number of filtered mappings (this bar is purely for logging purpose).\n\n    Args:\n        max_iter (int, optional): The maximum number of mapping extension iterations. Defaults to `10`.\n    \"\"\"\n\n    num_iter = 0\n    self.enlighten_status.update(demo=\"Mapping Extension\")\n    extension_progress_bar = self.enlighten_manager.counter(\n        desc=f\"Mapping Extension [Iteration #{num_iter}]\", unit=\"mapping\"\n    )\n    filtering_progress_bar = self.enlighten_manager.counter(\n        desc=f\"Mapping Filtering\", unit=\"mapping\"\n    )\n\n    if os.path.exists(self.extended_mapping_path) and os.path.exists(self.filtered_mapping_path):\n        self.logger.info(\n            f\"Found extended and filtered mapping files at {self.extended_mapping_path}\"\n            + f\" and {self.filtered_mapping_path}.\\nPlease check file integrity; if incomplete, \"\n            + \"delete them and re-run the program.\"\n        )\n\n        # for animation purposes\n        extension_progress_bar.desc = f\"Mapping Extension\"\n        for _ in EntityMapping.read_table_mappings(self.extended_mapping_path):\n            extension_progress_bar.update()\n\n        self.enlighten_status.update(demo=\"Mapping Filtering\")\n        for _ in EntityMapping.read_table_mappings(self.filtered_mapping_path):\n            filtering_progress_bar.update()\n\n        extension_progress_bar.close()\n        filtering_progress_bar.close()\n\n        return\n    # intialise the frontier, explored, final expansion sets with the raw mappings\n    # NOTE be careful of address pointers\n    frontier = [m.to_tuple() for m in self.raw_mappings]\n    expansion = [m.to_tuple(with_score=True) for m in self.raw_mappings]\n    # for animation purposes\n    for _ in range(len(expansion)):\n        extension_progress_bar.update()\n\n    self.logger.info(\n        f\"Start mapping extension for each class pair with score &gt;= {self.mapping_extension_threshold}.\"\n    )\n    while frontier and num_iter &lt; max_iter:\n        new_mappings = []\n        for src_class_iri, tgt_class_iri in frontier:\n            # one hop extension makes sure new mappings are really \"new\"\n            cur_new_mappings = self.one_hop_extend(src_class_iri, tgt_class_iri)\n            extension_progress_bar.update(len(cur_new_mappings))\n            new_mappings += cur_new_mappings\n        # add new mappings to the expansion set\n        expansion += new_mappings\n        # renew frontier with the newly discovered mappings\n        frontier = [(x, y) for x, y, _ in new_mappings]\n\n        self.logger.info(f\"Add {len(new_mappings)} mappings at iteration #{num_iter}.\")\n        num_iter += 1\n        extension_progress_bar.desc = f\"Mapping Extension [Iteration #{num_iter}]\"\n\n    num_extended = len(expansion) - len(self.raw_mappings)\n    self.logger.info(\n        f\"Finished iterative mapping extension with {num_extended} new mappings and in total {len(expansion)} extended mappings.\"\n    )\n\n    extended_mapping_df = pd.DataFrame(expansion, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n    extended_mapping_df.to_csv(self.extended_mapping_path, sep=\"\\t\", index=False)\n\n    self.enlighten_status.update(demo=\"Mapping Filtering\")\n\n    filtered_expansion = [\n        (src, tgt, score) for src, tgt, score in expansion if score &gt;= self.mapping_filtered_threshold\n    ]\n    self.logger.info(\n        f\"Filtered the extended mappings by a threshold of {self.mapping_filtered_threshold}.\"\n        + f\"There are {len(filtered_expansion)} mappings left for mapping repair.\"\n    )\n\n    for _ in range(len(filtered_expansion)):\n        filtering_progress_bar.update()\n\n    filtered_mapping_df = pd.DataFrame(filtered_expansion, columns=[\"SrcEntity\", \"TgtEntity\", \"Score\"])\n    filtered_mapping_df.to_csv(self.filtered_mapping_path, sep=\"\\t\", index=False)\n\n    extension_progress_bar.close()\n    filtering_progress_bar.close()\n    return filtered_expansion\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.one_hop_extend","title":"<code>one_hop_extend(src_class_iri, tgt_class_iri, pool_size=200)</code>","text":"<p>Extend mappings from a scored class pair \\((c, c')\\) by searching from one-hop neighbors.</p> <p>Search for plausible mappings between the parents of \\(c\\) and \\(c'\\), and between the children of \\(c\\) and \\(c'\\). Mappings that are not already computed (recorded in <code>self.mapping_score_dict</code>) and have a score \\(\\geq\\) <code>self.mapping_extension_threshold</code> will be returned as new mappings.</p> <p>Parameters:</p> Name Type Description Default <code>src_class_iri</code> <code>str</code> <p>The IRI of the source ontology class \\(c\\).</p> required <code>tgt_class_iri</code> <code>str</code> <p>The IRI of the target ontology class \\(c'\\).</p> required <code>pool_size</code> <code>int</code> <p>The maximum number of plausible mappings to be extended. Defaults to 200.</p> <code>200</code> <p>Returns:</p> Type Description <code>List[EntityMapping]</code> <p>A list of one-hop extended mappings.</p> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def one_hop_extend(self, src_class_iri: str, tgt_class_iri: str, pool_size: int = 200):\nr\"\"\"Extend mappings from a scored class pair $(c, c')$ by\n    searching from one-hop neighbors.\n\n    Search for plausible mappings between the parents of $c$ and $c'$,\n    and between the children of $c$ and $c'$. Mappings that are not\n    already computed (recorded in `self.mapping_score_dict`) and have\n    a score $\\geq$ `self.mapping_extension_threshold` will be returned as\n    **new** mappings.\n\n    Args:\n        src_class_iri (str): The IRI of the source ontology class $c$.\n        tgt_class_iri (str): The IRI of the target ontology class $c'$.\n        pool_size (int, optional): The maximum number of plausible mappings to be extended. Defaults to 200.\n\n    Returns:\n        (List[EntityMapping]): A list of one-hop extended mappings.\n    \"\"\"\n\n    def get_iris(owl_objects):\n        return [str(x.getIRI()) for x in owl_objects]\n\n    src_class = self.src_onto.get_owl_object(src_class_iri)\n    src_class_parent_iris = get_iris(self.src_onto.get_asserted_parents(src_class, named_only=True))\n    src_class_children_iris = get_iris(self.src_onto.get_asserted_children(src_class, named_only=True))\n\n    tgt_class = self.tgt_onto.get_owl_object(tgt_class_iri)\n    tgt_class_parent_iris = get_iris(self.tgt_onto.get_asserted_parents(tgt_class, named_only=True))\n    tgt_class_children_iris = get_iris(self.tgt_onto.get_asserted_children(tgt_class, named_only=True))\n\n    # pair up parents and children, respectively; NOTE set() might not be necessary\n    parent_pairs = list(set(itertools.product(src_class_parent_iris, tgt_class_parent_iris)))\n    children_pairs = list(set(itertools.product(src_class_children_iris, tgt_class_children_iris)))\n\n    candidate_pairs = parent_pairs + children_pairs\n    # downsample if the number of candidates is too large\n    if len(candidate_pairs) &gt; pool_size:\n        candidate_pairs = random.sample(candidate_pairs, pool_size)\n\n    extended_mappings = []\n    for src_candidate_iri, tgt_candidate_iri in parent_pairs + children_pairs:\n\n        # if already computed meaning that it is not a new mapping\n        if (src_candidate_iri, tgt_candidate_iri) in self.mapping_score_dict:\n            continue\n\n        src_candidate_annotations = self.mapping_predictor.src_annotation_index[src_candidate_iri]\n        tgt_candidate_annotations = self.mapping_predictor.tgt_annotation_index[tgt_candidate_iri]\n        score = self.mapping_predictor.bert_mapping_score(src_candidate_annotations, tgt_candidate_annotations)\n        # add to already scored collection\n        self.mapping_score_dict[(src_candidate_iri, tgt_candidate_iri)] = score\n\n        # skip mappings with low scores\n        if score &lt; self.mapping_extension_threshold:\n            continue\n\n        extended_mappings.append((src_candidate_iri, tgt_candidate_iri, score))\n\n    self.logger.info(\n        f\"New mappings (in tuples) extended from {(src_class_iri, tgt_class_iri)} are:\\n\" + f\"{extended_mappings}\"\n    )\n\n    return extended_mappings\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.mapping_repair","title":"<code>mapping_repair()</code>","text":"<p>Repair the filtered mappings with LogMap's debugger.</p> <p>Note</p> <p>A sub-folder under <code>match</code> named <code>logmap-repair</code> contains LogMap-related intermediate files.</p> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def mapping_repair(self):\n\"\"\"Repair the filtered mappings with LogMap's debugger.\n\n    !!! note\n\n        A sub-folder under `match` named `logmap-repair` contains LogMap-related intermediate files.\n    \"\"\"\n\n    # progress bar for animation purposes\n    self.enlighten_status.update(demo=\"Mapping Repairing\")\n    repair_progress_bar = self.enlighten_manager.counter(\n        desc=f\"Mapping Repairing\", unit=\"mapping\"\n    )\n\n    # skip repairing if already found the file\n    if os.path.exists(self.repaired_mapping_path):\n        self.logger.info(\n            f\"Found the repaired mapping file at {self.repaired_mapping_path}.\"\n            + \"\\nPlease check file integrity; if incomplete, \"\n            + \"delete it and re-run the program.\"\n        )\n        # update progress bar for animation purposes\n        for _ in EntityMapping.read_table_mappings(self.repaired_mapping_path):\n            repair_progress_bar.update()\n        repair_progress_bar.close()\n        return \n\n    # start mapping repair\n    self.logger.info(\"Repair the filtered mappings with LogMap debugger.\")\n    # formatting the filtered mappings\n    self.logmap_repair_formatting()\n\n    # run the LogMap repair module on the extended mappings\n    run_logmap_repair(\n        self.src_onto.owl_path,\n        self.tgt_onto.owl_path,\n        os.path.join(self.logmap_repair_path, f\"filtered_mappings_for_LogMap_repair.txt\"),\n        self.logmap_repair_path,\n        Ontology.get_max_jvm_memory()\n    )\n\n    # create table mappings from LogMap repair outputs\n    with open(os.path.join(self.logmap_repair_path, \"mappings_repaired_with_LogMap.tsv\"), \"r\") as f:\n        lines = f.readlines()\n    with open(os.path.join(self.output_path, \"match\", \"repaired_mappings.tsv\"), \"w+\") as f:\n        f.write(\"SrcEntity\\tTgtEntity\\tScore\\n\")\n        for line in lines:\n            src_ent_iri, tgt_ent_iri, score = line.split(\"\\t\")\n            f.write(f\"{src_ent_iri}\\t{tgt_ent_iri}\\t{score}\")\n            repair_progress_bar.update()\n\n    self.logger.info(\"Mapping repair finished.\")\n    repair_progress_bar.close()\n</code></pre>"},{"location":"deeponto/align/bertmap/#deeponto.align.bertmap.mapping_refinement.MappingRefiner.logmap_repair_formatting","title":"<code>logmap_repair_formatting()</code>","text":"<p>Transform the filtered mapping file into the LogMap format.</p> <p>An auxiliary function of the mapping repair module which requires mappings to be formatted as LogMap's input format.</p> Source code in <code>src/deeponto/align/bertmap/mapping_refinement.py</code> <pre><code>def logmap_repair_formatting(self):\n\"\"\"Transform the filtered mapping file into the LogMap format.\n\n    An auxiliary function of the mapping repair module which requires mappings\n    to be formatted as LogMap's input format.\n    \"\"\"\n    # read the filtered mapping file and convert to tuples\n    filtered_mappings = EntityMapping.read_table_mappings(self.filtered_mapping_path)\n    filtered_mappings_in_tuples = [m.to_tuple(with_score=True) for m in filtered_mappings]\n\n    # write the mappings into logmap format\n    lines = []\n    for src_class_iri, tgt_class_iri, score in filtered_mappings_in_tuples:\n        lines.append(f\"{src_class_iri}|{tgt_class_iri}|=|{score}|CLS\\n\")\n\n    # create a path to prevent error\n    create_path(self.logmap_repair_path)\n    formatted_file = os.path.join(self.logmap_repair_path, f\"filtered_mappings_for_LogMap_repair.txt\")\n    with open(formatted_file, \"w\") as f:\n        f.writelines(lines)\n\n    return lines\n</code></pre>"},{"location":"deeponto/align/bertsubs/","title":"BERTSubs (Inter)","text":""},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline","title":"<code>BERTSubsInterPipeline(src_onto, tgt_onto, config)</code>","text":"<p>Class for the model training and prediction/validation pipeline of inter-ontology subsumption of BERTSubs.</p> <p>Attributes:</p> Name Type Description <code>src_onto</code> <code>Ontology</code> <p>Source ontology (the sub-class side).</p> <code>tgt_onto</code> <code>Ontology</code> <p>Target ontology (the super-class side).</p> <code>config</code> <code>CfgNode</code> <p>Configuration.</p> <code>src_sampler</code> <code>SubsumptionSampler</code> <p>Object for sampling-related functions of the source ontology.</p> <code>tgt_sampler</code> <code>SubsumptionSampler</code> <p>Object for sampling-related functions of the target ontology.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def __init__(self, src_onto: Ontology, tgt_onto: Ontology, config: CfgNode):\n    self.src_onto = src_onto\n    self.tgt_onto = tgt_onto\n    self.config = config\n    self.config.label_property = self.config.src_label_property\n    self.src_sampler = SubsumptionSampler(onto=self.src_onto, config=self.config)\n    self.config.label_property = self.config.tgt_label_property\n    self.tgt_sampler = SubsumptionSampler(onto=self.tgt_onto, config=self.config)\n    start_time = datetime.datetime.now()\n\n    read_subsumptions = lambda file_name: [line.strip().split(',') for line in open(file_name).readlines()]\n    test_subsumptions = None if config.test_subsumption_file is None or config.test_subsumption_file == 'None' \\\n        else read_subsumptions(config.test_subsumption_file)\n    valid_subsumptions = None if config.valid_subsumption_file is None or config.valid_subsumption_file == 'None' \\\n        else read_subsumptions(config.valid_subsumption_file)\n\n    if config.use_ontology_subsumptions_training:\n        src_subsumptions = BERTSubsIntraPipeline.extract_subsumptions_from_ontology(onto=self.src_onto,\n                                                                                    subsumption_type=config.subsumption_type)\n        tgt_subsumptions = BERTSubsIntraPipeline.extract_subsumptions_from_ontology(onto=self.tgt_onto,\n                                                                                    subsumption_type=config.subsumption_type)\n        src_subsumptions0, tgt_subsumptions0 = [], []\n        if config.subsumption_type == 'named_class':\n            for subs in src_subsumptions:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                src_subsumptions0.append([str(c1.getIRI()), str(c2.getIRI())])\n            for subs in tgt_subsumptions:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                tgt_subsumptions0.append([str(c1.getIRI()), str(c2.getIRI())])\n        elif config.subsumption_type == 'restriction':\n            for subs in src_subsumptions:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                src_subsumptions0.append([str(c1.getIRI()), str(c2)])\n            for subs in tgt_subsumptions:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                tgt_subsumptions0.append([str(c1.getIRI()), str(c2)])\n            restrictions = BERTSubsIntraPipeline.extract_restrictions_from_ontology(onto=self.tgt_onto)\n            print('restrictions in the target ontology: %d' % len(restrictions))\n        else:\n            warnings.warn('Unknown subsumption type %s' % config.subsumption_type)\n            sys.exit(0)\n        print('Positive train subsumptions from the source/target ontology: %d/%d' % (\n            len(src_subsumptions0), len(tgt_subsumptions0)))\n\n        src_tr = self.src_sampler.generate_samples(subsumptions=src_subsumptions0)\n        tgt_tr = self.tgt_sampler.generate_samples(subsumptions=tgt_subsumptions0)\n    else:\n        src_tr, tgt_tr = [], []\n\n    if config.train_subsumption_file is None or config.train_subsumption_file == 'None':\n        tr = src_tr + tgt_tr\n    else:\n        train_subsumptions = read_subsumptions(config.train_subsumption_file)\n        tr = self.inter_ontology_sampling(subsumptions=train_subsumptions, pos_dup=config.fine_tune.train_pos_dup,\n                                          neg_dup=config.fine_tune.train_neg_dup)\n        tr = tr + src_tr + tgt_tr\n\n    if len(tr) == 0:\n        warnings.warn('No training samples extracted')\n        if config.fine_tune.do_fine_tune:\n            sys.exit(0)\n\n    end_time = datetime.datetime.now()\n    print('data pre-processing costs %.1f minutes' % ((end_time - start_time).seconds / 60))\n\n    start_time = datetime.datetime.now()\n    torch.cuda.empty_cache()\n    bert_trainer = BERTSubsumptionClassifierTrainer(config.fine_tune.pretrained, train_data=tr,\n                                                    val_data=tr[0:int(len(tr) / 5)],\n                                                    max_length=config.prompt.max_length,\n                                                    early_stop=config.fine_tune.early_stop)\n\n    epoch_steps = len(bert_trainer.tra) // config.fine_tune.batch_size  # total steps of an epoch\n    logging_steps = int(epoch_steps * 0.02) if int(epoch_steps * 0.02) &gt; 0 else 5\n    eval_steps = 5 * logging_steps\n    training_args = TrainingArguments(\n        output_dir=config.fine_tune.output_dir,\n        num_train_epochs=config.fine_tune.num_epochs,\n        per_device_train_batch_size=config.fine_tune.batch_size,\n        per_device_eval_batch_size=config.fine_tune.batch_size,\n        warmup_ratio=config.fine_tune.warm_up_ratio,\n        weight_decay=0.01,\n        logging_steps=logging_steps,\n        logging_dir=f\"{config.fine_tune.output_dir}/tb\",\n        eval_steps=eval_steps,\n        evaluation_strategy=\"steps\",\n        do_train=True,\n        do_eval=True,\n        save_steps=eval_steps,\n        load_best_model_at_end=True,\n        save_total_limit=1,\n        metric_for_best_model=\"accuracy\",\n        greater_is_better=True\n    )\n    if config.fine_tune.do_fine_tune and (config.prompt.prompt_type == 'traversal' or (\n            config.prompt.prompt_type == 'path' and config.prompt.use_sub_special_token)):\n        bert_trainer.add_special_tokens(['&lt;SUB&gt;'])\n\n    bert_trainer.train(train_args=training_args, do_fine_tune=config.fine_tune.do_fine_tune)\n    if config.fine_tune.do_fine_tune:\n        bert_trainer.trainer.save_model(\n            output_dir=os.path.join(config.fine_tune.output_dir, 'fine-tuned-checkpoint'))\n        print('fine-tuning done, fine-tuned model saved')\n    else:\n        print('pretrained or fine-tuned model loaded.')\n    end_time = datetime.datetime.now()\n    print('Fine-tuning costs %.1f minutes' % ((end_time - start_time).seconds / 60))\n\n    bert_trainer.model.eval()\n    self.device = torch.device(f\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n    bert_trainer.model.to(self.device)\n    self.tokenize = lambda x: bert_trainer.tokenizer(x, max_length=config.prompt.max_length, truncation=True,\n                                                     padding=True, return_tensors=\"pt\")\n    softmax = torch.nn.Softmax(dim=1)\n    self.classifier = lambda x: softmax(bert_trainer.model(**x).logits)[:, 1]\n\n    if valid_subsumptions is not None:\n        self.evaluate(target_subsumptions=valid_subsumptions, test_type='valid')\n\n    if test_subsumptions is not None:\n        if config.test_type == 'evaluation':\n            self.evaluate(target_subsumptions=test_subsumptions, test_type='test')\n        elif config.test_type == 'prediction':\n            self.predict(target_subsumptions=test_subsumptions)\n        else:\n            warnings.warn(\"Unknown test_type: %s\" % config.test_type)\n    print('\\n ------------------------- done! ---------------------------\\n\\n\\n')\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.inter_ontology_sampling","title":"<code>inter_ontology_sampling(subsumptions, pos_dup=1, neg_dup=1)</code>","text":"<p>Transform inter-ontology subsumptions to two-string samples</p> <p>Parameters:</p> Name Type Description Default <code>subsumptions</code> <code>List[List]</code> <p>A list of subsumptions; each subsumption is composed of two IRIs.</p> required <code>pos_dup</code> <code>int</code> <p>Positive sample duplication.</p> <code>1</code> <code>neg_dup</code> <code>int</code> <p>Negative sample duplication.</p> <code>1</code> Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def inter_ontology_sampling(self, subsumptions: List[List], pos_dup: int = 1, neg_dup: int = 1):\nr\"\"\"Transform inter-ontology subsumptions to two-string samples\n    Args:\n        subsumptions (List[List]): A list of subsumptions; each subsumption is composed of two IRIs.\n        pos_dup (int): Positive sample duplication.\n        neg_dup (int): Negative sample duplication.\n    \"\"\"\n    pos_samples = list()\n    for subs in subsumptions:\n        sub_strs = self.src_sampler.subclass_to_strings(subcls=subs[0])\n        sup_strs = self.tgt_sampler.supclass_to_strings(supcls=subs[1],\n                                                        subsumption_type=self.config.subsumption_type)\n        for sub_str in sub_strs:\n            for sup_str in sup_strs:\n                pos_samples.append([sub_str, sup_str, 1])\n    pos_samples = pos_dup * pos_samples\n\n    neg_subsumptions = list()\n    for subs in subsumptions:\n        for _ in range(neg_dup):\n            neg_c = self.tgt_sampler.get_negative_sample(subclass_iri=subs[1],\n                                                         subsumption_type=self.config.subsumption_type)\n            neg_subsumptions.append([subs[0], neg_c])\n\n    neg_samples = list()\n    for subs in neg_subsumptions:\n        sub_strs = self.src_sampler.subclass_to_strings(subcls=subs[0])\n        sup_strs = self.tgt_sampler.supclass_to_strings(supcls=subs[1],\n                                                        subsumption_type=self.config.subsumption_type)\n        for sub_str in sub_strs:\n            for sup_str in sup_strs:\n                neg_samples.append([sub_str, sup_str, 0])\n\n    if len(neg_samples) &lt; len(pos_samples):\n        neg_samples = neg_samples + [random.choice(neg_samples) for _ in range(len(pos_samples) - len(neg_samples))]\n    if len(neg_samples) &gt; len(pos_samples):\n        pos_samples = pos_samples + [random.choice(pos_samples) for _ in range(len(neg_samples) - len(pos_samples))]\n    print('training mappings, pos_samples: %d, neg_samples: %d' % (len(pos_samples), len(neg_samples)))\n    all_samples = [s for s in pos_samples + neg_samples if s[0] != '' and s[1] != '']\n    return all_samples\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.inter_ontology_subsumption_to_sample","title":"<code>inter_ontology_subsumption_to_sample(subsumption)</code>","text":"<p>Transform an inter ontology subsumption into a sample (a two-string list).</p> <p>Parameters:</p> Name Type Description Default <code>subsumption</code> <code>List</code> <p>a subsumption composed of two IRIs.</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def inter_ontology_subsumption_to_sample(self, subsumption: List):\nr\"\"\"Transform an inter ontology subsumption into a sample (a two-string list).\n\n    Args:\n        subsumption (List): a subsumption composed of two IRIs.\n    \"\"\"\n    subcls, supcls = subsumption[0], subsumption[1]\n    substrs = self.src_sampler.subclass_to_strings(subcls=subcls)\n    supstrs = self.tgt_sampler.supclass_to_strings(supcls=supcls, subsumption_type='named_class')\n    samples = list()\n    for substr in substrs:\n        for supstr in supstrs:\n            samples.append([substr, supstr])\n    return samples\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.score","title":"<code>score(samples)</code>","text":"<p>Score the samples with the classifier.</p> <p>Parameters:</p> Name Type Description Default <code>samples</code> <code>List[List]</code> <p>Each item is a list with two strings (input).</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def score(self, samples):\nr\"\"\"Score the samples with the classifier.\n\n    Args:\n        samples (List[List]): Each item is a list with two strings (input).\n    \"\"\"\n    sample_size = len(samples)\n    scores = np.zeros(sample_size)\n    batch_num = math.ceil(sample_size / self.config.evaluation.batch_size)\n    for i in range(batch_num):\n        j = (i + 1) * self.config.evaluation.batch_size \\\n            if (i + 1) * self.config.evaluation.batch_size &lt;= sample_size else sample_size\n        inputs = self.tokenize(samples[i * self.config.evaluation.batch_size:j])\n        inputs.to(self.device)\n        with torch.no_grad():\n            batch_scores = self.classifier(inputs)\n        scores[i * self.config.evaluation.batch_size:j] = batch_scores.cpu().numpy()\n    return scores\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.evaluate","title":"<code>evaluate(target_subsumptions, test_type='test')</code>","text":"<p>Test and calculate the metrics according to a given list of subsumptions.</p> <p>Parameters:</p> Name Type Description Default <code>target_subsumptions</code> <code>List[List]</code> <p>A list of subsumptions, each of which of is a two-component list <code>(subclass_iri, super_class_iri_or_str)</code>.</p> required <code>test_type</code> <code>str</code> <p><code>\"test\"</code> or <code>\"valid\"</code>.</p> <code>'test'</code> Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def evaluate(self, target_subsumptions: List[List], test_type: str = 'test'):\nr\"\"\"Test and calculate the metrics according to a given list of subsumptions.\n\n    Args:\n        target_subsumptions (List[List]): A list of subsumptions, each of which of is a two-component list `(subclass_iri, super_class_iri_or_str)`.\n        test_type (str): `\"test\"` or `\"valid\"`.\n    \"\"\"\n    MRR_sum, hits1_sum, hits5_sum, hits10_sum = 0, 0, 0, 0\n    MRR, Hits1, Hits5, Hits10 = 0, 0, 0, 0\n    size_sum, size_n = 0, 0\n    for k0, test in enumerate(target_subsumptions):\n        subcls, gt = test[0], test[1]\n        candidates = test[1:]\n        candidate_subsumptions = [[subcls, c] for c in candidates]\n        candidate_scores = np.zeros(len(candidate_subsumptions))\n        for k1, candidate_subsumption in enumerate(candidate_subsumptions):\n            samples = self.inter_ontology_subsumption_to_sample(subsumption=candidate_subsumption)\n            size_sum += len(samples)\n            size_n += 1\n            scores = self.score(samples=samples)\n            candidate_scores[k1] = np.average(scores)\n\n        sorted_indexes = np.argsort(candidate_scores)[::-1]\n        sorted_classes = [candidates[i] for i in sorted_indexes]\n        rank = sorted_classes.index(gt) + 1\n        MRR_sum += 1.0 / rank\n        hits1_sum += 1 if gt in sorted_classes[:1] else 0\n        hits5_sum += 1 if gt in sorted_classes[:5] else 0\n        hits10_sum += 1 if gt in sorted_classes[:10] else 0\n        num = k0 + 1\n        MRR, Hits1, Hits5, Hits10 = MRR_sum / num, hits1_sum / num, hits5_sum / num, hits10_sum / num\n        if num % 500 == 0:\n            print('\\n%d tested, MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n' % (\n                num, MRR, Hits1, Hits5, Hits10))\n    print('\\n[%s], MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n' % (test_type, MRR, Hits1, Hits5, Hits10))\n    print('%.2f samples per testing subsumption' % (size_sum / size_n))\n</code></pre>"},{"location":"deeponto/align/bertsubs/#deeponto.complete.bertsubs.pipeline_inter.BERTSubsInterPipeline.predict","title":"<code>predict(target_subsumptions)</code>","text":"<p>Predict a score for each given subsumption. </p> <p>The scores will be saved in <code>test_subsumption_scores.csv</code>.</p> <p>Parameters:</p> Name Type Description Default <code>target_subsumptions</code> <code>List[List]</code> <p>Each item is a list with the first element as the sub-class,                               and the remaining elements as n candidate super-classes.</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_inter.py</code> <pre><code>def predict(self, target_subsumptions: List[List]):\nr\"\"\"Predict a score for each given subsumption. \n\n    The scores will be saved in `test_subsumption_scores.csv`.\n\n    Args:\n        target_subsumptions (List[List]): Each item is a list with the first element as the sub-class,\n                                          and the remaining elements as n candidate super-classes.\n    \"\"\"\n    out_lines = []\n    for test in target_subsumptions:\n        subcls, candidates = test[0], test[1:]\n        candidate_subsumptions = [[subcls, c] for c in candidates]\n        candidate_scores = []\n\n        for candidate_subsumption in candidate_subsumptions:\n            samples = self.inter_ontology_subsumption_to_sample(subsumption=candidate_subsumption)\n            scores = self.score(samples=samples)\n            candidate_scores.append(np.average(scores))\n        out_lines.append(','.join([str(i) for i in candidate_scores]))\n\n    out_file = 'test_subsumption_scores.csv'\n    with open(out_file, 'w') as f:\n        for line in out_lines:\n            f.write('%s\\n' % line)\n    print('Predicted subsumption scores are saved to %s' % out_file)\n</code></pre>"},{"location":"deeponto/align/logmap/","title":"LogMap","text":"<p>Run LogMap matcher 4.0 in a <code>jar</code> command.</p> <p>Credit</p> <p>See LogMap repository at: https://github.com/ernestojimenezruiz/logmap-matcher.</p>"},{"location":"deeponto/align/logmap/#deeponto.align.logmap.run_logmap_repair","title":"<code>run_logmap_repair(src_onto_path, tgt_onto_path, mapping_file_path, output_path, max_jvm_memory='10g')</code>","text":"<p>Run the repair module of LogMap with <code>java -jar</code>.</p> Source code in <code>src/deeponto/align/logmap/__init__.py</code> <pre><code>def run_logmap_repair(\n    src_onto_path: str, tgt_onto_path: str, mapping_file_path: str, output_path: str, max_jvm_memory: str = \"10g\"\n):\n\"\"\"Run the repair module of LogMap with `java -jar`.\"\"\"\n\n    # find logmap directory\n    logmap_path = os.path.dirname(__file__)\n\n    # obtain absolute paths\n    src_onto_path = os.path.abspath(src_onto_path)\n    tgt_onto_path = os.path.abspath(tgt_onto_path)\n    mapping_file_path = os.path.abspath(mapping_file_path)\n    output_path = os.path.abspath(output_path)\n\n    # run jar command\n    print(f\"Run the repair module of LogMap from {logmap_path}.\")\n    repair_command = (\n        f\"java -Xms500m -Xmx{max_jvm_memory} -DentityExpansionLimit=100000000 -jar {logmap_path}/logmap-matcher-4.0.jar DEBUGGER \"\n        + f\"file:{src_onto_path} file:{tgt_onto_path} TXT {mapping_file_path}\"\n        + f\" {output_path} false false\"\n    )\n    print(f\"The jar command is:\\n{repair_command}.\")\n    run_jar(repair_command)\n</code></pre>"},{"location":"deeponto/complete/ontolama/","title":"OntoLAMA","text":""},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.inference.run_inference","title":"<code>run_inference(config, args)</code>","text":"<p>Main entry for running the OpenPrompt script.</p> Source code in <code>src/deeponto/complete/ontolama/inference.py</code> <pre><code>def run_inference(config, args):\n\"\"\"Main entry for running the OpenPrompt script.\n    \"\"\"\n    global CUR_TEMPLATE, CUR_VERBALIZER\n    # exit()\n    # init logger, create log dir and set log level, etc.\n    if args.resume and args.test:\n        raise Exception(\"cannot use flag --resume and --test together\")\n    if args.resume or args.test:\n        config.logging.path = EXP_PATH = args.resume or args.test\n    else:\n        EXP_PATH = config_experiment_dir(config)\n        init_logger(\n            os.path.join(EXP_PATH, \"log.txt\"),\n            config.logging.file_level,\n            config.logging.console_level,\n        )\n        # save config to the logger directory\n        save_config_to_yaml(config)\n\n    # load dataset. The valid_dataset can be None\n    train_dataset, valid_dataset, test_dataset, Processor = OntoLAMADataProcessor.load_inference_dataset(\n        config, test=args.test is not None or config.learning_setting == \"zero_shot\"\n    )\n\n    # main\n    if config.learning_setting == \"full\":\n        res = trainer(\n            EXP_PATH,\n            config,\n            Processor,\n            resume=args.resume,\n            test=args.test,\n            train_dataset=train_dataset,\n            valid_dataset=valid_dataset,\n            test_dataset=test_dataset,\n        )\n    elif config.learning_setting == \"few_shot\":\n        if config.few_shot.few_shot_sampling is None:\n            raise ValueError(\"use few_shot setting but config.few_shot.few_shot_sampling is not specified\")\n        seeds = config.sampling_from_train.seed\n        res = 0\n        for seed in seeds:\n            if not args.test:\n                sampler = FewShotSampler(\n                    num_examples_per_label=config.sampling_from_train.num_examples_per_label,\n                    also_sample_dev=config.sampling_from_train.also_sample_dev,\n                    num_examples_per_label_dev=config.sampling_from_train.num_examples_per_label_dev,\n                )\n                train_sampled_dataset, valid_sampled_dataset = sampler(\n                    train_dataset=train_dataset, valid_dataset=valid_dataset, seed=seed\n                )\n                result = trainer(\n                    os.path.join(EXP_PATH, f\"seed-{seed}\"),\n                    config,\n                    Processor,\n                    resume=args.resume,\n                    test=args.test,\n                    train_dataset=train_sampled_dataset,\n                    valid_dataset=valid_sampled_dataset,\n                    test_dataset=test_dataset,\n                )\n            else:\n                result = trainer(\n                    os.path.join(EXP_PATH, f\"seed-{seed}\"),\n                    config,\n                    Processor,\n                    test=args.test,\n                    test_dataset=test_dataset,\n                )\n            res += result\n        res /= len(seeds)\n    elif config.learning_setting == \"zero_shot\":\n        res = trainer(\n            EXP_PATH,\n            config,\n            Processor,\n            zero=True,\n            train_dataset=train_dataset,\n            valid_dataset=valid_dataset,\n            test_dataset=test_dataset,\n        )\n\n    return config, CUR_TEMPLATE, CUR_VERBALIZER\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase","title":"<code>SubsumptionSamplerBase(onto)</code>","text":"<p>Base Class for Sampling Subsumption Pairs.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def __init__(self, onto: Ontology):\n    self.onto = onto\n    self.progress_manager = enlighten.get_manager()\n\n    # for faster sampling\n    self.concept_iris = list(self.onto.owl_classes.keys())\n    self.object_property_iris = list(self.onto.owl_object_properties.keys())\n    self.sibling_concept_groups = self.onto.sibling_class_groups\n    self.sibling_auxiliary_dict = defaultdict(list)\n    for i, sib_group in enumerate(self.sibling_concept_groups):\n        for sib in sib_group:\n            self.sibling_auxiliary_dict[sib].append(i)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_named_concept","title":"<code>random_named_concept()</code>","text":"<p>Randomly draw a named concept's IRI.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def random_named_concept(self) -&gt; str:\n\"\"\"Randomly draw a named concept's IRI.\"\"\"\n    return random.choice(self.concept_iris)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_object_property","title":"<code>random_object_property()</code>","text":"<p>Randomly draw a object property's IRI.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def random_object_property(self) -&gt; str:\n\"\"\"Randomly draw a object property's IRI.\"\"\"\n    return random.choice(self.object_property_iris)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.get_siblings","title":"<code>get_siblings(concept_iri)</code>","text":"<p>Get the sibling concepts of the given concept.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def get_siblings(self, concept_iri: str):\n\"\"\"Get the sibling concepts of the given concept.\"\"\"\n    sibling_group = self.sibling_auxiliary_dict[concept_iri]\n    sibling_group = [self.sibling_concept_groups[i] for i in sibling_group]\n    sibling_group = list(itertools.chain.from_iterable(sibling_group))\n    return sibling_group\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.SubsumptionSamplerBase.random_sibling","title":"<code>random_sibling(concept_iri)</code>","text":"<p>Randomly draw a sibling concept for a given concept.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def random_sibling(self, concept_iri: str) -&gt; str:\n\"\"\"Randomly draw a sibling concept for a given concept.\"\"\"\n    sibling_group = self.get_siblings(concept_iri)\n    if sibling_group:\n        return random.choice(sibling_group)\n    else:\n        # not every concept has a sibling concept\n        return None\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler","title":"<code>AtomicSubsumptionSampler(onto)</code>","text":"<p>             Bases: <code>SubsumptionSamplerBase</code></p> <p>Sampler for constructing the Atomic Subsumption Inference (SI) dataset.</p> <p>Positive samples come from the entailed subsumptions.</p> <p>Soft negative samples come from the pairs of randomly selected concepts, subject to passing the assumed disjointness check.</p> <p>Hard negative samples come from the pairs of randomly selected sibling concepts, subject to passing the assumed disjointness check.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def __init__(self, onto: Ontology):\n    super().__init__(onto)\n\n    # compute the sibling concept pairs for faster hard negative sampling\n    self.sibling_pairs = []\n    for sib_group in self.sibling_concept_groups:\n        self.sibling_pairs += [(x, y) for x, y in itertools.product(sib_group, sib_group) if x != y]\n    self.maximum_num_hard_negatives = len(self.sibling_pairs)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler.positive_sampling","title":"<code>positive_sampling(num_samples=None)</code>","text":"<p>Sample named concept pairs that are involved in a subsumption axiom.</p> <p>An extracted pair \\((C, D)\\) indicates \\(\\mathcal{O} \\models C \\sqsubseteq D\\) where \\(\\mathcal{O}\\) is the input ontology.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def positive_sampling(self, num_samples: Optional[int] = None):\nr\"\"\"Sample named concept pairs that are involved in a subsumption axiom.\n\n    An extracted pair $(C, D)$ indicates $\\mathcal{O} \\models C \\sqsubseteq D$ where\n    $\\mathcal{O}$ is the input ontology.\n    \"\"\"\n    pbar = self.progress_manager.counter(desc=\"Sample Positive Subsumptions\", unit=\"pair\")\n    positives = []\n    for concept_iri in self.concept_iris:\n        owl_concept = self.onto.owl_classes[concept_iri]\n        for subsumer_iri in self.onto.reasoner.get_inferred_super_entities(owl_concept, direct=False):\n            positives.append((concept_iri, subsumer_iri))\n            pbar.update()\n    positives = list(set(sorted(positives)))\n    if num_samples:\n        positives = random.sample(positives, num_samples)\n    print(f\"Sample {len(positives)} unique positive subsumption pairs.\")\n    return positives\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.AtomicSubsumptionSampler.negative_sampling","title":"<code>negative_sampling(negative_sample_type, num_samples, apply_assumed_disjointness_alternative=True)</code>","text":"<p>Sample named concept pairs that are involved in a disjoiness (assumed) axiom, which then implies non-subsumption.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def negative_sampling(\n    self,\n    negative_sample_type: str,\n    num_samples: int,\n    apply_assumed_disjointness_alternative: bool = True,\n):\nr\"\"\"Sample named concept pairs that are involved in a disjoiness (assumed) axiom, which then\n    implies non-subsumption.\n    \"\"\"\n    if negative_sample_type == \"soft\":\n        draw_one = lambda: tuple(random.sample(self.concept_iris, k=2))\n    elif negative_sample_type == \"hard\":\n        draw_one = lambda: random.choice(self.sibling_pairs)\n    else:\n        raise RuntimeError(f\"{negative_sample_type} not supported.\")\n\n    negatives = []\n    max_iter = 2 * num_samples\n\n    # which method to validate the negative sample\n    valid_negative = self.onto.reasoner.check_assumed_disjoint\n    if apply_assumed_disjointness_alternative:\n        valid_negative = self.onto.reasoner.check_assumed_disjoint_alternative\n\n    print(f\"Sample {negative_sample_type} negative subsumption pairs.\")\n    # create two bars for process tracking\n    added_bar = self.progress_manager.counter(total=num_samples, desc=\"Sample Negative Subsumptions\", unit=\"pair\")\n    iter_bar = self.progress_manager.counter(total=max_iter, desc=\"#Iteration\", unit=\"it\")\n    i = 0\n    added = 0\n    while added &lt; num_samples and i &lt; max_iter:\n        sub_concept_iri, super_concept_iri = draw_one()\n        sub_concept = self.onto.get_owl_object(sub_concept_iri)\n        super_concept = self.onto.get_owl_object(super_concept_iri)\n        # collect class iri if accepted\n        if valid_negative(sub_concept, super_concept):\n            neg = (sub_concept_iri, super_concept_iri)\n            negatives.append(neg)\n            added += 1\n            added_bar.update(1)\n            if added == num_samples:\n                negatives = list(set(sorted(negatives)))\n                added = len(negatives)\n                added_bar.count = added\n        i += 1\n        iter_bar.update(1)\n    negatives = list(set(sorted(negatives)))\n    print(f\"Sample {len(negatives)} unique positive subsumption pairs.\")\n    return negatives\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler","title":"<code>ComplexSubsumptionSampler(onto)</code>","text":"<p>             Bases: <code>SubsumptionSamplerBase</code></p> <p>Sampler for constructing the Complex Subsumption Inference (SI) dataset.</p> <p>To obtain complex concept expressions on both sides of the subsumption relationship (as a sub-concept or a super-concept), this sampler utilises the equivalence axioms in the form of \\(C \\equiv C_{comp}\\) where \\(C\\) is atomic and \\(C_{comp}\\) is complex.</p> <p>An equivalence axiom like \\(C \\equiv C_{comp}\\) is deemed as an anchor axiom.</p> <p>Positive samples are in the form of \\(C_{sub} \\sqsubseteq C_{comp}\\) or \\(C_{comp} \\sqsubseteq C_{super}\\) where \\(C_{sub}\\) is an entailed sub-concept of \\(C\\) and \\(C_{comp}\\), \\(C_{super}\\) is an entailed super-concept of \\(C\\) and \\(C_{comp}\\).</p> <p>Negative samples are formed by replacing one of the named entities in the anchor axiom, the modified sub-concept and super-concept need to pass the assumed disjointness check to be accepted as a valid negative sample. Without loss of generality, suppose we choose \\(C \\sqsubseteq C_{comp}\\) and replace a named entity in \\(C_{comp}'\\) to form \\(C \\sqsubseteq C_{comp}'\\), then \\(C\\) and \\(C_{comp}'\\) is a valid negative only if they satisfy the assumed disjointness check.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def __init__(self, onto: Ontology):\n    super().__init__(onto)\n    self.anchor_axioms = self.onto.get_equivalence_axioms(\"Classes\")\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.positive_sampling_from_anchor","title":"<code>positive_sampling_from_anchor(anchor_axiom)</code>","text":"<p>Returns all positive subsumption pairs extracted from an anchor equivalence axiom.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def positive_sampling_from_anchor(self, anchor_axiom: OWLAxiom):\n\"\"\"Returns all positive subsumption pairs extracted from an anchor equivalence axiom.\"\"\"\n    sub_axiom = list(anchor_axiom.asOWLSubClassOfAxioms())[0]\n    atomic_concept, complex_concept = sub_axiom.getSubClass(), sub_axiom.getSuperClass()\n    # determine which is the atomic concept\n    if complex_concept.isClassExpressionLiteral():\n        atomic_concept, complex_concept = complex_concept, atomic_concept\n\n    # intialise the positive samples from the anchor equivalence axiom\n    positives = list(anchor_axiom.asOWLSubClassOfAxioms())\n    for super_concept_iri in self.onto.reasoner.get_inferred_super_entities(atomic_concept, direct=False):\n        positives.append(\n            self.onto.owl_data_factory.getOWLSubClassOfAxiom(\n                complex_concept, self.onto.get_owl_object(super_concept_iri)\n            )\n        )\n    for sub_concept_iri in self.onto.reasoner.get_inferred_sub_entities(atomic_concept, direct=False):\n        positives.append(\n            self.onto.owl_data_factory.getOWLSubClassOfAxiom(\n                self.onto.get_owl_object(sub_concept_iri), complex_concept\n            )\n        )\n\n    # TESTING\n    # for p in positives:\n    #     assert self.onto.reasoner.owl_reasoner.isEntailed(p)    \n\n    return list(set(sorted(positives)))\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.positive_sampling","title":"<code>positive_sampling(num_samples_per_anchor=10)</code>","text":"<p>Sample positive subsumption axioms that involve one atomic and one complex concepts.</p> <p>An extracted pair \\((C, D)\\) indicates \\(\\mathcal{O} \\models C \\sqsubseteq D\\) where \\(\\mathcal{O}\\) is the input ontology.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def positive_sampling(self, num_samples_per_anchor: Optional[int] = 10):\nr\"\"\"Sample positive subsumption axioms that involve one atomic and one complex concepts.\n\n    An extracted pair $(C, D)$ indicates $\\mathcal{O} \\models C \\sqsubseteq D$ where\n    $\\mathcal{O}$ is the input ontology.\n    \"\"\"\n    print(f\"Maximum number of positive samples for each anchor is set to {num_samples_per_anchor}.\")\n    pbar = self.progress_manager.counter(desc=\"Sample Positive Subsumptions from\", unit=\"anchor axiom\")\n    positives = dict()\n    for anchor in self.anchor_axioms:\n        positives_from_anchor = self.positive_sampling_from_anchor(anchor)\n        if num_samples_per_anchor and num_samples_per_anchor &lt; len(positives_from_anchor):\n            positives_from_anchor = random.sample(positives_from_anchor, k = num_samples_per_anchor)\n        positives[str(anchor)] = positives_from_anchor\n        pbar.update()\n    # positives = list(set(sorted(positives)))\n    print(f\"Sample {sum([len(v) for v in positives.values()])} unique positive subsumption pairs.\")\n    return positives\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.negative_sampling","title":"<code>negative_sampling(num_samples_per_anchor=10)</code>","text":"<p>Sample negative subsumption axioms that involve one atomic and one complex concepts.</p> <p>An extracted pair \\((C, D)\\) indicates \\(C\\) and \\(D\\) pass the assumed disjointness check.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def negative_sampling(self, num_samples_per_anchor: Optional[int] = 10):\nr\"\"\"Sample negative subsumption axioms that involve one atomic and one complex concepts.\n\n    An extracted pair $(C, D)$ indicates $C$ and $D$ pass the [assumed disjointness check][deeponto.onto.OntologyReasoner.check_assumed_disjoint].\n    \"\"\"\n    print(f\"Maximum number of negative samples for each anchor is set to {num_samples_per_anchor}.\")\n    pbar = self.progress_manager.counter(desc=\"Sample Negative Subsumptions from\", unit=\"anchor axiom\")\n    negatives = dict()\n    for anchor in self.anchor_axioms:\n        negatives_from_anchor = []\n        i, max_iter = 0, num_samples_per_anchor + 2\n        while i &lt; max_iter and len(negatives_from_anchor) &lt; num_samples_per_anchor:\n            corrupted_anchor = self.random_corrupt(anchor)\n            corrupted_sub_axiom = random.choice(list(corrupted_anchor.asOWLSubClassOfAxioms()))\n            sub_concept, super_concept = corrupted_sub_axiom.getSubClass(), corrupted_sub_axiom.getSuperClass()\n            if self.onto.reasoner.check_assumed_disjoint_alternative(sub_concept, super_concept):\n                negatives_from_anchor.append(corrupted_sub_axiom)\n            i += 1\n        negatives[str(anchor)] = list(set(sorted(negatives_from_anchor)))\n        pbar.update()\n    # negatives = list(set(sorted(negatives)))\n    print(f\"Sample {sum([len(v) for v in negatives.values()])} unique positive subsumption pairs.\")\n    return negatives\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.subsumption_sampler.ComplexSubsumptionSampler.random_corrupt","title":"<code>random_corrupt(axiom)</code>","text":"<p>Randomly change an IRI in the input axiom and return a new one.</p> Source code in <code>src/deeponto/complete/ontolama/subsumption_sampler.py</code> <pre><code>def random_corrupt(self, axiom: OWLAxiom):\n\"\"\"Randomly change an IRI in the input axiom and return a new one.\n    \"\"\"\n    replaced_iri = random.choice(re.findall(IRI, str(axiom)))[1:-1]\n    replaced_entity = self.onto.get_owl_object(replaced_iri)\n    replacement_iri = None\n    if self.onto.get_entity_type(replaced_entity) == \"Classes\":\n        replacement_iri = self.random_named_concept()\n    elif self.onto.get_entity_type(replaced_entity) == \"ObjectProperties\":\n        replacement_iri = self.random_object_property()\n    else:\n        # NOTE: to extend to other types of entities in future\n        raise RuntimeError(\"Unknown type of axiom.\")\n    return self.onto.replace_entity(axiom, replaced_iri, replacement_iri)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor","title":"<code>OntoLAMADataProcessor()</code>","text":"<p>             Bases: <code>DataProcessor</code></p> <p>Class for processing the OntoLAMA data points.</p> Source code in <code>src/deeponto/complete/ontolama/data_processor.py</code> <pre><code>def __init__(self):\n    super().__init__()\n    self.labels = [\"negative\", \"positive\"]\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.load_dataset","title":"<code>load_dataset(task_name, split)</code>  <code>staticmethod</code>","text":"<p>Load a specific OntoLAMA dataset from huggingface dataset hub.</p> Source code in <code>src/deeponto/complete/ontolama/data_processor.py</code> <pre><code>@staticmethod\ndef load_dataset(task_name: str, split: str):\n\"\"\"Load a specific OntoLAMA dataset from huggingface dataset hub.\"\"\"\n    # TODO: remove use_auth_token after going to public\n    return load_dataset(\"krr-oxford/OntoLAMA\", task_name, split=split, use_auth_token=True)\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.get_examples","title":"<code>get_examples(task_name, split)</code>","text":"<p>Load a specific OntoLAMA dataset and transform the data points into input examples for prompt-based inference.</p> Source code in <code>src/deeponto/complete/ontolama/data_processor.py</code> <pre><code>def get_examples(self, task_name, split):\n\"\"\"Load a specific OntoLAMA dataset and transform the data points into\n    input examples for prompt-based inference.\n    \"\"\"\n\n    dataset = self.load_dataset(task_name, split)\n\n    premise_name = \"v_sub_concept\"\n    hypothesis_name = \"v_super_concept\"\n    # different data fields for the bimnli dataset\n    if \"bimnli\" in task_name:\n        premise_name = \"premise\"\n        hypothesis_name = \"hypothesis\"\n\n    prompt_samples = []\n    for samp in dataset:\n        inp = InputExample(text_a=samp[premise_name], text_b=samp[hypothesis_name], label=samp[\"label\"])\n        prompt_samples.append(inp)\n\n    return prompt_samples\n</code></pre>"},{"location":"deeponto/complete/ontolama/#deeponto.complete.ontolama.data_processor.OntoLAMADataProcessor.load_inference_dataset","title":"<code>load_inference_dataset(config, return_class=True, test=False)</code>  <code>classmethod</code>","text":"<p>A plm loader using a global config. It will load the train, valid, and test set (if exists) simulatenously.</p> <p>Parameters:</p> Name Type Description Default <code>config</code> <code>CfgNode</code> <p>The global config from the CfgNode.</p> required <code>return_class</code> <code>bool</code> <p>Whether return the data processor class for future usage.</p> <code>True</code> <p>Returns:</p> Type Description <code>Optional[List[InputExample]]</code> <p>The train dataset.</p> <code>Optional[List[InputExample]]</code> <p>The valid dataset.</p> <code>Optional[List[InputExample]]</code> <p>The test dataset.</p> <code>Optional[OntoLAMADataProcessor]</code> <p>The data processor object.</p> Source code in <code>src/deeponto/complete/ontolama/data_processor.py</code> <pre><code>@classmethod\ndef load_inference_dataset(cls, config: CfgNode, return_class=True, test=False):\nr\"\"\"A plm loader using a global config.\n    It will load the train, valid, and test set (if exists) simulatenously.\n\n    Args:\n        config (CfgNode): The global config from the CfgNode.\n        return_class (bool): Whether return the data processor class for future usage.\n\n    Returns:\n        (Optional[List[InputExample]]): The train dataset.\n        (Optional[List[InputExample]]): The valid dataset.\n        (Optional[List[InputExample]]): The test dataset.\n        (Optional[OntoLAMADataProcessor]): The data processor object.\n    \"\"\"\n    dataset_config = config.dataset\n\n    processor = cls()\n\n    train_dataset = None\n    valid_dataset = None\n    if not test:\n        try:\n            train_dataset = processor.get_examples(dataset_config.task_name, \"train\")\n        except FileNotFoundError:\n            logger.warning(f\"Has no training dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n        try:\n            valid_dataset = processor.get_examples(dataset_config.task_name, \"validation\")\n        except FileNotFoundError:\n            logger.warning(f\"Has no validation dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n\n    test_dataset = None\n    try:\n        test_dataset = processor.get_examples(dataset_config.task_name, \"test\")\n    except FileNotFoundError:\n        logger.warning(f\"Has no test dataset in krr-oxford/OntoLAMA/{dataset_config.task_name}.\")\n    # checking whether donwloaded.\n    if (train_dataset is None) and (valid_dataset is None) and (test_dataset is None):\n        logger.error(\n            \"Dataset is empty. Either there is no download or the path is wrong. \"\n            + \"If not downloaded, please `cd datasets/` and `bash download_xxx.sh`\"\n        )\n        exit()\n    if return_class:\n        return train_dataset, valid_dataset, test_dataset, processor\n    else:\n        return train_dataset, valid_dataset, test_dataset\n</code></pre>"},{"location":"deeponto/complete/bertsubs/","title":"BERTSubs (Intra)","text":""},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline","title":"<code>BERTSubsIntraPipeline(onto, config)</code>","text":"<p>Class for the intra-ontology subsumption prediction setting of BERTSubs.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The target ontology.</p> <code>config</code> <code>CfgNode</code> <p>The configuration for BERTSubs.</p> <code>sampler</code> <code>SubsumptionSample</code> <p>The subsumption sampler for BERTSubs.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>def __init__(self, onto: Ontology, config: CfgNode):\n    self.onto = onto\n    self.config = config\n    self.sampler = SubsumptionSampler(onto=onto, config=config)\n    start_time = datetime.datetime.now()\n\n    n = 0\n    for k in self.sampler.named_classes:\n        n += len(self.sampler.iri_label[k])\n    print(\n        \"%d named classes, %.1f labels per class\"\n        % (len(self.sampler.named_classes), n / len(self.sampler.named_classes))\n    )\n\n    read_subsumptions = lambda file_name: [line.strip().split(\",\") for line in open(file_name).readlines()]\n    test_subsumptions = (\n        None\n        if config.test_subsumption_file is None or config.test_subsumption_file == \"None\"\n        else read_subsumptions(config.test_subsumption_file)\n    )\n\n    # The train/valid subsumptions are not given. They will be extracted from the given ontology:\n    if config.train_subsumption_file is None or config.train_subsumption_file == \"None\":\n        subsumptions0 = self.extract_subsumptions_from_ontology(\n            onto=onto, subsumption_type=config.subsumption_type\n        )\n        random.shuffle(subsumptions0)\n        valid_size = int(len(subsumptions0) * config.valid.valid_ratio)\n        train_subsumptions0, valid_subsumptions0 = subsumptions0[valid_size:], subsumptions0[0:valid_size]\n        train_subsumptions, valid_subsumptions = [], []\n        if config.subsumption_type == \"named_class\":\n            for subs in train_subsumptions0:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                train_subsumptions.append([str(c1.getIRI()), str(c2.getIRI())])\n\n            size_sum = 0\n            for subs in valid_subsumptions0:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                neg_candidates = BERTSubsIntraPipeline.get_test_neg_candidates_named_class(\n                    subclass=c1, gt=c2, max_neg_size=config.valid.max_neg_size, onto=onto\n                )\n                size = len(neg_candidates)\n                size_sum += size\n                if size &gt; 0:\n                    item = [str(c1.getIRI()), str(c2.getIRI())] + [str(c.getIRI()) for c in neg_candidates]\n                    valid_subsumptions.append(item)\n            print(\"\\t average neg candidate size in validation: %.2f\" % (size_sum / len(valid_subsumptions)))\n\n        elif config.subsumption_type == \"restriction\":\n            for subs in train_subsumptions0:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                train_subsumptions.append([str(c1.getIRI()), str(c2)])\n\n            restrictions = BERTSubsIntraPipeline.extract_restrictions_from_ontology(onto=onto)\n            print(\"restrictions: %d\" % len(restrictions))\n            size_sum = 0\n            for subs in valid_subsumptions0:\n                c1, c2 = subs.getSubClass(), subs.getSuperClass()\n                c2_neg = BERTSubsIntraPipeline.get_test_neg_candidates_restriction(\n                    subcls=c1, max_neg_size=config.valid.max_neg_size, restrictions=restrictions, onto=onto\n                )\n                size_sum += len(c2_neg)\n                item = [str(c1.getIRI()), str(c2)] + [str(r) for r in c2_neg]\n                valid_subsumptions.append(item)\n            print(\"valid candidate negative avg. size: %.1f\" % (size_sum / len(valid_subsumptions)))\n        else:\n            warnings.warn(\"Unknown subsumption type %s\" % config.subsumption_type)\n            sys.exit(0)\n\n    # The train/valid subsumptions are given:\n    else:\n        train_subsumptions = read_subsumptions(config.train_subsumption_file)\n        valid_subsumptions = read_subsumptions(config.valid_subsumption_file)\n\n    print(\"Positive train/valid subsumptions: %d/%d\" % (len(train_subsumptions), len(valid_subsumptions)))\n    tr = self.sampler.generate_samples(subsumptions=train_subsumptions)\n    va = self.sampler.generate_samples(subsumptions=valid_subsumptions, duplicate=False)\n\n    end_time = datetime.datetime.now()\n    print(\"data pre-processing costs %.1f minutes\" % ((end_time - start_time).seconds / 60))\n\n    start_time = datetime.datetime.now()\n    torch.cuda.empty_cache()\n    bert_trainer = BERTSubsumptionClassifierTrainer(\n        config.fine_tune.pretrained,\n        train_data=tr,\n        val_data=va,\n        max_length=config.prompt.max_length,\n        early_stop=config.fine_tune.early_stop,\n    )\n\n    epoch_steps = len(bert_trainer.tra) // config.fine_tune.batch_size  # total steps of an epoch\n    logging_steps = int(epoch_steps * 0.02) if int(epoch_steps * 0.02) &gt; 0 else 5\n    eval_steps = 5 * logging_steps\n    training_args = TrainingArguments(\n        output_dir=config.fine_tune.output_dir,\n        num_train_epochs=config.fine_tune.num_epochs,\n        per_device_train_batch_size=config.fine_tune.batch_size,\n        per_device_eval_batch_size=config.fine_tune.batch_size,\n        warmup_ratio=config.fine_tune.warm_up_ratio,\n        weight_decay=0.01,\n        logging_steps=logging_steps,\n        logging_dir=f\"{config.fine_tune.output_dir}/tb\",\n        eval_steps=eval_steps,\n        evaluation_strategy=\"steps\",\n        do_train=True,\n        do_eval=True,\n        save_steps=eval_steps,\n        load_best_model_at_end=True,\n        save_total_limit=1,\n        metric_for_best_model=\"accuracy\",\n        greater_is_better=True,\n    )\n    if config.fine_tune.do_fine_tune and (\n        config.prompt.prompt_type == \"traversal\"\n        or (config.prompt.prompt_type == \"path\" and config.prompt.use_sub_special_token)\n    ):\n        bert_trainer.add_special_tokens([\"&lt;SUB&gt;\"])\n\n    bert_trainer.train(train_args=training_args, do_fine_tune=config.fine_tune.do_fine_tune)\n    if config.fine_tune.do_fine_tune:\n        bert_trainer.trainer.save_model(\n            output_dir=os.path.join(config.fine_tune.output_dir, \"fine-tuned-checkpoint\")\n        )\n        print(\"fine-tuning done, fine-tuned model saved\")\n    else:\n        print(\"pretrained or fine-tuned model loaded.\")\n    end_time = datetime.datetime.now()\n    print(\"Fine-tuning costs %.1f minutes\" % ((end_time - start_time).seconds / 60))\n\n    bert_trainer.model.eval()\n    self.device = torch.device(f\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n    bert_trainer.model.to(self.device)\n    self.tokenize = lambda x: bert_trainer.tokenizer(\n        x, max_length=config.prompt.max_length, truncation=True, padding=True, return_tensors=\"pt\"\n    )\n    softmax = torch.nn.Softmax(dim=1)\n    self.classifier = lambda x: softmax(bert_trainer.model(**x).logits)[:, 1]\n\n    self.evaluate(target_subsumptions=valid_subsumptions, test_type=\"valid\")\n    if test_subsumptions is not None:\n        if config.test_type == \"evaluation\":\n            self.evaluate(target_subsumptions=test_subsumptions, test_type=\"test\")\n        elif config.test_type == \"prediction\":\n            self.predict(target_subsumptions=test_subsumptions)\n        else:\n            warnings.warn(\"Unknown test_type: %s\" % config.test_type)\n    print(\"\\n ------------------------- done! ---------------------------\\n\\n\\n\")\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.score","title":"<code>score(samples)</code>","text":"<p>The scoring function based on the fine-tuned BERT classifier.</p> <p>Parameters:</p> Name Type Description Default <code>samples</code> <code>List[Tuple]</code> <p>A list of input sentence pairs to be scored.</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>def score(self, samples: List[List]):\nr\"\"\"The scoring function based on the fine-tuned BERT classifier.\n\n    Args:\n        samples (List[Tuple]): A list of input sentence pairs to be scored.\n    \"\"\"\n    sample_size = len(samples)\n    scores = np.zeros(sample_size)\n    batch_num = math.ceil(sample_size / self.config.evaluation.batch_size)\n    for i in range(batch_num):\n        j = (\n            (i + 1) * self.config.evaluation.batch_size\n            if (i + 1) * self.config.evaluation.batch_size &lt;= sample_size\n            else sample_size\n        )\n        inputs = self.tokenize(samples[i * self.config.evaluation.batch_size : j])\n        inputs.to(self.device)\n        with torch.no_grad():\n            batch_scores = self.classifier(inputs)\n        scores[i * self.config.evaluation.batch_size : j] = batch_scores.cpu().numpy()\n    return scores\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.evaluate","title":"<code>evaluate(target_subsumptions, test_type='test')</code>","text":"<p>Test and calculate the metrics for a given list of subsumption pairs.</p> <p>Parameters:</p> Name Type Description Default <code>target_subsumptions</code> <code>List[Tuple]</code> <p>A list of subsumption pairs.</p> required <code>test_type</code> <code>str</code> <p><code>test</code> for testing or <code>valid</code> for validation.</p> <code>'test'</code> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>def evaluate(self, target_subsumptions: List[List], test_type: str = \"test\"):\nr\"\"\"Test and calculate the metrics for a given list of subsumption pairs.\n\n    Args:\n        target_subsumptions (List[Tuple]): A list of subsumption pairs.\n        test_type (str): `test` for testing or `valid` for validation.\n    \"\"\"\n\n    MRR_sum, hits1_sum, hits5_sum, hits10_sum = 0, 0, 0, 0\n    MRR, Hits1, Hits5, Hits10 = 0, 0, 0, 0\n    size_sum, size_n = 0, 0\n    for k0, test in enumerate(target_subsumptions):\n        subcls, gt = test[0], test[1]\n        candidates = test[1:]\n\n        candidate_subsumptions = [[subcls, c] for c in candidates]\n        candidate_scores = np.zeros(len(candidate_subsumptions))\n        for k1, candidate_subsumption in enumerate(candidate_subsumptions):\n            samples = self.sampler.subsumptions_to_samples(subsumptions=[candidate_subsumption], sample_label=None)\n            size_sum += len(samples)\n            size_n += 1\n            scores = self.score(samples=samples)\n            candidate_scores[k1] = np.average(scores)\n\n        sorted_indexes = np.argsort(candidate_scores)[::-1]\n        sorted_classes = [candidates[i] for i in sorted_indexes]\n\n        rank = sorted_classes.index(gt) + 1\n        MRR_sum += 1.0 / rank\n        hits1_sum += 1 if gt in sorted_classes[:1] else 0\n        hits5_sum += 1 if gt in sorted_classes[:5] else 0\n        hits10_sum += 1 if gt in sorted_classes[:10] else 0\n        num = k0 + 1\n        MRR, Hits1, Hits5, Hits10 = MRR_sum / num, hits1_sum / num, hits5_sum / num, hits10_sum / num\n        if num % 500 == 0:\n            print(\n                \"\\n%d tested, MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n\"\n                % (num, MRR, Hits1, Hits5, Hits10)\n            )\n    print(\n        \"\\n[%s], MRR: %.3f, Hits@1: %.3f, Hits@5: %.3f, Hits@10: %.3f\\n\" % (test_type, MRR, Hits1, Hits5, Hits10)\n    )\n    print(\"%.2f samples per testing subsumption\" % (size_sum / size_n))\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.predict","title":"<code>predict(target_subsumptions)</code>","text":"<p>Predict a score for each given subsumption in the list.</p> <p>The scores will be saved in <code>test_subsumption_scores.csv</code>.</p> <p>Parameters:</p> Name Type Description Default <code>target_subsumptions</code> <code>List[List]</code> <p>Each item is a list where the first element is a fixed ontology class \\(C\\), and the remaining elements are potential (candidate) super-classes of \\(C\\).</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>def predict(self, target_subsumptions: List[List]):\nr\"\"\"Predict a score for each given subsumption in the list.\n\n    The scores will be saved in `test_subsumption_scores.csv`.\n\n    Args:\n        target_subsumptions (List[List]): Each item is a list where the first element is a fixed ontology class $C$,\n            and the remaining elements are potential (candidate) super-classes of $C$.\n    \"\"\"\n    out_lines = []\n    for test in target_subsumptions:\n        subcls, candidates = test[0], test[1:]\n        candidate_subsumptions = [[subcls, c] for c in candidates]\n        candidate_scores = []\n\n        for candidate_subsumption in candidate_subsumptions:\n            samples = self.sampler.subsumptions_to_samples(subsumptions=[candidate_subsumption], sample_label=None)\n            scores = self.score(samples=samples)\n            candidate_scores.append(np.average(scores))\n\n        out_lines.append(\",\".join([str(i) for i in candidate_scores]))\n\n    out_file = \"test_subsumption_scores.csv\"\n    with open(out_file, \"w\") as f:\n        for line in out_lines:\n            f.write(\"%s\\n\" % line)\n    print(\"Predicted subsumption scores are saved to %s\" % out_file)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.extract_subsumptions_from_ontology","title":"<code>extract_subsumptions_from_ontology(onto, subsumption_type)</code>  <code>staticmethod</code>","text":"<p>Extract target subsumptions from a given ontology.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The target ontology.</p> required <code>subsumption_type</code> <code>str</code> <p>the type of subsumptions, options are <code>\"named_class\"</code> or <code>\"restriction\"</code>.</p> required Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>@staticmethod\ndef extract_subsumptions_from_ontology(onto: Ontology, subsumption_type: str):\nr\"\"\"Extract target subsumptions from a given ontology.\n\n    Args:\n        onto (Ontology): The target ontology.\n        subsumption_type (str): the type of subsumptions, options are `\"named_class\"` or `\"restriction\"`.\n\n    \"\"\"\n    all_subsumptions = onto.get_subsumption_axioms(entity_type=\"Classes\")\n    subsumptions = []\n    if subsumption_type == \"restriction\":\n        for subs in all_subsumptions:\n            if (\n                not onto.check_deprecated(owl_object=subs.getSubClass())\n                and not onto.check_named_entity(owl_object=subs.getSuperClass())\n                and SubsumptionSampler.is_basic_existential_restriction(\n                    complex_class_str=str(subs.getSuperClass())\n                )\n            ):\n                subsumptions.append(subs)\n    elif subsumption_type == \"named_class\":\n        for subs in all_subsumptions:\n            c1, c2 = subs.getSubClass(), subs.getSuperClass()\n            if (\n                onto.check_named_entity(owl_object=c1)\n                and not onto.check_deprecated(owl_object=c1)\n                and onto.check_named_entity(owl_object=c2)\n                and not onto.check_deprecated(owl_object=c2)\n            ):\n                subsumptions.append(subs)\n    else:\n        warnings.warn(\"\\nUnknown subsumption type: %s\\n\" % subsumption_type)\n    return subsumptions\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.extract_restrictions_from_ontology","title":"<code>extract_restrictions_from_ontology(onto)</code>  <code>staticmethod</code>","text":"<p>Extract basic existential restriction from an ontology.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The target ontology.</p> required <p>Returns:</p> Name Type Description <code>restrictions</code> <code>List</code> <p>a list of existential restrictions.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>@staticmethod\ndef extract_restrictions_from_ontology(onto: Ontology):\nr\"\"\"Extract basic existential restriction from an ontology.\n\n    Args:\n        onto (Ontology): The target ontology.\n    Returns:\n        restrictions (List): a list of existential restrictions.\n    \"\"\"\n    restrictions = []\n    for complexC in onto.get_asserted_complex_classes():\n        if SubsumptionSampler.is_basic_existential_restriction(complex_class_str=str(complexC)):\n            restrictions.append(complexC)\n    return restrictions\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.get_test_neg_candidates_restriction","title":"<code>get_test_neg_candidates_restriction(subcls, max_neg_size, restrictions, onto)</code>  <code>staticmethod</code>","text":"<p>Get a list of negative candidate class restrictions for testing.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>@staticmethod\ndef get_test_neg_candidates_restriction(subcls, max_neg_size, restrictions, onto):\n\"\"\"Get a list of negative candidate class restrictions for testing.\"\"\"\n    neg_restrictions = list()\n    n = max_neg_size * 2 if max_neg_size * 2 &lt;= len(restrictions) else len(restrictions)\n    for r in random.sample(restrictions, n):\n        if not onto.reasoner.check_subsumption(sub_entity=subcls, super_entity=r):\n            neg_restrictions.append(r)\n            if len(neg_restrictions) &gt;= max_neg_size:\n                break\n    return neg_restrictions\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.pipeline_intra.BERTSubsIntraPipeline.get_test_neg_candidates_named_class","title":"<code>get_test_neg_candidates_named_class(subclass, gt, max_neg_size, onto, max_depth=3, max_width=8)</code>  <code>staticmethod</code>","text":"<p>Get a list of negative candidate named classes for testing.</p> Source code in <code>src/deeponto/complete/bertsubs/pipeline_intra.py</code> <pre><code>@staticmethod\ndef get_test_neg_candidates_named_class(subclass, gt, max_neg_size, onto, max_depth=3, max_width=8):\n\"\"\"Get a list of negative candidate named classes for testing.\"\"\"\n    all_nebs, seeds = set(), [gt]\n    depth = 1\n    while depth &lt;= max_depth:\n        new_seeds = set()\n        for seed in seeds:\n            nebs = set()\n            for nc_iri in onto.reasoner.get_inferred_sub_entities(\n                seed, direct=True\n            ) + onto.reasoner.get_inferred_super_entities(seed, direct=True):\n                nc = onto.owl_classes[nc_iri]\n                if onto.check_named_entity(owl_object=nc) and not onto.check_deprecated(owl_object=nc):\n                    nebs.add(nc)\n            new_seeds = new_seeds.union(nebs)\n            all_nebs = all_nebs.union(nebs)\n        depth += 1\n        seeds = random.sample(new_seeds, max_width) if len(new_seeds) &gt; max_width else new_seeds\n    all_nebs = (\n        all_nebs\n        - {onto.owl_classes[iri] for iri in onto.reasoner.get_inferred_super_entities(subclass, direct=False)}\n        - {subclass}\n    )\n    if len(all_nebs) &gt; max_neg_size:\n        return random.sample(all_nebs, max_neg_size)\n    else:\n        return list(all_nebs)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler","title":"<code>SubsumptionSampler(onto, config)</code>","text":"<p>Class for sampling functions for training the subsumption prediction model.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The target ontology.</p> <code>config</code> <code>CfgNode</code> <p>The loaded configuration.</p> <code>named_classes</code> <code>Set[str]</code> <p>IRIs of named classes that are not deprecated.</p> <code>iri_label</code> <code>Dict[str, List]</code> <p>key -- class iris from <code>named_classes</code>, value -- a list of labels.</p> <code>restrictionObjects</code> <code>Set[OWLClassExpression]</code> <p>Basic existential restrictions that appear in the ontology.</p> <code>restrictions</code> <code>set[str]</code> <p>Strings of basic existential restrictions corresponding to <code>restrictionObjects</code>.</p> <code>restriction_label</code> <code>Dict[str</code> <p>List]): key -- existential restriction string, value -- a list of existential restriction labels.</p> <code>verb</code> <code>OntologyVerbaliser</code> <p>object for verbalisation.</p> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def __init__(self, onto: Ontology, config: CfgNode):\n    self.onto = onto\n    self.config = config\n    self.named_classes = self.extract_named_classes(onto=onto)\n    self.iri_label = dict()\n    for iri in self.named_classes:\n        self.iri_label[iri] = []\n        for p in config.label_property:\n            strings = onto.get_annotations(\n                owl_object=onto.get_owl_object(iri),\n                annotation_property_iri=p,\n                annotation_language_tag=None,\n                apply_lowercasing=False,\n                normalise_identifiers=False,\n            )\n            for s in strings:\n                if s not in self.iri_label[iri]:\n                    self.iri_label[iri].append(s)\n\n    self.restrictionObjects = set()\n    self.restrictions = set()\n    self.restriction_label = dict()\n    self.verb = OntologyVerbaliser(onto=onto)\n    for complexC in onto.get_asserted_complex_classes():\n        s = str(complexC)\n        self.restriction_label[s] = []\n        if self.is_basic_existential_restriction(complex_class_str=s):\n            self.restrictionObjects.add(complexC)\n            self.restrictions.add(s)\n            self.restriction_label[s].append(self.verb.verbalise_class_expression(complexC).verbal)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.is_basic_existential_restriction","title":"<code>is_basic_existential_restriction(complex_class_str)</code>  <code>staticmethod</code>","text":"<p>Determine if a complex class expression is a basic existential restriction.</p> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>@staticmethod\ndef is_basic_existential_restriction(complex_class_str: str):\n\"\"\"Determine if a complex class expression is a basic existential restriction.\"\"\"\n    IRI = \"&lt;https?:\\\\/\\\\/(?:www\\\\.)?[-a-zA-Z0-9@:%._\\\\+~#=]{1,256}\\\\.[a-zA-Z0-9()]{1,6}\\\\b(?:[-a-zA-Z0-9()@:%_\\\\+.~#?&amp;\\\\/=]*)&gt;\"\n    p = rf\"ObjectSomeValuesFrom\\({IRI}\\s{IRI}\\)\"\n    if re.match(p, complex_class_str):\n        return True\n    else:\n        return False\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.generate_samples","title":"<code>generate_samples(subsumptions, duplicate=True)</code>","text":"<p>Generate text samples from subsumptions.</p> <p>Parameters:</p> Name Type Description Default <code>subsumptions</code> <code>List[List]</code> <p>A list of subsumptions, each of which of is a two-component list <code>(sub_class_iri, super_class_iri_or_str)</code>.</p> required <code>duplicate</code> <code>bool</code> <p><code>True</code> -- duplicate the positive and negative samples, <code>False</code> -- do not duplicate.</p> <code>True</code> <p>Returns:</p> Type Description <code>List[List]</code> <p>A list of samples, each element is a triple in the form of <code>(sub_class_string, super_class_string, label_index)</code>.</p> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def generate_samples(self, subsumptions: List[List], duplicate: bool = True):\nr\"\"\"Generate text samples from subsumptions.\n\n    Args:\n        subsumptions (List[List]): A list of subsumptions, each of which of is a two-component list `(sub_class_iri, super_class_iri_or_str)`.\n        duplicate (bool): `True` -- duplicate the positive and negative samples, `False` -- do not duplicate.\n\n    Returns:\n        (List[List]): A list of samples, each element is a triple\n            in the form of `(sub_class_string, super_class_string, label_index)`.\n    \"\"\"\n    if duplicate:\n        pos_dup, neg_dup = self.config.fine_tune.train_pos_dup, self.config.fine_tune.train_neg_dup\n    else:\n        pos_dup, neg_dup = 1, 1\n    neg_subsumptions = list()\n    for subs in subsumptions:\n        c1 = subs[0]\n        for _ in range(neg_dup):\n            neg_c = self.get_negative_sample(subclass_iri=c1, subsumption_type=self.config.subsumption_type)\n            if neg_c is not None:\n                neg_subsumptions.append([c1, neg_c])\n    pos_samples = self.subsumptions_to_samples(subsumptions=subsumptions, sample_label=1)\n    pos_samples = pos_dup * pos_samples\n    neg_samples = self.subsumptions_to_samples(subsumptions=neg_subsumptions, sample_label=0)\n    if len(neg_samples) &lt; len(pos_samples):\n        neg_samples = neg_samples + [\n            random.choice(neg_samples) for _ in range(len(pos_samples) - len(neg_samples))\n        ]\n    if len(neg_samples) &gt; len(pos_samples):\n        pos_samples = pos_samples + [\n            random.choice(pos_samples) for _ in range(len(neg_samples) - len(pos_samples))\n        ]\n    print(\"pos_samples: %d, neg_samples: %d\" % (len(pos_samples), len(neg_samples)))\n    all_samples = [s for s in pos_samples + neg_samples if s[0] != \"\" and s[1] != \"\"]\n    random.shuffle(all_samples)\n    return all_samples\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.subsumptions_to_samples","title":"<code>subsumptions_to_samples(subsumptions, sample_label)</code>","text":"<p>Transform subsumptions into samples of strings.</p> <p>Parameters:</p> Name Type Description Default <code>subsumptions</code> <code>List[List]</code> <p>The given subsumptions.</p> required <code>sample_label</code> <code>Union[int, None]</code> <p><code>1</code> (positive), <code>0</code> (negative), <code>None</code> (no label).</p> required <p>Returns:</p> Type Description <code>List[List]</code> <p>A list of samples, each element is a triple in the form of <code>(sub_class_string, super_class_string, label_index)</code>.</p> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def subsumptions_to_samples(self, subsumptions: List[List], sample_label: Union[int, None]):\nr\"\"\"Transform subsumptions into samples of strings.\n\n    Args:\n        subsumptions (List[List]): The given subsumptions.\n        sample_label (Union[int, None]): `1` (positive), `0` (negative), `None` (no label).\n\n    Returns:\n        (List[List]): A list of samples, each element is a triple\n            in the form of `(sub_class_string, super_class_string, label_index)`.\n\n    \"\"\"\n    local_samples = list()\n    for subs in subsumptions:\n        subcls, supcls = subs[0], subs[1]\n        substrs = self.iri_label[subcls] if subcls in self.iri_label and len(self.iri_label[subcls]) &gt; 0 else [\"\"]\n\n        if self.config.subsumption_type == \"named_class\":\n            supstrs = self.iri_label[supcls] if supcls in self.iri_label and len(self.iri_label[supcls]) else [\"\"]\n        else:\n            if supcls in self.restriction_label and len(self.restriction_label[supcls]) &gt; 0:\n                supstrs = self.restriction_label[supcls]\n            else:\n                supstrs = [self.verb.verbalise_class_expression(supcls).verbal]\n\n        if self.config.use_one_label:\n            substrs, supstrs = substrs[0:1], supstrs[0:1]\n\n        if self.config.prompt.prompt_type == \"isolated\":\n            for substr in substrs:\n                for supstr in supstrs:\n                    local_samples.append([substr, supstr])\n\n        elif self.config.prompt.prompt_type == \"traversal\":\n            subs_list_strs = set()\n            for _ in range(self.config.prompt.context_dup):\n                context_sub, no_duplicate = self.traversal_subsumptions(\n                    cls=subcls,\n                    hop=self.config.prompt.prompt_hop,\n                    direction=\"subclass\",\n                    max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n                )\n                subs_list = [self.named_subsumption_to_str(subsum) for subsum in context_sub]\n                subs_list_str = \" &lt;SEP&gt; \".join(subs_list)\n                subs_list_strs.add(subs_list_str)\n                if no_duplicate:\n                    break\n\n            if self.config.subsumption_type == \"named_class\":\n                sups_list_strs = set()\n                for _ in range(self.config.prompt.context_dup):\n                    context_sup, no_duplicate = self.traversal_subsumptions(\n                        cls=supcls,\n                        hop=self.config.prompt.prompt_hop,\n                        direction=\"supclass\",\n                        max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n                    )\n                    sups_list = [self.named_subsumption_to_str(subsum) for subsum in context_sup]\n                    sups_list_str = \" &lt;SEP&gt; \".join(sups_list)\n                    sups_list_strs.add(sups_list_str)\n                    if no_duplicate:\n                        break\n            else:\n                sups_list_strs = set(supstrs)\n\n            for subs_list_str in subs_list_strs:\n                for substr in substrs:\n                    s1 = substr + \" &lt;SEP&gt; \" + subs_list_str\n                    for sups_list_str in sups_list_strs:\n                        for supstr in supstrs:\n                            s2 = supstr + \" &lt;SEP&gt; \" + sups_list_str\n                            local_samples.append([s1, s2])\n\n        elif self.config.prompt.prompt_type == \"path\":\n            sep_token = \"&lt;SUB&gt;\" if self.config.prompt.use_sub_special_token else \"&lt;SEP&gt;\"\n\n            s1_set = set()\n            for _ in range(self.config.prompt.context_dup):\n                context_sub, no_duplicate = self.path_subsumptions(\n                    cls=subcls, hop=self.config.prompt.prompt_hop, direction=\"subclass\"\n                )\n                if len(context_sub) &gt; 0:\n                    s1 = \"\"\n                    for i in range(len(context_sub)):\n                        subsum = context_sub[len(context_sub) - i - 1]\n                        subc = subsum[0]\n                        s1 += \"%s %s \" % (\n                            self.iri_label[subc][0]\n                            if subc in self.iri_label and len(self.iri_label[subc]) &gt; 0\n                            else \"\",\n                            sep_token,\n                        )\n                    for substr in substrs:\n                        s1_set.add(s1 + substr)\n                else:\n                    for substr in substrs:\n                        s1_set.add(\"%s %s\" % (sep_token, substr))\n\n                if no_duplicate:\n                    break\n\n            if self.config.subsumption_type == \"named_class\":\n                s2_set = set()\n                for _ in range(self.config.prompt.context_dup):\n                    context_sup, no_duplicate = self.path_subsumptions(\n                        cls=supcls, hop=self.config.prompt.prompt_hop, direction=\"supclass\"\n                    )\n                    if len(context_sup) &gt; 0:\n                        s2 = \"\"\n                        for subsum in context_sup:\n                            supc = subsum[1]\n                            s2 += \" %s %s\" % (\n                                sep_token,\n                                self.iri_label[supc][0]\n                                if supc in self.iri_label and len(self.iri_label[supc]) &gt; 0\n                                else \"\",\n                            )\n                        for supstr in supstrs:\n                            s2_set.add(supstr + s2)\n                    else:\n                        for supstr in supstrs:\n                            s2_set.add(\"%s %s\" % (supstr, sep_token))\n\n                    if no_duplicate:\n                        break\n            else:\n                s2_set = set(supstrs)\n\n            for s1 in s1_set:\n                for s2 in s2_set:\n                    local_samples.append([s1, s2])\n\n        else:\n            print(f\"unknown context type {self.config.prompt.prompt_type}\")\n            sys.exit(0)\n\n    if sample_label is not None:\n        for i in range(len(local_samples)):\n            local_samples[i].append(sample_label)\n\n    return local_samples\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.get_negative_sample","title":"<code>get_negative_sample(subclass_iri, subsumption_type='named_class')</code>","text":"<p>Given a named subclass, get a negative class for a negative subsumption.</p> <p>Parameters:</p> Name Type Description Default <code>subclass_iri</code> <code>str</code> <p>IRI of a given sub-class.</p> required <code>subsumption_type</code> <code>str</code> <p><code>named_class</code> or <code>restriction</code>.</p> <code>'named_class'</code> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def get_negative_sample(self, subclass_iri: str, subsumption_type: str = \"named_class\"):\nr\"\"\"Given a named subclass, get a negative class for a negative subsumption.\n\n    Args:\n        subclass_iri (str): IRI of a given sub-class.\n        subsumption_type (str): `named_class` or `restriction`.\n    \"\"\"\n    subclass = self.onto.get_owl_object(iri=subclass_iri)\n    if subsumption_type == \"named_class\":\n        if self.config.no_reasoning:\n            parents = self.onto.get_asserted_parents(owl_object=subclass, named_only=True)\n            ancestors = set([str(item.getIRI()) for item in parents])\n        else:\n            ancestors = set(self.onto.reasoner.get_inferred_super_entities(subclass, direct=False))\n        neg_c = random.sample(self.named_classes - ancestors, 1)[0]\n        return neg_c\n    else:\n        for neg_c in random.sample(self.restrictionObjects, 5):\n            if self.config.no_reasoning:\n                return str(neg_c)\n            else:\n                if not self.onto.reasoner.check_subsumption(sub_entity=subclass, super_entity=neg_c):\n                    return str(neg_c)\n        return None\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.named_subsumption_to_str","title":"<code>named_subsumption_to_str(subsum)</code>","text":"<p>Transform a named subsumption into string with <code>&lt;SUB&gt;</code> and classes' labels.</p> <p>Parameters:</p> Name Type Description Default <code>subsum</code> <code>List[Tuple]</code> <p>A list of subsumption pairs in the form of <code>(sub_class_iri, super_class_iri)</code>.</p> required Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def named_subsumption_to_str(self, subsum: List):\nr\"\"\"Transform a named subsumption into string with `&lt;SUB&gt;` and classes' labels.\n\n    Args:\n        subsum (List[Tuple]): A list of subsumption pairs in the form of `(sub_class_iri, super_class_iri)`.\n    \"\"\"\n    subc, supc = subsum[0], subsum[1]\n    subs = self.iri_label[subc][0] if subc in self.iri_label and len(self.iri_label[subc]) &gt; 0 else \"\"\n    sups = self.iri_label[supc][0] if supc in self.iri_label and len(self.iri_label[supc]) &gt; 0 else \"\"\n    return \"%s &lt;SUB&gt; %s\" % (subs, sups)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.subclass_to_strings","title":"<code>subclass_to_strings(subcls)</code>","text":"<p>Transform a sub-class into strings (with the path or traversal context template).</p> <p>Parameters:</p> Name Type Description Default <code>subcls</code> <code>str</code> <p>IRI of the sub-class.</p> required Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def subclass_to_strings(self, subcls):\nr\"\"\"Transform a sub-class into strings (with the path or traversal context template).\n\n    Args:\n        subcls (str): IRI of the sub-class.\n    \"\"\"\n    substrs = self.iri_label[subcls] if subcls in self.iri_label and len(self.iri_label[subcls]) &gt; 0 else [\"\"]\n\n    if self.config.use_one_label:\n        substrs = substrs[0:1]\n\n    if self.config.prompt.prompt_type == \"isolated\":\n        return substrs\n\n    elif self.config.prompt.prompt_type == \"traversal\":\n        subs_list_strs = set()\n        for _ in range(self.config.prompt.context_dup):\n            context_sub, no_duplicate = self.traversal_subsumptions(\n                cls=subcls,\n                hop=self.config.prompt.prompt_hop,\n                direction=\"subclass\",\n                max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n            )\n            subs_list = [self.named_subsumption_to_str(subsum) for subsum in context_sub]\n            subs_list_str = \" &lt;SEP&gt; \".join(subs_list)\n            subs_list_strs.add(subs_list_str)\n            if no_duplicate:\n                break\n\n        strs = list()\n        for subs_list_str in subs_list_strs:\n            for substr in substrs:\n                s1 = substr + \" &lt;SEP&gt; \" + subs_list_str\n                strs.append(s1)\n        return strs\n\n    elif self.config.prompt.prompt_type == \"path\":\n        sep_token = \"&lt;SUB&gt;\" if self.config.prompt.use_sub_special_token else \"&lt;SEP&gt;\"\n\n        s1_set = set()\n        for _ in range(self.config.prompt.context_dup):\n            context_sub, no_duplicate = self.path_subsumptions(\n                cls=subcls, hop=self.config.prompt.prompt_hop, direction=\"subclass\"\n            )\n            if len(context_sub) &gt; 0:\n                s1 = \"\"\n                for i in range(len(context_sub)):\n                    subsum = context_sub[len(context_sub) - i - 1]\n                    subc = subsum[0]\n                    s1 += \"%s %s \" % (\n                        self.iri_label[subc][0]\n                        if subc in self.iri_label and len(self.iri_label[subc]) &gt; 0\n                        else \"\",\n                        sep_token,\n                    )\n                for substr in substrs:\n                    s1_set.add(s1 + substr)\n            else:\n                for substr in substrs:\n                    s1_set.add(\"%s %s\" % (sep_token, substr))\n            if no_duplicate:\n                break\n\n        return list(s1_set)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.supclass_to_strings","title":"<code>supclass_to_strings(supcls, subsumption_type='named_class')</code>","text":"<p>Transform a super-class into strings (with the path or traversal context template if the subsumption type is <code>\"named_class\"</code>).</p> <p>Parameters:</p> Name Type Description Default <code>supcls</code> <code>str</code> <p>IRI of the super-class.</p> required <code>subsumption_type</code> <code>str</code> <p>The type of the subsumption.</p> <code>'named_class'</code> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def supclass_to_strings(self, supcls: str, subsumption_type: str = \"named_class\"):\nr\"\"\"Transform a super-class into strings (with the path or traversal context template if the subsumption type is `\"named_class\"`).\n\n    Args:\n        supcls (str): IRI of the super-class.\n        subsumption_type (str): The type of the subsumption.\n    \"\"\"\n\n    if subsumption_type == \"named_class\":\n        supstrs = self.iri_label[supcls] if supcls in self.iri_label and len(self.iri_label[supcls]) else [\"\"]\n    else:\n        if supcls in self.restriction_label and len(self.restriction_label[supcls]) &gt; 0:\n            supstrs = self.restriction_label[supcls]\n        else:\n            warnings.warn(\"Warning: %s has no descriptions\" % supcls)\n            supstrs = [\"\"]\n\n    if self.config.use_one_label:\n        if subsumption_type == \"named_class\":\n            supstrs = supstrs[0:1]\n\n    if self.config.prompt.prompt_type == \"isolated\":\n        return supstrs\n\n    elif self.config.prompt.prompt_type == \"traversal\":\n        if subsumption_type == \"named_class\":\n            sups_list_strs = set()\n            for _ in range(self.config.prompt.context_dup):\n                context_sup, no_duplicate = self.traversal_subsumptions(\n                    cls=supcls,\n                    hop=self.config.prompt.prompt_hop,\n                    direction=\"supclass\",\n                    max_subsumptions=self.config.prompt.prompt_max_subsumptions,\n                )\n                sups_list = [self.named_subsumption_to_str(subsum) for subsum in context_sup]\n                sups_list_str = \" &lt;SEP&gt; \".join(sups_list)\n                sups_list_strs.add(sups_list_str)\n                if no_duplicate:\n                    break\n\n        else:\n            sups_list_strs = set(supstrs)\n\n        strs = list()\n        for sups_list_str in sups_list_strs:\n            for supstr in supstrs:\n                s2 = supstr + \" &lt;SEP&gt; \" + sups_list_str\n                strs.append(s2)\n        return strs\n\n    elif self.config.prompt.prompt_type == \"path\":\n        sep_token = \"&lt;SUB&gt;\" if self.config.prompt.use_sub_special_token else \"&lt;SEP&gt;\"\n\n        if subsumption_type == \"named_class\":\n            s2_set = set()\n            for _ in range(self.config.prompt.context_dup):\n                context_sup, no_duplicate = self.path_subsumptions(\n                    cls=supcls, hop=self.config.prompt.prompt_hop, direction=\"supclass\"\n                )\n                if len(context_sup) &gt; 0:\n                    s2 = \"\"\n                    for subsum in context_sup:\n                        supc = subsum[1]\n                        s2 += \" %s %s\" % (\n                            sep_token,\n                            self.iri_label[supc][0]\n                            if supc in self.iri_label and len(self.iri_label[supc]) &gt; 0\n                            else \"\",\n                        )\n                    for supstr in supstrs:\n                        s2_set.add(supstr + s2)\n                else:\n                    for supstr in supstrs:\n                        s2_set.add(\"%s %s\" % (supstr, sep_token))\n\n                if no_duplicate:\n                    break\n        else:\n            s2_set = set(supstrs)\n\n        return list(s2_set)\n\n    else:\n        print(\"unknown context type %s\" % self.config.prompt.prompt_type)\n        sys.exit(0)\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.traversal_subsumptions","title":"<code>traversal_subsumptions(cls, hop=1, direction='subclass', max_subsumptions=5)</code>","text":"<p>Given a class, get its subsumptions by traversing the class hierarchy.</p> <pre><code>If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\nIf the class is a super-class in the subsumption axiom, get subsumptions from upside.\n</code></pre> <p>Parameters:</p> Name Type Description Default <code>cls</code> <code>str</code> <p>IRI of a named class.</p> required <code>hop</code> <code>int</code> <p>The depth of the path.</p> <code>1</code> <code>direction</code> <code>str</code> <p><code>subclass</code> (downside path) or <code>supclass</code> (upside path).</p> <code>'subclass'</code> <code>max_subsumptions</code> <code>int</code> <p>The maximum number of subsumptions to consider.</p> <code>5</code> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def traversal_subsumptions(self, cls: str, hop: int = 1, direction: str = \"subclass\", max_subsumptions: int = 5):\nr\"\"\"Given a class, get its subsumptions by traversing the class hierarchy.\n\n        If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\n        If the class is a super-class in the subsumption axiom, get subsumptions from upside.\n\n    Args:\n        cls (str): IRI of a named class.\n        hop (int): The depth of the path.\n        direction (str): `subclass` (downside path) or `supclass` (upside path).\n        max_subsumptions (int): The maximum number of subsumptions to consider.\n    \"\"\"\n    subsumptions = list()\n    seeds = [cls]\n    d = 1\n    no_duplicate = True\n    while d &lt;= hop:\n        new_seeds = list()\n        for s in seeds:\n            if direction == \"subclass\":\n                tmp = self.onto.reasoner.get_inferred_sub_entities(\n                    self.onto.get_owl_object(iri=s), direct=True\n                )\n                if len(tmp) &gt; 1:\n                    no_duplicate = False\n                random.shuffle(tmp)\n                for c in tmp:\n                    if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n                        subsumptions.append([c, s])\n                        if c not in new_seeds:\n                            new_seeds.append(c)\n            elif direction == \"supclass\":\n                tmp = self.onto.reasoner.get_inferred_super_entities(\n                    self.onto.get_owl_object(iri=s), direct=True\n                )\n                if len(tmp) &gt; 1:\n                    no_duplicate = False\n                random.shuffle(tmp)\n                for c in tmp:\n                    if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n                        subsumptions.append([s, c])\n                        if c not in new_seeds:\n                            new_seeds.append(c)\n            else:\n                warnings.warn(\"Unknown direction: %s\" % direction)\n        if len(subsumptions) &gt;= max_subsumptions:\n            subsumptions = random.sample(subsumptions, max_subsumptions)\n            break\n        else:\n            seeds = new_seeds\n            random.shuffle(seeds)\n            d += 1\n    return subsumptions, no_duplicate\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.text_semantics.SubsumptionSampler.path_subsumptions","title":"<code>path_subsumptions(cls, hop=1, direction='subclass')</code>","text":"<p>Given a class, get its path subsumptions.</p> <pre><code>If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\nIf the class is a super-class in the subsumption axiom, get subsumptions from upside.\n</code></pre> <p>Parameters:</p> Name Type Description Default <code>cls</code> <code>str</code> <p>IRI of a named class.</p> required <code>hop</code> <code>int</code> <p>The depth of the path.</p> <code>1</code> <code>direction</code> <code>str</code> <p><code>subclass</code> (downside path) or <code>supclass</code> (upside path).</p> <code>'subclass'</code> Source code in <code>src/deeponto/complete/bertsubs/text_semantics.py</code> <pre><code>def path_subsumptions(self, cls: str, hop: int = 1, direction: str = \"subclass\"):\nr\"\"\"Given a class, get its path subsumptions.\n\n        If the class is a sub-class in the subsumption axiom, get subsumptions from downside.\n        If the class is a super-class in the subsumption axiom, get subsumptions from upside.\n\n    Args:\n        cls (str): IRI of a named class.\n        hop (int): The depth of the path.\n        direction (str): `subclass` (downside path) or `supclass` (upside path).\n    \"\"\"\n    subsumptions = list()\n    seed = cls\n    d = 1\n    no_duplicate = True\n    while d &lt;= hop:\n        if direction == \"subclass\":\n            tmp = self.onto.reasoner.get_inferred_sub_entities(\n                self.onto.get_owl_object(iri=seed), direct=True\n            )\n            if len(tmp) &gt; 1:\n                no_duplicate = False\n            end = True\n            if len(tmp) &gt; 0:\n                random.shuffle(tmp)\n                for c in tmp:\n                    if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n                        subsumptions.append([c, seed])\n                        seed = c\n                        end = False\n                        break\n            if end:\n                break\n        elif direction == \"supclass\":\n            tmp = self.onto.reasoner.get_inferred_super_entities(\n                self.onto.get_owl_object(iri=seed), direct=True\n            )\n            if len(tmp) &gt; 1:\n                no_duplicate = False\n            end = True\n            if len(tmp) &gt; 0:\n                random.shuffle(tmp)\n                for c in tmp:\n                    if not self.onto.check_deprecated(owl_object=self.onto.get_owl_object(iri=c)):\n                        subsumptions.append([seed, c])\n                        seed = c\n                        end = False\n                        break\n            if end:\n                break\n        else:\n            warnings.warn(\"Unknown direction: %s\" % direction)\n\n        d += 1\n    return subsumptions, no_duplicate\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer","title":"<code>BERTSubsumptionClassifierTrainer(bert_checkpoint, train_data, val_data, max_length=128, early_stop=False, early_stop_patience=10)</code>","text":"Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>def __init__(\n    self,\n    bert_checkpoint: str,\n    train_data: List,\n    val_data: List,\n    max_length: int = 128,\n    early_stop: bool = False,\n    early_stop_patience: int = 10,\n):\n    print(f\"initialize BERT for Binary Classification from the Pretrained BERT model at: {bert_checkpoint} ...\")\n\n    # BERT\n    self.model = AutoModelForSequenceClassification.from_pretrained(bert_checkpoint)\n    self.tokenizer = AutoTokenizer.from_pretrained(bert_checkpoint)\n    self.trainer = None\n\n    self.max_length = max_length\n    self.tra = self.load_dataset(train_data, max_length=self.max_length, count_token_size=True)\n    self.val = self.load_dataset(val_data, max_length=self.max_length, count_token_size=True)\n    print(f\"text max length: {self.max_length}\")\n    print(f\"data files loaded with sizes:\")\n    print(f\"\\t[# Train]: {len(self.tra)}, [# Val]: {len(self.val)}\")\n\n    # early stopping\n    self.early_stop = early_stop\n    self.early_stop_patience = early_stop_patience\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.add_special_tokens","title":"<code>add_special_tokens(tokens)</code>","text":"<p>Add additional special tokens into the tokenizer's vocab.</p> <p>Parameters:</p> Name Type Description Default <code>tokens</code> <code>List[str]</code> <p>additional tokens to add, e.g., <code>[\"&lt;SUB&gt;\",\"&lt;EOA&gt;\",\"&lt;EOC&gt;\"]</code></p> required Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>def add_special_tokens(self, tokens: List):\nr\"\"\"Add additional special tokens into the tokenizer's vocab.\n    Args:\n        tokens (List[str]): additional tokens to add, e.g., `[\"&lt;SUB&gt;\",\"&lt;EOA&gt;\",\"&lt;EOC&gt;\"]`\n    \"\"\"\n    special_tokens_dict = {\"additional_special_tokens\": tokens}\n    self.tokenizer.add_special_tokens(special_tokens_dict)\n    self.model.resize_token_embeddings(len(self.tokenizer))\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.train","title":"<code>train(train_args, do_fine_tune=True)</code>","text":"<p>Initiate the Huggingface trainer with input arguments and start training.</p> <p>Parameters:</p> Name Type Description Default <code>train_args</code> <code>TrainingArguments</code> <p>Arguments for training.</p> required <code>do_fine_tune</code> <code>bool</code> <p><code>False</code> means loading the checkpoint without training. Defaults to <code>True</code>.</p> <code>True</code> Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>def train(self, train_args: TrainingArguments, do_fine_tune: bool = True):\nr\"\"\"Initiate the Huggingface trainer with input arguments and start training.\n    Args:\n        train_args (TrainingArguments): Arguments for training.\n        do_fine_tune (bool): `False` means loading the checkpoint without training. Defaults to `True`.\n    \"\"\"\n    self.trainer = Trainer(\n        model=self.model,\n        args=train_args,\n        train_dataset=self.tra,\n        eval_dataset=self.val,\n        compute_metrics=self.compute_metrics,\n        tokenizer=self.tokenizer,\n    )\n    if self.early_stop:\n        self.trainer.add_callback(EarlyStoppingCallback(early_stopping_patience=self.early_stop_patience))\n    if do_fine_tune:\n        self.trainer.train()\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.compute_metrics","title":"<code>compute_metrics(pred)</code>  <code>staticmethod</code>","text":"<p>Auxiliary function to add accurate metric into evaluation.</p> Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>@staticmethod\ndef compute_metrics(pred):\n\"\"\"Auxiliary function to add accurate metric into evaluation.\n    \"\"\"\n    labels = pred.label_ids\n    preds = pred.predictions.argmax(-1)\n    acc = accuracy_score(labels, preds)\n    return {\"accuracy\": acc}\n</code></pre>"},{"location":"deeponto/complete/bertsubs/#deeponto.complete.bertsubs.bert_classifier.BERTSubsumptionClassifierTrainer.load_dataset","title":"<code>load_dataset(data, max_length=512, count_token_size=False)</code>","text":"<p>Load a Huggingface dataset from a list of samples.</p> <p>Parameters:</p> Name Type Description Default <code>data</code> <code>List[Tuple]</code> <p>Data samples in a list.</p> required <code>max_length</code> <code>int</code> <p>Maximum length of the input sequence.</p> <code>512</code> <code>count_token_size</code> <code>bool</code> <p>Whether or not to count the token sizes of the data. Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/complete/bertsubs/bert_classifier.py</code> <pre><code>def load_dataset(self, data: List, max_length: int = 512, count_token_size: bool = False) -&gt; Dataset:\nr\"\"\"Load a Huggingface dataset from a list of samples.\n    Args:\n        data (List[Tuple]): Data samples in a list.\n        max_length (int): Maximum length of the input sequence.\n        count_token_size (bool): Whether or not to count the token sizes of the data. Defaults to `False`.\n    \"\"\"\n    # data_df = pd.DataFrame(data, columns=[\"sent1\", \"sent2\", \"labels\"])\n    # dataset = Dataset.from_pandas(data_df)\n\n    def iterate():\n        for sample in data:\n            yield {\"sent1\": sample[0], \"sent2\": sample[1], \"labels\": sample[2]}\n\n    dataset = Dataset.from_generator(iterate)\n\n    if count_token_size:\n        tokens = self.tokenizer(dataset[\"sent1\"], dataset[\"sent2\"])\n        l_sum, num_128, num_256, num_512, l_max = 0, 0, 0, 0, 0\n        for item in tokens[\"input_ids\"]:\n            l = len(item)\n            l_sum += l\n            if l &lt;= 128:\n                num_128 += 1\n            if l &lt;= 256:\n                num_256 += 1\n            if l &lt;= 512:\n                num_512 += 1\n            if l &gt; l_max:\n                l_max = l\n        print(\"average token size: %.2f\" % (l_sum / len(tokens[\"input_ids\"])))\n        print(\"ratio of token size &lt;= 128: %.3f\" % (num_128 / len(tokens[\"input_ids\"])))\n        print(\"ratio of token size &lt;= 256: %.3f\" % (num_256 / len(tokens[\"input_ids\"])))\n        print(\"ratio of token size &lt;= 512: %.3f\" % (num_512 / len(tokens[\"input_ids\"])))\n        print(\"max token size: %d\" % l_max)\n    dataset = dataset.map(\n        lambda examples: self.tokenizer(\n            examples[\"sent1\"], examples[\"sent2\"], max_length=max_length, truncation=True\n        ),\n        batched=True,\n        num_proc=1,\n    )\n    return dataset\n</code></pre>"},{"location":"deeponto/onto/normalisation/","title":"Ontology Pruning","text":""},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser","title":"<code>OntologyNormaliser()</code>","text":"<p>Class for ontology normalisation.</p> <p>Credit</p> <p>The code of this class originates from the mOWL library, which utilises the normalisation functionality from the Java library <code>Jcel</code>.</p> <p>The normalisation process transforms ontology axioms into normal forms in the Description Logic \\(\\mathcal{EL}\\), including:</p> <ul> <li>\\(C \\sqsubseteq D\\)</li> <li>\\(C \\sqcap C' \\sqsubseteq D\\)</li> <li>\\(C \\sqsubseteq \\exists r.D\\)</li> <li>\\(\\exists r.C \\sqsubseteq D\\)</li> </ul> <p>where \\(C\\) and \\(C'\\) can be named concepts or \\(\\top\\), \\(D\\) is a named concept or \\(\\bot\\), \\(r\\) is a role (property).</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The input ontology to be normalised.</p> <code>temp_super_class_index</code> <code>Dict[OWLCLassExpression, OWLClass]</code> <p>A dictionary in the form of <code>{complex_sub_class: temp_super_class}</code>, which means <code>temp_super_class</code> is created during the normalisation of a complex subsumption axiom that has <code>complex_sub_class</code> as the sub-class.</p> Source code in <code>src/deeponto/onto/normalisation.py</code> <pre><code>def __init__(self):\n    return\n</code></pre>"},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser.normalise","title":"<code>normalise(ontology)</code>","text":"<p>Performs the \\(\\mathcal{EL}\\) normalisation.</p> <p>Parameters:</p> Name Type Description Default <code>ontology</code> <code>Ontology</code> <p>An ontology to be normalised.</p> required <p>Returns:</p> Type Description <code>list[OWLAxiom]</code> <p>A list of normalised TBox axioms.</p> Source code in <code>src/deeponto/onto/normalisation.py</code> <pre><code>def normalise(self, ontology: Ontology):\nr\"\"\"Performs the $\\mathcal{EL}$ normalisation.\n\n    Args:\n        ontology (Ontology): An ontology to be normalised.\n\n    Returns:\n        (list[OWLAxiom]): A list of normalised TBox axioms.\n    \"\"\"\n\n    processed_owl_onto = self.preprocess_ontology(ontology)\n    root_ont = processed_owl_onto\n    translator = Translator(\n        processed_owl_onto.getOWLOntologyManager().getOWLDataFactory(), IntegerOntologyObjectFactoryImpl()\n    )\n    axioms = HashSet()\n    axioms.addAll(root_ont.getAxioms())\n    translator.getTranslationRepository().addAxiomEntities(root_ont)\n\n    for ont in root_ont.getImportsClosure():\n        axioms.addAll(ont.getAxioms())\n        translator.getTranslationRepository().addAxiomEntities(ont)\n\n    intAxioms = translator.translateSA(axioms)\n\n    normaliser = OntologyNormalizer()\n\n    factory = IntegerOntologyObjectFactoryImpl()\n    normalised_ontology = normaliser.normalize(intAxioms, factory)\n    self.rTranslator = ReverseAxiomTranslator(translator, processed_owl_onto)\n\n    normalised_axioms = []\n    # revert the jcel axioms to the original OWLAxioms\n    for ax in normalised_ontology:\n        try:\n            axiom = self.rTranslator.visit(ax)\n            normalised_axioms.append(axiom)\n        except Exception as e:\n            logging.info(\"Reverse translation. Ignoring axiom: %s\", ax)\n            logging.info(e)\n\n    return list(set(axioms))\n</code></pre>"},{"location":"deeponto/onto/normalisation/#deeponto.onto.normalisation.OntologyNormaliser.preprocess_ontology","title":"<code>preprocess_ontology(ontology)</code>","text":"<p>Preprocess the ontology to remove axioms that are not supported by the normalisation process.</p> Source code in <code>src/deeponto/onto/normalisation.py</code> <pre><code>def preprocess_ontology(self, ontology: Ontology):\n\"\"\"Preprocess the ontology to remove axioms that are not supported by the normalisation process.\"\"\"\n\n    tbox_axioms = ontology.owl_onto.getTBoxAxioms(Imports.fromBoolean(True))\n    new_tbox_axioms = HashSet()\n\n    for axiom in tbox_axioms:\n        axiom_as_str = axiom.toString()\n\n        if \"UnionOf\" in axiom_as_str:\n            continue\n        elif \"MinCardinality\" in axiom_as_str:\n            continue\n        elif \"ComplementOf\" in axiom_as_str:\n            continue\n        elif \"AllValuesFrom\" in axiom_as_str:\n            continue\n        elif \"MaxCardinality\" in axiom_as_str:\n            continue\n        elif \"ExactCardinality\" in axiom_as_str:\n            continue\n        elif \"Annotation\" in axiom_as_str:\n            continue\n        elif \"ObjectHasSelf\" in axiom_as_str:\n            continue\n        elif \"urn:swrl\" in axiom_as_str:\n            continue\n        elif \"EquivalentObjectProperties\" in axiom_as_str:\n            continue\n        elif \"SymmetricObjectProperty\" in axiom_as_str:\n            continue\n        elif \"AsymmetricObjectProperty\" in axiom_as_str:\n            continue\n        elif \"ObjectOneOf\" in axiom_as_str:\n            continue\n        else:\n            new_tbox_axioms.add(axiom)\n\n    processed_owl_onto = ontology.owl_manager.createOntology(new_tbox_axioms)\n    # NOTE: the returned object is `owlapi.OWLOntology` not `deeponto.onto.Ontology`\n    return processed_owl_onto\n</code></pre>"},{"location":"deeponto/onto/ontology/","title":"Ontology","text":"<p>Python classes in this page are strongly dependent on the OWLAPI library.  The base class <code>Ontology</code> extends several features including convenient access to specially defined entities (e.g., <code>owl:Thing</code> and <code>owl:Nothing</code>), indexing of entities in the signature with their IRIs as keys, and some other customised functions for specific ontology engineering purposes. <code>Ontology</code> also has an  <code>OntologyReasoner</code> attribute which provides reasoning facilities such as classifying entities, checking entailment, and so on. Users who are familiar with the OWLAPI should feel relatively easy to extend the Python classes here.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology","title":"<code>Ontology(owl_path, reasoner_type='hermit')</code>","text":"<p>Ontology class that extends from the Java library OWLAPI.</p> <p>Typing from OWLAPI</p> <p>Types with <code>OWL</code> prefix are mostly imported from the OWLAPI library by, for example, <code>from org.semanticweb.owlapi.model import OWLObject</code>.</p> <p>Attributes:</p> Name Type Description <code>owl_path</code> <code>str</code> <p>The path to the OWL ontology file.</p> <code>owl_manager</code> <code>OWLOntologyManager</code> <p>A ontology manager for creating <code>OWLOntology</code>.</p> <code>owl_onto</code> <code>OWLOntology</code> <p>An <code>OWLOntology</code> created by <code>owl_manger</code> from <code>owl_path</code>.</p> <code>owl_iri</code> <code>str</code> <p>The IRI of the <code>owl_onto</code>.</p> <code>owl_classes</code> <code>dict[str, OWLClass]</code> <p>A dictionary that stores the <code>(iri, ontology_class)</code> pairs.</p> <code>owl_object_properties</code> <code>dict[str, OWLObjectProperty]</code> <p>A dictionary that stores the <code>(iri, ontology_object_property)</code> pairs.</p> <code>owl_data_properties</code> <code>dict[str, OWLDataProperty]</code> <p>A dictionary that stores the <code>(iri, ontology_data_property)</code> pairs.</p> <code>owl_annotation_properties</code> <code>dict[str, OWLAnnotationProperty]</code> <p>A dictionary that stores the <code>(iri, ontology_annotation_property)</code> pairs.</p> <code>owl_individuals</code> <code>dict[str, OWLIndividual]</code> <p>A dictionary that stores the <code>(iri, ontology_individual)</code> pairs.</p> <code>owl_data_factory</code> <code>OWLDataFactory</code> <p>A data factory for manipulating axioms.</p> <code>reasoner_type</code> <code>str</code> <p>The type of reasoner used. Defaults to <code>\"hermit\"</code>. Options are <code>[\"hermit\", \"elk\", \"struct\"]</code>.</p> <code>reasoner</code> <code>OntologyReasoner</code> <p>A reasoner for ontology inference.</p> <p>Parameters:</p> Name Type Description Default <code>owl_path</code> <code>str</code> <p>The path to the OWL ontology file.</p> required <code>reasoner_type</code> <code>str</code> <p>The type of reasoner used. Defaults to <code>\"hermit\"</code>. Options are <code>[\"hermit\", \"elk\", \"struct\"]</code>.</p> <code>'hermit'</code> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def __init__(self, owl_path: str, reasoner_type: str = \"hermit\"):\n\"\"\"Initialise a new ontology.\n\n    Args:\n        owl_path (str): The path to the OWL ontology file.\n        reasoner_type (str): The type of reasoner used. Defaults to `\"hermit\"`. Options are `[\"hermit\", \"elk\", \"struct\"]`.\n    \"\"\"\n    self.owl_path = os.path.abspath(owl_path)\n    self.owl_manager = OWLManager.createOWLOntologyManager()\n    self.owl_onto = self.owl_manager.loadOntologyFromOntologyDocument(IRI.create(File(self.owl_path)))\n    self.owl_iri = str(self.owl_onto.getOntologyID().getOntologyIRI().get())\n    self.owl_classes = self._get_owl_objects(\"Classes\")\n    self.owl_object_properties = self._get_owl_objects(\"ObjectProperties\")\n    # for some reason the top object property is included\n    if OWL_TOP_OBJECT_PROPERTY in self.owl_object_properties.keys():\n        del self.owl_object_properties[OWL_TOP_OBJECT_PROPERTY]\n    self.owl_data_properties = self._get_owl_objects(\"DataProperties\")\n    self.owl_data_factory = self.owl_manager.getOWLDataFactory()\n    self.owl_annotation_properties = self._get_owl_objects(\"AnnotationProperties\")\n    self.owl_individuals = self._get_owl_objects(\"Individuals\")\n\n    # reasoning\n    self.reasoner_type = reasoner_type\n    self.reasoner = OntologyReasoner(self, self.reasoner_type)\n\n    # hidden attributes\n    self._multi_children_classes = None\n    self._sibling_class_groups = None\n    self._axiom_type = AxiomType  # for development use\n\n    # summary\n    self.info = {\n        type(self).__name__: {\n            \"loaded_from\": os.path.basename(self.owl_path),\n            \"num_classes\": len(self.owl_classes),\n            \"num_object_properties\": len(self.owl_object_properties),\n            \"num_data_properties\": len(self.owl_data_properties),\n            \"num_annotation_properties\": len(self.owl_annotation_properties),\n            \"num_individuals\": len(self.owl_individuals),\n            \"reasoner_type\": self.reasoner_type,\n        }\n    }\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.name","title":"<code>name</code>  <code>property</code>","text":"<p>Return the name of the ontology file.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLThing","title":"<code>OWLThing</code>  <code>property</code>","text":"<p>Return <code>OWLThing</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLNothing","title":"<code>OWLNothing</code>  <code>property</code>","text":"<p>Return <code>OWLNoThing</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLTopObjectProperty","title":"<code>OWLTopObjectProperty</code>  <code>property</code>","text":"<p>Return <code>OWLTopObjectProperty</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLBottomObjectProperty","title":"<code>OWLBottomObjectProperty</code>  <code>property</code>","text":"<p>Return <code>OWLBottomObjectProperty</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLTopDataProperty","title":"<code>OWLTopDataProperty</code>  <code>property</code>","text":"<p>Return <code>OWLTopDataProperty</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.OWLBottomDataProperty","title":"<code>OWLBottomDataProperty</code>  <code>property</code>","text":"<p>Return <code>OWLBottomDataProperty</code>.</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.sibling_class_groups","title":"<code>sibling_class_groups: List[List[str]]</code>  <code>property</code>","text":"<p>Return grouped sibling classes (with a common direct parent);</p> <p>NOTE that only groups with size &gt; 1 will be considered</p>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_entity_type","title":"<code>get_entity_type(entity, is_singular=False)</code>  <code>staticmethod</code>","text":"<p>A handy method to get the <code>type</code> of an <code>OWLObject</code> entity.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef get_entity_type(entity: OWLObject, is_singular: bool = False):\n\"\"\"A handy method to get the `type` of an `OWLObject` entity.\"\"\"\n    if isinstance(entity, OWLClassExpression):\n        return \"Classes\" if not is_singular else \"Class\"\n    elif isinstance(entity, OWLObjectPropertyExpression):\n        return \"ObjectProperties\" if not is_singular else \"ObjectProperty\"\n    elif isinstance(entity, OWLDataPropertyExpression):\n        return \"DataProperties\" if not is_singular else \"DataProperty\"\n    elif isinstance(entity, OWLIndividual):\n        return \"Individuals\" if not is_singular else \"Individual\"\n    else:\n        # NOTE: add further options in future\n        pass\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_max_jvm_memory","title":"<code>get_max_jvm_memory()</code>  <code>staticmethod</code>","text":"<p>Get the maximum heap size assigned to the JVM.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef get_max_jvm_memory():\n\"\"\"Get the maximum heap size assigned to the JVM.\"\"\"\n    if jpype.isJVMStarted():\n        return int(Runtime.getRuntime().maxMemory())\n    else:\n        raise RuntimeError(\"Cannot retrieve JVM memory as it is not started.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_owl_object","title":"<code>get_owl_object(iri)</code>","text":"<p>Get an <code>OWLObject</code> given its IRI.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_owl_object(self, iri: str):\n\"\"\"Get an `OWLObject` given its IRI.\"\"\"\n    if iri in self.owl_classes.keys():\n        return self.owl_classes[iri]\n    elif iri in self.owl_object_properties.keys():\n        return self.owl_object_properties[iri]\n    elif iri in self.owl_data_properties.keys():\n        return self.owl_data_properties[iri]\n    elif iri in self.owl_annotation_properties.keys():\n        return self.owl_annotation_properties[iri]\n    elif iri in self.owl_individuals.keys():\n        return self.owl_individuals[iri]\n    else:\n        raise KeyError(f\"Cannot retrieve unknown IRI: {iri}.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_iri","title":"<code>get_iri(owl_object)</code>","text":"<p>Get the IRI of an <code>OWLObject</code>. Raises an exception if there is no associated IRI.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_iri(self, owl_object: OWLObject):\n\"\"\"Get the IRI of an `OWLObject`. Raises an exception if there is no associated IRI.\"\"\"\n    try:\n        return str(owl_object.getIRI())\n    except:\n        raise RuntimeError(\"Input owl object does not have IRI.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_axiom_type","title":"<code>get_axiom_type(axiom)</code>  <code>staticmethod</code>","text":"<p>Get the axiom type (in <code>str</code>) for the given axiom.</p> <p>Check full list at: http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/AxiomType.html.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef get_axiom_type(axiom: OWLAxiom):\nr\"\"\"Get the axiom type (in `str`) for the given axiom.\n\n    Check full list at: &lt;http://owlcs.github.io/owlapi/apidocs_5/org/semanticweb/owlapi/model/AxiomType.html&gt;.\n    \"\"\"\n    return str(axiom.getAxiomType())\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_all_axioms","title":"<code>get_all_axioms()</code>","text":"<p>Return all axioms (in a list) asserted in the ontology.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_all_axioms(self):\n\"\"\"Return all axioms (in a list) asserted in the ontology.\"\"\"\n    return list(self.owl_onto.getAxioms())\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_subsumption_axioms","title":"<code>get_subsumption_axioms(entity_type='Classes')</code>","text":"<p>Return subsumption axioms (subject to input entity type) asserted in the ontology.</p> <p>Parameters:</p> Name Type Description Default <code>entity_type</code> <code>str</code> <p>The entity type to be considered. Defaults to <code>\"Classes\"</code>. Options are <code>\"Classes\"</code>, <code>\"ObjectProperties\"</code>, <code>\"DataProperties\"</code>, and <code>\"AnnotationProperties\"</code>.</p> <code>'Classes'</code> <p>Returns:</p> Type Description <code>List[OWLAxiom]</code> <p>A list of equivalence axioms subject to input entity type.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_subsumption_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return subsumption axioms (subject to input entity type) asserted in the ontology.\n\n    Args:\n        entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n            Options are `\"Classes\"`, `\"ObjectProperties\"`, `\"DataProperties\"`, and `\"AnnotationProperties\"`.\n    Returns:\n        (List[OWLAxiom]): A list of equivalence axioms subject to input entity type.\n    \"\"\"\n    if entity_type == \"Classes\":\n        return list(self.owl_onto.getAxioms(AxiomType.SUBCLASS_OF))\n    elif entity_type == \"ObjectProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.SUB_OBJECT_PROPERTY))\n    elif entity_type == \"DataProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.SUB_DATA_PROPERTY))\n    elif entity_type == \"AnnotationProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.SUB_ANNOTATION_PROPERTY_OF))\n    else:\n        raise ValueError(f\"Unknown entity type {entity_type}.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_equivalence_axioms","title":"<code>get_equivalence_axioms(entity_type='Classes')</code>","text":"<p>Return equivalence axioms (subject to input entity type) asserted in the ontology.</p> <p>Parameters:</p> Name Type Description Default <code>entity_type</code> <code>str</code> <p>The entity type to be considered. Defaults to <code>\"Classes\"</code>. Options are <code>\"Classes\"</code>, <code>\"ObjectProperties\"</code>, and <code>\"DataProperties\"</code>.</p> <code>'Classes'</code> <p>Returns:</p> Type Description <code>list[OWLAxiom]</code> <p>A list of equivalence axioms subject to input entity type.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_equivalence_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return equivalence axioms (subject to input entity type) asserted in the ontology.\n\n    Args:\n        entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n            Options are `\"Classes\"`, `\"ObjectProperties\"`, and `\"DataProperties\"`.\n    Returns:\n        (list[OWLAxiom]): A list of equivalence axioms subject to input entity type.\n    \"\"\"\n    if entity_type == \"Classes\":\n        return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_CLASSES))\n    elif entity_type == \"ObjectProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_OBJECT_PROPERTIES))\n    elif entity_type == \"DataProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.EQUIVALENT_DATA_PROPERTIES))\n    else:\n        raise ValueError(f\"Unknown entity type {entity_type}.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_assertion_axioms","title":"<code>get_assertion_axioms(entity_type='Classes')</code>","text":"<p>Return assertion (ABox) axioms (subject to input entity type) asserted in the ontology.</p> <p>Parameters:</p> Name Type Description Default <code>entity_type</code> <code>str</code> <p>The entity type to be considered. Defaults to <code>\"Classes\"</code>. Options are <code>\"Classes\"</code>, <code>\"ObjectProperties\"</code>, and <code>\"DataProperties\"</code>.</p> <code>'Classes'</code> <p>Returns:</p> Type Description <code>list[OWLAxiom]</code> <p>A list of assertion axioms subject to input entity type.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_assertion_axioms(self, entity_type: str = \"Classes\"):\n\"\"\"Return assertion (ABox) axioms (subject to input entity type) asserted in the ontology.\n\n    Args:\n        entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n            Options are `\"Classes\"`, `\"ObjectProperties\"`, and `\"DataProperties\"`.\n    Returns:\n        (list[OWLAxiom]): A list of assertion axioms subject to input entity type.\n    \"\"\"\n    if entity_type == \"Classes\":\n        return list(self.owl_onto.getAxioms(AxiomType.CLASS_ASSERTION))\n    elif entity_type == \"ObjectProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.OBJECT_PROPERTY_ASSERTION))\n    elif entity_type == \"DataProperties\":\n        return list(self.owl_onto.getAxioms(AxiomType.DATA_PROPERTY_ASSERTION))\n    elif entity_type == \"Annotations\":\n        return list(self.owl_onto.getAxioms(AxiomType.ANNOTATION_ASSERTION))\n    else:\n        raise ValueError(f\"Unknown entity type {entity_type}.\")\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_parents","title":"<code>get_asserted_parents(owl_object, named_only=False)</code>","text":"<p>Get all the asserted parents of a given owl object.</p> <p>Parameters:</p> Name Type Description Default <code>owl_object</code> <code>OWLObject</code> <p>An owl object that could have a parent.</p> required <code>named_only</code> <code>bool</code> <p>If <code>True</code>, return parents that are named classes.</p> <code>False</code> <p>Returns:</p> Type Description <code>set[OWLObject]</code> <p>The parent set of the given owl object.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_asserted_parents(self, owl_object: OWLObject, named_only: bool = False):\nr\"\"\"Get all the asserted parents of a given owl object.\n\n    Args:\n        owl_object (OWLObject): An owl object that could have a parent.\n        named_only (bool): If `True`, return parents that are named classes.\n    Returns:\n        (set[OWLObject]): The parent set of the given owl object.\n    \"\"\"\n    entity_type = self.get_entity_type(owl_object)\n    if entity_type == \"Classes\":\n        parents = set(EntitySearcher.getSuperClasses(owl_object, self.owl_onto))\n    elif entity_type.endswith(\"Properties\"):\n        parents = set(EntitySearcher.getSuperProperties(owl_object, self.owl_onto))\n    else:\n        raise ValueError(f\"Unsupported entity type {entity_type}.\")\n    if named_only:\n        parents = set([p for p in parents if self.check_named_entity(p)])\n    return parents\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_children","title":"<code>get_asserted_children(owl_object, named_only=False)</code>","text":"<p>Get all the asserted children of a given owl object.</p> <p>Parameters:</p> Name Type Description Default <code>owl_object</code> <code>OWLObject</code> <p>An owl object that could have a child.</p> required <code>named_only</code> <code>bool</code> <p>If <code>True</code>, return children that are named classes.</p> <code>False</code> <p>Returns:</p> Type Description <code>set[OWLObject]</code> <p>The children set of the given owl object.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_asserted_children(self, owl_object: OWLObject, named_only: bool = False):\nr\"\"\"Get all the asserted children of a given owl object.\n\n    Args:\n        owl_object (OWLObject): An owl object that could have a child.\n        named_only (bool): If `True`, return children that are named classes.\n    Returns:\n        (set[OWLObject]): The children set of the given owl object.\n    \"\"\"\n    entity_type = self.get_entity_type(owl_object)\n    if entity_type == \"Classes\":\n        children = set(EntitySearcher.getSubClasses(owl_object, self.owl_onto))\n    elif entity_type.endswith(\"Properties\"):\n        children = set(EntitySearcher.getSubProperties(owl_object, self.owl_onto))\n    else:\n        raise ValueError(f\"Unsupported entity type {entity_type}.\")\n    if named_only:\n        children = set([c for c in children if self.check_named_entity(c)])\n    return children\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_asserted_complex_classes","title":"<code>get_asserted_complex_classes(gci_only=False)</code>","text":"<p>Get complex classes that occur in at least one of the ontology axioms.</p> <p>Parameters:</p> Name Type Description Default <code>gci_only</code> <code>bool</code> <p>If <code>True</code>, consider complex classes that occur in GCIs only; otherwise consider those that occur in equivalence axioms as well.</p> <code>False</code> <p>Returns:</p> Type Description <code>set[OWLClassExpression]</code> <p>A set of complex classes.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_asserted_complex_classes(self, gci_only: bool = False):\n\"\"\"Get complex classes that occur in at least one of the ontology axioms.\n\n    Args:\n        gci_only (bool): If `True`, consider complex classes that occur in GCIs only; otherwise consider\n            those that occur in equivalence axioms as well.\n    Returns:\n        (set[OWLClassExpression]): A set of complex classes.\n    \"\"\"\n    complex_classes = []\n\n    for gci in self.get_subsumption_axioms(\"Classes\"):\n        super_class = gci.getSuperClass()\n        sub_class = gci.getSubClass()\n        if not OntologyReasoner.has_iri(super_class):\n            complex_classes.append(super_class)\n        if not OntologyReasoner.has_iri(sub_class):\n            complex_classes.append(sub_class)\n\n    # also considering equivalence axioms\n    if not gci_only:\n        for eq in self.get_equivalence_axioms(\"Classes\"):\n            gci = list(eq.asOWLSubClassOfAxioms())[0]\n            super_class = gci.getSuperClass()\n            sub_class = gci.getSubClass()\n            if not OntologyReasoner.has_iri(super_class):\n                complex_classes.append(super_class)\n            if not OntologyReasoner.has_iri(sub_class):\n                complex_classes.append(sub_class)\n\n    return set(complex_classes)\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.get_annotations","title":"<code>get_annotations(owl_object, annotation_property_iri=None, annotation_language_tag=None, apply_lowercasing=False, normalise_identifiers=False)</code>","text":"<p>Get the annotation literals of the given <code>OWLObject</code>.</p> <p>Parameters:</p> Name Type Description Default <code>owl_object</code> <code>Union[OWLObject, str]</code> <p>An <code>OWLObject</code> or its IRI.</p> required <code>annotation_property_iri</code> <code>str</code> <p>Any particular annotation property IRI of interest. Defaults to <code>None</code>.</p> <code>None</code> <code>annotation_language_tag</code> <code>str</code> <p>Any particular annotation language tag of interest; NOTE that not every annotation has a language tag, in this case assume it is in English. Defaults to <code>None</code>. Options are <code>\"en\"</code>, <code>\"ge\"</code> etc.</p> <code>None</code> <code>apply_lowercasing</code> <code>bool</code> <p>Whether or not to apply lowercasing to annotation literals. Defaults to <code>False</code>.</p> <code>False</code> <code>normalise_identifiers</code> <code>bool</code> <p>Whether to normalise annotation text that is in the Java identifier format. Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>set[str]</code> <p>A set of annotation literals of the given <code>OWLObject</code>.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_annotations(\n    self,\n    owl_object: Union[OWLObject, str],\n    annotation_property_iri: Optional[str] = None,\n    annotation_language_tag: Optional[str] = None,\n    apply_lowercasing: bool = False,\n    normalise_identifiers: bool = False,\n):\n\"\"\"Get the annotation literals of the given `OWLObject`.\n\n    Args:\n        owl_object (Union[OWLObject, str]): An `OWLObject` or its IRI.\n        annotation_property_iri (str, optional):\n            Any particular annotation property IRI of interest. Defaults to `None`.\n        annotation_language_tag (str, optional):\n            Any particular annotation language tag of interest; NOTE that not every\n            annotation has a language tag, in this case assume it is in English.\n            Defaults to `None`. Options are `\"en\"`, `\"ge\"` etc.\n        apply_lowercasing (bool): Whether or not to apply lowercasing to annotation literals.\n            Defaults to `False`.\n        normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format.\n            Defaults to `False`.\n    Returns:\n        (set[str]): A set of annotation literals of the given `OWLObject`.\n    \"\"\"\n    if isinstance(owl_object, str):\n        owl_object = self.get_owl_object(owl_object)\n\n    annotation_property = None\n    if annotation_property_iri:\n        # return an empty list if `annotation_property_iri` does not exist in this OWLOntology`\n        annotation_property = self.get_owl_object(annotation_property_iri)\n\n    annotations = []\n    for annotation in EntitySearcher.getAnnotations(owl_object, self.owl_onto, annotation_property):\n        annotation = annotation.getValue()\n        # boolean that indicates whether the annotation's language is of interest\n        fit_language = False\n        if not annotation_language_tag:\n            # it is set to `True` if `annotation_langauge` is not specified\n            fit_language = True\n        else:\n            # restrict the annotations to a language if specified\n            try:\n                # NOTE: not every annotation has a language attribute\n                fit_language = annotation.getLang() == annotation_language_tag\n            except:\n                # in the case when this annotation has no language tag\n                # we assume it is in English\n                if annotation_language_tag == \"en\":\n                    fit_language = True\n\n        if fit_language:\n            # only get annotations that have a literal value\n            if annotation.isLiteral():\n                annotations.append(\n                    process_annotation_literal(\n                        str(annotation.getLiteral()), apply_lowercasing, normalise_identifiers\n                    )\n                )\n\n    return uniqify(annotations)\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_consistency","title":"<code>check_consistency()</code>","text":"<p>Check if the ontology is consistent according to the pre-loaded reasoner.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_consistency(self):\n\"\"\"Check if the ontology is consistent according to the pre-loaded reasoner.\n    \"\"\"\n    logging.info(f\"Checking consistency with `{self.reasoner_type}` reasoner.\")\n    return self.reasoner.owl_reasoner.isConsistent()\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_named_entity","title":"<code>check_named_entity(owl_object)</code>","text":"<p>Check if the input entity is a named atomic entity. That is, it is not a complex entity, \\(\\top\\), or \\(\\bot\\).</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_named_entity(self, owl_object: OWLObject):\nr\"\"\"Check if the input entity is a named atomic entity. That is,\n    it is not a complex entity, $\\top$, or $\\bot$.\n    \"\"\"\n    entity_type = self.get_entity_type(owl_object)\n    top = TOP_BOTTOMS[entity_type].TOP\n    bottom = TOP_BOTTOMS[entity_type].BOTTOM\n    if OntologyReasoner.has_iri(owl_object):\n        iri = str(owl_object.getIRI())\n        # check if the entity is TOP or BOTTOM\n        return iri != top and iri != bottom\n    return False\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.check_deprecated","title":"<code>check_deprecated(owl_object)</code>","text":"<p>Check if the given OWL object is marked as deprecated according to \\(\\texttt{owl:deprecated}\\).</p> <p>NOTE: the string literal indicating deprecation is either <code>'true'</code> or <code>'True'</code>. Also, if \\(\\texttt{owl:deprecated}\\) is not defined in this ontology, return <code>False</code> by default.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_deprecated(self, owl_object: OWLObject):\nr\"\"\"Check if the given OWL object is marked as deprecated according to $\\texttt{owl:deprecated}$.\n\n    NOTE: the string literal indicating deprecation is either `'true'` or `'True'`. Also, if $\\texttt{owl:deprecated}$\n    is not defined in this ontology, return `False` by default.\n    \"\"\"\n    if not OWL_DEPRECATED in self.owl_annotation_properties.keys():\n        # return False if owl:deprecated is not defined in this ontology\n        return False\n\n    deprecated = self.get_annotations(owl_object, annotation_property_iri=OWL_DEPRECATED)\n    if deprecated and (list(deprecated)[0] == \"true\" or list(deprecated)[0] == \"True\"):\n        return True\n    else:\n        return False\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.save_onto","title":"<code>save_onto(save_path)</code>","text":"<p>Save the ontology file to the given path.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def save_onto(self, save_path: str):\n\"\"\"Save the ontology file to the given path.\"\"\"\n    self.owl_onto.saveOntology(IRI.create(File(save_path).toURI()))\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.build_annotation_index","title":"<code>build_annotation_index(annotation_property_iris=[RDFS_LABEL], entity_type='Classes', apply_lowercasing=False, normalise_identifiers=False)</code>","text":"<p>Build an annotation index for a given type of entities.</p> <p>Parameters:</p> Name Type Description Default <code>annotation_property_iris</code> <code>list[str]</code> <p>A list of annotation property IRIs (it is possible that not every annotation property IRI is in use); if not provided, the built-in <code>rdfs:label</code> is considered. Defaults to <code>[RDFS_LABEL]</code>.</p> <code>[RDFS_LABEL]</code> <code>entity_type</code> <code>str</code> <p>The entity type to be considered. Defaults to <code>\"Classes\"</code>. Options are <code>\"Classes\"</code>, <code>\"ObjectProperties\"</code>, <code>\"DataProperties\"</code>, etc.</p> <code>'Classes'</code> <code>apply_lowercasing</code> <code>bool</code> <p>Whether or not to apply lowercasing to annotation literals. Defaults to <code>True</code>.</p> <code>False</code> <code>normalise_identifiers</code> <code>bool</code> <p>Whether to normalise annotation text that is in the Java identifier format. Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>Tuple[dict, list[str]]</code> <p>The built annotation index, and the list of annotation property IRIs that are in use.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def build_annotation_index(\n    self,\n    annotation_property_iris: List[str] = [RDFS_LABEL],\n    entity_type: str = \"Classes\",\n    apply_lowercasing: bool = False,\n    normalise_identifiers: bool = False,\n):\n\"\"\"Build an annotation index for a given type of entities.\n\n    Args:\n        annotation_property_iris (list[str]): A list of annotation property IRIs (it is possible\n            that not every annotation property IRI is in use); if not provided, the built-in\n            `rdfs:label` is considered. Defaults to `[RDFS_LABEL]`.\n        entity_type (str, optional): The entity type to be considered. Defaults to `\"Classes\"`.\n            Options are `\"Classes\"`, `\"ObjectProperties\"`, `\"DataProperties\"`, etc.\n        apply_lowercasing (bool): Whether or not to apply lowercasing to annotation literals.\n            Defaults to `True`.\n        normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format.\n            Defaults to `False`.\n\n    Returns:\n        (Tuple[dict, list[str]]): The built annotation index, and the list of annotation property IRIs that are in use.\n    \"\"\"\n\n    annotation_index = defaultdict(set)\n    # example: Classes =&gt; owl_classes; ObjectProperties =&gt; owl_object_properties\n    entity_type = \"owl_\" + split_java_identifier(entity_type).replace(\" \", \"_\").lower()\n    entity_index = getattr(self, entity_type)\n\n    # preserve available annotation properties\n    annotation_property_iris = [\n        airi for airi in annotation_property_iris if airi in self.owl_annotation_properties.keys()\n    ]\n\n    # build the annotation index without duplicated literals\n    for airi in annotation_property_iris:\n        for iri, entity in entity_index.items():\n            annotation_index[iri].update(\n                self.get_annotations(\n                    owl_object=entity,\n                    annotation_property_iri=airi,\n                    annotation_language_tag=None,\n                    apply_lowercasing=apply_lowercasing,\n                    normalise_identifiers=normalise_identifiers,\n                )\n            )\n\n    return annotation_index, annotation_property_iris\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.build_inverted_annotation_index","title":"<code>build_inverted_annotation_index(annotation_index, tokenizer)</code>  <code>staticmethod</code>","text":"<p>Build an inverted annotation index given an annotation index and a tokenizer.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef build_inverted_annotation_index(annotation_index: dict, tokenizer: Tokenizer):\n\"\"\"Build an inverted annotation index given an annotation index and a tokenizer.\"\"\"\n    return InvertedIndex(annotation_index, tokenizer)\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.add_axiom","title":"<code>add_axiom(owl_axiom, return_undo=True)</code>","text":"<p>Add an axiom into the current ontology.</p> <p>Parameters:</p> Name Type Description Default <code>owl_axiom</code> <code>OWLAxiom</code> <p>An axiom to be added.</p> required <code>return_undo</code> <code>bool</code> <p>Returning the undo operation or not. Defaults to <code>True</code>.</p> <code>True</code> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def add_axiom(self, owl_axiom: OWLAxiom, return_undo: bool = True):\n\"\"\"Add an axiom into the current ontology.\n\n    Args:\n        owl_axiom (OWLAxiom): An axiom to be added.\n        return_undo (bool, optional): Returning the undo operation or not. Defaults to `True`.\n    \"\"\"\n    change = AddAxiom(self.owl_onto, owl_axiom)\n    result = self.owl_onto.applyChange(change)\n    logger.info(f\"[{str(result)}] Adding the axiom {str(owl_axiom)} into the ontology.\")\n    if return_undo:\n        return change.reverseChange()\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.remove_axiom","title":"<code>remove_axiom(owl_axiom, return_undo=True)</code>","text":"<p>Remove an axiom from the current ontology.</p> <p>Parameters:</p> Name Type Description Default <code>owl_axiom</code> <code>OWLAxiom</code> <p>An axiom to be removed.</p> required <code>return_undo</code> <code>bool</code> <p>Returning the undo operation or not. Defaults to <code>True</code>.</p> <code>True</code> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def remove_axiom(self, owl_axiom: OWLAxiom, return_undo: bool = True):\n\"\"\"Remove an axiom from the current ontology.\n\n    Args:\n        owl_axiom (OWLAxiom): An axiom to be removed.\n        return_undo (bool, optional): Returning the undo operation or not. Defaults to `True`.\n    \"\"\"\n    change = RemoveAxiom(self.owl_onto, owl_axiom)\n    result = self.owl_onto.applyChange(change)\n    logger.info(f\"[{str(result)}] Removing the axiom {str(owl_axiom)} from the ontology.\")\n    if return_undo:\n        return change.reverseChange()\n</code></pre>"},{"location":"deeponto/onto/ontology/#deeponto.onto.ontology.Ontology.replace_entity","title":"<code>replace_entity(owl_object, entity_iri, replacement_iri)</code>","text":"<p>Replace an entity in a class expression with another entity.</p> <p>Parameters:</p> Name Type Description Default <code>owl_object</code> <code>OWLObject</code> <p>An <code>OWLObject</code> entity to be manipulated.</p> required <code>entity_iri</code> <code>str</code> <p>IRI of the entity to be replaced.</p> required <code>replacement_iri</code> <code>str</code> <p>IRI of the entity to replace.</p> required <p>Returns:</p> Type Description <code>OWLObject</code> <p>The changed <code>OWLObject</code> entity.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def replace_entity(self, owl_object: OWLObject, entity_iri: str, replacement_iri: str):\n\"\"\"Replace an entity in a class expression with another entity.\n\n    Args:\n        owl_object (OWLObject): An `OWLObject` entity to be manipulated.\n        entity_iri (str): IRI of the entity to be replaced.\n        replacement_iri (str): IRI of the entity to replace.\n\n    Returns:\n        (OWLObject): The changed `OWLObject` entity.\n    \"\"\"\n    iri_dict = {IRI.create(entity_iri): IRI.create(replacement_iri)}\n    replacer = OWLObjectDuplicator(self.owl_data_factory, iri_dict)\n    return replacer.duplicateObject(owl_object)\n</code></pre>"},{"location":"deeponto/onto/projection/","title":"Ontology Projection","text":""},{"location":"deeponto/onto/projection/#deeponto.onto.projection.OntologyProjector","title":"<code>OntologyProjector(bidirectional_taxonomy=False, only_taxonomy=False, include_literals=False)</code>","text":"<p>Class for ontology projection -- transforming ontology axioms into triples.</p> <p>Credit</p> <p>The code of this class originates from the mOWL library.</p> <p>Attributes:</p> Name Type Description <code>bidirectional_taxonomy</code> <code>bool</code> <p>If <code>True</code> then per each <code>SubClass</code> edge one <code>SuperClass</code> edge will be generated. Defaults to <code>False</code>.</p> <code>only_taxonomy</code> <code>bool</code> <p>If <code>True</code>, then projection will only include <code>subClass</code> edges. Defaults to <code>False</code>.</p> <code>include_literals</code> <code>bool</code> <p>If <code>True</code> the projection will also include triples involving data property assertions and annotations. Defaults to <code>False</code>.</p> <p>Parameters:</p> Name Type Description Default <code>bidirectional_taxonomy</code> <code>bool</code> <p>description. If <code>True</code> then per each <code>SubClass</code> edge one <code>SuperClass</code> edge will be generated. Defaults to <code>False</code>.</p> <code>False</code> <code>only_taxonomy</code> <code>bool</code> <p>If <code>True</code>, then projection will only include <code>subClass</code> edges. Defaults to <code>False</code>.</p> <code>False</code> <code>include_literals</code> <code>bool</code> <p>description. If <code>True</code> the projection will also include triples involving data property assertions and annotations. Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/onto/projection.py</code> <pre><code>def __init__(self, bidirectional_taxonomy: bool=False, only_taxonomy: bool=False, include_literals: bool=False):\n\"\"\"Initialise an ontology projector.\n\n    Args:\n        bidirectional_taxonomy (bool, optional): _description_. If `True` then per each `SubClass` edge one `SuperClass` edge will\n            be generated. Defaults to `False`.\n        only_taxonomy (bool, optional): If `True`, then projection will only include `subClass` edges. Defaults to `False`.\n        include_literals (bool, optional): _description_. If `True` the projection will also include triples involving data property\n            assertions and annotations. Defaults to `False`.\n    \"\"\"\n    self.bidirectional_taxonomy = bidirectional_taxonomy\n    self.include_literals = include_literals\n    self.only_taxonomy = only_taxonomy\n    self.projector = Projector(self.bidirectional_taxonomy, self.only_taxonomy,\n                               self.include_literals)\n</code></pre>"},{"location":"deeponto/onto/projection/#deeponto.onto.projection.OntologyProjector.project","title":"<code>project(ontology)</code>","text":"<p>The projection algorithm implemented in OWL2Vec*.</p> <p>Parameters:</p> Name Type Description Default <code>ontology</code> <code>Ontology</code> <p>An ontology to be processed.</p> required <p>Returns:</p> Type Description <code>set</code> <p>Set of triples after projection.</p> Source code in <code>src/deeponto/onto/projection.py</code> <pre><code>def project(self, ontology: Ontology):\n\"\"\"The projection algorithm implemented in OWL2Vec*.\n\n    Args:\n        ontology (Ontology): An ontology to be processed.\n\n    Returns:\n        (set): Set of triples after projection.\n    \"\"\"\n    ontology = ontology.owl_onto\n    if not isinstance(ontology, OWLOntology):\n        raise TypeError(\n            \"Input ontology must be of type `org.semanticweb.owlapi.model.OWLOntology`.\")\n    edges = self.projector.project(ontology)\n    triples = []\n    for e in edges:\n        s, r, o = str(e.src()), str(e.rel()), str(e.dst())\n        if o != \"\":\n            if r == \"http://subclassof\":\n                r = str(RDFS.subClassOf)\n            triples.append((s, r, o))\n    return set(triples)\n</code></pre>"},{"location":"deeponto/onto/pruning/","title":"Ontology Pruning","text":""},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner","title":"<code>OntologyPruner(onto)</code>","text":"<p>Class for in-place ontology pruning.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The input ontology to be pruned. Note that the pruning process is in-place.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The input ontology to be pruned. Note that the pruning process is in-place.</p> required Source code in <code>src/deeponto/onto/pruning.py</code> <pre><code>def __init__(self, onto: Ontology):\n\"\"\"Initialise an ontology pruner.\n\n    Args:\n        onto (Ontology): The input ontology to be pruned. Note that the pruning process is in-place.\n    \"\"\"\n    self.onto = onto\n    self._pruning_applied = None\n</code></pre>"},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner.save_onto","title":"<code>save_onto(save_path)</code>","text":"<p>Save the pruned ontology file to the given path.</p> Source code in <code>src/deeponto/onto/pruning.py</code> <pre><code>def save_onto(self, save_path: str):\n\"\"\"Save the pruned ontology file to the given path.\"\"\"\n    logging.info(f\"{self._pruning_applied} pruning algorithm has been applied.\")\n    logging.info(f\"Save the pruned ontology file to {save_path}.\")\n    return self.onto.save_onto(save_path)\n</code></pre>"},{"location":"deeponto/onto/pruning/#deeponto.onto.pruning.OntologyPruner.prune","title":"<code>prune(class_iris_to_be_removed)</code>","text":"<p>Apply ontology pruning while preserving the relevant hierarchy.</p> <p>paper</p> <p>This refers to the ontology pruning algorithm introduced in the paper: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022).</p> <p>For each class \\(c\\) to be pruned, subsumption axioms will be created between \\(c\\)'s parents and children so as to preserve the relevant hierarchy.</p> <p>Parameters:</p> Name Type Description Default <code>class_iris_to_be_removed</code> <code>list[str]</code> <p>Classes with IRIs in this list will be pruned and the relevant hierarchy will be repaired.</p> required Source code in <code>src/deeponto/onto/pruning.py</code> <pre><code>def prune(self, class_iris_to_be_removed: List[str]):\nr\"\"\"Apply ontology pruning while preserving the relevant hierarchy.\n\n    !!! credit \"paper\"\n\n        This refers to the ontology pruning algorithm introduced in the paper:\n        [*Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching (ISWC 2022)*](https://link.springer.com/chapter/10.1007/978-3-031-19433-7_33).\n\n    For each class $c$ to be pruned, subsumption axioms will be created between $c$'s parents and children so as to preserve the\n    relevant hierarchy.\n\n    Args:\n        class_iris_to_be_removed (list[str]): Classes with IRIs in this list will be pruned and the relevant hierarchy will be repaired.\n    \"\"\"\n\n    # create the subsumption axioms first\n    for cl_iri in class_iris_to_be_removed:\n        cl = self.onto.get_owl_object(cl_iri)\n        cl_parents = self.onto.get_asserted_parents(cl)\n        cl_children = self.onto.get_asserted_children(cl)\n        for parent, child in itertools.product(cl_parents, cl_children):\n            sub_axiom = self.onto.owl_data_factory.getOWLSubClassOfAxiom(child, parent)\n            self.onto.add_axiom(sub_axiom)\n\n    # apply pruning\n    class_remover = OWLEntityRemover(Collections.singleton(self.onto.owl_onto))\n    for cl_iri in class_iris_to_be_removed:\n        cl = self.onto.get_owl_object(cl_iri)\n        cl.accept(class_remover)\n    self.onto.owl_manager.applyChanges(class_remover.getChanges())\n</code></pre>"},{"location":"deeponto/onto/reasoning/","title":"Ontology Reasoning","text":""},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner","title":"<code>OntologyReasoner(onto, reasoner_type)</code>","text":"<p>Ontology reasoner class that extends from the Java library OWLAPI.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The input <code>deeponto</code> ontology.</p> <code>owl_reasoner_factory</code> <code>OWLReasonerFactory</code> <p>A reasoner factory for creating a reasoner.</p> <code>owl_reasoner</code> <code>OWLReasoner</code> <p>The created reasoner.</p> <code>owl_data_factory</code> <code>OWLDataFactory</code> <p>A data factory (inherited from <code>onto</code>) for manipulating axioms.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>The input ontology to conduct reasoning on.</p> required <code>reasoner_type</code> <code>str</code> <p>The type of reasoner used. Options are <code>[\"hermit\", \"elk\", \"struct\"]</code>.</p> required Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def __init__(self, onto: Ontology, reasoner_type: str):\n\"\"\"Initialise an ontology reasoner.\n\n    Args:\n        onto (Ontology): The input ontology to conduct reasoning on.\n        reasoner_type (str): The type of reasoner used. Options are `[\"hermit\", \"elk\", \"struct\"]`.\n    \"\"\"\n    self.onto = onto\n    self.owl_reasoner_factory = None\n    self.owl_reasoner = None\n    self.reasoner_type = reasoner_type\n    self.load_reasoner(self.reasoner_type)\n    self.owl_data_factory = self.onto.owl_data_factory\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.load_reasoner","title":"<code>load_reasoner(reasoner_type)</code>","text":"<p>Load a new reaonser and dispose the old one if existed.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def load_reasoner(self, reasoner_type: str):\n\"\"\"Load a new reaonser and dispose the old one if existed.\"\"\"\n    assert reasoner_type in REASONER_DICT.keys(), f\"Unknown or unsupported reasoner type: {reasoner_type}.\"\n\n    if self.owl_reasoner:\n        self.owl_reasoner.dispose()\n\n    self.reasoner_type = reasoner_type\n    self.owl_reasoner_factory = REASONER_DICT[self.reasoner_type]()\n    # TODO: remove ELK message\n    # somehow Level.ERROR does not prevent the INFO message from ELK\n    # Logger.getLogger(\"org.semanticweb.elk\").setLevel(Level.OFF)\n\n    self.owl_reasoner = self.owl_reasoner_factory.createReasoner(self.onto.owl_onto)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_entity_type","title":"<code>get_entity_type(entity, is_singular=False)</code>  <code>staticmethod</code>","text":"<p>A handy method to get the type of an entity (<code>OWLObject</code>).</p> <p>NOTE: This method is inherited from the Ontology Class.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef get_entity_type(entity: OWLObject, is_singular: bool = False):\n\"\"\"A handy method to get the type of an entity (`OWLObject`).\n\n    NOTE: This method is inherited from the Ontology Class.\n    \"\"\"\n    return Ontology.get_entity_type(entity, is_singular)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.has_iri","title":"<code>has_iri(entity)</code>  <code>staticmethod</code>","text":"<p>Check if an entity has an IRI.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>@staticmethod\ndef has_iri(entity: OWLObject):\n\"\"\"Check if an entity has an IRI.\"\"\"\n    try:\n        entity.getIRI()\n        return True\n    except:\n        return False\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_inferred_super_entities","title":"<code>get_inferred_super_entities(entity, direct=False)</code>","text":"<p>Return the IRIs of named super-entities of a given <code>OWLObject</code> according to the reasoner.</p> <p>A mixture of <code>getSuperClasses</code>, <code>getSuperObjectProperties</code>, <code>getSuperDataProperties</code> functions imported from the OWLAPI reasoner. The type of input entity will be automatically determined. The top entity such as <code>owl:Thing</code> is ignored.</p> <p>Parameters:</p> Name Type Description Default <code>entity</code> <code>OWLObject</code> <p>An <code>OWLObject</code> entity of interest.</p> required <code>direct</code> <code>bool</code> <p>Return parents (<code>direct=True</code>) or ancestors (<code>direct=False</code>). Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>list[str]</code> <p>A list of IRIs of the super-entities of the given <code>OWLObject</code> entity.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_inferred_super_entities(self, entity: OWLObject, direct: bool = False):\nr\"\"\"Return the IRIs of named super-entities of a given `OWLObject` according to the reasoner.\n\n    A mixture of `getSuperClasses`, `getSuperObjectProperties`, `getSuperDataProperties`\n    functions imported from the OWLAPI reasoner. The type of input entity will be\n    automatically determined. The top entity such as `owl:Thing` is ignored.\n\n\n    Args:\n        entity (OWLObject): An `OWLObject` entity of interest.\n        direct (bool, optional): Return parents (`direct=True`) or\n            ancestors (`direct=False`). Defaults to `False`.\n\n    Returns:\n        (list[str]): A list of IRIs of the super-entities of the given `OWLObject` entity.\n    \"\"\"\n    entity_type = self.get_entity_type(entity)\n    get_super = f\"getSuper{entity_type}\"\n    TOP = TOP_BOTTOMS[entity_type].TOP  # get the corresponding TOP entity\n    super_entities = getattr(self.owl_reasoner, get_super)(entity, direct).getFlattened()\n    super_entity_iris = [str(s.getIRI()) for s in super_entities]\n    # the root node is owl#Thing\n    if TOP in super_entity_iris:\n        super_entity_iris.remove(TOP)\n    return super_entity_iris\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.get_inferred_sub_entities","title":"<code>get_inferred_sub_entities(entity, direct=False)</code>","text":"<p>Return the IRIs of named sub-entities of a given <code>OWLObject</code> according to the reasoner.</p> <p>A mixture of <code>getSubClasses</code>, <code>getSubObjectProperties</code>, <code>getSubDataProperties</code> functions imported from the OWLAPI reasoner. The type of input entity will be automatically determined. The bottom entity such as <code>owl:Nothing</code> is ignored.</p> <p>Parameters:</p> Name Type Description Default <code>entity</code> <code>OWLObject</code> <p>An <code>OWLObject</code> entity of interest.</p> required <code>direct</code> <code>bool</code> <p>Return parents (<code>direct=True</code>) or ancestors (<code>direct=False</code>). Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>list[str]</code> <p>A list of IRIs of the sub-entities of the given <code>OWLObject</code> entity.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def get_inferred_sub_entities(self, entity: OWLObject, direct: bool = False):\n\"\"\"Return the IRIs of named sub-entities of a given `OWLObject` according to the reasoner.\n\n    A mixture of `getSubClasses`, `getSubObjectProperties`, `getSubDataProperties`\n    functions imported from the OWLAPI reasoner. The type of input entity will be\n    automatically determined. The bottom entity such as `owl:Nothing` is ignored.\n\n    Args:\n        entity (OWLObject): An `OWLObject` entity of interest.\n        direct (bool, optional): Return parents (`direct=True`) or\n            ancestors (`direct=False`). Defaults to `False`.\n\n    Returns:\n        (list[str]): A list of IRIs of the sub-entities of the given `OWLObject` entity.\n    \"\"\"\n    entity_type = self.get_entity_type(entity)\n    get_sub = f\"getSub{entity_type}\"\n    BOTTOM = TOP_BOTTOMS[entity_type].BOTTOM\n    sub_entities = getattr(self.owl_reasoner, get_sub)(entity, direct).getFlattened()\n    sub_entity_iris = [str(s.getIRI()) for s in sub_entities]\n    # the root node is owl#Thing\n    if BOTTOM in sub_entity_iris:\n        sub_entity_iris.remove(BOTTOM)\n    return sub_entity_iris\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_subsumption","title":"<code>check_subsumption(sub_entity, super_entity)</code>","text":"<p>Check if the first entity is subsumed by the second entity according to the reasoner.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_subsumption(self, sub_entity: OWLObject, super_entity: OWLObject):\n\"\"\"Check if the first entity is subsumed by the second entity according to the reasoner.\"\"\"\n    entity_type = self.get_entity_type(sub_entity, is_singular=True)\n    assert entity_type == self.get_entity_type(super_entity, is_singular=True)\n\n    sub_axiom = getattr(self.owl_data_factory, f\"getOWLSub{entity_type}OfAxiom\")(sub_entity, super_entity)\n\n    return self.owl_reasoner.isEntailed(sub_axiom)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_disjoint","title":"<code>check_disjoint(entity1, entity2)</code>","text":"<p>Check if two entities are disjoint according to the reasoner.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_disjoint(self, entity1: OWLObject, entity2: OWLObject):\n\"\"\"Check if two entities are disjoint according to the reasoner.\"\"\"\n    entity_type = self.get_entity_type(entity1)\n    assert entity_type == self.get_entity_type(entity2)\n\n    disjoint_axiom = getattr(self.owl_data_factory, f\"getOWLDisjoint{entity_type}Axiom\")([entity1, entity2])\n\n    return self.owl_reasoner.isEntailed(disjoint_axiom)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_common_descendants","title":"<code>check_common_descendants(entity1, entity2)</code>","text":"<p>Check if two entities have a common decendant.</p> <p>Entities can be OWL class or property expressions, and can be either atomic or complex. It takes longer computation time for the complex ones. Complex entities do not have an IRI. This method is optimised in the way that if there exists an atomic entity <code>A</code>, we compute descendants for <code>A</code> and compare them against the other entity which could be complex.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_common_descendants(self, entity1: OWLObject, entity2: OWLObject):\n\"\"\"Check if two entities have a common decendant.\n\n    Entities can be **OWL class or property expressions**, and can be either **atomic\n    or complex**. It takes longer computation time for the complex ones. Complex\n    entities do not have an IRI. This method is optimised in the way that if\n    there exists an atomic entity `A`, we compute descendants for `A` and\n    compare them against the other entity which could be complex.\n    \"\"\"\n    entity_type = self.get_entity_type(entity1)\n    assert entity_type == self.get_entity_type(entity2)\n\n    if not self.has_iri(entity1) and not self.has_iri(entity2):\n        logger.warn(\"Computing descendants for two complex entities is not efficient.\")\n\n    # `computed` is the one we compute the descendants\n    # `compared` is the one we compare `computed`'s descendant one-by-one\n    # we set the atomic entity as `computed` for efficiency if there is one\n    computed, compared = entity1, entity2\n    if not self.has_iri(entity1) and self.has_iri(entity2):\n        computed, compared = entity2, entity1\n\n    # for every inferred child of `computed`, check if it is subsumed by `compared``\n    for descendant_iri in self.get_inferred_sub_entities(computed, direct=False):\n        # print(\"check a subsumption\")\n        if self.check_subsumption(self.onto.get_owl_object(descendant_iri), compared):\n            return True\n    return False\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.instances_of","title":"<code>instances_of(owl_class, direct=False)</code>","text":"<p>Return the list of named individuals that are instances of a given OWL class expression.</p> <p>Parameters:</p> Name Type Description Default <code>owl_class</code> <code>OWLClassExpression</code> <p>An ontology class of interest.</p> required <code>direct</code> <code>bool</code> <p>Return direct instances (<code>direct=True</code>) or also include the sub-classes' instances (<code>direct=False</code>). Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>list[OWLIndividual]</code> <p>A list of named individuals that are instances of <code>owl_class</code>.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def instances_of(self, owl_class: OWLClassExpression, direct: bool = False):\n\"\"\"Return the list of named individuals that are instances of a given OWL class expression.\n\n    Args:\n        owl_class (OWLClassExpression): An ontology class of interest.\n        direct (bool, optional): Return direct instances (`direct=True`) or\n            also include the sub-classes' instances (`direct=False`). Defaults to `False`.\n\n    Returns:\n        (list[OWLIndividual]): A list of named individuals that are instances of `owl_class`.\n    \"\"\"\n    return list(self.owl_reasoner.getInstances(owl_class, direct).getFlattened())\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_instance","title":"<code>check_instance(owl_instance, owl_class)</code>","text":"<p>Check if a named individual is an instance of an OWL class.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_instance(self, owl_instance: OWLIndividual, owl_class: OWLClassExpression):\n\"\"\"Check if a named individual is an instance of an OWL class.\"\"\"\n    assertion_axiom = self.owl_data_factory.getOWLClassAssertionAxiom(owl_class, owl_instance)\n    return self.owl_reasoner.isEntailed(assertion_axiom)\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_common_instances","title":"<code>check_common_instances(owl_class1, owl_class2)</code>","text":"<p>Check if two OWL class expressions have a common instance.</p> <p>Class expressions can be atomic or complex, and it takes longer computation time for the complex ones. Complex classes do not have an IRI. This method is optimised in the way that if there exists an atomic class <code>A</code>, we compute instances for <code>A</code> and compare them against the other class which could be complex.</p> <p>Difference with <code>check_common_descendants</code></p> <p>The inputs of this function are restricted to OWL class expressions. This is because <code>descendant</code> is related to hierarchy and both class and property expressions have a hierarchy, but <code>instance</code> is restricted to classes.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_common_instances(self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression):\n\"\"\"Check if two OWL class expressions have a common instance.\n\n    Class expressions can be **atomic or complex**, and it takes longer computation time\n    for the complex ones. Complex classes do not have an IRI. This method is optimised\n    in the way that if there exists an atomic class `A`, we compute instances for `A` and\n    compare them against the other class which could be complex.\n\n    !!! note \"Difference with [`check_common_descendants`][deeponto.onto.OntologyReasoner.check_common_descendants]\"\n        The inputs of this function are restricted to **OWL class expressions**. This is because\n        `descendant` is related to hierarchy and both class and property expressions have a hierarchy,\n        but `instance` is restricted to classes.\n    \"\"\"\n\n    if not self.has_iri(owl_class1) and not self.has_iri(owl_class2):\n        logger.warn(\"Computing instances for two complex classes is not efficient.\")\n\n    # `computed` is the one we compute the instances\n    # `compared` is the one we compare `computed`'s descendant one-by-one\n    # we set the atomic entity as `computed` for efficiency if there is one\n    computed, compared = owl_class1, owl_class2\n    if not self.has_iri(owl_class1) and self.has_iri(owl_class2):\n        computed, compared = owl_class2, owl_class2\n\n    # for every inferred instance of `computed`, check if it is subsumed by `compared``\n    for instance in self.instances_of(computed, direct=False):\n        if self.check_instance(instance, compared):\n            return True\n    return False\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_assumed_disjoint","title":"<code>check_assumed_disjoint(owl_class1, owl_class2)</code>","text":"<p>Check if two OWL class expressions satisfy the Assumed Disjointness.</p> <p>Paper</p> <p>The definition of Assumed Disjointness comes from the paper: Language Model Analysis for Ontology Subsumption Inference.</p> <p>Assumed Disjointness (Definition)</p> <p>Two class expressions \\(C\\) and \\(D\\) are assumed to be disjoint if they meet the followings:</p> <ol> <li>By adding the disjointness axiom of them into the ontology, \\(C\\) and \\(D\\) are still satisfiable.</li> <li>\\(C\\) and \\(D\\) do not have a common descendant (otherwise \\(C\\) and \\(D\\) can be satisfiable but their common descendants become the bottom \\(\\bot\\).)</li> </ol> <p>Note that the special case where \\(C\\) and \\(D\\) are already disjoint is covered by the first check. The paper also proposed a practical alternative to decide Assumed Disjointness. See <code>check_assumed_disjoint_alternative</code>.</p> <p>Examples:</p> <p>Suppose pre-load an ontology <code>onto</code> from the disease ontology file <code>doid.owl</code>.</p> <pre><code>&gt;&gt;&gt; c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n&gt;&gt;&gt; c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n&gt;&gt;&gt; onto.reasoner.check_assumed_disjoint(c1, c2)\n[SUCCESSFULLY] Adding the axiom DisjointClasses(&lt;http://purl.obolibrary.org/obo/DOID_0001816&gt; &lt;http://purl.obolibrary.org/obo/DOID_4058&gt;) into the ontology.\n[CHECK1 True] input classes are still satisfiable;\n[SUCCESSFULLY] Removing the axiom from the ontology.\n[CHECK2 False] input classes have NO common descendant.\n[PASSED False] assumed disjointness check done.\nFalse\n</code></pre> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_assumed_disjoint(self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression):\nr\"\"\"Check if two OWL class expressions satisfy the Assumed Disjointness.\n\n    !!! credit \"Paper\"\n\n        The definition of **Assumed Disjointness** comes from the paper:\n        [Language Model Analysis for Ontology Subsumption Inference](https://aclanthology.org/2023.findings-acl.213).\n\n    !!! note \"Assumed Disjointness (Definition)\"\n        Two class expressions $C$ and $D$ are assumed to be disjoint if they meet the followings:\n\n        1. By adding the disjointness axiom of them into the ontology, $C$ and $D$ are **still satisfiable**.\n        2. $C$ and $D$ **do not have a common descendant** (otherwise $C$ and $D$ can be satisfiable but their\n        common descendants become the bottom $\\bot$.)\n\n    Note that the special case where $C$ and $D$ are already disjoint is covered by the first check.\n    The paper also proposed a practical alternative to decide Assumed Disjointness.\n    See [`check_assumed_disjoint_alternative`][deeponto.onto.OntologyReasoner.check_assumed_disjoint_alternative].\n\n    Examples:\n        Suppose pre-load an ontology `onto` from the disease ontology file `doid.owl`.\n\n        ```python\n        &gt;&gt;&gt; c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n        &gt;&gt;&gt; c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n        &gt;&gt;&gt; onto.reasoner.check_assumed_disjoint(c1, c2)\n        [SUCCESSFULLY] Adding the axiom DisjointClasses(&lt;http://purl.obolibrary.org/obo/DOID_0001816&gt; &lt;http://purl.obolibrary.org/obo/DOID_4058&gt;) into the ontology.\n        [CHECK1 True] input classes are still satisfiable;\n        [SUCCESSFULLY] Removing the axiom from the ontology.\n        [CHECK2 False] input classes have NO common descendant.\n        [PASSED False] assumed disjointness check done.\n        False\n        ```\n    \"\"\"\n    # banner_message(\"Check Asssumed Disjointness\")\n\n    entity_type = self.get_entity_type(owl_class1)\n    assert entity_type == self.get_entity_type(owl_class2)\n\n    # adding the disjointness axiom of `class1`` and `class2``\n    disjoint_axiom = getattr(self.owl_data_factory, f\"getOWLDisjoint{entity_type}Axiom\")([owl_class1, owl_class2])\n    undo_change = self.onto.add_axiom(disjoint_axiom, return_undo=True)\n    self.load_reasoner(self.reasoner_type)\n\n    # check if they are still satisfiable\n    still_satisfiable = self.owl_reasoner.isSatisfiable(owl_class1)\n    still_satisfiable = still_satisfiable and self.owl_reasoner.isSatisfiable(owl_class2)\n    logger.info(f\"[CHECK1 {still_satisfiable}] input classes are still satisfiable;\")\n\n    # remove the axiom and re-construct the reasoner\n    undo_change_result = self.onto.owl_onto.applyChange(undo_change)\n    logger.info(f\"[{str(undo_change_result)}] Removing the axiom from the ontology.\")\n    self.load_reasoner(self.reasoner_type)\n\n    # failing first check, there is no need to do the second.\n    if not still_satisfiable:\n        logger.info(\"Failed `satisfiability check`, skip the `common descendant` check.\")\n        logger.info(f\"[PASSED {still_satisfiable}] assumed disjointness check done.\")\n        return False\n\n    # otherwise, the classes are still satisfiable and we should conduct the second check\n    has_common_descendants = self.check_common_descendants(owl_class1, owl_class2)\n    logger.info(f\"[CHECK2 {not has_common_descendants}] input classes have NO common descendant.\")\n    logger.info(f\"[PASSED {not has_common_descendants}] assumed disjointness check done.\")\n    return not has_common_descendants\n</code></pre>"},{"location":"deeponto/onto/reasoning/#deeponto.onto.ontology.OntologyReasoner.check_assumed_disjoint_alternative","title":"<code>check_assumed_disjoint_alternative(owl_class1, owl_class2, verbose=False)</code>","text":"<p>Check if two OWL class expressions satisfy the Assumed Disjointness.</p> <p>Paper</p> <p>The definition of Assumed Disjointness comes from the paper: Language Model Analysis for Ontology Subsumption Inference.</p> <p>The practical alternative version of <code>check_assumed_disjoint</code> with following conditions:</p> <p>Assumed Disjointness (Practical Alternative)</p> <p>Two class expressions \\(C\\) and \\(D\\) are assumed to be disjoint if they</p> <ol> <li>do not have a subsumption relationship between them,</li> <li>do not have a common descendant (in TBox),</li> <li>do not have a common instance (in ABox).</li> </ol> <p>If all the conditions have been met, then we assume <code>class1</code> and <code>class2</code> as disjoint.</p> <p>Examples:</p> <p>Suppose pre-load an ontology <code>onto</code> from the disease ontology file <code>doid.owl</code>.</p> <p><pre><code>&gt;&gt;&gt; c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n&gt;&gt;&gt; c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n&gt;&gt;&gt; onto.reasoner.check_assumed_disjoint(c1, c2, verbose=True)\n[CHECK1 True] input classes have NO subsumption relationship;\n[CHECK2 False] input classes have NO common descendant;\nFailed the `common descendant check`, skip the `common instance` check.\n[PASSED False] assumed disjointness check done.\nFalse\n</code></pre> In this alternative implementation, we do no need to add and remove axioms which will then be time-saving.</p> Source code in <code>src/deeponto/onto/ontology.py</code> <pre><code>def check_assumed_disjoint_alternative(\n    self, owl_class1: OWLClassExpression, owl_class2: OWLClassExpression, verbose: bool = False\n):\nr\"\"\"Check if two OWL class expressions satisfy the Assumed Disjointness.\n\n    !!! credit \"Paper\"\n\n        The definition of **Assumed Disjointness** comes from the paper:\n        [Language Model Analysis for Ontology Subsumption Inference](https://aclanthology.org/2023.findings-acl.213).\n\n    The practical alternative version of [`check_assumed_disjoint`][deeponto.onto.OntologyReasoner.check_assumed_disjoint]\n    with following conditions:\n\n\n    !!! note \"Assumed Disjointness (Practical Alternative)\"\n        Two class expressions $C$ and $D$ are assumed to be disjoint if they\n\n        1. **do not** have a **subsumption relationship** between them,\n        2. **do not** have a **common descendant** (in TBox),\n        3. **do not** have a **common instance** (in ABox).\n\n    If all the conditions have been met, then we assume `class1` and `class2` as disjoint.\n\n    Examples:\n        Suppose pre-load an ontology `onto` from the disease ontology file `doid.owl`.\n\n        ```python\n        &gt;&gt;&gt; c1 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_4058\")\n        &gt;&gt;&gt; c2 = onto.get_owl_object_from_iri(\"http://purl.obolibrary.org/obo/DOID_0001816\")\n        &gt;&gt;&gt; onto.reasoner.check_assumed_disjoint(c1, c2, verbose=True)\n        [CHECK1 True] input classes have NO subsumption relationship;\n        [CHECK2 False] input classes have NO common descendant;\n        Failed the `common descendant check`, skip the `common instance` check.\n        [PASSED False] assumed disjointness check done.\n        False\n        ```\n        In this alternative implementation, we do no need to add and remove axioms which will then\n        be time-saving.\n    \"\"\"\n    # banner_message(\"Check Asssumed Disjointness (Alternative)\")\n\n    # # Check for entailed disjointness (short-cut)\n    # if self.check_disjoint(owl_class1, owl_class2):\n    #     print(f\"Input classes are already entailed as disjoint.\")\n    #     return True\n\n    # Check for entailed subsumption,\n    # common descendants and common instances\n\n    has_subsumption = self.check_subsumption(owl_class1, owl_class2)\n    has_subsumption = has_subsumption or self.check_subsumption(owl_class2, owl_class1)\n    if verbose:\n        logger.info(f\"[CHECK1 {not has_subsumption}] input classes have NO subsumption relationship;\")\n    if has_subsumption:\n        if verbose:\n            logger.info(\"Failed the `subsumption check`, skip the `common descendant` check.\")\n            logger.info(f\"[PASSED {not has_subsumption}] assumed disjointness check done.\")\n        return False\n\n    has_common_descendants = self.check_common_descendants(owl_class1, owl_class2)\n    if verbose:\n        logger.info(f\"[CHECK2 {not has_common_descendants}] input classes have NO common descendant;\")\n    if has_common_descendants:\n        if verbose:\n            logger.info(\"Failed the `common descendant check`, skip the `common instance` check.\")\n            logger.info(f\"[PASSED {not has_common_descendants}] assumed disjointness check done.\")\n        return False\n\n    # TODO: `check_common_instances` is still experimental because we have not tested it with ontologies of rich ABox.\n    has_common_instances = self.check_common_instances(owl_class1, owl_class2)\n    if verbose:\n        logger.info(f\"[CHECK3 {not has_common_instances}] input classes have NO common instance;\")\n        logger.info(f\"[PASSED {not has_common_instances}] assumed disjointness check done.\")\n    return not has_common_instances\n</code></pre>"},{"location":"deeponto/onto/taxonomy/","title":"Ontology Taxonomy","text":"<p>Extracting the taxonomy from an ontology often comes in handy for graph-based machine learning techniques. Here we provide a basic <code>Taxonomy</code> class built upon <code>networkx.DiGraph</code> where nodes represent entities and edges represent subsumptions. We then provide the <code>OntologyTaxonomy</code> class that extends the basic <code>Taxonomy</code>. It utilises the simple structural reasoner to enrich the ontology subsumptions beyond asserted ones, and build the taxonomy over the expanded subsumptions. Each node represents a named class and has a label (<code>rdfs:label</code>) attribute. The root node <code>owl:Thing</code> is also specified for functions like counting the node depths, etc. Moreover, we provide the <code>WordnetTaxonomy</code> class that wraps the WordNet knowledge graph for easier access.</p> <p>Note<p>It is also possible to use <code>OntologyProjector</code> to extract triples from the ontology as edges of the taxonomy. We will consider this feature in the future.</p> </p>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy","title":"<code>Taxonomy(edges, root_node=None)</code>","text":"<p>Class for building the taxonomy over structured data.</p> <p>Attributes:</p> Name Type Description <code>nodes</code> <code>list</code> <p>A list of entity ids.</p> <code>edges</code> <code>list</code> <p>A list of <code>(parent, child)</code> pairs.</p> <code>graph</code> <code>networkx.DiGraph</code> <p>A directed graph that represents the taxonomy.</p> <code>root_node</code> <code>Optional[str]</code> <p>Optional root node id. Defaults to <code>None</code>.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def __init__(self, edges: list, root_node: Optional[str] = None):\n    self.edges = edges\n    self.graph = nx.DiGraph(self.edges)\n    self.nodes = list(self.graph.nodes)\n    self.root_node = root_node\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_node_attributes","title":"<code>get_node_attributes(entity_id)</code>","text":"<p>Get the attributes of the given entity.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_node_attributes(self, entity_id: str):\n\"\"\"Get the attributes of the given entity.\"\"\"\n    return self.graph.nodes[entity_id]\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_children","title":"<code>get_children(entity_id, apply_transitivity=False)</code>","text":"<p>Get the set of children for a given entity.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_children(self, entity_id: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of children for a given entity.\"\"\"\n    if not apply_transitivity:\n        return set(self.graph.successors(entity_id))\n    else:\n        return set(itertools.chain.from_iterable(nx.dfs_successors(self.graph, entity_id).values()))\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_parents","title":"<code>get_parents(entity_id, apply_transitivity=False)</code>","text":"<p>Get the set of parents for a given entity.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_parents(self, entity_id: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of parents for a given entity.\"\"\"\n    if not apply_transitivity:\n        return set(self.graph.predecessors(entity_id))\n    else:\n        # NOTE: the nx.dfs_predecessors does not give desirable results\n        frontier = list(self.get_parents(entity_id))\n        explored = set()\n        descendants = frontier\n        while frontier:\n            for candidate in frontier:\n                descendants += list(self.get_parents(candidate))\n            explored.update(frontier)\n            frontier = set(descendants) - explored\n        return set(descendants)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_descendant_graph","title":"<code>get_descendant_graph(entity_id)</code>","text":"<p>Create a descendant graph (<code>networkx.DiGraph</code>) for a given entity.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_descendant_graph(self, entity_id: str):\nr\"\"\"Create a descendant graph (`networkx.DiGraph`) for a given entity.\"\"\"\n    descendants = self.get_children(entity_id, apply_transitivity=True)\n    return self.graph.subgraph(list(descendants))\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_shortest_node_depth","title":"<code>get_shortest_node_depth(entity_id)</code>","text":"<p>Get the shortest depth of the given entity in the taxonomy.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_shortest_node_depth(self, entity_id: str):\n\"\"\"Get the shortest depth of the given entity in the taxonomy.\"\"\"\n    if not self.root_node:\n        raise RuntimeError(\"No root node specified.\")\n    return nx.shortest_path_length(self.graph, self.root_node, entity_id)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_longest_node_depth","title":"<code>get_longest_node_depth(entity_id)</code>","text":"<p>Get the longest depth of the given entity in the taxonomy.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_longest_node_depth(self, entity_id: str):\n\"\"\"Get the longest depth of the given entity in the taxonomy.\"\"\"\n    if not self.root_node:\n        raise RuntimeError(\"No root node specified.\")\n    return max([len(p) for p in nx.all_simple_paths(self.graph, self.root_node, entity_id)])\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.Taxonomy.get_lowest_common_ancestor","title":"<code>get_lowest_common_ancestor(entity_id1, entity_id2)</code>","text":"<p>Get the lowest common ancestor of the given two entities.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_lowest_common_ancestor(self, entity_id1: str, entity_id2: str):\n\"\"\"Get the lowest common ancestor of the given two entities.\"\"\"\n    return nx.lowest_common_ancestor(self.graph, entity_id1, entity_id2)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy","title":"<code>OntologyTaxonomy(onto, reasoner_type='struct')</code>","text":"<p>             Bases: <code>Taxonomy</code></p> <p>Class for building the taxonomy (top-down subsumption graph) from an ontology.</p> <p>The nodes of this graph are named classes only, but the hierarchy is enriched (beyond asserted axioms) by an ontology reasoner.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>The input ontology to build the taxonomy.</p> <code>reasoner_type</code> <code>str</code> <p>The type of reasoner used. Defaults to <code>\"struct\"</code>. Options are <code>[\"hermit\", \"elk\", \"struct\"]</code>.</p> <code>reasoner</code> <code>OntologyReasoner</code> <p>An ontology reasoner used for completing the hierarchy. If the <code>reasoner_type</code> is the same as <code>onto.reasoner_type</code>, then re-use <code>onto.reasoner</code>; otherwise, create a new one.</p> <code>root_node</code> <code>str</code> <p>The root node that represents <code>owl:Thing</code>.</p> <code>nodes</code> <code>list</code> <p>A list of named class IRIs.</p> <code>edges</code> <code>list</code> <p>A list of <code>(parent, child)</code> class pairs. That is, if \\(C \\sqsubseteq D\\), then \\((D, C)\\) will be added as an edge.</p> <code>graph</code> <code>networkx.DiGraph</code> <p>A directed subsumption graph.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def __init__(self, onto: Ontology, reasoner_type: str = \"struct\"):\n    self.onto = onto\n    # the reasoner is used for completing the hierarchy\n    self.reasoner_type = reasoner_type\n    # re-use onto.reasoner if the reasoner type is the same; otherwise create a new one\n    self.reasoner = (\n        self.onto.reasoner\n        if reasoner_type == self.onto.reasoner_type\n        else OntologyReasoner(self.onto, reasoner_type)\n    )\n    root_node = \"owl:Thing\"\n    subsumption_pairs = []\n    for cl_iri, cl in self.onto.owl_classes.items():\n        # NOTE: this is different from using self.onto.get_asserted_parents which does not conduct simple reasoning\n        named_parents = self.reasoner.get_inferred_super_entities(cl, direct=True)\n        if not named_parents:\n            # if no parents then add root node as the parent\n            named_parents.append(root_node)\n        for named_parent in named_parents:\n            subsumption_pairs.append((named_parent, cl_iri))\n    super().__init__(edges=subsumption_pairs, root_node=root_node)\n\n    # set node annotations (rdfs:label)\n    for class_iri in self.nodes:\n        if class_iri == self.root_node:\n            self.graph.nodes[class_iri][\"label\"] = \"Thing\"\n        else:\n            owl_class = self.onto.get_owl_object(class_iri)\n            self.graph.nodes[class_iri][\"label\"] = self.onto.get_annotations(owl_class, RDFS_LABEL)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_parents","title":"<code>get_parents(class_iri, apply_transitivity=False)</code>","text":"<p>Get the set of parents for a given class.</p> <p>It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning. For more advanced logical reasoning, use the DL reasoner <code>self.onto.reasoner</code> instead.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_parents(self, class_iri: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of parents for a given class.\n\n    It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning.\n    For more advanced logical reasoning, use the DL reasoner `self.onto.reasoner` instead.\n    \"\"\"\n    return super().get_parents(class_iri, apply_transitivity)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_children","title":"<code>get_children(class_iri, apply_transitivity=False)</code>","text":"<p>Get the set of children for a given class.</p> <p>It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning. For more advanced logical reasoning, use the DL reasoner <code>self.onto.reasoner</code> instead.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_children(self, class_iri: str, apply_transitivity: bool = False):\nr\"\"\"Get the set of children for a given class.\n\n    It is worth noting that this method with transitivity applied can be deemed as simple structural reasoning.\n    For more advanced logical reasoning, use the DL reasoner `self.onto.reasoner` instead.\n    \"\"\"\n    return super().get_children(class_iri, apply_transitivity)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_descendant_graph","title":"<code>get_descendant_graph(class_iri)</code>","text":"<p>Create a descendant graph (<code>networkx.DiGraph</code>) for a given ontology class.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_descendant_graph(self, class_iri: str):\nr\"\"\"Create a descendant graph (`networkx.DiGraph`) for a given ontology class.\"\"\"\n    super().get_descendant_graph(class_iri)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_shortest_node_depth","title":"<code>get_shortest_node_depth(class_iri)</code>","text":"<p>Get the shortest depth of the given named class in the taxonomy.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_shortest_node_depth(self, class_iri: str):\n\"\"\"Get the shortest depth of the given named class in the taxonomy.\"\"\"\n    return nx.shortest_path_length(self.graph, self.root_node, class_iri)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_longest_node_depth","title":"<code>get_longest_node_depth(class_iri)</code>","text":"<p>Get the longest depth of the given named class in the taxonomy.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_longest_node_depth(self, class_iri: str):\n\"\"\"Get the longest depth of the given named class in the taxonomy.\"\"\"\n    return max([len(p) for p in nx.all_simple_paths(self.graph, self.root_node, class_iri)])\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.OntologyTaxonomy.get_lowest_common_ancestor","title":"<code>get_lowest_common_ancestor(class_iri1, class_iri2)</code>","text":"<p>Get the lowest common ancestor of the given two named classes.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def get_lowest_common_ancestor(self, class_iri1: str, class_iri2: str):\n\"\"\"Get the lowest common ancestor of the given two named classes.\"\"\"\n    return super().get_lowest_common_ancestor(class_iri1, class_iri2)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy","title":"<code>WordnetTaxonomy(pos='n', include_membership=False)</code>","text":"<p>             Bases: <code>Taxonomy</code></p> <p>Class for the building the taxonomy (hypernym graph) from wordnet.</p> <p>Attributes:</p> Name Type Description <code>pos</code> <code>str</code> <p>The pos-tag of entities to be extracted from wordnet.</p> <code>nodes</code> <code>list</code> <p>A list of entity ids extracted from wordnet.</p> <code>edges</code> <code>list</code> <p>A list of hyponym-hypernym pairs.</p> <code>graph</code> <code>networkx.DiGraph</code> <p>A directed hypernym graph.</p> <p>Parameters:</p> Name Type Description Default <code>pos</code> <code>str</code> <p>The pos-tag of entities to be extracted from wordnet.</p> <code>'n'</code> <code>include_membership</code> <code>bool</code> <p>Whether to include <code>instance_hypernyms</code> or not (e.g., London is an instance of City).  Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def __init__(self, pos: str = \"n\", include_membership: bool = False):\nr\"\"\"Initialise the wordnet taxonomy.\n\n    Args:\n        pos (str): The pos-tag of entities to be extracted from wordnet.\n        include_membership (bool): Whether to include `instance_hypernyms` or not (e.g., London is an instance of City).  Defaults to `False`.\n    \"\"\"\n\n    self.pos = pos\n    synsets = self.fetch_synsets(pos=pos)\n    hypernym_pairs = self.fetch_hypernyms(synsets, include_membership)\n    super().__init__(edges=hypernym_pairs)\n\n    # set node annotations\n    for synset in synsets:\n        try:\n            self.graph.nodes[synset.name()][\"name\"] = synset.name().split(\".\")[0].replace(\"_\", \" \")\n            self.graph.nodes[synset.name()][\"definition\"] = synset.definition()\n        except:\n            continue\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy.fetch_synsets","title":"<code>fetch_synsets(pos='n')</code>  <code>staticmethod</code>","text":"<p>Get synsets of certain pos-tag from wordnet.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>@staticmethod\ndef fetch_synsets(pos: str = \"n\"):\n\"\"\"Get synsets of certain pos-tag from wordnet.\"\"\"\n    words = wn.words()\n    synsets = set()\n    for word in words:\n        synsets.update(wn.synsets(word, pos=pos))\n    logger.info(f'{len(synsets)} synsets (pos=\"{pos}\") fetched.')\n    return synsets\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.WordnetTaxonomy.fetch_hypernyms","title":"<code>fetch_hypernyms(synsets, include_membership=False)</code>  <code>staticmethod</code>","text":"<p>Get hypernym-hyponym pairs from a given set of wordnet synsets.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>@staticmethod\ndef fetch_hypernyms(synsets: set, include_membership: bool = False):\n\"\"\"Get hypernym-hyponym pairs from a given set of wordnet synsets.\"\"\"\n    hypernym_hyponym_pairs = []\n    for synset in synsets:\n        for h_synset in synset.hypernyms():\n            hypernym_hyponym_pairs.append((h_synset.name(), synset.name()))\n        if include_membership:\n            for h_synset in synset.instance_hypernyms():\n                hypernym_hyponym_pairs.append((h_synset.name(), synset.name()))\n    logger.info(f\"{len(hypernym_hyponym_pairs)} hypernym-hyponym pairs fetched.\")\n    return hypernym_hyponym_pairs\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler","title":"<code>TaxonomyNegativeSampler(taxonomy, entity_weights=None)</code>","text":"<p>Class for the efficient negative sampling with buffer over the taxonomy.</p> <p>Attributes:</p> Name Type Description <code>taxonomy</code> <code>str</code> <p>The taxonomy for negative sampling.</p> <code>entity_weights</code> <code>Optional[dict]</code> <p>A dictionary with the taxonomy entities as keys and their corresponding weights as values. Defaults to <code>None</code>.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def __init__(self, taxonomy: Taxonomy, entity_weights: Optional[dict] = None):\n    self.taxonomy = taxonomy\n    self.entities = self.taxonomy.nodes\n    # uniform distribution if weights not provided\n    self.entity_weights = entity_weights\n\n    self._entity_probs = None\n    if self.entity_weights:\n        self._entity_probs = np.array([self.entity_weights[e] for e in self.entities])\n        self._entity_probs = self._entity_probs / self._entity_probs.sum()\n    self._buffer = []\n    self._default_buffer_size = 10000\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler.fill","title":"<code>fill(buffer_size=None)</code>","text":"<p>Buffer a large collection of entities sampled with replacement for faster negative sampling.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def fill(self, buffer_size: Optional[int] = None):\n\"\"\"Buffer a large collection of entities sampled with replacement for faster negative sampling.\"\"\"\n    buffer_size = buffer_size if buffer_size else self._default_buffer_size\n    if self._entity_probs:\n        self._buffer = np.random.choice(self.entities, size=buffer_size, p=self._entity_probs)\n    else:\n        self._buffer = np.random.choice(self.entities, size=buffer_size)\n</code></pre>"},{"location":"deeponto/onto/taxonomy/#deeponto.onto.taxonomy.TaxonomyNegativeSampler.sample","title":"<code>sample(entity_id, n_samples, buffer_size=None)</code>","text":"<p>Sample N negative samples for a given entity with replacement.</p> Source code in <code>src/deeponto/onto/taxonomy.py</code> <pre><code>def sample(self, entity_id: str, n_samples: int, buffer_size: Optional[int] = None):\n\"\"\"Sample N negative samples for a given entity with replacement.\"\"\"\n    negative_samples = []\n    positive_samples = self.taxonomy.get_parents(entity_id, True)\n    while len(negative_samples) &lt; n_samples:\n        if len(self._buffer) &lt; n_samples:\n            self.fill(buffer_size)\n        negative_samples += list(filter(lambda x: x not in positive_samples, self._buffer[:n_samples]))\n        self._buffer = self._buffer[n_samples:]  # remove the samples from the buffer\n    return negative_samples[:n_samples]\n</code></pre>"},{"location":"deeponto/onto/verbalisation/","title":"Ontology Verbalisation","text":"<p>Verbalising an ontology into natural language texts is a challenging task. \\(\\textsf{DeepOnto}\\) provides some basic building blocks for achieving this goal. The implemented <code>OntologyVerbaliser</code> is essentially a recursive concept verbaliser that first splits a complex concept \\(C\\) into a sub-formula tree, verbalising the leaf nodes (atomic concepts or object properties) by their names, then merging the verbalised child nodes according to the logical pattern at their parent node. </p> <p>Please cite the following paper if you consider using our verbaliser.</p> <p>Paper</p> <p>The recursive concept verbaliser is proposed in the paper: Language Model Analysis for Ontology Subsumption Inference (Findings of ACL 2023).</p> <pre><code>@inproceedings{he-etal-2023-language,\n    title = \"Language Model Analysis for Ontology Subsumption Inference\",\n    author = \"He, Yuan  and\n    Chen, Jiaoyan  and\n    Jimenez-Ruiz, Ernesto  and\n    Dong, Hang  and\n    Horrocks, Ian\",\n    booktitle = \"Findings of the Association for Computational Linguistics: ACL 2023\",\n    month = jul,\n    year = \"2023\",\n    address = \"Toronto, Canada\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.findings-acl.213\",\n    doi = \"10.18653/v1/2023.findings-acl.213\",\n    pages = \"3439--3453\"\n}\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser","title":"<code>OntologyVerbaliser(onto, apply_lowercasing=False, keep_iri=False, apply_auto_correction=False, add_quantifier_word=False)</code>","text":"<p>A recursive natural language verbaliser for the OWL logical expressions, e.g., <code>OWLAxiom</code> and <code>OWLClassExpression</code>.</p> <p>The concept patterns supported by this verbaliser are shown below:</p> Pattern Verbalisation (\\(\\mathcal{V}\\)) \\(A\\) (atomic) the name (\\(\\texttt{rdfs:label}\\)) of \\(A\\)  (auto-correction is optional) \\(r\\) (property) the name (\\(\\texttt{rdfs:label}\\)) of \\(r\\)  (auto-correction is optional) \\(\\neg C\\) \"not \\(\\mathcal{V}(C)\\)\" \\(\\exists r.C\\) \"something that \\(\\mathcal{V}(r)\\) some \\(\\mathcal{V}(C)\\)\"  (the quantifier word \"some\" is optional) \\(\\forall r.C\\) \"something that \\(\\mathcal{V}(r)\\) only \\(\\mathcal{V}(C)\\)\"  (the quantifier word \"only\" is optional) \\(C_1 \\sqcap ... \\sqcap C_n\\) if \\(C_i = \\exists/\\forall r.D_i\\) and \\(C_j = \\exists/\\forall r.D_j\\), they will be re-written into \\(\\exists/\\forall r.(D_i \\sqcap D_j)\\) before verbalisation; suppose after re-writing the new expression is \\(C_1 \\sqcap ... \\sqcap C_{n'}\\) <p> (a) if all \\(C_i\\)s (for \\(i = 1, ..., n'\\)) are restrictions, in the form of \\(\\exists/\\forall r_i.D_i\\):  \"something that \\(\\mathcal{V}(r_1)\\) some/only \\(V(D_1)\\) and ... and \\(\\mathcal{V}(r_{n'})\\) some/only \\(V(D_{n'})\\)\" (b) if some \\(C_i\\)s (for \\(i = m+1, ..., n'\\)) are restrictions, in the form of \\(\\exists/\\forall r_i.D_i\\):  \"\\(\\mathcal{V}(C_{1})\\) and ... and \\(\\mathcal{V}(C_{m})\\) that \\(\\mathcal{V}(r_{m+1})\\) some/only \\(V(D_{m+1})\\) and ... and \\(\\mathcal{V}(r_{n'})\\) some/only \\(V(D_{n'})\\)\" (c) if no \\(C_i\\) is a restriction:  \"\\(\\mathcal{V}(C_{1})\\) and ... and \\(\\mathcal{V}(C_{n'})\\)\" \\(C_1 \\sqcup ... \\sqcup C_n\\) similar to verbalising \\(C_1 \\sqcap ... \\sqcap C_n\\) except that \"and\" is replaced by \"or\" and case (b) uses the same verbalisation as case (c) \\(r_1 \\cdot r_2\\) (property chain) \\(\\mathcal{V}(r_1)\\) something that \\(\\mathcal{V}(r_2)\\) <p>With this concept verbaliser, a range of OWL axioms are supported:</p> <ul> <li>Class axioms for subsumption, equivalence, assertion.</li> <li>Object property axioms for subsumption, assertion.</li> </ul> <p>The verbaliser operates at the concept level, and an additional template is needed to integrate the verbalised components of an axiom.</p> <p>Warning</p> <p>This verbaliser utilises spacy for POS tagging used in the auto-correction of property names. Automatic download of the rule-based library <code>en_core_web_sm</code> is available at the init function. However, if you somehow cannot find it, please manually download it using <code>python -m spacy download en_core_web_sm</code>.</p> <p>Attributes:</p> Name Type Description <code>onto</code> <code>Ontology</code> <p>An ontology whose entities and axioms are to be verbalised.</p> <code>parser</code> <code>OntologySyntaxParser</code> <p>A syntax parser for the string representation of an <code>OWLObject</code>.</p> <code>vocab</code> <code>dict[str, list[str]]</code> <p>A dictionary with <code>(entity_iri, entity_name)</code> pairs, by default the names are retrieved from \\(\\texttt{rdfs:label}\\).</p> <code>apply_lowercasing</code> <code>bool</code> <p>Whether to apply lowercasing to the entity names. Defaults to <code>False</code>.</p> <code>keep_iri</code> <code>bool</code> <p>Whether to keep the IRIs of entities without verbalising them using <code>self.vocab</code>. Defaults to <code>False</code>.</p> <code>apply_auto_correction</code> <code>bool</code> <p>Whether to automatically apply rule-based auto-correction to entity names. Defaults to <code>False</code>.</p> <code>add_quantifier_word</code> <code>bool</code> <p>Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to <code>False</code>.</p> <p>Parameters:</p> Name Type Description Default <code>onto</code> <code>Ontology</code> <p>An ontology whose entities and axioms are to be verbalised.</p> required <code>apply_lowercasing</code> <code>bool</code> <p>Whether to apply lowercasing to the entity names. Defaults to <code>False</code>.</p> <code>False</code> <code>keep_iri</code> <code>bool</code> <p>Whether to keep the IRIs of entities without verbalising them using <code>self.vocab</code>. Defaults to <code>False</code>.</p> <code>False</code> <code>apply_auto_correction</code> <code>bool</code> <p>Whether to automatically apply rule-based auto-correction to entity names. Defaults to <code>False</code>.</p> <code>False</code> <code>add_quantifier_word</code> <code>bool</code> <p>Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to <code>False</code>.</p> <code>False</code> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def __init__(\n    self,\n    onto: Ontology,\n    apply_lowercasing: bool = False,\n    keep_iri: bool = False,\n    apply_auto_correction: bool = False,\n    add_quantifier_word: bool = False,\n):\n\"\"\"Initialise an ontology verbaliser.\n\n    Args:\n        onto (Ontology): An ontology whose entities and axioms are to be verbalised.\n        apply_lowercasing (bool, optional): Whether to apply lowercasing to the entity names. Defaults to `False`.\n        keep_iri (bool, optional): Whether to keep the IRIs of entities without verbalising them using `self.vocab`. Defaults to `False`.\n        apply_auto_correction (bool, optional): Whether to automatically apply rule-based auto-correction to entity names. Defaults to `False`.\n        add_quantifier_word (bool, optional): Whether to add quantifier words (\"some\"/\"only\") as in the Manchester syntax. Defaults to `False`.\n    \"\"\"\n    self.onto = onto\n    self.parser = OntologySyntaxParser()\n\n    # download en_core_web_sm for object property\n    try:\n        spacy.load(\"en_core_web_sm\")\n    except:\n        print(\"Download `en_core_web_sm` for pos tagger.\")\n        os.system(\"python -m spacy download en_core_web_sm\")\n\n    self.nlp = spacy.load(\"en_core_web_sm\")\n\n    # build the default vocabulary for entities\n    self.apply_lowercasing_to_vocab = apply_lowercasing\n    self.vocab = dict()\n    for entity_type in [\"Classes\", \"ObjectProperties\", \"DataProperties\", \"Individuals\"]:\n        entity_annotations, _ = self.onto.build_annotation_index(\n            entity_type=entity_type, apply_lowercasing=self.apply_lowercasing_to_vocab\n        )\n        self.vocab.update(**entity_annotations)\n    literal_or_iri = lambda k, v: list(v)[0] if v else k  # set vocab to IRI if no string available\n    self.vocab = {k: literal_or_iri(k, v) for k, v in self.vocab.items()}  # only set one name for each entity\n\n    self.keep_iri = keep_iri\n    self.apply_auto_correction = apply_auto_correction\n    self.add_quantifier_word = add_quantifier_word\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.update_entity_name","title":"<code>update_entity_name(entity_iri, entity_name)</code>","text":"<p>Update the name of an entity in <code>self.vocab</code>.</p> <p>If you want to change the name of a specific entity, you should call this function before applying verbalisation.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def update_entity_name(self, entity_iri: str, entity_name: str):\n\"\"\"Update the name of an entity in `self.vocab`.\n\n    If you want to change the name of a specific entity, you should call this\n    function before applying verbalisation.\n    \"\"\"\n    self.vocab[entity_iri] = entity_name\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_expression","title":"<code>verbalise_class_expression(class_expression)</code>","text":"<p>Verbalise a class expression (<code>OWLClassExpression</code>) or its parsed form (in <code>RangeNode</code>).</p> <p>See currently supported types of class (or concept) expressions here.</p> <p>Parameters:</p> Name Type Description Default <code>class_expression</code> <code>Union[OWLClassExpression, str, RangeNode]</code> <p>A class expression to be verbalised.</p> required <p>Raises:</p> Type Description <code>RuntimeError</code> <p>Occurs when the class expression is not in one of the supported types.</p> <p>Returns:</p> Type Description <code>CfgNode</code> <p>A nested dictionary that presents the recursive results of verbalisation. The verbalised string can be accessed with the key <code>[\"verbal\"]</code> or with the attribute <code>.verbal</code>.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_class_expression(self, class_expression: Union[OWLClassExpression, str, RangeNode]):\nr\"\"\"Verbalise a class expression (`OWLClassExpression`) or its parsed form (in `RangeNode`).\n\n    See currently supported types of class (or concept) expressions [here][deeponto.onto.verbalisation.OntologyVerbaliser].\n\n\n    Args:\n        class_expression (Union[OWLClassExpression, str, RangeNode]): A class expression to be verbalised.\n\n    Raises:\n        RuntimeError: Occurs when the class expression is not in one of the supported types.\n\n    Returns:\n        (CfgNode): A nested dictionary that presents the recursive results of verbalisation. The verbalised string\n            can be accessed with the key `[\"verbal\"]` or with the attribute `.verbal`.\n    \"\"\"\n\n    if not isinstance(class_expression, RangeNode):\n        parsed_class_expression = self.parser.parse(class_expression).children[0]  # skip the root node\n    else:\n        parsed_class_expression = class_expression\n\n    # for a singleton IRI\n    if parsed_class_expression.is_iri:\n        return self._verbalise_iri(parsed_class_expression)\n\n    if parsed_class_expression.name.startswith(\"NEG\"):\n        # negation only has one child\n        cl = self.verbalise_class_expression(parsed_class_expression.children[0])\n        return CfgNode({\"verbal\": \"not \" + cl.verbal, \"class\": cl, \"type\": \"NEG\"})\n\n    # for existential and universal restrictions\n    if parsed_class_expression.name.startswith(\"EX.\") or parsed_class_expression.name.startswith(\"ALL\"):\n        return self._verbalise_restriction(parsed_class_expression)\n\n    # for conjunction and disjunction\n    if parsed_class_expression.name.startswith(\"AND\") or parsed_class_expression.name.startswith(\"OR\"):\n        return self._verbalise_junction(parsed_class_expression)\n\n    # for a property chain\n    if parsed_class_expression.name.startswith(\"OPC\"):\n        return self._verbalise_property(parsed_class_expression)\n\n    raise RuntimeError(f\"Input class expression `{str(class_expression)}` is not in one of the supported types.\")\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_subsumption_axiom","title":"<code>verbalise_class_subsumption_axiom(class_subsumption_axiom)</code>","text":"<p>Verbalise a class subsumption axiom.</p> <p>The subsumption axiom can have two forms:</p> <ul> <li>\\(C_{sub} \\sqsubseteq C_{super}\\), the <code>SubClassOf</code> axiom;</li> <li>\\(C_{super} \\sqsupseteq C_{sub}\\), the <code>SuperClassOf</code> axiom.</li> </ul> <p>Parameters:</p> Name Type Description Default <code>class_subsumption_axiom</code> <code>OWLAxiom</code> <p>Then class subsumption axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised sub-concept \\(\\mathcal{V}(C_{sub})\\) and super-concept \\(\\mathcal{V}(C_{super})\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_class_subsumption_axiom(self, class_subsumption_axiom: OWLAxiom):\nr\"\"\"Verbalise a class subsumption axiom.\n\n    The subsumption axiom can have two forms:\n\n    - $C_{sub} \\sqsubseteq C_{super}$, the `SubClassOf` axiom;\n    - $C_{super} \\sqsupseteq C_{sub}$, the `SuperClassOf` axiom.\n\n    Args:\n        class_subsumption_axiom (OWLAxiom): Then class subsumption axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised sub-concept $\\mathcal{V}(C_{sub})$ and super-concept $\\mathcal{V}(C_{super})$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(class_subsumption_axiom, \"SubClassOf\", \"SuperClassOf\")\n\n    parsed_subsumption_axiom = self.parser.parse(class_subsumption_axiom).children[0]  # skip the root node\n    if str(class_subsumption_axiom).startswith(\"SubClassOf\"):\n        parsed_sub_class, parsed_super_class = parsed_subsumption_axiom.children\n    elif str(class_subsumption_axiom).startswith(\"SuperClassOf\"):\n        parsed_super_class, parsed_sub_class = parsed_subsumption_axiom.children\n\n    verbalised_sub_class = self.verbalise_class_expression(parsed_sub_class)\n    verbalised_super_class = self.verbalise_class_expression(parsed_super_class)\n    return verbalised_sub_class, verbalised_super_class\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_equivalence_axiom","title":"<code>verbalise_class_equivalence_axiom(class_equivalence_axiom)</code>","text":"<p>Verbalise a class equivalence axiom.</p> <p>The equivalence axiom has the form \\(C \\equiv D\\).</p> <p>Parameters:</p> Name Type Description Default <code>class_equivalence_axiom</code> <code>OWLAxiom</code> <p>The class equivalence axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised concept \\(\\mathcal{V}(C)\\) and its equivalent concept \\(\\mathcal{V}(D)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_class_equivalence_axiom(self, class_equivalence_axiom: OWLAxiom):\nr\"\"\"Verbalise a class equivalence axiom.\n\n    The equivalence axiom has the form $C \\equiv D$.\n\n    Args:\n        class_equivalence_axiom (OWLAxiom): The class equivalence axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised concept $\\mathcal{V}(C)$ and its equivalent concept $\\mathcal{V}(D)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(class_equivalence_axiom, \"EquivalentClasses\")\n\n    parsed_equivalence_axiom = self.parser.parse(class_equivalence_axiom).children[0]  # skip the root node\n    parsed_class_left, parsed_class_right = parsed_equivalence_axiom.children\n\n    verbalised_left_class = self.verbalise_class_expression(parsed_class_left)\n    verbalised_right_class = self.verbalise_class_expression(parsed_class_right)\n    return verbalised_left_class, verbalised_right_class\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_class_assertion_axiom","title":"<code>verbalise_class_assertion_axiom(class_assertion_axiom)</code>","text":"<p>Verbalise a class assertion axiom.</p> <p>The class assertion axiom has the form \\(C(x)\\).</p> <p>Parameters:</p> Name Type Description Default <code>class_assertion_axiom</code> <code>OWLAxiom</code> <p>The class assertion axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised class \\(\\mathcal{V}(C)\\) and individual \\(\\mathcal{V}(x)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_class_assertion_axiom(self, class_assertion_axiom: OWLAxiom):\nr\"\"\"Verbalise a class assertion axiom.\n\n    The class assertion axiom has the form $C(x)$.\n\n    Args:\n        class_assertion_axiom (OWLAxiom): The class assertion axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised class $\\mathcal{V}(C)$ and individual $\\mathcal{V}(x)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(class_assertion_axiom, \"ClassAssertion\")\n\n    parsed_equivalence_axiom = self.parser.parse(class_assertion_axiom).children[0]  # skip the root node\n    parsed_class, parsed_individual = parsed_equivalence_axiom.children\n\n    verbalised_class = self.verbalise_class_expression(parsed_class)\n    verbalised_individual = self._verbalise_iri(parsed_individual)\n    return verbalised_class, verbalised_individual\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_subsumption_axiom","title":"<code>verbalise_object_property_subsumption_axiom(object_property_subsumption_axiom)</code>","text":"<p>Verbalise an object property subsumption axiom.</p> <p>The subsumption axiom can have two forms:</p> <ul> <li>\\(r_{sub} \\sqsubseteq r_{super}\\), the <code>SubObjectPropertyOf</code> axiom;</li> <li>\\(r_{super} \\sqsupseteq r_{sub}\\), the <code>SuperObjectPropertyOf</code> axiom.</li> </ul> <p>Parameters:</p> Name Type Description Default <code>object_property_subsumption_axiom</code> <code>OWLAxiom</code> <p>The object property subsumption axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised sub-property \\(\\mathcal{V}(r_{sub})\\) and super-property \\(\\mathcal{V}(r_{super})\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_object_property_subsumption_axiom(self, object_property_subsumption_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property subsumption axiom.\n\n    The subsumption axiom can have two forms:\n\n    - $r_{sub} \\sqsubseteq r_{super}$, the `SubObjectPropertyOf` axiom;\n    - $r_{super} \\sqsupseteq r_{sub}$, the `SuperObjectPropertyOf` axiom.\n\n    Args:\n        object_property_subsumption_axiom (OWLAxiom): The object property subsumption axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised sub-property $\\mathcal{V}(r_{sub})$ and super-property $\\mathcal{V}(r_{super})$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(\n        object_property_subsumption_axiom,\n        \"SubObjectPropertyOf\",\n        \"SuperObjectPropertyOf\",\n        \"SubPropertyChainOf\",\n        \"SuperPropertyChainOf\",\n    )\n\n    parsed_subsumption_axiom = self.parser.parse(object_property_subsumption_axiom).children[\n        0\n    ]  # skip the root node\n    if str(object_property_subsumption_axiom).startswith(\"SubObjectPropertyOf\"):\n        parsed_sub_property, parsed_super_property = parsed_subsumption_axiom.children\n    elif str(object_property_subsumption_axiom).startswith(\"SuperObjectPropertyOf\"):\n        parsed_super_property, parsed_sub_property = parsed_subsumption_axiom.children\n\n    verbalised_sub_property = self._verbalise_property(parsed_sub_property)\n    verbalised_super_property = self._verbalise_property(parsed_super_property)\n    return verbalised_sub_property, verbalised_super_property\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_assertion_axiom","title":"<code>verbalise_object_property_assertion_axiom(object_property_assertion_axiom)</code>","text":"<p>Verbalise an object property assertion axiom.</p> <p>The object property assertion axiom has the form \\(r(x, y)\\).</p> <p>Parameters:</p> Name Type Description Default <code>object_property_assertion_axiom</code> <code>OWLAxiom</code> <p>The object property assertion axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised object property \\(\\mathcal{V}(r)\\) and two individuals \\(\\mathcal{V}(x)\\) and \\(\\mathcal{V}(y)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_object_property_assertion_axiom(self, object_property_assertion_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property assertion axiom.\n\n    The object property assertion axiom has the form $r(x, y)$.\n\n    Args:\n        object_property_assertion_axiom (OWLAxiom): The object property assertion axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and two individuals $\\mathcal{V}(x)$ and $\\mathcal{V}(y)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(object_property_assertion_axiom, \"ObjectPropertyAssertion\")\n\n    # skip the root node\n    parsed_object_property_assertion_axiom = self.parser.parse(object_property_assertion_axiom).children[0]\n    parsed_obj_prop, parsed_indiv_x, parsed_indiv_y = parsed_object_property_assertion_axiom.children\n\n    verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n    verbalised_individual_x = self._verbalise_iri(parsed_indiv_x)\n    verbalised_individual_y = self._verbalise_iri(parsed_indiv_y)\n    return verbalised_object_property, verbalised_individual_x, verbalised_individual_y\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_domain_axiom","title":"<code>verbalise_object_property_domain_axiom(object_property_domain_axiom)</code>","text":"<p>Verbalise an object property domain axiom.</p> <p>The domain of a property \\(r: X \\rightarrow Y\\) specifies the concept expression \\(X\\) of its subject.</p> <p>Parameters:</p> Name Type Description Default <code>object_property_domain_axiom</code> <code>OWLAxiom</code> <p>The object property domain axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised object property \\(\\mathcal{V}(r)\\) and its domain \\(\\mathcal{V}(X)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_object_property_domain_axiom(self, object_property_domain_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property domain axiom.\n\n    The domain of a property $r: X \\rightarrow Y$ specifies the concept expression $X$ of its subject.\n\n    Args:\n        object_property_domain_axiom (OWLAxiom): The object property domain axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and its domain $\\mathcal{V}(X)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(object_property_domain_axiom, \"ObjectPropertyDomain\")\n\n    # skip the root node\n    parsed_object_property_domain_axiom = self.parser.parse(object_property_domain_axiom).children[0]\n    parsed_obj_prop, parsed_obj_prop_domain = parsed_object_property_domain_axiom.children\n\n    verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n    verbalised_object_property_domain = self.verbalise_class_expression(parsed_obj_prop_domain)\n\n    return verbalised_object_property, verbalised_object_property_domain\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologyVerbaliser.verbalise_object_property_range_axiom","title":"<code>verbalise_object_property_range_axiom(object_property_range_axiom)</code>","text":"<p>Verbalise an object property range axiom.</p> <p>The range of a property \\(r: X \\rightarrow Y\\) specifies the concept expression \\(Y\\) of its object.</p> <p>Parameters:</p> Name Type Description Default <code>object_property_range_axiom</code> <code>OWLAxiom</code> <p>The object property range axiom to be verbalised.</p> required <p>Returns:</p> Type Description <code>Tuple[CfgNode, CfgNode]</code> <p>The verbalised object property \\(\\mathcal{V}(r)\\) and its range \\(\\mathcal{V}(Y)\\) (order matters).</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def verbalise_object_property_range_axiom(self, object_property_range_axiom: OWLAxiom):\nr\"\"\"Verbalise an object property range axiom.\n\n    The range of a property $r: X \\rightarrow Y$ specifies the concept expression $Y$ of its object.\n\n    Args:\n        object_property_range_axiom (OWLAxiom): The object property range axiom to be verbalised.\n\n    Returns:\n        (Tuple[CfgNode, CfgNode]): The verbalised object property $\\mathcal{V}(r)$ and its range $\\mathcal{V}(Y)$ (order matters).\n    \"\"\"\n\n    # input check\n    self._axiom_input_check(object_property_range_axiom, \"ObjectPropertyRange\")\n\n    # skip the root node\n    parsed_object_property_range_axiom = self.parser.parse(object_property_range_axiom).children[0]\n    parsed_obj_prop, parsed_obj_prop_range = parsed_object_property_range_axiom.children\n\n    verbalised_object_property = self._verbalise_iri(parsed_obj_prop, is_property=True)\n    verbalised_object_property_range = self.verbalise_class_expression(parsed_obj_prop_range)\n\n    return verbalised_object_property, verbalised_object_property_range\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser","title":"<code>OntologySyntaxParser()</code>","text":"<p>A syntax parser for the OWL logical expressions, e.g., <code>OWLAxiom</code> and <code>OWLClassExpression</code>.</p> <p>It makes use of the string representation (based on Manchester Syntax) defined in the OWLAPI. In Python, such string can be accessed by simply using <code>str(some_owl_object)</code>.</p> <p>To keep the Java import in the main <code>Ontology</code> class, this parser does not deal with <code>OWLAxiom</code> directly but instead its string representation.</p> <p>Due to the <code>OWLObject</code> syntax, this parser relies on two components:</p> <ol> <li>Parentheses matching;</li> <li>Tree construction (<code>RangeNode</code>).</li> </ol> <p>As a result, it will return a <code>RangeNode</code> that specifies the sub-formulas (and their respective positions in the string representation) in a tree structure.</p> <p>Examples:</p> <p>Suppose the input is an <code>OWLAxiom</code> that has the string representation:</p> <pre><code>&gt;&gt;&gt; str(owl_axiom)\n&gt;&gt;&gt; 'EquivalentClasses(&lt;http://purl.obolibrary.org/obo/FOODON_00001707&gt; ObjectIntersectionOf(&lt;http://purl.obolibrary.org/obo/FOODON_00002044&gt; ObjectSomeValuesFrom(&lt;http://purl.obolibrary.org/obo/RO_0001000&gt; &lt;http://purl.obolibrary.org/obo/FOODON_03412116&gt;)) )'\n</code></pre> <p>This corresponds to the following logical expression:</p> \\[ CephalopodFoodProduct \\equiv MolluskFoodProduct \\sqcap \\exists derivesFrom.Cephalopod \\] <p>After apply the parser, a <code>RangeNode</code> will be returned which can be rentered as:</p> <pre><code>axiom_parser = OntologySyntaxParser()\nprint(axiom_parser.parse(str(owl_axiom)).render_tree())\n</code></pre> <code>Output:</code> <pre><code>Root@[0:inf]\n\u2514\u2500\u2500 EQV@[0:212]\n    \u251c\u2500\u2500 FOODON_00001707@[6:54]\n    \u2514\u2500\u2500 AND@[55:210]\n        \u251c\u2500\u2500 FOODON_00002044@[61:109]\n        \u2514\u2500\u2500 EX.@[110:209]\n            \u251c\u2500\u2500 RO_0001000@[116:159]\n            \u2514\u2500\u2500 FOODON_03412116@[160:208]\n</code></pre> <p>Or, if <code>graphviz</code> (installed by e.g., <code>sudo apt install graphviz</code>) is available, you can visualise the tree as an image by:</p> <pre><code>axiom_parser.parse(str(owl_axiom)).render_image()\n</code></pre> <p><code>Output:</code></p> <p> </p> <p>The name for each node has the form <code>{node_type}@[{start}:{end}]</code>, which means a node of the type <code>{node_type}</code> is located at the range <code>[{start}:{end}]</code> in the abbreviated expression  (see <code>abbreviate_owl_expression</code> below).</p> <p>The leaf nodes are IRIs and they are represented by the last segment (split by <code>\"/\"</code>) of the whole IRI.</p> <p>Child nodes can be accessed by <code>.children</code>, the string representation of the sub-formula in this node can be accessed by <code>.text</code>. For example:</p> <pre><code>parser.parse(str(owl_axiom)).children[0].children[1].text\n</code></pre> <code>Output:</code> <pre><code>'[AND](&lt;http://purl.obolibrary.org/obo/FOODON_00002044&gt; [EX.](&lt;http://purl.obolibrary.org/obo/RO_0001000&gt; &lt;http://purl.obolibrary.org/obo/FOODON_03412116&gt;))'\n</code></pre> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def __init__(self):\n    pass\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.abbreviate_owl_expression","title":"<code>abbreviate_owl_expression(owl_expression)</code>","text":"<p>Abbreviate the string representations of logical operators to a fixed length (easier for parsing).</p> <p>The abbreviations are specified at <code>deeponto.onto.verbalisation.ABBREVIATION_DICT</code>.</p> <p>Parameters:</p> Name Type Description Default <code>owl_expression</code> <code>str</code> <p>The string representation of an <code>OWLObject</code>.</p> required <p>Returns:</p> Type Description <code>str</code> <p>The modified string representation of this <code>OWLObject</code> where the logical operators are abbreviated.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def abbreviate_owl_expression(self, owl_expression: str):\nr\"\"\"Abbreviate the string representations of logical operators to a\n    fixed length (easier for parsing).\n\n    The abbreviations are specified at `deeponto.onto.verbalisation.ABBREVIATION_DICT`.\n\n    Args:\n        owl_expression (str): The string representation of an `OWLObject`.\n\n    Returns:\n        (str): The modified string representation of this `OWLObject` where the logical operators are abbreviated.\n    \"\"\"\n    for k, v in ABBREVIATION_DICT.items():\n        owl_expression = owl_expression.replace(k, v)\n    return owl_expression\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.parse","title":"<code>parse(owl_expression)</code>","text":"<p>Parse an <code>OWLAxiom</code> into a <code>RangeNode</code>.</p> <p>This is the main entry for using the parser, which relies on the <code>parse_by_parentheses</code> method below.</p> <p>Parameters:</p> Name Type Description Default <code>owl_expression</code> <code>Union[str, OWLObject]</code> <p>The string representation of an <code>OWLObject</code> or the <code>OWLObject</code> itself.</p> required <p>Returns:</p> Type Description <code>RangeNode</code> <p>A parsed syntactic tree given what parentheses to be matched.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def parse(self, owl_expression: Union[str, OWLObject]) -&gt; RangeNode:\nr\"\"\"Parse an `OWLAxiom` into a [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n    This is the main entry for using the parser, which relies on the [`parse_by_parentheses`][deeponto.onto.verbalisation.OntologySyntaxParser.parse_by_parentheses]\n    method below.\n\n    Args:\n        owl_expression (Union[str, OWLObject]): The string representation of an `OWLObject` or the `OWLObject` itself.\n\n    Returns:\n        (RangeNode): A parsed syntactic tree given what parentheses to be matched.\n    \"\"\"\n    if not isinstance(owl_expression, str):\n        owl_expression = str(owl_expression)\n    owl_expression = self.abbreviate_owl_expression(owl_expression)\n    # print(\"To parse the following (transformed) axiom text:\\n\", owl_expression)\n    # parse complex patterns first\n    cur_parsed = self.parse_by_parentheses(owl_expression)\n    # parse the IRI patterns latter\n    return self.parse_by_parentheses(owl_expression, cur_parsed, for_iri=True)\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.OntologySyntaxParser.parse_by_parentheses","title":"<code>parse_by_parentheses(owl_expression, already_parsed=None, for_iri=False)</code>  <code>classmethod</code>","text":"<p>Parse an <code>OWLAxiom</code> based on parentheses matching into a <code>RangeNode</code>.</p> <p>This function needs to be applied twice to get a fully parsed <code>RangeNode</code> because IRIs have a different parenthesis pattern.</p> <p>Parameters:</p> Name Type Description Default <code>owl_expression</code> <code>str</code> <p>The string representation of an <code>OWLObject</code>.</p> required <code>already_parsed</code> <code>RangeNode</code> <p>A partially parsed <code>RangeNode</code> to continue with. Defaults to <code>None</code>.</p> <code>None</code> <code>for_iri</code> <code>bool</code> <p>Parentheses are by default <code>()</code> but will be changed to <code>&lt;&gt;</code> for IRIs. Defaults to <code>False</code>.</p> <code>False</code> <p>Raises:</p> Type Description <code>RuntimeError</code> <p>Raised when the input axiom text is nor properly formatted.</p> <p>Returns:</p> Type Description <code>RangeNode</code> <p>A parsed syntactic tree given what parentheses to be matched.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>@classmethod\ndef parse_by_parentheses(\n    cls, owl_expression: str, already_parsed: RangeNode = None, for_iri: bool = False\n) -&gt; RangeNode:\nr\"\"\"Parse an `OWLAxiom` based on parentheses matching into a [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n    This function needs to be applied twice to get a fully parsed [`RangeNode`][deeponto.onto.verbalisation.RangeNode] because IRIs have\n    a different parenthesis pattern.\n\n    Args:\n        owl_expression (str): The string representation of an `OWLObject`.\n        already_parsed (RangeNode, optional): A partially parsed [`RangeNode`][deeponto.onto.verbalisation.RangeNode] to continue with. Defaults to `None`.\n        for_iri (bool, optional): Parentheses are by default `()` but will be changed to `&lt;&gt;` for IRIs. Defaults to `False`.\n\n    Raises:\n        RuntimeError: Raised when the input axiom text is nor properly formatted.\n\n    Returns:\n        (RangeNode): A parsed syntactic tree given what parentheses to be matched.\n    \"\"\"\n    if not already_parsed:\n        # a root node that covers the entire sentence\n        parsed = RangeNode(0, math.inf, name=f\"Root\", text=owl_expression, is_iri=False)\n    else:\n        parsed = already_parsed\n    stack = []\n    left_par = \"(\"\n    right_par = \")\"\n    if for_iri:\n        left_par = \"&lt;\"\n        right_par = \"&gt;\"\n\n    for i, c in enumerate(owl_expression):\n        if c == left_par:\n            stack.append(i)\n        if c == right_par:\n            try:\n                start = stack.pop()\n                end = i\n                if not for_iri:\n                    # the first character is actually \"[\"\n                    real_start = start - 5\n                    axiom_type = owl_expression[real_start + 1 : start - 1]\n                    node = RangeNode(\n                        real_start,\n                        end + 1,\n                        name=f\"{axiom_type}\",\n                        text=owl_expression[real_start : end + 1],\n                        is_iri=False,\n                    )\n                    parsed.insert_child(node)\n                else:\n                    # no preceding characters for just atomic class (IRI)\n                    abbr_iri = owl_expression[start : end + 1].split(\"/\")[-1].rstrip(\"&gt;\")\n                    node = RangeNode(\n                        start, end + 1, name=abbr_iri, text=owl_expression[start : end + 1], is_iri=True\n                    )\n                    parsed.insert_child(node)\n            except IndexError:\n                print(\"Too many closing parentheses\")\n\n    if stack:  # check if stack is empty afterwards\n        raise RuntimeError(\"Too many opening parentheses\")\n\n    return parsed\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode","title":"<code>RangeNode(start, end, name=None, **kwargs)</code>","text":"<p>             Bases: <code>NodeMixin</code></p> <p>A tree implementation for ranges (without partial overlap).</p> <ul> <li>Parent node's range fully covers child node's range, e.g., <code>[1, 10]</code> is a parent of <code>[2, 5]</code>.</li> <li>Partial overlap between ranges are not allowed, e.g., <code>[2, 4]</code> and <code>[3, 5]</code> cannot appear in the same <code>RangeNodeTree</code>.</li> <li>Non-overlap ranges are on different branches (irrelevant).</li> <li>Child nodes are ordered according to their relative positions.</li> </ul> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def __init__(self, start, end, name=None, **kwargs):\n    if start &gt;= end:\n        raise RuntimeError(\"invalid start and end positions ...\")\n    self.start = start\n    self.end = end\n    self.name = \"Root\" if not name else name\n    self.name = f\"{self.name}@[{self.start}:{self.end}]\"  # add start and ent to the name\n    for k, v in kwargs.items():\n        setattr(self, k, v)\n    super().__init__()\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.__gt__","title":"<code>__gt__(other)</code>","text":"<p>Compare two ranges if they have a different <code>start</code> and/or a different <code>end</code>.</p> <ul> <li>\\(R_1 \\lt R_2\\): if range \\(R_1\\) is completely contained in range \\(R_2\\), and \\(R_1 \\neq R_2\\).</li> <li>\\(R_1 \\gt R_2\\): if range \\(R_2\\) is completely contained in range \\(R_1\\),  and \\(R_1 \\neq R_2\\).</li> <li><code>\"irrelevant\"</code>: if range \\(R_1\\) and range \\(R_2\\) have no overlap.</li> </ul> <p>Warning</p> <p>Partial overlap is not allowed.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def __gt__(self, other: RangeNode):\nr\"\"\"Compare two ranges if they have a different `start` and/or a different `end`.\n\n    - $R_1 \\lt R_2$: if range $R_1$ is completely contained in range $R_2$, and $R_1 \\neq R_2$.\n    - $R_1 \\gt R_2$: if range $R_2$ is completely contained in range $R_1$,  and $R_1 \\neq R_2$.\n    - `\"irrelevant\"`: if range $R_1$ and range $R_2$ have no overlap.\n\n    !!! warning\n\n        Partial overlap is not allowed.\n    \"\"\"\n    # ranges inside\n    if self.start &lt;= other.start and other.end &lt;= self.end:\n        return True\n\n    # ranges outside\n    if other.start &lt;= self.start and self.end &lt;= other.end:\n        return False\n\n    if other.end &lt; self.start or self.end &lt; other.start:\n        return \"irrelevant\"\n\n    raise RuntimeError(\"Compared ranges have a partial overlap.\")\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.sort_by_start","title":"<code>sort_by_start(nodes)</code>  <code>staticmethod</code>","text":"<p>A sorting function that sorts the nodes by their starting positions.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>@staticmethod\ndef sort_by_start(nodes: List[RangeNode]):\n\"\"\"A sorting function that sorts the nodes by their starting positions.\"\"\"\n    temp = {sib: sib.start for sib in nodes}\n    return list(dict(sorted(temp.items(), key=lambda item: item[1])).keys())\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.insert_child","title":"<code>insert_child(node)</code>","text":"<p>Inserting a child <code>RangeNode</code>.</p> <p>Child nodes have a smaller (inclusive) range, e.g., <code>[2, 5]</code> is a child of <code>[1, 6]</code>.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def insert_child(self, node: RangeNode):\nr\"\"\"Inserting a child [`RangeNode`][deeponto.onto.verbalisation.RangeNode].\n\n    Child nodes have a smaller (inclusive) range, e.g., `[2, 5]` is a child of `[1, 6]`.\n    \"\"\"\n    if node &gt; self:\n        raise RuntimeError(\"invalid child node\")\n    if node.start == self.start and node.end == self.end:\n        # duplicated node\n        return\n    # print(self.children)\n    if self.children:\n        inserted = False\n        for ch in self.children:\n            if (node &lt; ch) is True:\n                # print(\"further down\")\n                ch.insert_child(node)\n                inserted = True\n                break\n            elif (node &gt; ch) is True:\n                # print(\"insert in between\")\n                ch.parent = node\n                # NOTE: should not break here as it could be parent of multiple children !\n                # break\n            # NOTE: the equal case is when two nodes are exactly the same, no operation needed\n        if not inserted:\n            self.children = list(self.children) + [node]\n            self.children = self.sort_by_start(self.children)\n    else:\n        node.parent = self\n        self.children = [node]\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.render_tree","title":"<code>render_tree()</code>","text":"<p>Render the whole tree.</p> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def render_tree(self):\n\"\"\"Render the whole tree.\"\"\"\n    return RenderTree(self)\n</code></pre>"},{"location":"deeponto/onto/verbalisation/#deeponto.onto.verbalisation.RangeNode.render_image","title":"<code>render_image()</code>","text":"<p>Calling this function will generate a temporary <code>range_node.png</code> file which will be displayed.</p> <p>To make this visualisation work, you need to install <code>graphviz</code> by, e.g.,</p> <pre><code>sudo apt install graphviz\n</code></pre> Source code in <code>src/deeponto/onto/verbalisation.py</code> <pre><code>def render_image(self):\n\"\"\"Calling this function will generate a temporary `range_node.png` file\n    which will be displayed.\n\n    To make this visualisation work, you need to install `graphviz` by, e.g.,\n\n    ```bash\n    sudo apt install graphviz\n    ```\n    \"\"\"\n    RenderTreeGraph(self).to_picture(\"range_node.png\")\n    return Image(\"range_node.png\")\n</code></pre>"},{"location":"deeponto/utils/data_utils/","title":"Data Utilities","text":""},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.set_seed","title":"<code>set_seed(seed)</code>","text":"<p>Set seed function imported from transformers.</p> Source code in <code>src/deeponto/utils/data_utils.py</code> <pre><code>def set_seed(seed):\n\"\"\"Set seed function imported from transformers.\"\"\"\n    t_set_seed(seed)\n</code></pre>"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.sort_dict_by_values","title":"<code>sort_dict_by_values(dic, desc=True, k=None)</code>","text":"<p>Return a sorted dict by values with first k reserved if provided.</p> Source code in <code>src/deeponto/utils/data_utils.py</code> <pre><code>def sort_dict_by_values(dic: dict, desc: bool = True, k: Optional[int] = None):\n\"\"\"Return a sorted dict by values with first k reserved if provided.\"\"\"\n    sorted_items = list(sorted(dic.items(), key=lambda item: item[1], reverse=desc))\n    return dict(sorted_items[:k])\n</code></pre>"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.uniqify","title":"<code>uniqify(ls)</code>","text":"<p>Return a list of unique elements without messing around the order</p> Source code in <code>src/deeponto/utils/data_utils.py</code> <pre><code>def uniqify(ls):\n\"\"\"Return a list of unique elements without messing around the order\"\"\"\n    non_empty_ls = list(filter(lambda x: x != \"\", ls))\n    return list(dict.fromkeys(non_empty_ls))\n</code></pre>"},{"location":"deeponto/utils/data_utils/#deeponto.utils.data_utils.print_dict","title":"<code>print_dict(dic)</code>","text":"<p>Pretty print a dictionary.</p> Source code in <code>src/deeponto/utils/data_utils.py</code> <pre><code>def print_dict(dic: dict):\n\"\"\"Pretty print a dictionary.\"\"\"\n    pretty_print = json.dumps(dic, indent=4, separators=(\",\", \": \"))\n    # print(pretty_print)\n    return pretty_print\n</code></pre>"},{"location":"deeponto/utils/decorators/","title":"Decorators","text":""},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.timer","title":"<code>timer(function)</code>","text":"<p>Print the runtime of the decorated function.</p> Source code in <code>src/deeponto/utils/decorators.py</code> <pre><code>def timer(function):\n\"\"\"Print the runtime of the decorated function.\"\"\"\n\n    @wraps(function)\n    def wrapper_timer(*args, **kwargs):\n        start_time = time.perf_counter()  # 1\n        value = function(*args, **kwargs)\n        end_time = time.perf_counter()  # 2\n        run_time = end_time - start_time  # 3\n        print(f\"Finished {function.__name__!r} in {run_time:.4f} secs.\")\n        return value\n\n    return wrapper_timer\n</code></pre>"},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.debug","title":"<code>debug(function)</code>","text":"<p>Print the function signature and return value.</p> Source code in <code>src/deeponto/utils/decorators.py</code> <pre><code>def debug(function):\n\"\"\"Print the function signature and return value.\"\"\"\n\n    @wraps(function)\n    def wrapper_debug(*args, **kwargs):\n        args_repr = [repr(a) for a in args]\n        kwargs_repr = [f\"{k}={v!r}\" for k, v in kwargs.items()]\n        signature = \", \".join(args_repr + kwargs_repr)\n        print(f\"Calling {function.__name__}({signature})\")\n        value = function(*args, **kwargs)\n        print(f\"{function.__name__!r} returned {value!r}.\")\n        return value\n\n    return wrapper_debug\n</code></pre>"},{"location":"deeponto/utils/decorators/#deeponto.utils.decorators.paper","title":"<code>paper(title, link)</code>","text":"<p>Add paper tagger for methods.</p> Source code in <code>src/deeponto/utils/decorators.py</code> <pre><code>def paper(title: str, link: str):\n\"\"\"Add paper tagger for methods.\"\"\"\n    # Define a new decorator, named \"decorator\", to return\n    def decorator(func):\n        # Ensure the decorated function keeps its metadata\n        @wraps(func)\n        def wrapper(*args, **kwargs):\n            # Call the function being decorated and return the result\n            return func(*args, **kwargs)\n\n        wrapper.paper_title = f'This method is associated with tha paper of title: \"{title}\".'\n        wrapper.paper_link = f\"This method is associated with the paper with link: {link}.\"\n        return wrapper\n\n    # Return the new decorator\n    return decorator\n</code></pre>"},{"location":"deeponto/utils/file_utils/","title":"File Utilities","text":""},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.create_path","title":"<code>create_path(path)</code>","text":"<p>Create a path recursively.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def create_path(path: str):\n\"\"\"Create a path recursively.\"\"\"\n    Path(path).mkdir(parents=True, exist_ok=True)\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.save_file","title":"<code>save_file(obj, save_path, sort_keys=False)</code>","text":"<p>Save an object to a certain format.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def save_file(obj, save_path: str, sort_keys: bool = False):\n\"\"\"Save an object to a certain format.\"\"\"\n    if save_path.endswith(\".json\"):\n        with open(save_path, \"w\") as output:\n            json.dump(obj, output, indent=4, separators=(\",\", \": \"), sort_keys=sort_keys)\n    elif save_path.endswith(\".pkl\"):\n        with open(save_path, \"wb\") as output:\n            pickle.dump(obj, output, -1)\n    elif save_path.endswith(\".yaml\"):\n        with open(save_path, \"w\") as output:\n            yaml.dump(obj, output, default_flow_style=False, allow_unicode=True)\n    else:\n        raise RuntimeError(f\"Unsupported saving format: {save_path}\")\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.load_file","title":"<code>load_file(save_path)</code>","text":"<p>Load an object of a certain format.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def load_file(save_path: str):\n\"\"\"Load an object of a certain format.\"\"\"\n    if save_path.endswith(\".json\"):\n        with open(save_path, \"r\") as input:\n            return json.load(input)\n    elif save_path.endswith(\".pkl\"):\n        with open(save_path, \"rb\") as input:\n            return pickle.load(input)\n    elif save_path.endswith(\".yaml\"):\n        with open(save_path, \"r\") as input:\n            return yaml.safe_load(input)\n    else:\n        raise RuntimeError(f\"Unsupported loading format: {save_path}\")\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.copy2","title":"<code>copy2(source, destination)</code>","text":"<p>Copy a file from source to destination.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def copy2(source: str, destination: str):\n\"\"\"Copy a file from source to destination.\"\"\"\n    try:\n        shutil.copy2(source, destination)\n        print(f\"copied successfully FROM {source} TO {destination}\")\n    except shutil.SameFileError:\n        print(f\"same file exists at {destination}\")\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_table","title":"<code>read_table(table_file_path)</code>","text":"<p>Read <code>csv</code> or <code>tsv</code> file as pandas dataframe without treating <code>\"NULL\"</code>, <code>\"null\"</code>, and <code>\"n/a\"</code> as an empty string.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def read_table(table_file_path: str):\nr\"\"\"Read `csv` or `tsv` file as pandas dataframe without treating `\"NULL\"`, `\"null\"`, and `\"n/a\"` as an empty string.\"\"\"\n    # TODO: this might change with the version of pandas\n    na_vals = pd.io.parsers.readers.STR_NA_VALUES.difference({\"NULL\", \"null\", \"n/a\"})\n    sep = \"\\t\" if table_file_path.endswith(\".tsv\") else \",\"\n    return pd.read_csv(table_file_path, sep=sep, na_values=na_vals, keep_default_na=False)\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_jsonl","title":"<code>read_jsonl(file_path)</code>","text":"<p>Read <code>.jsonl</code> file (list of json) introduced in the BLINK project.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def read_jsonl(file_path: str):\n\"\"\"Read `.jsonl` file (list of json) introduced in the BLINK project.\"\"\"\n    results = []\n    key_set = []\n    with open(file_path, \"r\", encoding=\"utf-8-sig\") as f:\n        lines = f.readlines()\n        for line in lines:\n            record = json.loads(line)\n            results.append(record)\n            key_set += list(record.keys())\n    print(f\"all available keys: {set(key_set)}\")\n    return results\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.read_oaei_mappings","title":"<code>read_oaei_mappings(rdf_file)</code>","text":"<p>To read mapping files in the OAEI rdf format.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def read_oaei_mappings(rdf_file: str):\n\"\"\"To read mapping files in the OAEI rdf format.\"\"\"\n    xml_root = ET.parse(rdf_file).getroot()\n    ref_mappings = []  # where relation is \"=\"\n    ignored_mappings = []  # where relation is \"?\"\n\n    for elem in xml_root.iter():\n        # every Cell contains a mapping of en1 -rel(some value)-&gt; en2\n        if \"Cell\" in elem.tag:\n            en1, en2, rel, measure = None, None, None, None\n            for sub_elem in elem:\n                if \"entity1\" in sub_elem.tag:\n                    en1 = list(sub_elem.attrib.values())[0]\n                elif \"entity2\" in sub_elem.tag:\n                    en2 = list(sub_elem.attrib.values())[0]\n                elif \"relation\" in sub_elem.tag:\n                    rel = sub_elem.text\n                elif \"measure\" in sub_elem.tag:\n                    measure = sub_elem.text\n            row = (en1, en2, measure)\n            # =: equivalent; &gt; superset of; &lt; subset of.\n            if rel == \"=\" or rel == \"&gt;\" or rel == \"&lt;\":\n                # rel.replace(\"&amp;gt;\", \"&gt;\").replace(\"&amp;lt;\", \"&lt;\")\n                ref_mappings.append(row)\n            elif rel == \"?\":\n                ignored_mappings.append(row)\n            else:\n                print(\"Unknown Relation Warning: \", rel)\n\n    print('#Maps (\"=\"):', len(ref_mappings))\n    print('#Maps (\"?\"):', len(ignored_mappings))\n\n    return ref_mappings, ignored_mappings\n</code></pre>"},{"location":"deeponto/utils/file_utils/#deeponto.utils.file_utils.run_jar","title":"<code>run_jar(jar_command, timeout=3600)</code>","text":"<p>Run jar command using subprocess.</p> Source code in <code>src/deeponto/utils/file_utils.py</code> <pre><code>def run_jar(jar_command: str, timeout=3600):\n\"\"\"Run jar command using subprocess.\"\"\"\n    print(f\"Run jar command with timeout: {timeout}s.\")\n    proc = subprocess.Popen(jar_command.split(\" \"))\n    try:\n        _, _ = proc.communicate(timeout=timeout)\n    except subprocess.TimeoutExpired:\n        warnings.warn(\"kill the jar process as timed out\")\n        proc.kill()\n        _, _ = proc.communicate()\n</code></pre>"},{"location":"deeponto/utils/logging/","title":"Logging","text":""},{"location":"deeponto/utils/logging/#deeponto.utils.logging.RuntimeFormatter","title":"<code>RuntimeFormatter(*args, **kwargs)</code>","text":"<p>             Bases: <code>logging.Formatter</code></p> <p>Auxiliary class for runtime formatting in the logger.</p> Source code in <code>src/deeponto/utils/logging.py</code> <pre><code>def __init__(self, *args, **kwargs):\n    super().__init__(*args, **kwargs)\n    self.start_time = time.time()\n</code></pre>"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.RuntimeFormatter.formatTime","title":"<code>formatTime(record, datefmt=None)</code>","text":"<p>Record relative runtime in hr:min:sec format\u3002</p> Source code in <code>src/deeponto/utils/logging.py</code> <pre><code>def formatTime(self, record, datefmt=None):\n\"\"\"Record relative runtime in hr:min:sec format\u3002\"\"\"\n    duration = datetime.datetime.utcfromtimestamp(record.created - self.start_time)\n    elapsed = duration.strftime(\"%H:%M:%S\")\n    return \"{}\".format(elapsed)\n</code></pre>"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.create_logger","title":"<code>create_logger(model_name, saved_path)</code>","text":"<p>Create logger for both console info and saved info.</p> <p>The pre-existed log file will be cleared before writing into new messages.</p> Source code in <code>src/deeponto/utils/logging.py</code> <pre><code>def create_logger(model_name: str, saved_path: str):\n\"\"\"Create logger for both console info and saved info.\n\n    The pre-existed log file will be cleared before writing into new messages.\n    \"\"\"\n    logger = logging.getLogger(model_name)\n    logger.setLevel(logging.DEBUG)\n    # create file handler which logs even debug messages\n    fh = logging.FileHandler(f\"{saved_path}/{model_name}.log\", mode=\"w\")  # \"w\" means clear the log file before writing\n    fh.setLevel(logging.DEBUG)\n    # create console handler with a higher log level\n    ch = logging.StreamHandler()\n    ch.setLevel(logging.INFO)\n    # create formatter and add it to the handlers\n    formatter = RuntimeFormatter(\"[Time: %(asctime)s] - [PID: %(process)d] - [Model: %(name)s] \\n%(message)s\")\n    fh.setFormatter(formatter)\n    ch.setFormatter(formatter)\n    # add the handlers to the logger\n    logger.addHandler(fh)\n    logger.addHandler(ch)\n    logger.propagate = False\n    return logger\n</code></pre>"},{"location":"deeponto/utils/logging/#deeponto.utils.logging.banner_message","title":"<code>banner_message(message, sym='^')</code>","text":"<p>Print a banner message surrounded by special symbols.</p> Source code in <code>src/deeponto/utils/logging.py</code> <pre><code>def banner_message(message: str, sym=\"^\"):\n\"\"\"Print a banner message surrounded by special symbols.\"\"\"\n    print()\n    message = message.upper()\n    banner_len = len(message) + 4\n    message = \" \" * ((banner_len - len(message)) // 2) + message\n    message = message + \" \" * (banner_len - len(message))\n    print(message)\n    print(sym * banner_len)\n    print()\n</code></pre>"},{"location":"deeponto/utils/text_utils/","title":"Text Utilities","text":""},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer","title":"<code>Tokenizer(tokenizer_type)</code>","text":"<p>A Tokenizer class for both sub-word (pre-trained) and word (rule-based) level tokenization.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def __init__(self, tokenizer_type: str):\n    self.type = tokenizer_type\n    self._tokenizer = None  # hidden tokenizer\n    self.tokenize = None  # the tokenization method\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer.from_pretrained","title":"<code>from_pretrained(pretrained_path='bert-base-uncased')</code>  <code>classmethod</code>","text":"<p>(Based on transformers) Load a sub-word level tokenizer from pre-trained model.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>@classmethod\ndef from_pretrained(cls, pretrained_path: str = \"bert-base-uncased\"):\n\"\"\"(Based on **transformers**) Load a sub-word level tokenizer from pre-trained model.\"\"\"\n    instance = cls(\"pre-trained\")\n    instance._tokenizer = AutoTokenizer.from_pretrained(pretrained_path)\n    instance.tokenize = instance._tokenizer.tokenize\n    return instance\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.Tokenizer.from_rule_based","title":"<code>from_rule_based()</code>  <code>classmethod</code>","text":"<p>(Based on spacy) Load a word-level (rule-based) tokenizer.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>@classmethod\ndef from_rule_based(cls):\n\"\"\"(Based on **spacy**) Load a word-level (rule-based) tokenizer.\"\"\"\n    spacy.prefer_gpu()\n    instance = cls(\"rule-based\")\n    instance._tokenizer = English()\n    instance.tokenize = lambda texts: [word.text for word in instance._tokenizer(texts).doc]\n    return instance\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.InvertedIndex","title":"<code>InvertedIndex(index, tokenizer)</code>","text":"<p>Inverted index built from a text index.</p> <p>Attributes:</p> Name Type Description <code>tokenizer</code> <code>Tokenizer</code> <p>A tokenizer instance to be used.</p> <code>original_index</code> <code>defaultdict</code> <p>A dictionary where the values are text strings to be tokenized.</p> <code>constructed_index</code> <code>defaultdict</code> <p>A dictionary that acts as the inverted index of <code>original_index</code>.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def __init__(self, index: defaultdict, tokenizer: Tokenizer):\n    self.tokenizer = tokenizer\n    self.original_index = index\n    self.constructed_index = defaultdict(list)\n    for k, v in self.original_index.items():\n        # value is a list of strings\n        for token in self.tokenizer(v):\n            self.constructed_index[token].append(k)\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.InvertedIndex.idf_select","title":"<code>idf_select(texts, pool_size=200)</code>","text":"<p>Given a list of tokens, select a set candidates based on the inverted document frequency (idf) scores.</p> <p>We use <code>idf</code> instead of  <code>tf</code> because labels have different lengths and thus tf is not a fair measure.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def idf_select(self, texts: Union[str, List[str]], pool_size: int = 200):\n\"\"\"Given a list of tokens, select a set candidates based on the inverted document frequency (idf) scores.\n\n    We use `idf` instead of  `tf` because labels have different lengths and thus tf is not a fair measure.\n    \"\"\"\n    candidate_pool = defaultdict(lambda: 0)\n    # D := number of \"documents\", i.e., number of \"keys\" in the original index\n    D = len(self.original_index)\n    for token in self.tokenizer(texts):\n        # each token is associated with some classes\n        potential_candidates = self.constructed_index[token]\n        if not potential_candidates:\n            continue\n        # We use idf instead of tf because the text for each class is of different length, tf is not a fair measure\n        # inverse document frequency: with more classes to have the current token tk, the score decreases\n        idf = math.log10(D / len(potential_candidates))\n        for candidate in potential_candidates:\n            # each candidate class is scored by sum(idf)\n            candidate_pool[candidate] += idf\n    candidate_pool = list(sorted(candidate_pool.items(), key=lambda item: item[1], reverse=True))\n    # print(f\"Select {min(len(candidate_pool), pool_size)} candidates.\")\n    # select the first K ranked\n    return candidate_pool[:pool_size]\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.process_annotation_literal","title":"<code>process_annotation_literal(annotation_literal, apply_lowercasing=False, normalise_identifiers=False)</code>","text":"<p>Pre-process an annotation literal string.</p> <p>Parameters:</p> Name Type Description Default <code>annotation_literal</code> <code>str</code> <p>A literal string of an entity's annotation.</p> required <code>apply_lowercasing</code> <code>bool</code> <p>A boolean that determines lowercasing or not. Defaults to <code>False</code>.</p> <code>False</code> <code>normalise_identifiers</code> <code>bool</code> <p>Whether to normalise annotation text that is in the Java identifier format. Defaults to <code>False</code>.</p> <code>False</code> <p>Returns:</p> Type Description <code>str</code> <p>the processed annotation literal string.</p> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def process_annotation_literal(\n    annotation_literal: str, apply_lowercasing: bool = False, normalise_identifiers: bool = False\n):\n\"\"\"Pre-process an annotation literal string.\n\n    Args:\n        annotation_literal (str): A literal string of an entity's annotation.\n        apply_lowercasing (bool): A boolean that determines lowercasing or not. Defaults to `False`.\n        normalise_identifiers (bool): Whether to normalise annotation text that is in the Java identifier format. Defaults to `False`.\n\n    Returns:\n        (str): the processed annotation literal string.\n    \"\"\"\n\n    # replace the underscores with spaces\n    annotation_literal = annotation_literal.replace(\"_\", \" \")\n\n    # if the annotation literal is a valid identifier with first letter capitalised\n    # we suspect that it could be a Java style identifier that needs to be split\n    if normalise_identifiers and annotation_literal[0].isupper() and annotation_literal.isidentifier():\n        annotation_literal = split_java_identifier(annotation_literal)\n\n    # lowercase the annotation literal if specfied\n    if apply_lowercasing:\n        annotation_literal = annotation_literal.lower()\n\n    return annotation_literal\n</code></pre>"},{"location":"deeponto/utils/text_utils/#deeponto.utils.text_utils.split_java_identifier","title":"<code>split_java_identifier(java_style_identifier)</code>","text":"<p>Split words in java's identifier style into natural language phrase.</p> <p>Examples:</p> <ul> <li><code>\"SuperNaturalPower\"</code> \\(\\rightarrow\\) <code>\"Super Natural Power\"</code></li> <li><code>\"APIReference\"</code> \\(\\rightarrow\\) <code>\"API Reference\"</code></li> <li><code>\"Covid19\"</code> \\(\\rightarrow\\) <code>\"Covid 19\"</code></li> </ul> Source code in <code>src/deeponto/utils/text_utils.py</code> <pre><code>def split_java_identifier(java_style_identifier: str):\nr\"\"\"Split words in java's identifier style into natural language phrase.\n\n    Examples:\n        - `\"SuperNaturalPower\"` $\\rightarrow$ `\"Super Natural Power\"`\n        - `\"APIReference\"` $\\rightarrow$ `\"API Reference\"`\n        - `\"Covid19\"` $\\rightarrow$ `\"Covid 19\"`\n    \"\"\"\n    # split at every capital letter or number (numbers are treated as capital letters)\n    raw_words = re.findall(\"([0-9A-Z][a-z]*)\", java_style_identifier)\n    words = []\n    capitalized_word = \"\"\n    for i, w in enumerate(raw_words):\n        # the above regex pattern will split at capitals\n        # so the capitalized words are split into characters\n        # i.e., (len(w) == 1)\n        if len(w) == 1:\n            capitalized_word += w\n            # edge case for the last word\n            if i == len(raw_words) - 1:\n                words.append(capitalized_word)\n\n        # if the the current w is a full word, save the previous\n        # cached capitalized_word and also save current full word\n        elif capitalized_word:\n            words.append(capitalized_word)\n            words.append(w)\n            capitalized_word = \"\"\n\n        # just save the current full word otherwise\n        else:\n            words.append(w)\n\n    return \" \".join(words)\n</code></pre>"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index b4800fc1..15d2ff76 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ