BELHD: Improving Biomedical Entity Linking with Homonym Disambiguation

Code to reproduce experiments in:

@article{BelhdImprovinGarda2024,
  archiveprefix = {arXiv},
  author = {Garda, Samuele and Leser, Ulf},
  eprint = {2401.05125v1},
  month = {Jan},
  primaryclass = {cs.CL},
  title = {BELHD: Improving Biomedical Entity Linking with Homonoym Disambiguation},
  url = {http://arxiv.org/abs/2401.05125v1},
  year = {2024},
}

Setup
Results
- BELB
- BioRED
Run

Setup

Install the belb library in your python environment:

git clone https://github.com/sg-wbi/belb
cd belb
pip install -e .

Then you need to install other requirements specific for BELHD:

(belhd) user $ pip install -r requirements.txt

Results

We stored predictions of all models and gold labels in the data directory. Below you find the commands to reproduce all tables reported in the paper.

BELB

Reproduce the results on BELB.

Main table:

(belhd) user $ python -m scripts.evaluate

BELHD ablations:

(belhd) user $ python -m scripts.evaluate_ablations

Ad-hoc solutions for homonyms. Abbreviations:

(belhd) user $ python -m scripts.evaluate_ar

and species assignment:

(belhd) user $ python -m scripts.evaluate_sa

BioRED

(belhd) user $ python -m biored.evaluate

Run

If you wish to use our code with BELB you first need to follow the belb instructions to setup a directory with all the data (corpora and KBs).

Homonym Disambiguation

To create KB versions with disambiguated homonyms:

(belhd) user $ python -m scripts.disambiguate_kbs --dir /path/to/belb/dir

We note that belb deals with large KBs and its code it's not optimized. This step takes quite a while, especially for NCBI Gene.

BELHD

To train BELHD you need to convert BELB data into the required input format

Edit data/configs/data.yaml:

belb_dir : 'path/to/belb/directory'
exp_dir : 'path/to/experiments/directory'

Prepare data with:

(belhd) user $ python -m scripts.tokenize_corpora

and

(belhd) user $ python -m scripts.tokenize_dkbs

Then you can use the helpers scripts bin/train.sh to train the models and bin/predict.sh to obtain the predictions for each corpus.

BELHD Ablations

Run scripts bin/train_ablations.sh and bin/predict_ablations.sh

Ad-hoc solutions for homonyms

You need to first train BELHD without HD and with abbreviation resolution (bin/train_nohd.sh) and obtain the predictions (bin/predict_nohd.sh). For this you need to create a version of the data with abbreviation resolution with:

(belhd) user $ python -m scripts.tokenize_corpora abbres=true

Similarly you need to rerun the baselines with abbreviation resolution. Gene corpora with species assignment are stored in ./data/belb/species_assign (see SpeciesAssignment.md for details).

Baselines

For each baseline we use the original code. We provide detailed instruction on how to run them in separate files:

BioSyn: ./baselines/biosyn/README.md
GenBioEL: ./baselines/genbioel/README.md
arboEL: ./baselines/arboel/README.md

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Ab3P		Ab3P
baselines		baselines
bin		bin
biored		biored
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
disambiguation.py		disambiguation.py
index.py		index.py
model.py		model.py
path_Ab3P		path_Ab3P
predict.py		predict.py
predict_nohd.py		predict_nohd.py
requirements.txt		requirements.txt
storage.py		storage.py
tokenizer.py		tokenizer.py
train.py		train.py
train_nohd.py		train_nohd.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BELHD: Improving Biomedical Entity Linking with Homonym Disambiguation

Setup

Results

BELB

BioRED

Run

Homonym Disambiguation

BELHD

BELHD Ablations

Ad-hoc solutions for homonyms

Baselines

About

Releases

Packages

Languages

sg-wbi/belhd

Folders and files

Latest commit

History

Repository files navigation

BELHD: Improving Biomedical Entity Linking with Homonym Disambiguation

Setup

Results

BELB

BioRED

Run

Homonym Disambiguation

BELHD

BELHD Ablations

Ad-hoc solutions for homonyms

Baselines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages