Skip to content

Repository containing code and results from the Large Scale Evaluation of Keyphrase Extraction Models, published in JCDL 2020.

License

Notifications You must be signed in to change notification settings

ygorg/JCDL_2020_KPE_Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

3d5ca46 · Mar 23, 2021

History

3 Commits
Jul 3, 2020
Jul 3, 2020
May 28, 2020
Mar 23, 2021
Jul 3, 2020
Jul 3, 2020
Jul 3, 2020

Repository files navigation

Large-Scale Evaluation of Keyphrase Extraction Models

This repository holds the code necessary to reproduce results from the paper "Large-Scale Evaluation of Keyphrase Extraction Models" accepted at JCDL2020.

This table shows the f-score @ top 10 (F@10).

model PubMed ACM SemEval-2010 Inspec WWW KP20k DUC-2001 500N-KPCrowd KPTimes NYTime
FirstPhrases 15.4 13.6 13.8 29.3 10.2 13.5 24.6 17.1 11.4 9.2
TextRank 1.8 2.5 3.5 35.8 8.4 10.2 21.5 7.1 2.8 2.7
TfIdf 16.7 12.1 17.7 36.5 9.3 11.5 23.3 16.9 12.4 9.6
PositionRank 4.9 5.7 6.8 34.2 11.6 14.1 28.6 13.4 10.4 8.5
MultipartiteRank 15.8 11.6 14.3 30.5 10.8 13.6 25.6 18.2 14.0 11.2
EmbedRank 3.7 2.1 2.5 35.6 10.7 12.4 29.5 12.4 4.7 3.1
Kea 18.6 14.2 19.5 34.5 11.0 14.0 26.5 17.3 13.8 11.0
CopyRNN 24.2 24.4 20.3 28.2 22.2 25.5 12.7 15.5 14.9 11.0
CopyCorrRNN 20.8 21.1 19.4 27.9 19.9 22.0 17.0 11.5 11.9 9.7
CopyRNN_News 11.6 5.1 7.0 9.2 6.3 6.6 10.5 8.4 31.9 39.3
CopyCorrRNN_News n/a n/a n/a n/a n/a n/a 10.5 7.8 19.8 20.5

Requirements

  • pke
  • ake-datasets
    • Clone with git clone https://github.com/boudinfl/ake-datasets
    • Define environment variable export PATH_AKE_DATASET=PATH/TO/ake-datasets
    • You will need Stanford CoreNLP
    • Define environment variable export PATH_CORENLP=PATH/TO/stanford-corenlp-full-...
    • Preprocess datasets by running _preprocess.sh for each dataset (this can take a while for large dataset)
    • KP20k and KPTimes are downloaded automatically when running _preprocess.sh but you can start downloading now with these links:

Running models

To run keyphrase extraction models on each dataset:

bash _benchmarks.sh

The output will be stored in output/DATASET/DATASET.MODEL(.stem)?.json. You can change which models are executed by editing corresponding params/DATASET.json file.

Evaluating

Evaluate one specific output:

python3 evaluation/eval.py -i output/DATASET/DATASET.MODEL.stem.json -r $PATH_AKE_DATASETS/datasets/DATASET/references/REF_TYPE.test.stem.json

Evaluate all outputs and create a .csv holding all scores:

python3 evaluation/evaluate_all.py -v output scores.csv

Using python3 evaluation/make_tables.py scores.csv will output a table (like the one in this README).

Citing this paper

Large-Scale Evaluation of Keyphrase Extraction Models. [arXiv, code] Ygor Gallina, Florian Boudin, Béatrice Daille. Joint Conference on Digital Libraries (JCDL), 2020.

About

Repository containing code and results from the Large Scale Evaluation of Keyphrase Extraction Models, published in JCDL 2020.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published