This repository holds the code necessary to reproduce results from the paper "Large-Scale Evaluation of Keyphrase Extraction Models" accepted at JCDL2020.
This table shows the f-score @ top 10 (F@10).
model | PubMed | ACM | SemEval-2010 | Inspec | WWW | KP20k | DUC-2001 | 500N-KPCrowd | KPTimes | NYTime |
---|---|---|---|---|---|---|---|---|---|---|
FirstPhrases | 15.4 | 13.6 | 13.8 | 29.3 | 10.2 | 13.5 | 24.6 | 17.1 | 11.4 | 9.2 |
TextRank | 1.8 | 2.5 | 3.5 | 35.8 | 8.4 | 10.2 | 21.5 | 7.1 | 2.8 | 2.7 |
TfIdf | 16.7 | 12.1 | 17.7 | 36.5 | 9.3 | 11.5 | 23.3 | 16.9 | 12.4 | 9.6 |
PositionRank | 4.9 | 5.7 | 6.8 | 34.2 | 11.6 | 14.1 | 28.6 | 13.4 | 10.4 | 8.5 |
MultipartiteRank | 15.8 | 11.6 | 14.3 | 30.5 | 10.8 | 13.6 | 25.6 | 18.2 | 14.0 | 11.2 |
EmbedRank | 3.7 | 2.1 | 2.5 | 35.6 | 10.7 | 12.4 | 29.5 | 12.4 | 4.7 | 3.1 |
Kea | 18.6 | 14.2 | 19.5 | 34.5 | 11.0 | 14.0 | 26.5 | 17.3 | 13.8 | 11.0 |
CopyRNN | 24.2 | 24.4 | 20.3 | 28.2 | 22.2 | 25.5 | 12.7 | 15.5 | 14.9 | 11.0 |
CopyCorrRNN | 20.8 | 21.1 | 19.4 | 27.9 | 19.9 | 22.0 | 17.0 | 11.5 | 11.9 | 9.7 |
CopyRNN_News | 11.6 | 5.1 | 7.0 | 9.2 | 6.3 | 6.6 | 10.5 | 8.4 | 31.9 | 39.3 |
CopyCorrRNN_News | n/a | n/a | n/a | n/a | n/a | n/a | 10.5 | 7.8 | 19.8 | 20.5 |
- pke
- Install with
python3 -m pip install git+https://github.com/boudinfl/pke
- To execute EmbedRank you will need sent2vec_wiki_bigrams (16GB !) downloadable from epfml/sent2vec
- To execute CopyRNN and CopyCorrRNN you will need CopyRNN pretrained and CorrRNN pretrained
- Install with
- ake-datasets
- Clone with
git clone https://github.com/boudinfl/ake-datasets
- Define environment variable
export PATH_AKE_DATASET=PATH/TO/ake-datasets
- You will need Stanford CoreNLP
- Define environment variable
export PATH_CORENLP=PATH/TO/stanford-corenlp-full-...
- Preprocess datasets by running
_preprocess.sh
for each dataset (this can take a while for large dataset) - KP20k and KPTimes are downloaded automatically when running
_preprocess.sh
but you can start downloading now with these links:- KP20k (214MB)
- KPTimes Test (30MB)
- KPTimes Valid (19MB)
- KPTimes Train (474MB)
- Clone with
To run keyphrase extraction models on each dataset:
bash _benchmarks.sh
The output will be stored in output/DATASET/DATASET.MODEL(.stem)?.json
.
You can change which models are executed by editing corresponding params/DATASET.json
file.
Evaluate one specific output:
python3 evaluation/eval.py -i output/DATASET/DATASET.MODEL.stem.json -r $PATH_AKE_DATASETS/datasets/DATASET/references/REF_TYPE.test.stem.json
Evaluate all outputs and create a .csv
holding all scores:
python3 evaluation/evaluate_all.py -v output scores.csv
Using python3 evaluation/make_tables.py scores.csv
will output a table (like the one in this README).
Large-Scale Evaluation of Keyphrase Extraction Models. [arXiv, code] Ygor Gallina, Florian Boudin, Béatrice Daille. Joint Conference on Digital Libraries (JCDL), 2020.