We present a tool capable of (1) extracting question-worthy answers from paragraphs and (2) generating questions from both paragraphs and answers.
Question Generation (QG) aims to automatically generate questions given an input. This input often requires two components: the source text, and the word(s) that will make up the question's answer. When developing these QG systems, datasets consisting of paragraph-answer pairs are typically used. However, in real environment systems, pre-identified answers are not widely available. Thus, we propose a system capable of extracting question-worthy answers from paragraphs and generating questions from both paragraphs and answers. Formally, we define four components (aka agents). The first three (answer extractors) identify question-worthy answers from the paragraphs. The last one (generator) has the role of generating questions, considering both answer and paragraph pairs.
- Question-Worthy Answer Extraction using:
- Named Entity Recognition
- Keywords using BERT
- Clauses
- Question Generation with T5 model
Python 3 (tested with version 3.9.13 on Windows 10)
- Install the Python packages from requirements.txt. If you are using a virtual environment for Python package management, you can install all python packages needed by using the following bash command:
pip install -r requirements.txt
- Install spacy-clausie (as indicated on the authors' page)
git clone https://github.com/mmxgn/spacy-clausie.git cd spacy-clausie python setup.py build python setup.py install [--user] # Optionally python setup.py test
- Download this t5 model checkpoint (for question generation) and save it in the
models_checkpoints
folder (you can create it in the project root)
- Web Application
cd web_app/ python app.py
- Command Line (to be done)
To ask questions, report issues or request features, please use the GitHub Issue Tracker.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks in advance!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is released under the General Public License Version 3.0 (or later). For details, please see the file LICENSE in the root directory.
Additionaly, this project includes third party software components: stanza, KeyBERT, spacy-clausie, and T5 model. Each of these software components have their own license. Please see stanza/license, KeyBERT/license, spacy-clausie/license, and T5 correspondingly.
A commercial license may also be available for use in industrial projects, collaborations or distributors of proprietary software that do not wish to use an open-source license. Please contact the author if you are interested.
If you use this software in your research/job/work, please kindly cite our project:
@software{ae_and_qg,
author = {Bernardo Leite},
title = {Answer Extraction and Question Generation},
year = {2022},
version = {1.0.0},
url = {https://github.com/bernardoleite/answer-extraction-and-qg},
}
Also consider citing the third party software components (see on their respective pages): stanza, KeyBERT, spacy-clausie and T5.
Other important references:
Bernardo Leite, [email protected]