Mining Arguments in US Presidential Campaign Debates

Project Description

This project is dedicated to the problem of Argument Mining and is aimed at reproducing the paper published by Haddadan et al. in 2019. The dataset is comprised of 39 speeches from the presidential debates in the US from the year 1960 to 2016. The corpus sentences are labeled as claims, premises or none of above.

Like the authors, we solve two binary classification tasks: 1) classification of argumentative vs. non-argumentative sentences; and 2) classification of claims vs. premises.

The methodology of the project is entirely reflected by the original paper. We split the practical part into four experimental settings. The splits are based on different methodological approaches.

Setting 1: tf-idf word matrices for unigrams + SVM linear model

Setting 2: tf-idf word matrices for uni-, bi- and trigrams and a set of engineered features + SVM with rbf kernel

Setting 3: fasttext embeddings + LSTM

Setting 4: tf-idf word matrices for uni-, bi- and trigrams and a set of engineered features + Feed Forward Network

The project is carried out as a part of PM "Mining Opinions and Arguments", taught by Prof. Dr. Manfred Stede at the University of Potsdam, Winter Term 2021/22. Our final report can be found here.

Team: Chen Peng, Mariia Poiaganova, Milena Voskanyan

Repository Structure

Folder	Item	Description
📁 code	📁 experiments	Notebooks for 4 experimental settings × 2 tasks
	linguistic_features.py	Documented functions for feature engineering
	preprocessing.py	Code snippet for sentences preprocessing
	requirements.txt	Necessary Python frameworks and their versions
📁 data	📁 raw	Debates' scripts along with respective annotation files
	sentence_db_candidate.csv	Main dataset used
📑 paper		Final report
📁 presentations	dec-presentation.pdf	Slides for the first presentation
	feb-presentation.pdf	Slides for the final presentation

Replication

🔹 To reproduce the project, start by cloning the repository:

git clone https://github.com/pc852/Argument-Mining-Presidential-Debates.git

🔹 Next, we recommend running the experiments from Jupyter Notebook environment. If you are new to it, nice guideline can be found here.

🔹 You will need Python version 3.7 or higher, and the following packages:

pandas, scikit-learn, nltk, keras, tensorflow, gensim, numpy, spacy, vadersentiment. Our versions for them can be found here.

🔹 Each notebook is tailored to be run independently, however, you might need some additional preparations for certain experimental settings.

For Settings 2 and 4, with engineered features

In order to compile Part-of-Speech and NER engineering functions, you should download the model en_core_web_sm to your system.

For Setting 3, with LSTM model

You should download the pre-trained fasttext embeddings file wiki-news-300d-1M.vec and move it to project folder. FastText

⚪ Dataset is kept here, but for all the experiments you will need only this file.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
code		code
data		data
presentations		presentations
Haddadan2019.pdf		Haddadan2019.pdf
README.md		README.md
final-report.pdf		final-report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mining Arguments in US Presidential Campaign Debates

Project Description

Repository Structure

Replication

About

Releases

Packages

Contributors 3

Languages

pc852/Argument-Mining-Presidential-Debates

Folders and files

Latest commit

History

Repository files navigation

Mining Arguments in US Presidential Campaign Debates

Project Description

Repository Structure

Replication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages