Skim-Attention

Skim-Attention: Learning to Focus via Document Layout Laura Nguyen, Thomas Scialom, Jacopo Staiano, Benjamin Piwowarski, EMNLP 2021

Introduction

Skim-Attention is a new attention mechanism that takes advantage of the document layout by attending exclusively to the 2-dimensional position of the words. We propose two approaches to exploit Skim-Attention: i) Skimformer, wherein self-attention is replaced with Skim-Attention; and ii) SkimmingMask, where an attention mask is built from Skim-Attention and fed to a Transformer language model.

Models

skimformer | HuggingFace

Environment Setup

Setup environment as follows:

$ conda create -n skim-attention python=3.8
$ conda activate skim-attention
$ git clone ssh://[email protected]:2222/research/doc-intelligence/skim-attention.git
$ cd skim-attention
$ pip install -r requirements.txt
$ pip install -e .

Pre-train Skimformer

Pre-train Skimformer as follows:

$ srun python experiments/run_pretraining.py --output_dir path/to/output/dir \
                                             --max_seq_length 512 \
                                             --data_dir path/to/data/dir \
                                             --model_type skimformer \
                                             --hidden_layout_size 768 \
                                             --skim_attention_head_size 64 \
                                             --tokenizer_name bert-base-uncased \
                                             --use_fast_tokenizer \
                                             --contextualize_2d_positions \
                                             --do_train \
                                             --do_eval \
                                             --do_predict \
                                             --learning_rate 1e-4 \
                                             --max_steps 10000 \
                                             --weight_decay 0.01 \
                                             --warmup_steps 100 \
                                             --per_device_train_batch_size 8 \
                                             --fp16 \
                                             --gradient_accumulation_steps 2

To pre-train LongSkimformer, replace skimformer with longskimformer, and provide a value for the attention_window parameter.

Document Layout Analysis

Skim-Attention is evaluated on a downstream task, document layout analysis. We use a subset of the DocBank dataset.

To fine-tune Skimformer, run the following:

$ run python experiments/run_layout_analysis.py --output_dir path/to/output/dir \
                                                --max_seq_length 512 \
                                                --data_dir path/to/data/dir \
                                                --model_type skimformer \
                                                --model_name_or_path path/to/pretrained/model \
                                                --tokenizer_name bert-base-uncased \
                                                --use_fast_tokenizer \
                                                --do_train \
                                                --do_eval \
                                                --do_predict \
                                                --num_train_epochs 10 \
                                                --learning_rate 5e-5 \
                                                --per_device_train_batch_size 8 \
                                                --fp16 \
                                                --gradient_accumulation_steps 2

To plug Skimformer's layout embeddings in BERT, do as follows:

$ run python experiments/run_layout_analysis.py --output_dir path/to/output/dir \
                                                --max_seq_length 512 \
                                                --data_dir path/to/data/dir \
                                                --model_type bertwithskimembed \
                                                --skim_model_name_or_path path/to/pretrained/skimformer/model \
                                                --core_model_type bert \
                                                --core_model_name_or_path path/to/pretrained/bert/model \
                                                --tokenizer_name bert-base-uncased \
                                                --use_fast_tokenizer \
                                                --do_train \
                                                --do_eval \
                                                --do_predict \
                                                --num_train_epochs 10 \
                                                --learning_rate 5e-5 \
                                                --per_device_train_batch_size 8 \
                                                --fp16 \
                                                --gradient_accumulation_steps 2

To plug SkimmingMask in BERT or LayoutLM, do as follows:

$ run python experiments/run_layout_analysis.py --output_dir path/to/output/dir \
                                                --max_seq_length 512 \
                                                --data_dir path/to/data/dir \
                                                --model_type skimmingmask \
                                                --skim_model_name_or_path path/to/pretrained/skimformer/model \
                                                --core_model_type <bert | layoutlm> \
                                                --core_model_name_or_path path/to/pretrained/bert/or/layoutlm/model \
                                                --tokenizer_name <bert-base-uncased | microsoft/layoutlm-base-uncased> \
                                                --use_fast_tokenizer \
                                                --top_k 128 \
                                                --do_train \
                                                --do_eval \
                                                --do_predict \
                                                --num_train_epochs 10 \
                                                --learning_rate 5e-5 \
                                                --per_device_train_batch_size 8 \
                                                --fp16 \
                                                --gradient_accumulation_steps 2

Citation

@article{nguyen2021skimattention,
    title={Skim-Attention: Learning to Focus via Document Layout}, 
    author={Laura Nguyen and Thomas Scialom and Jacopo Staiano and Benjamin Piwowarski},
    journal={arXiv preprint arXiv:2109.01078}
    year={2021},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
experiments		experiments
src/skim		src/skim
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skim-Attention

Introduction

Models

Environment Setup

Pre-train Skimformer

Document Layout Analysis

Citation

About

Releases

Packages

Languages

License

recitalAI/skim-attention

Folders and files

Latest commit

History

Repository files navigation

Skim-Attention

Introduction

Models

Environment Setup

Pre-train Skimformer

Document Layout Analysis

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages