musti-eval-baselines

Setup

Follow the steps in Repository Setup in volta. Except cloning original volta repository, use git clone [email protected]:Odeuropa/musti-eval-baselines.git.

Test Data Preparation and Downloading Images

To prepare test file on a specific language use:

python download_images.py --split --lang en --filepath data/musti_data/musti-train.json

To include all languages in test file, use:

python download_images.py --no-split

Feature Extraction

To install Feature Extractor follow the steps in here.

conda activate py-bottomup

cd volta/features_extraction/datasets/flickr30k
python flickr30k_boxes36_h5-proposal.py --root /musti-eval-baselines/data/image_data/images --outdir /musti-eval-baselines/data/image_data/features

cd ../..
python h5_to_lmdb.py --h5 /musti-eval-baselines/data/image_data/features/musti_boxes36.h5 --lmdb /musti-eval-baselines/data/image_data/features/lmdb

conda deactivate

Models

Find links to download pretrained models from this link. Place baselines folder that contains models under musti-eval-baselines folder. Model configuration files are stored in volta/config/.

Evaluate data

Make sure that config_test_task.yml has the right test file path as val_annotations_jsonpath.

conda activate volta

cd test_scripts
bash test_vilbert_finetuned.sh

cd ..
python eval_result.py --test_file data/test_files/musti-train-en.jsonl --logit_file results/vilbert/pretrained/musti-train-en-logits.txt

conda deactivate

Data

	Positive pairs (Train)	Total pairs (Train)	Positive pairs (Test)	Total pairs (Test)
English	198 (24.90%)	795	-	200
German	95 (19.79%)	480	-	213
French	102 (34.00%)	300	-	200
Italian	198 (24.78%)	799	-	201

Table 1: MUSTI train and test set data statistics. The class distribution of the test data is kept confidential.

Results

Majority of data in each language is the negative class. Therefore we present the dummy baseline results where the baseline classifies all pairs as negative. The Overall score is the F1-macro on all predictions on all test data.

	English	German	French	Italian	Overall
dummy-baseline	.4285	.4289	.3333	.4273	.4075
mUNITER	.4269	.4289	.3551	.4398	.4177
mUNITER-SNLI	.4474	.4644	.3605	.5020	.4473
mUNITER-MUSTI	.6965	.4579	.5022	.6535	.6011
mUNITER-SNLI-MUSTI	.7482	.5014	.5053	.6850	.6176
Shao et al.	.7867	.4568	.3743	.7501	.6033
ViLBERT-SNLI-MUSTI	.8024	-	-	-	-

Table 2: Multilingual models and ViLBERT-SNLI-MUSTI results on the MUSTI test set, given as F1-macro score. Best performing monolingual model Vilbert-SNLI-MUSTI added for reference.

ViLBERT	ViLBERT-SNLI	ViLBERT-MUSTI	ViLBERT-SNLI-MUSTI	Shao et al. (multilingual)
.4609	.4373	.7834	.8024	.7867

Table 3: ViLBERT results on the MUSTI English test set, given as F1-macro score. Best performing multilingual model on English Shao et al. added for reference.

Acknowledgement

This code is modified from volta which in turn relies on vilbert-multi-task and other repositories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

musti-eval-baselines

Setup

Test Data Preparation and Downloading Images

Feature Extraction

Models

Evaluate data

Data

Results

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
test_scripts		test_scripts
volta		volta
.gitignore		.gitignore
README.md		README.md
config_test_task.yml		config_test_task.yml
download_images.py		download_images.py
eval_result.py		eval_result.py

Odeuropa/musti-eval-baselines

Folders and files

Latest commit

History

Repository files navigation

musti-eval-baselines

Setup

Test Data Preparation and Downloading Images

Feature Extraction

Models

Evaluate data

Data

Results

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages