This repository provides the data for the DSL-ML shared task held at VarDial 2024.
The DSL-ML task covers five language groups:
- BCMS (Bosnian/Croatian/Montenegrin/Serbian)
- EN (American and British English)
- ES (Argentinian and Peninsular Spanish)
- FR (Belgian, Canadian, French and Swiss French).
Each group has its own training, development and test dataset. Participants are expected to submit results for all five language groups, but partial submissions will also be accepted.
The files contain two tab-separated columns, where the first column contains the label(s) and the second column the text. The labels are separated by commas:
EN-GB,EN-US HAMAR, Norway -- Welcome to the Rink of Dreams. If you build it, they will come. Welcome to the very temporary home of the world's greatest collection of masters figure skaters.