Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test suites #17

Open
arademaker opened this issue Jul 14, 2021 · 5 comments
Open

test suites #17

arademaker opened this issue Jul 14, 2021 · 5 comments

Comments

@arademaker
Copy link
Member

This topic was already partially discussed in
https://delphinqa.ling.washington.edu/t/matrix-mrs-test-suite/484

We have in the wiki some pages related to the MrsTestSuite ...TestSuite: a discussion page and pages that are actually data (translations) I believe they could be moved to a better place. The English version of MRS test suite is also in the ERG repository http://svn.delph-in.net/erg/trunk/tsdb/gold/. Actually in the Erg repo we have both the MRS and the CSLI that @oepen explained the different to me:

the MRS test suite is something that ann and dan cooked up over the course of five or so weeks while dan was visiting cambridge, 2001 or 2002, i would say. except for some reuse of Abrams and Browne, I doubt there is any overlap in actual sentences with what was originally called the HP test suite. the latter was created to explore variation in syntactic structures and lives on in the DELPH-IN universe under the name CSLI test suite (since around 1994). the MRS test suite, on the other hand, exemplifies basic semantic constructions. so, in my view it is misleading to say it was derived from the HP data, but dan was of course centrally involved in both efforts.

(hope that @oepen is fine with my quote above)

This is also related to LR-POR/PorGram#5 where we start to use/care about these test suites for the development of the Portuguese (Brazilian) grammar.

  1. I wonder if we can better organize this data somehow. Maybe creating a separated repository to hold all the versions/translations instead of having them pages in the wiki.
  2. The discussion page suggested the use of https://github.com/xigt/xigt for moving from simple text files to something more informative that allow more annotations.
  3. How about other grammars? Do they also incorpore this data in their repositories? Using profiles?
@arademaker
Copy link
Member Author

The note about the numbers in the end of https://github.com/delph-in/docs/wiki/MatrixMrsTestSuite is related to only the table in this same page? This page is confusing because it contains a table with EN and JA translations but JA sentences are also in https://github.com/delph-in/docs/wiki/MatrixMrsTestSuiteJa and this is the only pages that seems to use the suggested schema for the sentence numbers.

The experiments with Tatoeba (https://github.com/delph-in/docs/wiki/MatrixMrsTestSuiteTatoeba) were not very conclusive too... what we can do about it?

Do we want a consolidation of these pages?

@fcbond
Copy link
Member

fcbond commented Jul 16, 2021 via email

@arademaker
Copy link
Member Author

Not sure if I got your point... I was thinking on how to move these data out of the wiki in a more structural format, maybe into their own repo. But we also have ERG with the monolingual version inside its own repo...

@fcbond
Copy link
Member

fcbond commented Jul 18, 2021 via email

@arademaker
Copy link
Member Author

arademaker commented Dec 10, 2022

Updates about the Tatoeba website, the sentences are surviving, and now, many more translations are available:

https://tatoeba.org/en/sentences_lists/show/166576/und/und

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants