-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
organize the test sentences into test suites #5
Comments
Eu mencionei este assunto em delph-in/docs#17, veja que Erg tem no repositório dela tanto esta MRs test suite como a CSLI test suite, sendo a segunda, segundo o Stephan (veja quote no issue) sobre estruturas sintáticas e esta MRS sobre representações semânticas. Uma solução seria adotar a mesma estratégia que Erg, incorporar neste repo as test suites e suas anotações golden como profiles. O LKB_FOS no Linux se conecta ao TSDB e pode trabalhar com profiles ao invés de text files. |
@leoalenc eu gostaria de unificar os arquivos test_sentences e my-test_sentences.txt que estão no repositório. Isto está relacionado ao novo repositório que movi para cá http://github.com/LR-POR/ud-matrix e ao issue #19. A tabela do #19 poderá conter apenas os IDs das sentenças nos respectivos test suites quando organizarmos eles. Mas temos hoje várias novas sentenças que não estavam nos datasets originais:
|
@arademaker, test_sentences é sempre gerado pela Matriz a partir do arquivo choices, que contém esses exemplos. Desse modo, seguindo sua sugestão, vou eliminar test_sentences. |
Não sei se eliminar faz sentido afinal outros tantos arquivos são também gerados pelo choices. Mas certamente preciso entender seus exemplos melhor. A ideia é termos dentro do diretório tsdb os profiles para as test suítes MRS e CSLI, traduzidas. Mas vc criou um outro tanto de sentenças com variações das sentenças destes test suítes, certo? Para casa profile que tivermos no tsdb, a ideia é ter as mesmas sentenças analisadas em ud neste repositório chamado agora UD-Matrix mas que precisa ser renomeado. |
@arademaker, no momento, não temos uma versão final do conjunto de teste MRS em português do Brasil. E não ainda não traduzi o conjunto de teste CSLI. Portanto, o único conjunto de teste que tem sido utilizado para testar a gramática é my-test_sentences.txt, de que o arquivo apagado constitui um subconjunto. |
@arademaker, veja 6c66770. |
related to #31 |
We now have a
Once I have fixed that, I will close this issue, recreating the core profile. The We already have some sentences from the MRS Test suite translation in the core. So we may need to decide if in the future we many revise the core profile, removing the ones that are from MRS and keeping MRS separated from the new variations created by @leoalenc |
In c8f3141 I used the command below to create the core.tsv
Next, I prepared the
|
removed files that can be produced automatically from the core.txt file
@leoalenc, the I confirmed that
|
Considering the questions above. I also found two different versions of choices files: Can @leoalenc confirm which one is the last one? @arademaker , |
@arademaker , with 4363808, I removed two duplicates from
|
@arademaker , the grammar does not generate any output for the following sentences:
This is an excellent question for Dan! This is why |
The file test_sentences not been used for the development of the grammar for a long time, see my first comment above. It is deprecated. The sole basis for the development of the grammar @arademaker is the file |
The comment #5 (comment) was fixed. |
1. the folder test was removed, the sentences created so far for testing the grammar were combined into the core.txt moved to the tsdb/skeleton folder. 2. the folder 'gold' was removed, @leoalenc agreed that this was probably a test folder for some old issue.
@leoalenc can you double-check the examples in https://github.com/LR-POR/PorGram/blob/main/tsdb/skeletons/core.txt I found some cases that I expected to be ungrammatical (marked as 0 in the last field):
|
@arademaker , all sentences mentioned are grammatical. However, you are right to raise doubts about this. In fact, the first sentence seems to be more common in European Portuguese. This construction appears on page 240 of the Grammar of the Portuguese Language, in the second edition, by Maria Elena Mira Matheus and other authors, published in Lisbon in 1989 by the publisher Caminho. The use of the universal quantifier in plural without article is considered incorrect by prescriptive approaches. Personally, I prefer the version with article. However, in my personal dialect, the construction without definite article is also possible. It is attested by Mario A. Perini in his Descriptive Grammar of Brazilian, Portuguese, published by Vozes in 2016, see page 364. |
@arademaker , the grammar analyses both sentences with LKB if we set the |
@arademaker , some sentences from
The sentence should have two readings. I'm going to correct this. In the future, I could design a tense feature architecture to avoid this. Would you like to ask Dan if this effort is worthy? |
Last number is 1 for grammatical and 0 for ungrammatical. We are not recording ambiguity. A good question for @danflink Thanks. |
@arademaker, estou revendo a tradução ou adaptação desse conjunto teste para o português do Brasil, de modo a poder avaliar a cobertura da gramática.
Num primeiro momento, vou simplesmente juntar num arquivo todas as traduções possíveis que me ocorrerem ou que me forem sugeridas. Levarei em conta, também, as sentenças equivalentes dos conjuntos teste do alemão e francês.
Num segundo momento, seria interessante organizar as traduções seguindo estas orientações, para o que peço ajuda, se isso for realmente relevante para o desenvolvimento da PorGram:
http://moin.delph-in.net/wiki/MatrixMrsTestSuite
Esta issue será fechada quando esses dois objetivos forem cumpridos.
The text was updated successfully, but these errors were encountered: