-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
1111 lines (893 loc) · 53.8 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Release notes for stable version "ERG 2023"
Highlights:
Improved overall syntactic coverage on Redwoods profiles to 93.77% on 100K items
Improved parse selection by about 1% using new redwoods.mem model.
Improved overall parsing efficiency by about 20%.
2021-12-14 - Added files for Singlish dialect, authored by Siew Yeng Chow based
on her Master's thesis at NTU.
2022-07 - Incorporated changes to enable chart-mapping in LKB-FOS, thanks to
John Carroll.
2022-10 - Adopted Emerson-Turing construction types for appending SLASH, with
thanks to Guy Emerson and John Carroll.
2022-11 - Improved Version.lsp, METADATA, and grammar-loading files for better
interface with LTDB, thanks to Francis Bond.
Because we now generate erg.hds file each time the grammar is loaded into LKB,
discarded erg/etc/rules.hds.
------------------------------------------------------------------------------
Release notes for stable version "ERG 2020"
Punctuation marks now separate tokens
- Revised syntactic analysis to treat all punctuation marks as separate tokens
instead of as affixes. So syntactic rules combine a punctuation token either
with the immediately preceding or following token, except for the possessive
apostrophe which attaches to the preceding NP. Thanks to Stephan Oepen for
motivation, assistance, and guidance in making this conversion, enabling
better consistency of ERG output with that of other NLP tools and conventions.
Also thanks to Woodley Packard for engineering support to accommodate
treebanking updates in the face of near-universal changes in token counts.
Full Redwoods treebank update
- All of the usual treebanked profiles, totaling 1.5M tokens, have now been
updated using the full-forest treebanking tool fftb, and reflecting the
changed analysis of punctuation. An additional 1000 items from WSJ section
23 have also been treebanked after the release was stable, to provide a new
set of annotations for evaluation.
Documentation strings throughout the grammar
- Both ACE and the LKB, along with Pydelphin, now fully support the use of
triple-quote-marked documentation strings on types and instances, so these
have been added to most instances of leaf lexical types, constructions, and
lexical rules in the ERG. Thanks to Francis Bond for pushing this cause
forward, and to developers for accommodating the necessary formalism changes.
Alas, PET does not yet provide full support, so this release of the grammar
includes variants of several grammar files ("...for-pet.tdl") where the doc
strings have been deleted. For now, compile and run PET using these variant
files as follows:
flop english-for-pet
cheap -cm -repp -default-les=all -packing -verb=4 english
------------------------------------------------------------------------------
In trunk, as an interim update,
- refreshed support for openproof generation
- expanded coverage with mal-rules and types
- added full-forest treebank profiles for wsj06-09
------------------------------------------------------------------------------
Release notes for stable version "ERG 2018"
Annotations
- Supplied full-forest treebanks for Redwoods profiles, including the first five
sections of the WSJ.
- Added profiles for WeSearch user-generated content (wlb03, wnb03), and for
Sherlock Holmes story (sh-spec).
- Improved the well-formedness and consistency of the MRSs, aiming for more
consistency with an updated version of the semantic algebra.
Token mapping
- Upgraded to use GML 1.0, particularly relevant for the WeScience corpus.
- Improved support for both `strong' brackets (manually inserted) and `weak'
delimiters (motivated by, for example, hyphens) to signal phrase boundaries
that should not be crossed.
Syntax
- Enabled extraction from within NPs.
- Changed the attachment order of pre- and post-nominal modifiers, so now the
pre-modifiers attach before post-modifiers.
- Added two new types of modifiers of verbal projections: indefinite NPs as in
|She walked out of the casino a rich woman.|; and gapped clauses of saying, as
|They will, we suspect, leave early.|
- Added the `do-be' construction as in |the only thing we said she had to do was
finish the assignment|
Semantics
- Moved information-structure constraints from RELS to ICONS in MRSs, including
for focus movement (`topicalization') and passiveization.
- Simplified the inventory of role labels, notably for conjunction relations,
where ARG1 and ARG2 replace the old L-HNDL, R-HNDL, L-INDEX, R-INDEX roles.
- Improved SEM-I consistency.
Platforms and applications
- Added support for ubertagging with PET and ACE, thanks to Rebecca Dridan.
- Added robust parsing mode for ACE using csaw PCFG, thanks to Woodley Packard.
- Added support for `agree' parser/generator
- Expanded the inventory of mal-rules for grammar-checking.
- Added `transfer' rules and support for generation from first order logic.
- Added support for manually inserted `strong' brackets which force phrasal
boundaries, as in |we ⌊(⌋ saw a man ⌊)⌋ with a telescope|.
- Changed RNAME value on rules from string to type, to allow `weak' bracket
rules to constrain which rules must apply, as for named entities such as
|New York Stock Exchange|.
- Added support for robust `bridging' rules (disabled by default).
------------------------------------------------------------------------------
Release notes for trunk version 2016-09-27
Now underway in full-forest treebanking of Redwoods profiles and eventually
WSJ as well, and making minor grammar corrections along the way.
------------------------------------------------------------------------------
Release notes for trunk version 2015-06-19
[After a long hiatus, returning to commenting on trunk version changes.]
Tuned paraphrase rules both for educ and for openproof. The educ set are
mostly for generating variant correct answers for the new Reading composition
exercises in the Redbird Language Arts course. The openproof modifications
are aimed at reducing the remaining ambiguity in the generated English outputs.
------------------------------------------------------------------------------
Release notes for trunk version 2013-03-19
Added two constructions motivated by Sherlock Holmes corpus:
(1) adverbial clauses with gaps and verbs of saying, as in
|You have, I presume, considered this.|.
(2) adverbial indefinite NPs as VP modifiers, as in
|He arrived a hero and departed a villain|
Also improved treatment of present participles as adjectives, employing verb
predications for semantics.
------------------------------------------------------------------------------
Inflectional rules: instances made one-to-one with types
------------------------------------------------------------------------------
Release notes for version "ERG (1212)"
Stable tagged release, including updates of all tsdb/gold profiles. This
release is also used for the treebanked profiles of DeepBank 1.0, the Wall
Street Journal corpus included in the Penn Treebank.
Details and an online demo can be found at
www.delph-in.net/erg
------------------------------------------------------------------------------
Release notes for version "ERG (1111)"
Stable tagged release, including updates of all tsdb/gold profiles, plus
the addition of two new profiles from the Tanaka corpus: rtc000 and rtc001.
Details on ERG coverage of all gold profiles can be found on the
Redwoods web page: http://www.delph-in.net/redwoods.
------------------------------------------------------------------------------
Update of `trunk' version as of August 2011:
Added coverage for the following phenomena:
- pre-determiner adjective phrases, as in
|too tall a building|.
|too strong an opponent to overcome|
- enough' + VP/NP complement, as in
|We met a tall enough player to hire.|
- sentence-initial indefinite NP `depictives', as in
|A happy cat, she purred.|
- extraposed relative clauses, as in
|A cat appeared suddenly which had no tail.|
- gapping constructions, where the second head in a conjoined VP or S is missing
|He persuades Kim to sing and Abrams to act.|
- `do-be' construction as in
|The only thing we didn't expect him to do was give himself a raise.|
- conditional inversion, as in
|Were we to visit Paris, we would be happy.|
- more freedom in ordering of complements
|The book was given to Kim by Sandy.|
|The book was given by Sandy to Kim.|
Also made minor improvements for generation, including corrected trigger rules.
Gold profile updates are included only for csli, mrs, hike, cb, and jh1
------------------------------------------------------------------------------
Release notes for version "ERG (1010)"
Stable tagged release with full (manual) updates of all gold profiles
including LOGON, WeScience, and (after a long hiatus) the Verbmobil and
ecommerce treebanks, along with the newly added SemCor (semantically
tagged portion of the Brown corpus - the first 3100 items so far). Details
on current ERG coverage of these profiles can be found on the Redwoods
web page: http://www.delph-in.net/redwoods.
------------------------------------------------------------------------------
Release notes for version "ERG (1007)"
Minor improvements for better coverage of WSJ corpus and of the education and
speech application corpora.
------------------------------------------------------------------------------
Release notes for version "ERG (1004)"
This is intended as a `stable' release, accompanied by a full manual update
of the `gold' treebanked profiles, and parse-ranking models trained on them.
------------------------------------------------------------------------------
Release notes for version "ERG (1003)"
- This release is essentially a pre-release of a proposed stable release
next month ("ERG (1004)"), and will serve as the basis for final tuning,
debugging, and updating of the various treebanks.
- At long last, the rule names have all been converted to conform to the
naming scheme proposed in 2008, and described at
http://wiki.delph-in.net/moin/ErgTop
PLEASE NOTE that all pre-existing treebanks constructed using the ERG
will have to be converted before they can be used for treebank updates
with this new grammar version. See
http://wiki.delph-in.net/moin/ErgRules
for instructions to effect this conversion automatically.
- This release also includes adaptation of the arboretum files for current
use in the EPGY grammar-checking application.
- The chart-mapping machinery includes a revised treatment of quote marks
in their splendid variety, aiming for more normalization in preprocessing,
thanks to Stephan Oepen.
- The grammar currently includes some temporary patches to support generation
using unknown words, mostly recently for experiments in generating from
DMRSs. While largely functional, this should be considered work still in
progress, since at least the mechanism for assigning semantic predicate
names to unknown words is far from ideal.
- The `gold' directory now contains an additional profile `petet' for the
Evaluation by Textual Entailment trial data set. In addition to this
new profile, the following usual three `gold' profiles have been updated
using this version of the grammar: csli, mrs, hike. Expectations are
that the full `gold' collection of profiles will be updated by April.
------------------------------------------------------------------------------
Release notes for version "ERG (1002)"
- Re-working of arboretum files to apply to error analysis in grammar checking
- In preprocessor, factoring out of treatment of quote marks.
- Better interim accommodation for unknown words in generation, consistent
with current naming convention for unknown predicates (see erg/tmr/pos.tdl).
------------------------------------------------------------------------------
Release notes for version "ERG (0909)"
- Addition of EPGY-specific types and lexical entry constraints
------------------------------------------------------------------------------
Release notes for version "ERG (0907)" (the Barcelona release)
- Note that the attribute STEM has been renamed to ORTH, for more clarity.
For those few who use the lexical database in connection with the ERG, it
will be necessary to reload the database, using the revised table
definitions in this version of the grammar.
- Improved coverage to admit some VP-modifying relative clauses, as in
'Abrams hired us, which bothers Browne.'
- Further stabilizing of the chart-mapping machinery for preprocessing and
for accommodation of unknown words.
- Extended support for generation with unknown words
- Experimental support for paraphrasing as an external LOGON MT-like task
which uses the external SEM-I (semantic interface) specification. See
the file in $LOGONROOT/uio/enen/README for a quick introduction.
------------------------------------------------------------------------------
- Added treebanks for WeScience profiles 3 and 4.
- Added more support for generation with unknown words
------------------------------------------------------------------------------
Release notes for version "ERG (0902)"
- First version making use of the new chart-mapping machinery
- please note that you will need a correspondingly new version of the LKB
and PET (no older than 22-Feb-09; PET compiled off its `cm' branch).
- for the LOGON tree, please use the `trunk' version and select appropriate
PET binaries (from the `cm' branch) as `flop -t' and `cheap -t'.
- Added the first four treebanked profiles in the WeScience corpus
- Updated other profiles in 'gold' subdirectory
- but note that a few still await updating, including the SensEval and
SemCor profiles).
------------------------------------------------------------------------------
Release notes for version "LinGO (July-08)"
- Elaborated the chart-mapping rules to accommodate the existing treebanked
corpora, including more systematic treatment of POS-driven unknown word
handling.
- Added additional treebanks for some corpus data from Senseval, SemCor, and
ILIAD (Melbourne).
- Added syntactic coverage for some additional sentential modifier phrases
('as'+passiveVP, and NP predicatives like "His project fully funded, Abrams
celebrated."), and for marked word order with PPs appearing before some VPs,
and before some complement NPs.
---------------------------------------------------------------------------
Release notes for version "LinGO (Apr-08)"
- Include experimental chart-mapping preprocessor rules in inpmap-rules.tdl
and lexmap-rules.tdl.
- Enriched the hierarchy of semantic predicates to support underspecification
in translation, including abstract predicates for locative 'in, on, at'
and for 'the, a, udef' quantifiers.
- Tuned lexicon for Semcor data to support treebanking.
- Added syntactic coverage for 'small clause' predicatives such as
'The dog barked, its heart beating wildly".
---------------------------------------------------------------------------
Release notes for version "LinGO (26-Jan-08)"
Final tuning for SciBorg's first treebank of six abstracts
Final tuning for LOGON/HandOn treebank update
---------------------------------------------------------------------------
Release notes for version "LinGO (24-Jan-08)"
A few corrections to lexical entries based on most recent HandOn fan-outs
---------------------------------------------------------------------------
Release notes for version "LinGO (23-Jan-08)"
Added a few missing lexical entries for degree specifiers
---------------------------------------------------------------------------
Release notes for version "LinGO (21-Jan-08)"
And still more tuning - maybe the final round - for HandOn
------------------------------------------------------------------------------
Release notes for version "LinGO (20-Jan-08)"
1. More tuning for HandOn driven by 'sti' and 'vei' fan-out logs
------------------------------------------------------------------------------
Release notes for version "LinGO (17-Jan-08)"
1. Minor adjustments to lexicon, grammar, and trigger rules for fine-tuning
of HandOn system.
------------------------------------------------------------------------------
Release notes for version "LinGO (15-Jan-08)"
1. Added vocabulary for HandOn based on missing predicates from NoEn
2. Completed tuning of lexicon and preprocessing for HandOn English data
3. One recent change that may affect transfer:
Decomposition of N-V compounds like "snow-covered" and "T-marked"
- used to be multi-words with single predicate, but are now constructed
via compound rule, with the two component EPs and an additional
linking EP with PRED |argument_rel| similar to |compound_rel|
------------------------------------------------------------------------------
Release notes for version "LinGO (Nov-07)"
1. Treebanks
- Updated all treebanks in erg/gold, but have not yet rebuilt jhpstg.mem file
2. MRS quality improvements / harmonization
- Added type constraints on ARG1s for several classes of modifiers
- Corrected missing semantic link in P-PP construction "from behind the hill"
- Removed spurious pron_rel from infinitival subordinate constructions like
"Kim sang to impress Sandy."
- Made minor changes to title construction:
- changed pred name for post-head titles to be consistent with pre-head one
- corrected rule for number-headed phrases like "page 3"
------------------------------------------------------------------------------
Release notes for version "LinGO (Oct-07)"
Added lexical coverage for vocabulary in the English data for the HandOn
project, in this case keeping the large number of domain-specific proper
names in a separate file 'handon-propers.tdl'. Also made some repairs to
remaining inconsistencies in MRSs in the message-free universe.
In addition, did several bits of minor tuning of syntactic constructions
in support of the DFKI Checkpoint project, and added first version of the
token-mapping rules for PET's emerging support for this functionality.
This release also includes an additional settings file for PET, 'mrs.set',
to support development of generation capability for PET.
Note that only three of the 'gold' profiles (csli, hike, and mrs) have been
updated in this release; the rest will follow shortly.
------------------------------------------------------------------------------
Release notes for version "LinGO (Jul-07)"
Added lexical coverage for several additional treebanked data sets, including
Senseval 2-4, FraCaS, SciBorg, and Acrolinx (though the latter two data sets
are not distributable). Also updated the full set of 'gold' profiles for
the existing data sets.
PLEASE NOTE that this version requires an up-to-date version of the LKB to
get correct behavior with the treebanked data in 'gold', since the derivation
trees are now augmented with a specification of which root constraint was
used to admit each tree.
------------------------------------------------------------------------------
Release notes for version "LinGO (21-Mar-07)"
The most significant change in this version of the ERG is the complete removal
of messages, as announced at the Fefor DELPH-IN meeting to follow the
completion of the LOGON demonstrator. This version is a nearly exact non-msg
equivalent of the final LOGON version "LinGO (17-Mar-07)", so it should be
straightforward to compare and contrast the two variants. In brief, the
distinction among propositions, questions, and commands is now made via the
value of the attribute SF ('sentence force' i.e., illocutionary force), a
property of events. This attribute and its values are also used in the
most recent release of the Grammar Matrix.
In addition, this release contains the following modifications/improvements,
the first of which is also included in the final LOGON version:
- Adoption of Stephan Oepen's proposal for a more uniform treatment of
properties of MRS events and indices
- Adoption of Berthold Crysmann's proposal for the full cross-product of
subtypes for encoding person-number
- Addition of missing vocabulary for the Senseval 2 test data
- Addition of pragmatic EPs to encode focus (formerly referred to as
'topicalization') and promoted arguments in passive constructions.
------------------------------------------------------------------------------
Release notes for version "LinGO (17-Mar-07)"
(Final version with messages)
Added missing lexical entries for the known-vocabulary held-out portion of
the LOGON corpus (43 proper names and 5 common nouns)
------------------------------------------------------------------------------
Release notes for version "LinGO (20-Dec-06)" (Final LOGON version)
- A few more corrections for mixed case and preprocessor
------------------------------------------------------------------------------
Release notes for version "LinGO (19-Dec-06)"
- Added feature on 'index' called IND for 'individuated', to enable distinction
in SEM-I among count nouns, mass nouns, and mass-or-count nouns.
- Added further improvements to mixed case orthography, here primarily for
country-related adjectives and nouns like "Englishman" and "Norwegian"
------------------------------------------------------------------------------
Release notes for version "LinGO (15-Dec-06)"
- Corrected preprocessor to preserve mixed case for Norwegian 'special'
characters - they were being downcased because PET doesn't like them for
interactive parsing, but now that the lexical entries require mixed case
for proper names, we have to keep it for batch processing, which works okay.
- Modified lexical entries per latest transfer requests:
- deleted bogus adjective entries for "U-shaped"
- corrected "T-marked" to also work in predicative position
- added entry for "noticeable that S"
------------------------------------------------------------------------------
Release notes for version "LinGO (14-Dec-06)"
- Added more consistent capitalization in orthography (and CARG) for ERG
lexicon, to enable higher quality generation.
- Added spelling variants for two lexical entries used in training corpus:
'Hedmarker/Hedemarker' and 'El Dorado/Eldorado'
- Added entries as requested by transfer: 'mackerel', 'arboretum', "vitamin C"
------------------------------------------------------------------------------
Release notes for version "LinGO (13-Dec-06)"
More adjustments for final LOGON integration:
- Added and corrected lexical entries as requested by Transfer
- Corrected generator 'black hole' errors so generator will always terminate
at least on the usual test suite data.
- Incorporated new LNK feature which replaces old WLINK, for mapping from
MRS relations to their corresponding surface form positions [oe]
------------------------------------------------------------------------------
Release notes for version "LinGO (01-Dec-06)"
Minor additions for final LOGON integration:
- Added remaining missing lexical entries for known-vocabulary held-out data
- Improved efficiency for generation with coordination, by collapsing
near-duplicate lexical entries for conjunctions
- Made minor corrections guided by LOGON fan-out logs, to improve both
coverage and quality
------------------------------------------------------------------------------
Release notes for version "LinGO (Nov-06)"
NOTE: Users of this version of the ERG are strongly encouraged to also
obtain a current version of the LKB and [incr tsdb()], in order to
benefit fully from recent enhancements.
- Since the last public release in July, the ERG's lexicon has been expanded
to include about 3000 additional nouns and adjectives that occur with high
frequency in the British National Corpus (100 times or more).
- Some additional technical vocabulary was added to accommodate a sample
of data from the Cambridge SciBorg project; these lexical entries are
also tagged with "SciBorg" in the lexical database.
- The remaining changes have focused on tuning the grammar and SEM-I for
generation in the near-final LOGON demonstrator.
- Updated treebank summary for LOGON data, in erg/gold. Note that this
version of the treebank benefitted from the welcome addition to PET by
Yi Zhang enabling the selective unpacking strategy used in the LKB.
Profile Items Parsed Treebank
-----------------------------------
JH0 261 248 226
JH1 1353 1302 1221
JH2 1307 1154 1058
JH3 1443 1367 1230
JH4 1603 1505 1416
JH5 464 420 398
PS 965 908 860
TG 2014 1875 1735
ROND 1290 1203 1133
----- ----- ----- -----
Totals 10700 9982 9277
-------------------------------------------------------------------------------
Release notes for version "LinGO (13-Oct-06)"
Added entries for digit-orthography cardinal adjectives to help generator.
-------------------------------------------------------------------------------
Release notes for version "LinGO (12-Oct-06)"
Maybe the final round of tuning for this integration:
1. Merged falsely ambiguous lexical predicates:
NEW OLD
"_fine_a_for_rel" "_fine_a_1_rel"
"_good_a_at-for_rel" "_good_a_for_rel"
"_good_a_at-for_rel" "_good_a_at_rel"
"_good_a_at-for_rel" "_good_a_1_rel",
"_understand_v_by_rel" "_understand_v_1_rel"
2. Added missing trigger rules for it-cleft construction.
3. Corrected a few minor errors in grammar rules.
------------------------------------------------------------------------------
Release notes for version "LinGO (10-Oct-06)"
Still more minor tuning
1. Corrected entry for "_guess_" unknown-noun lex entry to work in compounds
2. Corrected NP fragment rules to allow fragments that are conjoined NPs
3. Enabled entry for prep "to" to also modify proper names.
------------------------------------------------------------------------------
Release notes for version "LinGO (09-Oct-06)"
More minor tuning for impending LOGON release:
1. Fixed spelling of 'considerred' for dative passive form
2. Enabled generation of implicit NP coordination
3. Corrected lexical entry's PRED name for 'edge'
4. Corrected modification of imperatives
5. Added missing topmost message for sentence-initial conjunction
6. Allowed Adj-N as title
7. Added missing entries for 'follow' and 'transport': NP+PP-dir
8. Corrected multiple SEM-I entries for 'choose' verb
9. Added missing analysis of NP's + PP construction
10. Added lexical entry for 'mountain pasture' as title
11. Renamed inconsistent degree adverb preds:
"_a+little_x_deg_rel"
"_steeply_x_deg_rel"
"_directly_x_deg_rel"
"_shortly_x_deg_rel"
12. Added entry for adj 'so' ('true') with expl-it subj: "It is so that ..."
Note that only the gold profiles for 'csli', 'mrs', and 'hike' have been
updated for this release.
------------------------------------------------------------------------------
Release notes for version "LinGO (05-Oct-06)"
Minor tuning for upcoming LOGON release:
1. Corrected PRED name for "downstairs"
Old SEM-I entry:
"_downstairs_a_1_rel" : ARG0 e, ARG1 u.
New:
_downstairs_p_rel : ARG0 e, ARG1 u.
2. Corrected nbar-fragment rule to also analyze measure-nouns like "centimeter"
------------------------------------------------------------------------------
Release notes for version "LinGO (27-Sept-06)"
- Further LOGON tuning for better harmonization with NorGram
- Used the newly available selective unpacking in PET to create the treebanks
in the 'gold' subdirectory.
- Updated the treebanks for the full set of profiles in 'gold'
------------------------------------------------------------------------------
Release notes for version "LinGO (11-Sept-06)"
- Tuning for improved LOGON generation for 'vei' development corpus
- Added several thousand lexical entries based on frequency of use in
the BNC, guided by the unigram and bigram error mining analysis of
Yi Zhang. In particular, added entries for (i) those words which were
entirely missing and with BNC frequency of 100 or more; and all
words with at least one entry already in the ERG, but with (ii) unigram
error score of 0.00, or (iii) bigram score of 0.00.
------------------------------------------------------------------------------
Release notes for version "LinGO (18-Jul-06)"
- Minor tuning to improve coverage on LOGON 'vei' items
------------------------------------------------------------------------------
Release notes for version "LinGO (Jul-06)"
- Improvements in semantic composition (assisted by useful error analysis
in utool) and additional lexical entries, as noted in internal LOGON
release notes below, since last public release of January 2006.
- Converted leaf lexical type names to conform to new naming conventions,
with mapping from old to new names provided in file "new-le-types.txt".
See wiki.delph-in.net/erg for documentation of new LE types.
- Adopted use of new variable property mappings given in file "semi.vpm".
- Updated treebank summary for LOGON data, in erg/gold (these profiles
were used to retrain the parse selection model in "jh.mem"):
Profile Items Parsed Treebank
-----------------------------------
JH0 261 233 197
JH1 1254 1132 1043
JH2 1185 1047 908
JH3 1311 1197 1057
JH4 1454 1336 1214
JH5 464 408 371
PS 965 892 833
TG 2014 1831 1656
ROND 1290 1196 1072
----- ----- ----- -----
Totals 10198 9272 8351
------------------------------------------------------------------------------
Internal release notes for version "LinGO (08-Jun-06)"
Small corrections to semantics of title nouns, both alone and in compounds.
Note that only the 'gold' profiles for csli, mrs, and hike have been updated.
------------------------------------------------------------------------------
Internal release notes for version "LinGO (24-May-06)"
Added lexical entries needed for remaining LOGON development corpus (Turglede
and Preikestolen texts).
Made semantics for comparatives, superlatives, and much/many more consistent.
Reduced generation output of variants with commas for modification & coord.
NP-coord - Corrected semantics, adding qeq (more consistent, and more scopes)
Free rels - Made embedded message be prpstn_m_rel, not underspecified.
Corrected semantics errors throughout, using Utool
Added treebank profiles for ps (Preikestolen) and tg (Turglede) data.
------------------------------------------------------------------------------
Internal release notes for version "LinGO (13-Feb-06)"
Corrected semantics for quantifiers 'most' and 'the most', dropping the
predicate 'most_q_rel' in favor of decomposed semantics using the usual
"many-much_a_rel".
------------------------------------------------------------------------------
Internal release notes for version "LinGO (09-Feb-06)"
Minor improvements in SEM-I content, and correction of an item in gold MRS.
------------------------------------------------------------------------------
Internal release notes for version "LinGO (06-Feb-06)"
More harmony for depictives, now with same semantics as other subordinate
clauses. Also corrections to SEM-I for directional PP verbs.
------------------------------------------------------------------------------
Internal release notes for version "LinGO (03-Feb-06)"
Improved harmony:
- Comparative and superlative determiners now have decomposed semantics
analogous to correponsding adjectives, consistent with NorGram
- Comparative and superlative adjectives now present the ARG0 of the
comp_rel/superl_rel as their INDEX, with one benefit being a better
MRS for measured comparatives, as in 'Dogs are 5 cm taller than horses.'
- Free relatives, like ordinary relatives, no longer introduce a TPC
value for their message.
Consistency:
- Lexical entry type for named years ('2004') is now treated more like
other named entities, undergoing a bare-NP rule to project a full NP.
- Title compounds as in 'project manager Abrams' now have the compound
relation take two ref-inds as arguments, like one would expect.
------------------------------------------------------------------------------
Release notes for version "LinGO (Jan-06)"
PLEASE NOTE: This version of the ERG requires up-to-date versions of both
the LKB and PET, since it takes advantage of improvements in the treatment
of morphology in the LKB, and also depends on a consistent treatment of
special characters like \?, \(, and \".
This version includes minor tuning adjustments to the lexicon and grammar,
to improve overall precision and coverage on the data sets included in the
Redwoods 6 (Norwegian Growth) treebank, which has been expanded to include
about 5000 items from the LOGON development corpus on Norwegian back-country
tourism. The single-best-parse profiles for this additional data appear as
usual in the subdirectory 'gold', in the six directories jh0 - jh5.
In addition, the grammar now includes a semantic interface file 'erg.smi'
which currently specifies the minimal properties of each lexical predicate,
including its name and its arguments, their types, and their optionality.
This file should soon also include the grammar predicates (those
introduced by rules rather than by lexical entries), as well as the set
of abstract predicates which are intended as part of the external interface
to the grammar.
------------------------------------------------------------------------------
Release notes for version "LinGO (05-Dec-05)"
1. Punctuation - Eliminated the duplication in files that was formerly
needed for minor differences between the LKB and PET, now resolved.
2. Lexicon - Added vocabulary needed for the LOGON development corpus
on tourism in the Norwegian mountains.
3. Generation - Tuned the trigger rules for introducing semantically
empty lexical entries, for improved efficiency.
4. Treebanks - There are now additional profiles jh* in the directory gold,
for several segment of the LOGON development corpus for the Jotenheimen
region. In this release, only jh1 is updated; the other five sections
will follow soon. The other (non-LOGON) profiles are all up to date.
------------------------------------------------------------------------------
Release notes for version "LinGO (23-Nov-05)"
1. Corrected lexical entries for "write" and "unevaluated", as well as
the preprocessor-related "twodigitdomersatz". Also added entry for
"untrafficked".
2. Repaired error in comma punctuation which was causing overgeneration.
3. Corrected error in lexical types for day-of-month entries which was
producing ill-formed MRSs.
------------------------------------------------------------------------------
Release notes for version "LinGO (15-Nov-05)"
1. Added and corrected lexical entries and SEM-I
- Most interestingly, added some entries for 'kind' readings, as for
the noun "bear" in "they hunted bear." The predicate names are
distinct, since presumably these would be derived from some lexical
rule producing a distinct sense, and take the form "_<noun>_n_kind_rel"
- Changed the single entry for the adjective "born" so it is treated
semantically more like the passive participle it once was, and now
introduces the predicate "_bear_v_2_rel" with a distinct sense of
the verb "bear" from that in "Kim can't bear to lose"
- Made changes in response to requests from JTL for transfer.
2. Tuned grammar in minor respects to improve consistency in treebanking
the JH corpus.
------------------------------------------------------------------------------
Release notes for version "LinGO (10-Nov-05)"
1. Corrected SEM-I and lexicon errors noted by JTL, and improved constraints
on lexical types with handle arguments so the SEM-I reflects these
(introducing e.g. [ ARG3 h ] instead of formerly [ ARG3 u ]).
2. Added a few more lexical entries needed for JH, and some minor syntactic
additions for constructions like "Try it yourself" and "Kvame became
sole owner".
------------------------------------------------------------------------------
Release notes for version "LinGO (05-Nov-05)"
Quick additional release to make improvements for treebanking Jotenheim
1. Punctuation - Cleaned out a few more temporary patches in preprocessor
and lexicon, especially for |"|, |(|, |)| which had had substitutions.
2. Preprocessing - Added a few more cases revealed by Jotenheim data.
3. Lexicon - Added a few missing multi-words that emerged from initial
treebanking, and changed a few more formerly relational nouns to just
ordinary nouns, to avoid spurious ambiguity
'top, bottom, side, front, back'
Also (finally) corrected the pred names for "anybody", "someone", etc.
to now use _any_q_rel rather than any_q_rel, and same for _some_q_rel.
4. Fixed TPC assignments in relative clauses and for 'wonder'.
5. Corrected nominalization, which became too constrained in an attempt to
avoid spurious ambiguity.
------------------------------------------------------------------------------
Release notes for version "LinGO (01-Nov-05)"
1. Tuned generation trigger rules to reduce overgeneration, improve efficiency
Also attempted to make more consistent use of TPC, PSV, allowing underspec.
2. Revised morphology to benefit from improvements in LKB and later in PET,
now that irregularly inflected words can co-exist with punctuation suffixes
(so eliminated files inflr-pet.tdl, inflr-pnct-pet.tdl, robust.tdl, and
robust-pnct.tdl).
3. Reduced inventory of scopal adverbs, and improved consistency for adverbs.
Note in particular that most so-called discourse adverbs have been
converted to scopal adverbs, and the conjunctions 'and, or, but' are now
treated as such even when they are sentence-initial.
4. Corrected some errors in lexical types and in syntactic rules; in particular
fixed type for mass_ppcomp, which was broken, and improved nbar-coordination
whose semantics was not ideal.
5. Some other lexical changes:
- 'both' determiner is now logically equivalent to "the two".
- 'respect (for)' wasn't entered as a mass noun, now is.
- 'cross_over_v1, _v2' removed from lexicon (now done compositionally)
- various entries for cardinal "one" had CARG "01", now just CARG "1".
------------------------------------------------------------------------------
Release notes for version "LinGO (09-Sep-05)"
1. Repaired punctuation overgeneration for non-WH topicalization, by removing
a licensing for constructions like "Who won? asked Kim." (not frequent in
our data set, though seen in Rondane).
2. Removed STATIVE from grammar, since no longer used
3. Removed spurious fragment rules only used for parsing dictionary definitions
4. Corrected lexical predicates in SEM-I
_have_v_to_rel => "_have_v_to_rel" (from type to string)
"_fail1_v_1_rel" => "_fail_v_1_rel" (misspelling)
5. Added missing lexical entry for unaccusative (intransitive) "weaken"
6. Added lexical entries for "move" and "drive" analogous to "put", still
using the same inventory of predicates in the SEM-I.
7. Split the lexical rule for prenominal verbal modifiers into two rules,
one for present participles and one for passives, to avoid spurious
verb-particle entries which should be disallowed as modifiers (since
the particle can't be present).
8. Modified the types for raising verbs taking an infinitival VP complement
so they uniformly combine with the infinitival "to" which introduces a
message.
9. Added reentrancies for TPC and PSV so the appropriate values appear
on messages in embedded clauses.
10. Improved generator efficiency by adding grammar-internal feature --TPC
which new generator compliance rules assign a value based on the public
feature TPC.
11. Also further refined trigger rules, and exploited the newly invented
compliance rules which adjust the input MRS to comply with grammar-internal
constraints (so far restricted to assigning value for --TPC based on TPC.
12. Again for efficiency, added constraints on events introduced by adverbs
and degree specifiers so they will not trigger lexical entries in
generation.
13. Once again corrected the reported failure to generate some examples like
"Abrams could." which made use of ellipsis_rel as underspecification of
ellipsis_ref_rel.
------------------------------------------------------------------------------
Release notes for version "LinGO (05-Sep-05)"
Improved generation with punctuation and fragments. Updated Verbmobil
section of Redwoods treebank, and filled in missing gold profiles.
------------------------------------------------------------------------------
Release notes for version "LinGO (02-Sep-05)"
Minor update: Modified trigger rules to use unification rather than
subsumption, and added some abstractions over trigger rules, in mtr.tdl
Further reduced spurious commas preceding modifiers in generation.
Punctuation rules now compatible with current LKB morphology. Infinitival
subjects no longer introduce nominalization (as in "To err is human.")
------------------------------------------------------------------------------
Release notes for version "LinGO (15-Aug-05)"
Minor update: The usual normalizing of predicate names, this time mostly for
expletive-it-taking predicates. Also some futher tuning of trigger rules,
and change to verb_synsem to make sure uninflected lexical entries already
identify their INDEX and KEYREL.ARG0, for better generator initialization.
------------------------------------------------------------------------------
Release notes for version "LinGO (09-Aug-05)"
Minor update for yet more consistency in predicate names, especially for
relational nouns and adjectives, respectively, to get their related entries
to match in predicate names. Also corrected ordering error in prp_infl_rule
and added a few additional lexical entries for the LOGON development corpus.
------------------------------------------------------------------------------
Release notes for version "LinGO (05-Aug-05)"
Minor update to improve consistency in predicate naming conventions, and
to restore the 'chunking' roots in roots.tdl which are used experimentally
in trying to generate from fragmented MRSs.
Note that in this release, only the 'gold' profiles for 'csli', 'mrs',
and 'hike' have been updated.
------------------------------------------------------------------------------
Release notes for version "LinGO (Jul-05)"
This release incorporates several significant changes to the previous
release, but at long last also includes a first step at documenting an
external semantic interface for the grammar. The changes will soon be
described in a little more detail on the ERG Wiki, but in summary:
1. Punctuation as affixation
Previous versions of the grammar implemented a treatment of punctuation
adopting a standard but linguistically dubious strategy of using a
preprocessor to make all punctuation marks distinct tokens, adding
spaces around each one. This version implements an analysis which
leaves the input string unchanged with respect to punctuation (except
for apostrophes), and treats the punctuation marks as spell-changing
affixes. This change creates backward incompatibilities with earlier
treebanks because the tokenization for each sentence is now different.
A few infelicities remain from making this change, including
- minor inconsistencies in the readers of affixation rules for the
LKB and PET (and even for previous and current versions of the LKB)
- imperfect interaction of irregular inflected forms and punctuation
- imperfect interaction of multi-words and punctuation
There are work-arounds for some of these, awaiting better resolution.
2. Semantics
a. Semantically empty prepositions no longer introduce an EP (they
used to add an EP whose predicate name ended in "_sel_rel", for
lexically 'selected'). So the generator trigger rules have been
augmented to automatically introduce the necessary lexical entries
for generation, currently based on predicate-naming conventions
for the lexical entries that select empty prepositions.
b. Messages now introduce an additional attribute, ARG0, whose value
is the event of the highest-scoping verbal EP within the scope of
the message. The main motivation is to make it simpler for
applications to identify the relevant event properties of a
clause's semantics without looking 'inside' the clause's MRS.
c. All lexical predicates now have some value in the 'sense' field
of the predicate name (Background: by convention in the ERG, each
lexical predicate name has the following form: _ORTH_POS_SENSE_rel
where ORTH is the lexeme's orthography, POS is a coarse-grained
sense distinction drawing from the vocabulary [v n a p x q c], and
SENSE is an arbitrary sequence of characters (excluding |_|), and
where each of the fields is separated by an underscore. Earlier,
the sense field could have been left empty.) The default value for
the sense field is now '1'.
d. Relational nouns now specify in their sense field the orthography
of the preposition marking their oblique complement (usually 'of').
e. Tag questions previously discarded the semantics of the tag phrase,
contrary to the monotonicity assumption in the ERG. This is now
corrected, with the result that the semantics of sentences with
tag questions is now rather more baroque. The main benefit of the
reanalysis is that lexical rules now properly always preserve the
semantics of their input lexemes.
f. Sentential subjects were previously analyzed via a nominalization
rule. This simplified the syntactic analysis of "That Abrams
arrived annoyed Browne" since the "annoy" lexeme could always
unify its ARG1 value with the semantic index of its subject. But
the resulting asymmetry for the 'extraposed' and non-extraposed
variants of lexemes like 'annoy' was annoying. This version of
the grammar now provides the same MRS for both variants ('It
annoyed Browne that Abrams arrived' and the above example), via
a syntactic variant of an 'it-extraposition' lexical rule, with
thanks to Ann Copestake for the suggested implementation. One
consequence is that the earlier treatment of examples like "The
problem was that Abrams arrived" no longer works, since the
identity copula was being used, and requires its complement to
supply a referential index. So there is also yet another entry
for the verb 'be', which supplies an EP similar to the identity
'be'.
g. Verbal modifiers of nouns were being given an inconsistent
semantics, with postnominal modifiers as in 'people singing arias'
supplying a message for the modifier phrase, but with prenominal
modifiers as in 'the singing people' not contributing a message.
In this version of the grammar, verbal projections now always
supply a message, making the world a little more consistent, but
leaving a sharper contrast now between "the singing children"
and "the interesting children" where 'interesting' is analyzed
as an adjective and hence does not supply a message.
3. Lexicon
New lexical entries have been added drawn from the Norwegian tourism
domain of the LOGON development corpus, bringing the current number
of lexemes to 22,750 for this release, of which about 2700 are proper
names.
4. SEM-I
A first draft of the semantic interface for the grammar is now
presented in the file erg-full.smi, including the predicate names and
semantic arguments of all predicates introduced either by lexical
entries or by the grammar (either via lexical/syntactic rules or via
abstractions over more specific predicates). Documentation of this
file is under active development.
5. Naming conventions
The feature name DIVISIBLE on referential indices has been shortened
to DIV for better readability of MRSs.
6. LKB warnings on grammar loading
The LKB's new and improved treatment of morphology offers several
advantages, and the current version of the grammar benefits from
these, but still results in some warning messages when loading.
Users can ignore these messages for now, while the developers resolve
the underlying causes. The first is about the 'punct_bang_rule',
and the others warn of lexical rules that can feed themselves.
------------------------------------------------------------------------------
Release notes for version "LinGO (30-Apr-05)"
This is a minor update to the Apr-05 version, including some lexical
additions, adjustments to the semantic predicate hierarchy, and tuning
of syntactic analyses, all designed to improve end-to-end translation
for LOGON. The only substantive difference is in the analysis of
possessive constructions, where the grammar now produces nearly