Delphin Tools: scripts for automating parsing, translation, and generation within the Delphin HPSG framework
Author: Eric Nichols [email protected], created while a graduate student at Nara Institute of Science and Technology (NAIST).
-
Via git
$ git clone 'https://github.com/delph-in/delphintools.git' /path/to/delphintools
-
Via zip file
$ wget 'https://github.com/delph-in/delphintools/archive/master.zip' -O /path/to/delphintools.zip $ cd /path/to/ $ unzip delphintools.zip $ mv delphintools-master delphintools
Add the following to your .bashrc
# Batch processing with Delphin
DTHOME=/path/to/delphintools
if [ -f $DTHOME/setup.sh ]; then
. $DTHOME/setup.sh
fi
delphintools/bin/logon_do
provides a simple interface for parsing, tranlating, and generating using the LOGON binary. It automatically creates a collection of profiles from a bitext-formatted input file.
Prior to any processing, logon_do
should be invoked with the --bitext
(or -b
option) to create a directory to store future profiles in:
$ logon_do --bitext /path/to/bitext.txt /path/to/profile_top
$ ls /path/to/profile_top/*
/path/to/profile_top/bitext:
object original
The above command will create the directory /path/to/profile_top
and a subdirectory bitext/
that contains the original bitext file (called 'original') and a bitext file with the source and target language swapped ('object') that is used for parsing and paraphrasing the target langauge.
usage: logon_do -g|--grammar <jaen|enen> -t|--task <smrs|tmrs|pmrs|gmrs|vmrs|omrs|imrs> [/path/to/profile_top]
Calling logon_do
in the following manner will create the profile /path/to/profile_top/smrs
and parse the source sentences in bitext/
original with the Japanese grammar Jacy.
$ logon_do --grammar jaen --task smrs /path/to/profile_top
If /path/to/profile
is omitted, logon_do
will default to the current directory. So the following command is equivalent:
$ cd /path/to/profile_top && logon_do --grammar jaen --task smrs
Once parsing is complete, the profile smrs
will contain the parse results:
$ ls /path/to/profile/smrs
analysis decision fold item-phenomenon output parse preference result run set update
daughter edge item item-set parameter phenomenon relations rule score tree
Currently, the following grammars are supported.
jaen
: Japanese->English translationenen
: English->English paraphrasing (omrs and imrs are supported)
Users can add support for their own systems by copying and modifying the settings files for an existing grammar to call appropriate TSDB CPUs for each task:
$ cp -r lisp/jaen lisp/koen
$ emacs lisp/koen/*.lisp
Currently, the following tasks are supported.
smrs
: MRS from parsing source languagetmrs
: MRS from applying transfer grammar to smrspmrs
: "partial transfer" results -- MRS containing mix of translated and untranslated rels from applying transfer grammar to smrsgmrs
: MRS from applying generation grammar to tmrsvmrs
: variants (i.e. paraphrases) of source language parses in smrsomrs
: MRS from parsing "object" (i.e. target) languageimrs
: "iikae" (i.e. paraphrases) of target language parses in omrs
logon_do
supports the following command line arguments:
--count|-c
: sets the number of cpus to allow a task to run in parallel.--debug|-d
: output lisp code without calling LOGON binary for debugging purposes.--edges|-e
: sets the number of edges to use for parsing and generation. transfer is not affected by this setting.--limit|-l
: sets the number of results to store for a task. also determines the number of dependency results to use.--settings|-s
: loads a user-specified settings file of arbitrary lisp code. seelisp/settings/
for examples. defaults tolisp/settings/default.lisp
profile_top
has the following directory structure. tmrs, pmrs, gmrs, and vmrs store their profiles in subdirectories that match the result-id of the dependency used to create them.
/path/to/profile_top/bitext/original
/bitext/object
/smrs/
/tmrs/0/
/pmrs/0/
/gmrs/0/0/
/vmrs/0
/omrs/
/imrs/0/