Skip to content

Commit

Permalink
Reformatted readme markdown
Browse files Browse the repository at this point in the history
  • Loading branch information
amir-zeldes committed Mar 11, 2016
1 parent eb0dd8b commit c195a75
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Version: 1.9 (includes POS tagging and lemmatization, with DDGLC Greek lemma inf

The part-of-speech tagging models are for use with the freely available TreeTagger
(http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/). The models are based
on the guidelines of the Coptic SCRIPTORIUM project, which closely follow Layton's (2004)
on the guidelines of the Coptic SCRIPTORIUM project, which closely follow Layton's (2011)
grammar. The lexicon used by the tagger is based on a lexicon kindly provided by Prof.
Tito Orlandi and the CMCL project (http://cmcl.let.uniroma1.it/) and a lemma list provided by
Prof. Tonio Sebastian Richter and the DDGLC project (http://research.uni-leipzig.de/ddglc/).
Expand All @@ -20,25 +20,33 @@ the TreeTagger excutable, which requires one of the two parameter files to run.
also expects an input file in a one-token-per-line format. For exaple, the input file input.txt could
include the following tokens (in UTF-8! The ascii characters below are for illustration purposes only):

```
p
noute
pe
.
```

These will be tagged as:

```
p ART
noute N
pe COP
. PUNCT
```

To run the tagger, run the TreeTagger excutable as follows (Windows example):

```
tree-tagger.exe coptic_fine.par -token input.txt output.txt
```

Or to include lemmas in a third column in the output use:

```
tree-tagger.exe coptic_fine.par -token -lemma input.txt output.txt
```

The option -token tells the TreeTagger that the input is already tokenized. For a Coptic tokenizer,
see the Coptic SCRIPTORIUM project web page. Further options, such as allowing for SGML tags in the
Expand Down

0 comments on commit c195a75

Please sign in to comment.