-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtools.html
199 lines (134 loc) · 7.67 KB
/
tools.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>SCRIPTORIUM | Tools</title>
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<link href="https://fonts.googleapis.com/css?family=Asul:400,700" rel="stylesheet">
<link rel="stylesheet" href="css/global.css" type="text/css" charset="utf-8"/>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-55145025-1', 'auto');
ga('send', 'pageview');
</script>
</head>
<body>
<div id="wrapper">
<header id="header">
</header>
<section id="main">
<div class="container">
<div class="breadcrumb">
<a href="/tools">Tools</a>
</div>
<!-- CONTENT STARTS HERE =============================== -->
<div id="content">
<p>
Some of the tools below use a Sahidic Coptic lexicon based on data kindly provided by Prof. Tito Orlandi and the <a href="http://cmcl.let.uniroma1.it/">CMCL</a> project.
When using the part-of-speech tagging models or the tokenization script and its lexicon please make sure to refer back to
the CMCL project.
</p>
<h4>Lacuna Prediction Tool</h4>
<p><img src="img/lacuna_pred.png" width="500" class="content-box" style="padding:0px"></p>
<p><b style="color:red">New:</b>
Check out the demo of the neural lacuna prediction tool from <a href="https://aclanthology.org/2024.ml4al-1.8/">our paper</a>:
</p>
<p> <a href="https://gucorpling.org/lacuna-demo/" class="btn">Lacuna prediction</a></p>
<h4>Entity Visualizations</h4>
<p><img src="https://i1.wp.com/blog.copticscriptorium.org/wp-content/uploads/2020/06/entity_treemap.png" width="600" class="content-box" style="padding:0px"></p>
<p>
We've posted some visualizations of our Coptic entity annotations. Try playing with the data, which comes from our freely available corpora:
</p>
<p> <a href="entities/breakdown.html" class="btn">Entity visualizations</a></p>
<h4>Natural Language Processing API</h4>
<p>
You can now get unified access to the latest NLP tools online using a web interface
or a machine actionable REST API:
</p>
<p><a href="https://tools.copticscriptorium.org/coptic-nlp/" class="btn">Coptic NLP Service</a></p>
<p>The NLP service currently covers segmentation, normalization, part of speech tagging, lemmatization and language of origin tagging. For individual command line tools, see below.</p>
<h4>Part-of-Speech Tagging</h4>
<div class="content-box">
<div class="content-col">
<h4>Scripts and models</h4>
<ul>
<li><a href="https://github.com/CopticScriptorium/Tokenizers/releases/latest" target="new">Tokenization script and lexicon</a> (assumes normalized Coptic, see tokenization guidelines)</li>
<li><a href="http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/" target="new">TreeTagger</a> - an open source part-of-speech tagger (<a href="http://www.smo.uhi.ac.uk/~oduibhin/oideasra/interfaces/winttinterface.htm" target="new">additional Windows interface WinTreeTagger</a>)</li>
<li><a href="https://github.com/CopticScriptorium/Tagger-Part-of-Speech/releases/latest" target="new">Coptic TreeTagger training models</a> - for the fine and coarse grained tagsets (see tagging guidelines)</li>
</ul>
</div>
<div class="content-col">
<h4>Documentation</h4>
<ul>
<li><a href="https://github.com/CopticScriptorium/tagger-part-of-speech/raw/master/scriptorium-transcription-guidelines.pdf" target="new">Diplomatic Transcription Guidelines</a></li>
<li>Tokenization Guidelines (see sections 3 & 4 of the Transcription Guidelines)</li>
<li><a href="https://github.com/CopticScriptorium/tagger-part-of-speech/raw/master/scriptorium_tagset_documentation.pdf" target="new">Part-of-Speech Tagging Guidelines</a></li>
<li><a href="https://github.com/CopticScriptorium/tagger-part-of-speech/blob/master/Coptic SCRIPTORIUM lemmatization guidelines.pdf" target="new">Lemmatization Guidelines</a> </li>
</ul>
</div>
</div>
<h4>Coptic Universal Dependency Treebank</h4>
<p>A treebank is a collection of texts in which sentences have been exhaustively annotated with syntactic analyses. Our Coptic Treebank project uses the <a href="http://universaldependencies.org/" target="_blank">Universal Dependencies</a> standards, which apply the same annotation scheme to multiple languages.</p>
<div class="content-box">
<div class="content-col">
<h4>Models and Examples</h4>
<ul>
<li><a href="treebank.html" target="new">Description and introduction to the Coptic Treebank</a></li>
<li><a href="https://github.com/UniversalDependencies/UD_Coptic-Scriptorium" target="new">Download Coptic Universal Dependency Treebank in CoNLL-U format</a></li>
<li><a href="https://annis.copticscriptorium.org/annis/scriptorium/#_q=cG9zPSJWIiAtPmRlcFtmdW5jPSJuc3ViaiJdIGxlbW1hPSLisoHispvisp_ispUi&_c=c2hlbm91dGUuZm94&cl=5&cr=5&s=0&l=10&_seg=bm9ybV9ncm91cA&_bt=bm9ybQ" target="new">Sample search of verbs governing a 1st person subject in Shenoute's Not Because a Fox Bark</a></li>
</ul>
</div>
<div class="content-col">
<h4>Documentation</h4>
<ul>
<li><a href="http://universaldependencies.org/cop/" target="new">Coptic Treebank Annotation Guidelines</a></li>
<li><a href="http://universaldependencies.org/" target="_blank">Universal Dependencies Project</a></li>
</ul>
</div>
</div>
<h4>Additional Annotation Tools</h4>
<div class="content-box">
<ul>
<li><a href="https://github.com/CopticScriptorium/normalizer/releases/latest" target="_blank">Normalizer</a> (normalizes orthography, removes diacritics)</li>
<li><a href="https://github.com/CopticScriptorium/lexical-taggers/releases/latest" target="_blank">Language of origin tagger</a> (to annotate loan words from Greek, Latin, Hebrew/Greco-Hebrew, Aramaic)</li>
<li><a href="https://github.com/CopticScriptorium/tagger-part-of-speech/releases/latest" target="_blank">Lemmatizer</a> (to annotate words with their dictionary head word; embedded in the Part-of-Speech Tagger)</li>
</ul>
</div>
<h4>Converters</h4>
<div class="content-box">
<ul>
<li>Coptic encoding converter (converts older text character systems used for fonts such as Coptic and Laser Coptic into standards-compliant Coptic Unicode characters)
<ul>
<li>Simple recoding script in Perl (supports CMCL, Laser Coptic and UTF-8 encoding conversion)</li>
<li>Converter for ASCII encoding / UTF-8 of Dirk Van Damme and Gregor Wurst</li>
<li><a href="https://github.com/CopticScriptorium/converters/releases/latest" target="_blank">Download both converters</a></li>
</ul>
</li>
<li><a href="http://corpus-tools.org/pepper/" target="new">SaltNPepper</a> - a metamodel based Java framework for multi-format conversion </li>
<li><a href="https://github.com/amir-zeldes/XLAddIns">Excel-Plugin</a> for importing and exporting EXMARaLDA XML, SGML, PAULA XML and subsets of TEI XML</li>
</ul>
</div>
</div>
<!-- END CONTENT ============================== -->
</section>
<footer id="footer">
</footer>
</div> <!-- /#wrapper -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script>
$(function(){
//$("#navbar").load("nav.html");
$("#header").load("header.html",function() {
$(".m-tools").addClass('on');
});
$("#footer").load("footer.html");
});
</script>
</body>
</html>