You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I cannot find this in documentation: I was wondering if UDAPI already includes ways to deal with CoNLL-U plus files (i.e. read, write...). In particular, I am interested in expanding an existing regular CoNLL-U file into a plus one by adding new custom columns.
Thanks!
The text was updated successfully, but these errors were encountered:
Udapi does not support CoNLL-U Plus yet. There is read.Conll with parameter attributes, where you can specify which columns are in a given file, but it uses setattr(node, attribute_name, value) internally, which means that only existing attribute names can be used as column names (or an underscore meaning that a given column should be ignored by the reader).
I would welcome if someone sends a PR adding read.Conlluplus (or read.Conllup considering that .conllup is the recommended file extension) and write.Conlluplus. That would mean interpreting the global.columns header (perhaps storing it to document.meta['global.columns'] similarly to document.meta['global.Entity']. The question is where to store the extra (non-standard) columns and how to name them (lowercase?). I would suggest storing them in node.misc, so e.g. global.columns = ID FORM PARSEME:MWE results in node.misc["parseme:mwe"] containing the values from the last column. When serializing this document using write.Conlluplus with document.meta['global.columns'] == "ID FORM MISC PARSEME:MWE", the parseme:mwe attribute would not be stored in MISC, but in the last (PARSEME:MWE) column. This would allow the users to easily convert between different formats (possibly using e.g. udapy read.Conllu files=input.conllu util.Eval doc='doc.meta['global.columns'] = "ID FORM LEMMA MISC PARSEME:MWE"' write.Conlluplus files=output.conllup).
Hi!
I cannot find this in documentation: I was wondering if UDAPI already includes ways to deal with CoNLL-U plus files (i.e. read, write...). In particular, I am interested in expanding an existing regular CoNLL-U file into a plus one by adding new custom columns.
Thanks!
The text was updated successfully, but these errors were encountered: