Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFormatInfo no longer has mime type - where did it go? #32

Open
reckart opened this issue Mar 29, 2018 · 14 comments
Open

DataFormatInfo no longer has mime type - where did it go? #32

reckart opened this issue Mar 29, 2018 · 14 comments

Comments

@reckart
Copy link
Member

reckart commented Mar 29, 2018

DataFormatInfo no longer has a mime type info (and file extension) in OMTD-SHARE 3.0.2.

@pennyl67 where did they go?

@reckart reckart added this to the 3.0.2.1 milestone Mar 29, 2018
@pennyl67
Copy link
Contributor

data formats in the ontology have a property "hasMimetype" which links a format to the corresponding mimetype. Obviously only for those that have a mimetype - broad concepts such as "corpus format" do not have a mimetype. Same thing for file extension and documentation url.
Again, if it helps, I can send you the equivalence relations between data format and mimetype. In fact, it's already at a googlesheet (intended for checking purposes): https://docs.google.com/spreadsheets/d/1Xs3-RlwyJdrCvMIJsOkOuOOh7EXkLdUHknj1I2w04Wo/edit?usp=sharing. But the googlesheet has the IRI and not the label. We can add the label of the mimetype which would help you. If you want this info in another format, let me know.

@reckart
Copy link
Member Author

reckart commented Mar 29, 2018

Ok, then I'll remove the code regarding mimetypes from the OMTD Maven Plugin.

@pennyl67
Copy link
Contributor

Does this mean that data format will have to be entered manually?

@reckart
Copy link
Member Author

reckart commented Mar 29, 2018

Rather... well... what about data formats which are not in the ontology (i.e. otherFormat). Shouldn't in be possible to specify such information as mimetype and file extension at least for these?

@greenwoodma
Copy link
Member

hmm, I'm confused as I thought when I migrated the maven plugin code to the latest model version I updated some things around mimetypes, although I guess I don't know if the info is used in any of the examples I've tested on so far, and hence if it ends up in the right place

@reckart
Copy link
Member Author

reckart commented Mar 29, 2018

@greenwoodma as far as I can see, you just commented out the stuff and in some cases added a "todo" comment.

@greenwoodma
Copy link
Member

@reckart was just looking at the code and certainly UimaDescriptorAnalyzer adds mimetype info, see line 263 onwards

@pennyl67
Copy link
Contributor

I find it a pity not to have already some mapping from mimetype to data format when it's known - so, if the googlesheet can be used for the mappings, pls let's do; just tell me how I can help.
For other data formats (as for all the ontology-driven elements), the idea is that you use the dataFormat to specify a broader concept and then in the dataFormatOther (free text) you add the new suggested value, wich should be monitored by the ontology curators.

@reckart
Copy link
Member Author

reckart commented Mar 29, 2018

@greenwoodma no, it doesn't - that code is ineffective. It tries to look up the data format in the controlled vocabulary using the mime type.

Data format identifier example: "http://w3id.org/meta-share/omtd-share/Conll2000"

Mime type example: "text/tab-separated-values"

It will obviously never match.

@reckart
Copy link
Member Author

reckart commented Mar 29, 2018

@pennyl67 the new format could however, have a different mime type and file extension...

@reckart
Copy link
Member Author

reckart commented Mar 29, 2018

@pennyl67 @greenwoodma I am presently working on the code, seeing how I can add a UIMA-type -> OMTD-SHARE type mapping. Once I worked that out, I might also add something like this for mime types.

@greenwoodma
Copy link
Member

@reckart ah, sorry. I clearly misunderstood how things had changed between the two model versions

@pennyl67
Copy link
Contributor

@reckart yes, indeed for new data formats we need more info than just a name (a documentation url, at least! to me that's more important than just a mimetype, if it's not a standard mimetype).
But finish with types and then we discuss this - I' m also putting a note for the discussion on the ontology curation.

@reckart
Copy link
Member Author

reckart commented Mar 29, 2018

I find it a pity not to have already some mapping from mimetype to data format when it's known - so, if the googlesheet can be used for the mappings, pls let's do; just tell me how I can help.
For other data formats (as for all the ontology-driven elements), the idea is that you use the dataFormat to specify a broader concept and then in the dataFormatOther (free text) you add the new suggested value, wich should be monitored by the ontology curators.

MIME type mapping has been added: #34

@reckart reckart removed this from the 3.0.2.1 milestone Mar 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants