-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data format / markup language #6
Comments
I think, given talk in #5 about linked data and using appropriate ontologies where available, JSONLD is the most appropriate of those listed. Aesthetically I prefer it to RDF/XML, and easy clientside crunching of JSONLD is a major plus. It's not my field so I can't make too strong of a case without a bunch more research... @pmackay? |
I think my preference would be either YAML or JSON/JSONLD, with maybe a slight lean toward YAML purely because it is both machine-friendly and human-friendly. FWIW, I think I mentioned this before, but the Drupal 8 CMI (configuration management initiative) put a lot of time and thought into comparing data types, and settled on YAML. There was a LOT of debate and discussion (as is always the case when a huge project/community makes a big decision like that). I found this from 2011: https://groups.drupal.org/node/159044 |
The human-readability of YAML is the biggest win, IMO. I think it will lend itself to faster adoption among non-techies. I actually know a good number of farmers who know how to program. But I know a lot more who don't. So the approachability of something like YAML could mean a big difference in whether or not we can get traction in these datasets, and get more people to start contributing/maintaining. |
My concern with this is what problems are created by using text files with a structured format? Its one thing to use JSON/YAML for defining config, metadata or schema files, its another to define 1000s of data files for plants. What happens if there is a need to change the format of the files? Is this easier or harder than updating a database structure? Does it limit the users who might interact with it? What is gained by writing separate files? If a database was used, it becomes easy to output data in any of the formats above. If a file format is needed, I wonder if CSV with multiple sheets/files could be simpler? CSV is easier to edit even for non-techies, spreadsheets can be used. Drupal 8 CMI generates the YAML files it uses. Its helpful cos it is machine readable, but typically most D8 devs wont write it. It does help with diffing though. |
It's difficult to talk about these things while there is discussion with significant consequences for implementation happening in other issues, but I'll answer some questions from my personal bias.
We can bootstrap a database now using Git for versioning, and GitHub for distributed authoring while we figure out the long term tech stack for the project.
I would favour a document store like CouchDB for a crop database. Being able to deep-nest data natively is super useful for the kind of data we're dealing with, and document stores let you query on nested data easily. Ontop of that nailing a schema which applies to every crop is hard... There is a good chance of having cases where different groups need slightly varying schema.
It also means we need to build the tools to handle authoring and conflict resolution etc. In the long run, I think CouchDB would be a great fit for this project, but I was hoping to get things moving with flatfile and develop tools as we go.
I don't think any markup is good in the long run. The long term goal for me would be an editing interface. Flatfile markup is a bootstrap to get us going with a database based on imports from existing data sources. |
Good questions @pmackay ... I'll add to @andru's responses:
Files are the least-common-denominator, in a sense. By offering individual files, you set the barrier to entry very low. People with zero database or programming knowledge generally know what a file is. And storing canonical data in files doesn't stop you from also building a "true" database on top of them. For example: a MySQL database that serves the data from memory via a REST app - but pulls the canonical data from YAML files on the hard drive. And the app can serve the data in whatever format it chooses, because it is essentially designing it's MySQL tables schema to match the schema of the YAML files. @andru you could import into a CouchDB and your app could use that. I would import into my farmOS database (MySQL). The point being: we can have ALL of the above! With files as the base. :-) AND: nothing is stopping OpenFarm.cc (or another site perhaps) from creating a web-based UI for creating/editing the crop files - and then spitting them out as YAML for inclusion in whatever dataset the user wants (either a Git repo they maintain, or they can post a pull-request to another one).
This is certainly valid. And it's one of the reasons defining a standard - and SIMPLE - schema from the beginning is important. As for changing schema in the future, if that becomes necessary, I think it can be handled with schema versioning. See my comment outlining a potential process here: #7 (comment)
This is true - but the big disadvantages of CSVs, in my mind, are:
Yea, D8 CMI isn't a perfect comparison, because much of it is machine-generated. But one of the reasons they went with YAML was so that the config could easily be read and edited by hand, when necessary. |
I'm going to take back this statement - if that's alright with everyone. Ultimately, both JSON and YAML can be equally used to define intended data types, to give validators information about how to process the data. So in this case it's not really a big difference. |
What about using JSON Table Schema and also Data packages if needed?
This is a valid issue and if there is a need for highly structured data, could be a decent need for structured files. I appreciate your points too, just discussing :) |
If only I could go back in time... ;-) |
Data Packages looks good. JSON-LD is becoming very widely used for API's. It's compatible with RDF vocabularies and can express anything RDF-XML can but in a package that's easier to work with on the clientside. What do you think of it? |
I'm curious about JSON-LD - never used it. Data Packages looks good and follows a similar schema format to the one I proposed in http://github.com/farmOS/CropDB-Spec. The one concern I have is that it is limited to CSV data. |
my_story I join the point about a very fixed structure, extensible in a second time but not at the first round. But keeping in mind extensibility. I think having quite standard/simple technology behind could help to find help on projects, improving maintainability. I would also like to say, because there is only a few people talking here and along this kind of unstable project, the focus should remain on the feasibility, every second. The second important focus is to be useful for the "final gardener" but it relies more on the final websites. For the DB project, the contribution should be very easy. |
There's been some discussion in #4 and #5 about the data format/markup language to use to represent the data.
Currently the options presented are:
If anyone has a case to make for/against these or another to propose, here's the place!
The text was updated successfully, but these errors were encountered: