-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting rough: a versioned JSON dump #4
Comments
The main issue I can think of is that we'd be splitting our dataset at this moment, and it would evolve separately on OF until this one becomes more usable as an endpoint. However, I don't think that we get enough edits at the moment for that to be a real concern. We could also build some scripts that check a variety of sources and aggregate that data, then let humans check merge conflicts? |
@andru I like it! Getting started with at least a sketch is the best first step. And it will help to identify where the commonalities are. I agree that each crop/variety should be a separate file. See my comment here: #2 (comment) I would also suggest that we consider YAML instead of JSON. The Drupal community recently chose YAML over JSON for all of it's configuration management. I'm not familiar with all of the reasons (I'm sure there are lots of comparisons out there) - but one that stood out to me is that YAML can have comments embedded in it. I do love comments. :-) |
It looks like there are options for converting YAML to JSON in Javascript, as well. So I don't think it would be an impediment to JS-only apps. What do you think? |
I think YAML could be a good fit, since we're talking about hand-editing files for now. Less braces, but whitespace sensitive markup can be confusing for some people too. I don't have a clear preference. Whether we go with YAML or JSON for the source files, we should look into options for validation - maybe there's a github pull request integration that can handle it? @simonv3 is there a way you could keep a log of changes made to the data at OpenFarm which, depending on the quantity, could be manually applied to the repo or scripted? From the perspective of Hortomatic, in the short term I'll be using the data as read-only. |
Making crops in OF read only for now is an option, but I'd have to discuss that with the other people working on the project. Thinking about it - the main editing that's been happening on crops is actually link to wikipedia and uploading images, all of which is not really "crop" information. I'm personally cool with YAML. |
I'm curious, whats the goal? What will the list of crops be used for? |
Hey everyone! I sketched up two quick proof-of-concept repositories, to demonstrate sort of what I'm thinking. It's not meant to be "final solution" - I just find it easier to get my ideas out in code sometimes. And maybe it can provide a starting point for further conversation. The two repositories are: https://github.com/farmOS/CropDB-Spec CropDB-Spec serves as a place to define the data specification. It basically just has two files: cropdb.schema.yml, which defines the basic schema of a crop YAML file; and db/example.yml which is an example crop YAML file that contains comments about each field/value. CropDB-Base serves as an example of an actual crop collection that implements the spec. I just added a single crop file called "tomato.yml" as an example, but we could start building out more if you like this approach. The way I see "crop collections" is: perhaps we can provide a "base" collection that contains very general information about a set of very common crops. But other people could create their own sets for more specific ones - ie: seed producers could create sets that have files for each of their available varieties/cultivars. And they could use the "base" set as a starting point - utilizing the "inherits" field I proposed. So for example, Johnny's Seeds sells a Tomato variety called "Big Beef" (http://www.johnnyseeds.com/p-7958-big-beef.aspx). In their data set, they could create a file called tomato.big_beef.yml (or something like that) and specify in there that it "inherits" from the base tomato.yml file. But they could also include a line in that file that overrides the "days_to_maturity" and set it to 70, because it's different from the default 60 defined in tomato.yml. Again, this is all just a sketch - meant to convey some possible ideas and get your feedback. I haven't implemented any actual code to use these files, nor do I have much experience with YAML - so there may be things wrong - but hopefully it at least makes sense from a conceptual point of view. What do you think? |
If anyone want's commit access to those repos, let me know! Feel free to bang on it, propose changes, etc. Or, if it's completely different from what you're thinking - we can throw them out completely - but this is roughly what I am going to need in farmOS. :-) |
I want to put a link to the datapackages set of tools here: https://www.npmjs.com/search?q=datapackage http://dataprotocols.org/data-packages/ Your spec and implementation files reminded me of it @mstenta, and there's a group of well defined tools for this already - it's probably worth just reading up on them and seeing what they do. |
@mstenta would it be possible to start by capturing the models and properties you need? Separately from the data format? (wrote a bit more on here #2 (comment)). |
Thanks @simonv3 ! That looks like a good guide to follow and learn from! I'll spend some time familiarizing myself with it. The format and structure I used for the YAML was loosely based on the format Drupal 8 is using for configuration storage. I'm sure there's some overlap in the concepts so it would be helpful to identify those. @pmackay - Definitely! I agree starting a wiki to sketch out the properties is a good next step. So far, in the YAML sketch I made, the "crop" model looks something like this:
Just a start... I'm starting to compile a list of other data properties that I plan to use. Should we start a wiki to compile them? |
Want to fill out more info here https://github.com/openfarmcc/Crops/wiki/Crop-data-needs? |
And just to be clear: my current use-case is specifically to build a set of files that can be imported into farmOS. Within farmOS, users will be able to plan out their plantings via a "Planting Wizard", which will use the data in these files to auto-generate tasks with specific dates. The "frost tolerance" and "days to maturity" that I included in the schema are both useful for that specific purpose. @andru would be able to use these files for Hortomatic, as well. And OpenFarm.cc could use them as a basis upon which guides could be built. It would also help to accomplish your goal in #1 I think. |
Great! Thanks @pmackay - I will start adding more to that... |
@simonv3 - I really like how the datapackages format is put together. That would mean that the crop sets would be CSV files, too - which is good - lots of things can read CSV. :-) Do you know if it can handle other formats too? Is YAML out? I don't really have strong opinions on the format at this point - just curious what the options are. Question: is it limited to flat single-row data? In other words: if we discovered that we needed to represent nested objects somehow, or many-to-one relationships, do you know if that's possible with datapackages? I don't know if that will be necessary - I suppose we'll see what comes together in https://github.com/openfarmcc/Crops/wiki/Crop-data-needs |
Great conversation all! "And just to be clear: my current use-case is specifically to build a set of files that can be imported into farmOS. Within farmOS, users will be able to plan out their plantings via a "Planting Wizard", which will use the data in these files to auto-generate tasks with specific dates." ^ This is pretty much what I need for FarmBot 👍 Though we we're hoping to use OpenFarm Guides as the main source of data. |
Resources I've put together: |
Great to see the ball rolling!
@roryaronson That scientific crop traits spreadsheet is great - what's the source ontology? @mstenta I think the discussion over a common schema could use it's own issue, so I've started it off with my thoughts over at #5 |
@andru I don't remember anymore cause I made that list like a year ago. Its from a lot of sources cobbled together. I think I just googled "plaint traits list" and copy-pasted from like 100 places haha |
I second using YAML. It's readble. The USDA has CSV files for plants. If the seed varieties could be cross referenced with seed vendors it would be really helpful. |
@simonv3 What are your thoughts on getting this repo rolling as a JSON file of crops?
I think you already did a bunch of data scraping from openly licensed sets for OpenFarm, and I've done the same for Hortomatic. I think we should get something rough rolling with this for now...
Proposal:
What do you think? Could this model work for OpenFarm for the time being?
@mstenta you mentioned you've already got some crop data going for FarmOS, could this model work for you?
1: By which I mean a taxonomic ID of some kind, not common names... species, variety, cultivar... there will be duplicates because horticultural naming is a mess, but it should get us close to something unique
The text was updated successfully, but these errors were encountered: