Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write documentation on how to add datasets to Community Data #78

Open
GilesGibson opened this issue Aug 11, 2014 · 51 comments
Open

Write documentation on how to add datasets to Community Data #78

GilesGibson opened this issue Aug 11, 2014 · 51 comments

Comments

@GilesGibson
Copy link
Collaborator

Apologies but I am having severe problems trying to figure out how to add a dataset to COD. I believe that I have to add it to the Lambeth CKAN site first. I have managed to upload the lambeth bus stops location datafile (taken from the google shared drive of datasets). I think it is there but I am unsure how I can make it appear under Lambeth organisation dataset rather than just on the list I have under my login.

I copy the URL info from the CKAN field and then go to COD, choose the add dataset option, click on the DU "go" and paste in the URL. I keep on getting error 500 messages and nothing more.

I would like to know what I am doing wrong and also how we could document the process so that users of COD could know the process of adding datasets.

@GilesGibson GilesGibson added this to the v1.0 milestone Aug 11, 2014
@GilesGibson GilesGibson assigned pmackay and dataunity and unassigned pmackay Aug 11, 2014
@djwesto
Copy link
Contributor

djwesto commented Aug 11, 2014

Might this be a good time to start creating some documentation tasks and a small Help section on the main site?

Of course, we could also do with fixing any interface label text that's not clearly explaining how to do this task.

@GilesGibson
Copy link
Collaborator Author

yes, some help text on the add dataset screen would be good. In the mean time, any clues so I can actually add something to COD would be welcome. We need to fix this so test users can populate the site.

@dataunity
Copy link

@GilesGibson Could you paste the exact dataset url you're using here please and I'll take a look.

@pmackay
Copy link
Member

pmackay commented Aug 11, 2014

@GilesGibson when you add a dataset to CKAN, are you able to see the data rendered as a table properly? If not, its not parsing it properly inside CKAN (which is required for DU to read it).

@dataunity
Copy link

The instructions for adding files from CKAN are on Slack (step 4 can be swapped to our new url on COD):

https://communityopendata.slack.com/files/kev/F02ES54BS/adding_a_ckan_datatable_via_data_unity

Looking at the Lambeth Bus Stops dataset, it doesn't look like it's a CSV file, and it can't be previewed in CKAN so it wont be addable to DU in the current state.

@GilesGibson
Copy link
Collaborator Author

@pmackay when you uploaded/downloaded from the Lambeth web site all the datasets were they csv files? Not able to preview in CKAN.

@dataunity thanks for the text guide. @djwesto can we add most of this to the add dataset page as a guide for users. A screen shot of the CKAN example of where to preview a csv file would also help.

@pmackay
Copy link
Member

pmackay commented Aug 11, 2014

Most but not all are CSV files. The rest are .json files. The extension should show that. I dont know about the quality of the CSV data though.

@GilesGibson
Copy link
Collaborator Author

Google being Google is not showing easily what type of file it is apart from saying it is a spreadsheet. Is there an easy way of finding out? Details don't show this, when I click on it Google just shows it as a spreadsheet, cannot figure out how to get it to show the filetype.

@dataunity
Copy link

@GilesGibson to get the file as a CSV file from Google Spreadsheets you can open the file on Google Drive, then go "File -> Download as -> Comma-separated values". This should give you a CSV file you can upload to CKAN, then follow the loading instructions on Slack to get it onto COD (https://communityopendata.slack.com/files/kev/F02ES54BS/adding_a_ckan_datatable_via_data_unity).

@GilesGibson
Copy link
Collaborator Author

Ah, Google wasn't offering me csv format. However under "other file formats" it seems to download as csv. However, we now have the situation where all the files that @pmackay carefully extracted from the Lambeth web site and stored on google drive will need to be downloaded from google drive one by one and then uploaded again to the CKAN site? Is this the only way? Can CKAN not simple refer to google drive and save all the duplicate work?

@pmackay
Copy link
Member

pmackay commented Aug 11, 2014

I have the files as CSV on my computer. But Google automatically converts them when uploaded.

@GilesGibson
Copy link
Collaborator Author

how frustrating. Any workaround otherwise there is going to be loads extra work to get these files on to CKAN. Dropbox them over?

@dataunity
Copy link

We're using CSV because it's an open format. Open Data should be in an open format for lots of reasons like it prevents vendor lock-in and needs no special tools for view/edit the files.

I don't know of any bulk way to upload to CKAN (but there might be). I think each file will need it's own meta-data added though, so not sure it would help in this case?

@GilesGibson
Copy link
Collaborator Author

I realise that csv is the one to use. Just frustrating that all the csv files exist after downloading from Lambeth web site and now we cannot get a folder with them all in that is shared, google trying to be clever and getting in the way. All I want is a URL to give to CKAN without having to repeat all the previous work.

@GilesGibson
Copy link
Collaborator Author

OK, still stuck and missing the obvious.

All Lambeth files have been extracted from Lambeth web site via a clever utility that Paul wrote.

Uploaded to Google drive for all to see. Unfortunately Google converts them all to the Google spreadsheet format.

If you download them to your local drive as a csv then CKAN will not let you put your local drive in as the source.

Seems we still need a web site that we can store csv files on so that we can then get them to CKAN so that COD can get them.

All very confusing unless I am missing something obvious.

@dataunity
Copy link

Is it possible to cut out the middle man (Google Spreadsheet)? Would it be possible for @pmackay to email the CSV files from his computer to @GilesGibson? It doesn't look like there's any extra metadata in the Google spreadsheets so don't think we'll loose any info.

@GilesGibson
Copy link
Collaborator Author

I realise that the metadata will have to be added one by one. Still haven't got to that situation yet and stuck on not being able to add a dataset that Lambeth created to Lambeth CKAN. It is just my lack of use of a CKAN site and how it works I think.

@pmackay
Copy link
Member

pmackay commented Aug 11, 2014

@GilesGibson I've uploaded a zip file to https://drive.google.com/?authuser=0#folders/0B_wNdTyma3n1UjV4eTRBZWVURjA containing all the files. Can you access it?

@GilesGibson
Copy link
Collaborator Author

yes, I can access that, 125 files, many thanks. Where can I put them so that CKAN can refer to them via a URL or is there a way of uploading to CKAN? Maybe I missed that option.

@GilesGibson
Copy link
Collaborator Author

Hmm, CKAN has me beaten. I have now tried uploading a csv file to CKAN. Whenever I refer/preview to it on CKAN it just triggers a download. The add dataset option within COD still returns error 500.

@dataunity
Copy link

When I uploaded a file to CKAN I think I put CSV for the 'tags' option. This seemed to make CKAN recognise it as a CSV file. I don't know if that's the official way to do it though.

I've just tried editing the existing Lambeth GP Surgeries. I changed format to CSV and that seemed to have fixed things:

http://5.101.100.119/dataset/lambeth-gp-surgeries/resource/56281b66-fc71-4e34-ba82-31e192f98f3a

@pmackay
Copy link
Member

pmackay commented Aug 11, 2014

That looks good - can see it shows the table of values. Can you import that
into DU?

@GilesGibson
Copy link
Collaborator Author

@dataunity Thanks for looking at this. When I went to change the format it didn't allow anything, just blanked it out.

@dataunity
Copy link

Yes - was blank for me too. I just tried typing in 'csv' to see if it did anything.

@GilesGibson
Copy link
Collaborator Author

I am still stuck. I have managed to upload another 3 datasets to CKAN but cannot get them in to COD. Just get the error 500 every time I try. What does this error message mean? Is the data poor? Is the url wrong? some guidance or help for users would be useful.

@dataunity
Copy link

@GilesGibson have you been following the instructions mentioned above on Slack (https://communityopendata.slack.com/files/kev/F02ES54BS/adding_a_ckan_datatable_via_data_unity)? They should provide the guidance.

Can you paste the urls you are trying to enter into the system here?

@GilesGibson
Copy link
Collaborator Author

I think I have been following them. This is one url from the box that sits above the preview. It previews OK in CKAN, just wont go across.

http://5.101.100.119/dataset/cf3eac6f-f3bb-4693-9db6-cd2498b4640a/resource/06e0cd22-a69a-491a-903a-a8aa136015b3/download/lambethbettingshops.csv

@dataunity
Copy link

@GilesGibson it's the third bullet point in the instructions which is the key one here:

  • Copy the url of the CKAN data preview (the page where you can see the spreadsheet style data)

So you take the url of the page where you can see the spreadsheet in CKAN, rather than the url of the CSV file itself. This lets Data Unity find the metadata about the dataset as well as just the CSV data. So for your example you would use this url:

http://5.101.100.119/dataset/cf3eac6f-f3bb-4693-9db6-cd2498b4640a/resource/06e0cd22-a69a-491a-903a-a8aa136015b3

@GilesGibson
Copy link
Collaborator Author

Ah, I had read that differently. maybe if we could add to the "Add dataset" page the instructions and to state that is the address in the browser bar not the one above the csv file.

Currently it is trying to create a vis but is hanging there. Is it having difficulty translating from eastings/northings to lat/long?

@GilesGibson
Copy link
Collaborator Author

tried doing a graph with the betting shop dataset. I wanted it to total each occurrence (Y axis) per ward (x-axis) - still waiting on the build.

@dataunity
Copy link

I've clarified the instructions on Slack to be clearer.

The process might take a little while depending on the size of the data. Small datasets take 10-15 secs, larger ones (like the Police data) take a minute or two.

The process wont convert eastings/northings to lat/long. This needs the Data Unity data cleaning interface, but there has been no time to create that yet.

@GilesGibson
Copy link
Collaborator Author

Thanks for updating the instructions. @djwesto can we add in these instructions to the "add dataset" page where the DU widget lives?

@GilesGibson
Copy link
Collaborator Author

Progress so far.

Most of Lambeth data sets are not lat/long. sent request to @pmackay if a few can be converted.

I think I have managed to upload jan 2014 crime stats to CKAN site. However, it has errors displaying it so URL of browser will not work when pasted into COD. CKAN site now giving 504 errors when I try and explore the data.

Leeds Data Mill - tried to get locations of gambling licence premises into COD. It isn't offered as a csv (only xml with schema). so I assume therefore that it isn't possible - @dataunity?

@dataunity
Copy link

@GilesGibson - that's right, not possible to parse XML. Generally every XML file has a different hierarchical structure which means they need customised parsers to extract data for visualisation. Perhaps you could leave a comment at the bottom of the dataset page on Leeds Data Mill asking if they could release as CSV?

@dataunity
Copy link

We've been adding datasets over the last few days, so closing this one.

@GilesGibson
Copy link
Collaborator Author

Add dataset page - still needs the text guide instructions for adding in new datasets. @djwesto ideally a nice bit of text with some helpful pickies. I am keen to make process this as easy as possible within the boundaries that it is a quickie bit of code doing it all.

@GilesGibson GilesGibson reopened this Aug 19, 2014
@djwesto
Copy link
Contributor

djwesto commented Aug 19, 2014

So is this issue getting assigned to me to write some documentation?

@djwesto djwesto assigned djwesto and unassigned dataunity Aug 19, 2014
@GilesGibson
Copy link
Collaborator Author

combo effort?

@dataunity
Copy link

@djwesto there's already some documentation on Slack if it helps:

https://communityopendata.slack.com/files/kev/F02ES54BS/adding_a_ckan_datatable_via_data_unity

@djwesto djwesto changed the title Adding dataset to COD Write documentation on how to add datasets to Community Data Sep 4, 2014
@djwesto
Copy link
Contributor

djwesto commented Sep 15, 2014

OK, I've finally managed to put a guide to adding datasets together here:

http://live-communitydata.gotpantheon.com/adding-your-own-data

Please can you all check through the process and make sure it's all correct?

I think there are some weird problems with CKAN's metadata screens, particularly the fact that two concurrent screens both request a Title and Description for your datasets... Seems overkill so I just ignored it in the guide.

Reassigning to @GilesGibson

@djwesto djwesto assigned GilesGibson and unassigned djwesto Sep 15, 2014
@GilesGibson
Copy link
Collaborator Author

if we are now having the ability to add docs etc will these be via the CKAN server as well? If so the guide needs to reflect that.

@GilesGibson GilesGibson assigned pmackay and unassigned GilesGibson Nov 17, 2014
@pmackay
Copy link
Member

pmackay commented Nov 17, 2014

No, the docs support has nothing to do with CKAN.

@pmackay pmackay assigned GilesGibson and unassigned pmackay Nov 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants