-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV Concatenator #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test works. But I haven't tested the built image in docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested in pycharm. I had to install pandas and set Clowder_version. It works well.
Approving this. I guess we can merge once you address the other comments.
This adds an extractor that will try to concatenate CSV/TSV/XLSX files when uploaded to Clowder.
To test:
(there's also a Dockerfile)
Then upload two CSVs to a dataset and see that concatenated.csv is created. If you already have 2+ CSVs in the dataset and you upload a new one, all will be merged into the output. Uploading additional CSVs will update the concatenated file as it goes.
File types are currently separated, meaning CSVs will get their own merge separate from Excel etc. This would be easy to change but not sure if there's a good use case for that?
Uses pyclowder files.delete so requires this version of pyclowder: clowder-framework/pyclowder#92