Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Concatenator #1

Merged
merged 7 commits into from
Nov 7, 2023
Merged

CSV Concatenator #1

merged 7 commits into from
Nov 7, 2023

Conversation

max-zilla
Copy link
Contributor

@max-zilla max-zilla commented Oct 10, 2023

This adds an extractor that will try to concatenate CSV/TSV/XLSX files when uploaded to Clowder.

To test:

export CLOWDER_VERSION=2
python concatenate.py

(there's also a Dockerfile)

Then upload two CSVs to a dataset and see that concatenated.csv is created. If you already have 2+ CSVs in the dataset and you upload a new one, all will be merged into the output. Uploading additional CSVs will update the concatenated file as it goes.

File types are currently separated, meaning CSVs will get their own merge separate from Excel etc. This would be easy to change but not sure if there's a good use case for that?
image

Uses pyclowder files.delete so requires this version of pyclowder: clowder-framework/pyclowder#92

Copy link
Member

@longshuicy longshuicy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test works. But I haven't tested the built image in docker

csv-concatenator/extractor_info.json Outdated Show resolved Hide resolved
csv-concatenator/requirements.txt Outdated Show resolved Hide resolved
csv-concatenator/Dockerfile Show resolved Hide resolved
Copy link
Member

@ddey2 ddey2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested in pycharm. I had to install pandas and set Clowder_version. It works well.

Approving this. I guess we can merge once you address the other comments.

@max-zilla max-zilla merged commit 0ba84d2 into main Nov 7, 2023
@max-zilla max-zilla deleted the csv_concatenator branch November 7, 2023 16:14
@longshuicy longshuicy linked an issue Nov 8, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV concatenator demo
3 participants