Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the fileContents column by default? #5

Open
wetneb opened this issue Dec 16, 2024 · 1 comment
Open

Remove the fileContents column by default? #5

wetneb opened this issue Dec 16, 2024 · 1 comment

Comments

@wetneb
Copy link
Member

wetneb commented Dec 16, 2024

I wonder what use case you have in mind for the fileContents column?
Because OpenRefine isn't great at storing longer texts in cells, it has the tendency to make the table a lot less compact.
The fact that only the beginning of the text is stored also restricts the potential uses we can make of it.

For those reasons I would be tempted to remove it, either completely or just by default, with an option to enable it if needed (which could also be the occasion to configure the maximum length of the contents stored in it, making the feature potentially more useful).

@magdmartin
Copy link
Member

magdmartin commented Jan 8, 2025

I agree with @wetneb.

I tested the extension loading a folder with many large XML files (10 files, each 500MB in size), and OpenRefine crashed with java.lang.OutOfMemoryError: Java heap space. When I tested it again with smaller XML files, it worked, and the content of each XML was loaded into the FileContent column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants