Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for synchronous iteration from partially computed grids #343

Draft
wants to merge 296 commits into
base: 4.0
Choose a base branch
from

Conversation

wetneb
Copy link
Owner

@wetneb wetneb commented Nov 16, 2023

For OpenRefine#2256.

This draft PR shows the main backend changes that I think are necessary to enable a row/record-based form of parallelism for long-running operations.

The use case mentioned in OpenRefine#2256 is about column-based concurrency, where we are able to execute two long-running operations simultaneously because they work on separate columns. I have described in a forum post the architectural changes that this implies.

This PR is about another form of concurrency: row/record wise. In this situation, we are executing two consecutive long-running operations, which are both row or record-wise. Even though they might work on the same column, they can still be executed in parallel, as long as the second operation only processes a row/record once it has been fully processed by the first one. This synchronization is what this PR offers to work towards.

I wrote a corresponding message on the forum thread, where I will provide more motivation about the changes and highlight some design questions.

dependabot bot and others added 30 commits September 1, 2023 22:05
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 3.7.0 to 3.8.1.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](actions/setup-node@v3.7.0...v3.8.1)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…nRefine#6037)

Bumps [shogo82148/actions-upload-release-asset](https://github.com/shogo82148/actions-upload-release-asset) from 1.6.5 to 1.6.6.
- [Release notes](https://github.com/shogo82148/actions-upload-release-asset/releases)
- [Commits](shogo82148/actions-upload-release-asset@v1.6.5...v1.6.6)

---
updated-dependencies:
- dependency-name: shogo82148/actions-upload-release-asset
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
To avoid PRs like OpenRefine#6032, which require changes in Butterfly first.
* Updated Cypress.

* Updated tests.
…ine#6039)

Bumps `jetty.version` from 9.4.51.v20230217 to 9.4.52.v20230823.

Updates `org.eclipse.jetty:jetty-servlets` from 9.4.51.v20230217 to 9.4.52.v20230823
- [Release notes](https://github.com/eclipse/jetty.project/releases)
- [Commits](jetty/jetty.project@jetty-9.4.51.v20230217...jetty-9.4.52.v20230823)

Updates `org.eclipse.jetty:jetty-server` from 9.4.51.v20230217 to 9.4.52.v20230823
- [Release notes](https://github.com/eclipse/jetty.project/releases)
- [Commits](jetty/jetty.project@jetty-9.4.51.v20230217...jetty-9.4.52.v20230823)

Updates `org.eclipse.jetty:jetty-servlet` from 9.4.51.v20230217 to 9.4.52.v20230823
- [Release notes](https://github.com/eclipse/jetty.project/releases)
- [Commits](jetty/jetty.project@jetty-9.4.51.v20230217...jetty-9.4.52.v20230823)

Updates `org.eclipse.jetty:jetty-webapp` from 9.4.51.v20230217 to 9.4.52.v20230823
- [Release notes](https://github.com/eclipse/jetty.project/releases)
- [Commits](jetty/jetty.project@jetty-9.4.51.v20230217...jetty-9.4.52.v20230823)

---
updated-dependencies:
- dependency-name: org.eclipse.jetty:jetty-servlets
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.eclipse.jetty:jetty-server
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.eclipse.jetty:jetty-servlet
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.eclipse.jetty:jetty-webapp
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps org.slf4j:slf4j-api from 2.0.7 to 2.0.9.

---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-api
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps com.google.apis:google-api-services-drive from v3-rev20230815-2.0.0 to v3-rev20230822-2.0.0.

---
updated-dependencies:
- dependency-name: com.google.apis:google-api-services-drive
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps com.google.apis:google-api-services-sheets from v4-rev20230808-2.0.0 to v4-rev20230815-2.0.0.

---
updated-dependencies:
- dependency-name: com.google.apis:google-api-services-sheets
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Currently translated at 66.6% (2 of 3 strings)

Co-authored-by: Nicolas @belett VIGNERON <[email protected]>
Translate-URL: https://hosted.weblate.org/projects/openrefine/openrefine-messages/fr/
Translation: OpenRefine/OpenRefine Messages
…6048)

Bumps [eslint](https://github.com/eslint/eslint) from 8.48.0 to 8.49.0.
- [Release notes](https://github.com/eslint/eslint/releases)
- [Changelog](https://github.com/eslint/eslint/blob/main/CHANGELOG.md)
- [Commits](eslint/eslint@v8.48.0...v8.49.0)

---
updated-dependencies:
- dependency-name: eslint
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…efine#6049)

Bumps org.apache.commons:commons-compress from 1.23.0 to 1.24.0.

---
updated-dependencies:
- dependency-name: org.apache.commons:commons-compress
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#6054)

Bumps [cypress](https://github.com/cypress-io/cypress) from 13.1.0 to 13.2.0.
- [Release notes](https://github.com/cypress-io/cypress/releases)
- [Changelog](https://github.com/cypress-io/cypress/blob/develop/CHANGELOG.md)
- [Commits](cypress-io/cypress@v13.1.0...v13.2.0)

---
updated-dependencies:
- dependency-name: cypress
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Also add tests for reversed sorts as well a case sensitive sorts
Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <[email protected]>
Translate-URL: https://hosted.weblate.org/projects/openrefine/database/
Translation: OpenRefine/database
OpenRefine#6061)

Bumps [org.apache.maven.plugins:maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin) from 3.5.0 to 3.6.0.
- [Release notes](https://github.com/apache/maven-javadoc-plugin/releases)
- [Commits](apache/maven-javadoc-plugin@maven-javadoc-plugin-3.5.0...maven-javadoc-plugin-3.6.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-javadoc-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ss (OpenRefine#6062)

Bumps [eslint-plugin-cypress](https://github.com/cypress-io/eslint-plugin-cypress) from 2.14.0 to 2.15.1.
- [Release notes](https://github.com/cypress-io/eslint-plugin-cypress/releases)
- [Commits](cypress-io/eslint-plugin-cypress@v2.14.0...v2.15.1)

---
updated-dependencies:
- dependency-name: eslint-plugin-cypress
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Introducing a new variable that takes only required properties of suggest options and excludes the flyout properties

Closes OpenRefine#6038.
Currently translated at 100.0% (12 of 12 strings)

Translated using Weblate (Spanish)

Currently translated at 1.2% (1 of 79 strings)

Translated using Weblate (Spanish)

Currently translated at 100.0% (62 of 62 strings)

Co-authored-by: Hosted Weblate <[email protected]>
Co-authored-by: gallegonovato <[email protected]>
Translate-URL: https://hosted.weblate.org/projects/openrefine/database/es/
Translate-URL: https://hosted.weblate.org/projects/openrefine/openrefine-control-evaluation-errors/es/
Translate-URL: https://hosted.weblate.org/projects/openrefine/openrefine-evaluation-errors/es/
Translation: OpenRefine/OpenRefine Control Evaluation Errors
Translation: OpenRefine/OpenRefine Evaluation Errors
Translation: OpenRefine/database
Bumps [maven-source-plugin](https://github.com/apache/maven-source-plugin) from 3.2.1 to 3.3.0.
- [Commits](apache/maven-source-plugin@maven-source-plugin-3.2.1...maven-source-plugin-3.3.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-source-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
)

* Restore previous constructor behavior

Use modified time if we're given one.

* Enhance test to cover initial read case

It wasn't testing if an unnecessary project
write occurs when projects are read, but
not written

* Make sure lastSave timestamp is initialized on load - fixes OpenRefine#3805

* Fix test for workspace save on project remove from PR OpenRefine#4796

* Save workspace when projects removed. Fixes OpenRefine#1418

* Refactor to clean up workspace save

- only write temp file if needed rather than writing,
  then deleting if unneeded
- check and log errors on file create/delete/rename
- use try-with-resources to avoid resource leaks
* Added preference for auto cluster.

* Added cluster message and button.

* Fixed translation and logic.

* Fixed bind and addClusterMessage().

* Added padding.

* Fixed cluster counter spacing.

* Updated tests.
* fixes OpenRefine#5656

* add FileNameScrutinizer test for "ଫାଇଲ.wav"

* remove printStackTrace

* format code (linter)

* Restore original pattern and widen it with Unicode classes

---------

Co-authored-by: Antonin Delpeuch <[email protected]>
wetneb added 29 commits January 27, 2024 21:47
The goal is to then be able to further restrict those operations to only
access their declared column dependencies.
TODO:
- adapt the testing runner as well
- improve the common test suite to check for Gzip and Zstd support
This makes it possible to watch all partition files for changes,
to stream from a serialized PLL as it gets written.
This makes it possible to apply record grouping transparently on a
grid that is still being computed.
See https://forum.openrefine.org/t/concurrency-of-long-running-operations/1009/3
This enables parallel computation of long-running operations
which are only separated by row/record-wise operations.
This helps check that processes are making progress for large
datasets, where the progress percentage may stay at 0% for a long time.
@wetneb wetneb force-pushed the sync_change_data_read branch from 096531f to c94d97f Compare January 30, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.