RumbleDB 1.20.0 "Honeylocust"
Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.
Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb
Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/
Spark 3.0 and 3.1 are no longer supported as of RumbleDB 1.20, as they are no longer supported officially by the Spark team.
RumbleDB comes in 4 jars that you can pick from depending on your needs:
rumbledb-1.20.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.20.0-standalone.jar with Java 8 or 11.
rumbledb-1.20.0-for-spark-3.X.jar (3.2, 3.3) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.20.0-for-spark-3.X.jar
New features:
- Open and query YAML files (also with multiple documents) with yaml-doc()
- Serialize the output of your queries to YAML with --output-format yaml
- General comparisons (existential quantification on large sequences) now work with very big sequences and are automatically pushed down to Spark.
Bugfixes:
- Fixed an issue preventing reading Decimal types from Parquet with some precisions and ranges
- Fixed a few bugs in static typing
- Fixed a bug that didn't throw an error when using the concatenation operator || on sequences with more than one item