Skip to content

RumbleDB 1.20.0 "Honeylocust"

Compare
Choose a tag to compare
@ghislainfourny ghislainfourny released this 07 Nov 12:57
· 665 commits to master since this release
38e07ca

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Spark 3.0 and 3.1 are no longer supported as of RumbleDB 1.20, as they are no longer supported officially by the Spark team.

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.20.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.20.0-standalone.jar with Java 8 or 11.
rumbledb-1.20.0-for-spark-3.X.jar (3.2, 3.3) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.20.0-for-spark-3.X.jar

New features:

  • Open and query YAML files (also with multiple documents) with yaml-doc()
  • Serialize the output of your queries to YAML with --output-format yaml
  • General comparisons (existential quantification on large sequences) now work with very big sequences and are automatically pushed down to Spark.

Bugfixes:

  • Fixed an issue preventing reading Decimal types from Parquet with some precisions and ranges
  • Fixed a few bugs in static typing
  • Fixed a bug that didn't throw an error when using the concatenation operator || on sequences with more than one item