Merge pull request #606 from RumbleDB/Version1.5

Version1.5
RumbleDB · Mar 30, 2020 · d161bab · d161bab
2 parents a87fb05 + d7e225b
commit d161bab
Show file tree

Hide file tree

Showing 7 changed files with 21 additions and 16 deletions.
diff --git a/docs/Function library.md b/docs/Function library.md
@@ -663,7 +663,7 @@ returns the (single) JSON value read from the supplied JSON file. This will also
 
 ## Integration with HDFS and Spark
 
-We support two more functions to read a JSON file from HDFS or send a large sequence to the cluster:
+We support more functions to read JSON, Parquet, CSV, text and ROOT files from various storage layers such as S3 and HDFS, or send a large sequence to the cluster. Supported schemes are: file, s3, s3a, s3n, hdfs, wasb, gs and root.
 
 ### json-file (Rumble specific)
 

diff --git a/docs/Getting started.md b/docs/Getting started.md
@@ -43,7 +43,7 @@ Create, in the same directory as Rumble, a file data.json and put the following
 
 In a shell, from the directory where the rumble .jar lies, type, all on one line:
 
-    spark-submit --master local[*] --deploy-mode client spark-rumble-1.4.jar --shell yes
+    spark-submit --master local[*] --deploy-mode client spark-rumble-1.5.jar --shell yes
                  
 The Rumble shell appears:
 

diff --git a/docs/JSONiq.md b/docs/JSONiq.md
@@ -42,7 +42,7 @@ return count($z)
 
 ### Expressions pushed down to Spark
 
-Some expressions are pushed down to Spark out of the box. For example, this will work on a large file leveraging the parallelism of Spark:
+Many expressions are pushed down to Spark out of the box. For example, this will work on a large file leveraging the parallelism of Spark:
 
 ```
 count(json-file("file.json")[$$.field eq "foo"].bar[].foo[[1]])
@@ -54,24 +54,28 @@ What is pushed down so far is:
 - aggregation functions such as count
 - JSON navigation expressions: object lookup (as well as keys() call), array lookup, array unboxing, filtering predicates
 - predicates on positions, include use of context-dependent functions position() and last(), e.g.,
+- type checking (instance of, treat as)
+- many builtin function calls (head, tail, exist, etc)
 
 ```
 json-file("file.json")[position() ge 10 and position() le last() - 2]
 ```
 
 More expressions working on sequences will be pushed down in the future, prioritized on the feedback we receive.
 
-We also started to push down some expressions to DataFrames and Spark SQL. In particular, keys() pushes down the schema lookup if used on parquet-file() and structured-json-file(). Likewise, count() on these is also pushed down.
+We also started to push down some expressions to DataFrames and Spark SQL (obtained via structured-json-file, csv-file and parquet-file calls). In particular, keys() pushes down the schema lookup if used on parquet-file() and structured-json-file(). Likewise, count() as well as object lookup, array unboxing and array lookup is also pushed down on DataFrames.
 
 When an expression does not support pushdown, it will materialize automaticaly. To avoid issues, the materializion is capped by default at 100 items, but this can be changed on the command line with --result-size. A warning is issued if a materialization happened and the sequence was truncated.
 
 ### Unsupported global variables, settings and modules
 
 Prologs with user-defined functions are now supported, but not yet global global variables, settings and library modules.
 
-Dynamic functions (aka, function items that can be passed as values and dynamically called) are now supported.
+Global external variables with string values are supported (use "--variable:foo bar" on the command line to assign values to them).
 
-Builtin function calls are type-checked, but user-defined function calls and dynamic calls are not type-checked yet.
+Dynamic functions (aka, function items that can be passed as values and dynamically called) are supported.
+
+All function calls are type-checked.
 
 ### Unsupported try/catch
 
@@ -88,6 +92,7 @@ The type system is not quite complete yet, although a lot of progress was made.
 |  Type | Status |
 |-------|--------|
 | atomic | supported |
+| anyURI | supported |
 | base64Binary | supported |
 | boolean | supported |
 | byte | not supported |

diff --git a/docs/Run on a cluster.md b/docs/Run on a cluster.md
@@ -5,19 +5,19 @@ simply by modifying the command line parameters as documented [here for spark-su
 
 If the Spark cluster is running on yarn, then the --master option must be changed from local[\*] to yarn compared to the getting started guide.
 
-    spark-submit --master yarn --deploy-mode client spark-rumble-1.4.jar --shell yes
+    spark-submit --master yarn --deploy-mode client spark-rumble-1.5.jar --shell yes
                  
 You can also adapt the number of executors, etc.
 
     spark-submit --master yarn --deploy-mode client
                  --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 spark-rumble-1.4.jar --shell yes
+                 spark-rumble-1.5.jar --shell yes
 
 The size limit for materialization can also be made higher with --result-size (the default is 100). This affects the number of items displayed on the shells as an answer to a query, as well as any materializations happening within the query with push-down is not supported. Warnings are issued if the cap is reached.
 
     spark-submit --master yarn --deploy-mode client
                  --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 spark-rumble-1.4.jar
+                 spark-rumble-1.5.jar
                  --shell yes --result-size 10000
 
 ## Creation functions
@@ -58,7 +58,7 @@ Rumble also supports executing a single query from the command line, reading fro
 
     spark-submit --master yarn --deploy-mode client
                  --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 spark-rumble-1.4.jar
+                 spark-rumble-1.5.jar
                  --query-path "hdfs:///user/me/query.jq"
                  --output-path "hdfs:///user/me/results/output"
                  --log-path "hdfs:///user/me/logging/mylog"
@@ -67,7 +67,7 @@ The query path can also be a local, absolute path. It is also possible to omit t
 
     spark-submit --master yarn --deploy-mode client
                  --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 spark-rumble-1.4.jar
+                 spark-rumble-1.5.jar
                  --query-path "/home/me/my-local-machine/query.jq"
                  --output-path "/user/me/results/output"
                  --log-path "hdfs:///user/me/logging/mylog"

diff --git a/docs/install.md b/docs/install.md
@@ -58,13 +58,13 @@ Once the ANTLR sources have been generated, you can compile the entire project l
 
     $ mvn clean compile assembly:single
 
-After successful completion, you can check the `target` directory, which should contain the compiled classes as well as the JAR file `spark-rumble-1.4-jar-with-dependencies.jar`.
+After successful completion, you can check the `target` directory, which should contain the compiled classes as well as the JAR file `spark-rumble-1.5.jar`.
 
 ## Running locally
 
 The most straightforward to test if the above steps were successful is to run the Rumble shell locally, like so:
 
-    $ spark-submit --master local[2] --deploy-mode client target/spark-rumble-1.4-jar-with-dependencies.jar --shell yes
+    $ spark-submit --master local[2] --deploy-mode client target/spark-rumble-1.5.jar --shell yes
 
 The Rumble shell should start:
 
@@ -113,6 +113,6 @@ This is it. Rumble is step and ready to go locally. You can now move on to a JSO
 
 You can also try to run the Rumble shell on a cluster if you have one available and configured -- this is done in the same way as any other `spark-submit` command:
 
-    $ spark-submit --master yarn --deploy-mode client --num-executors 40 spark-rumble-1.4.jar
+    $ spark-submit --master yarn --deploy-mode client --num-executors 40 spark-rumble-1.5.jar
 
 More details are provided in the rest of the documentation.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,4 +1,4 @@
-site_name: Rumble 1.4.0 "Willow Oak" beta
+site_name: Rumble 1.5.0 "Southern Live Oak" beta
 pages:
   - '1. Documentation home': 'index.md'
   - '2. Getting started': 'Getting started.md'

diff --git a/src/main/resources/assets/banner.txt b/src/main/resources/assets/banner.txt
@@ -1,6 +1,6 @@
     ____                  __    __   
    / __ \__  ______ ___  / /_  / /__ 
   / /_/ / / / / __ `__ \/ __ \/ / _ \  The distributed JSONiq engine
- / _, _/ /_/ / / / / / / /_/ / /  __/  1.4 "Willow Oak" beta
+ / _, _/ /_/ / / / / / / /_/ / /  __/  1.5 "Southern Live Oak" beta
 /_/ |_|\__,_/_/ /_/ /_/_.___/_/\___/