local spark ppl testing documentation #902

YANG-DB · 2024-11-13T22:28:49Z

Description

add local spark ppl testing documentation and details

Related Issues

Check List

Updated documentation (docs/ppl-lang/README.md)
Implemented unit tests
Implemented tests for combination with other commands
New added source code should include a copyright header
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: YANGDB <[email protected]>

LantaoJin · 2024-11-14T02:29:24Z

@YANG-DB A high level question: will we use local spark to do sanity test instead of using OpenSearch Domain in future? It seems not an end to end testing. For example we found this issue #875 in sanity test with Domain env.

qianheng-aws · 2024-11-14T02:40:49Z

@YANG-DB What's the motivation to add this doc?

I think we already have the guide about the local spark ppl usage in root README:
https://github.com/opensearch-project/opensearch-spark/blob/main/README.md#ppl-build--run

And the ppl commands testing is somehow duplicate with ppl-commands doc, that place should be the single of truth for each command
https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/README.md

LantaoJin · 2024-11-14T02:41:28Z

docs/ppl-lang/local-spark-ppl-test-instruction.md

+## emails table
+```sql
+CREATE TABLE emails (name STRING, age INT, email STRING, street_address STRING, year INT, month INT) PARTITIONED BY (year, month);
+INSERT INTO testTable (name, age, email, street_address, year, month) VALUES ('Alice', 30, '[email protected]', '123 Main St, Seattle', 2023, 4), ('Bob', 55, '[email protected]', '456 Elm St, Portland', 2023, 5), ('Charlie', 65, '[email protected]', '789 Pine St, San Francisco', 2023, 4), ('David', 19, '[email protected]', '101 Maple St, New York', 2023, 5), ('Eve', 21, '[email protected]', '202 Oak St, Boston', 2023, 4), ('Frank', 76, '[email protected]', '303 Cedar St, Austin', 2023, 5), ('Grace', 41, '[email protected]', '404 Birch St, Chicago', 2023, 4), ('Hank', 32, '[email protected]', '505 Spruce St, Miami', 2023, 5), ('Ivy', 9, '[email protected]', '606 Fir St, Denver', 2023, 4), ('Jack', 12, '[email protected]', '707 Ash St, Seattle', 2023, 5);


testTable should be emails

LantaoJin · 2024-11-14T02:46:37Z

docs/ppl-lang/local-spark-ppl-test-instruction.md

+# Testing PPL using local Spark
+
+## Produce the PPL artifact
+The first step would be to produce the spark-ppl artifact: `sbt clean sparkPPLCosmetic/publishM2`


This action is dangerous when a user has write credentials and remote repo settings in env. How about change to sbt clean sparkPPLCosmetic/assembly?
It will generate the spark-ppl artifact and print it in the end:

[info] Built: ./opensearch-spark/sparkPPLCosmetic/target/scala-2.12/opensearch-spark-ppl-assembly-x.y.z-SNAPSHOT.jar
[info] Jar hash: 71dd9c

@LantaoJin I've updated - please review and see if anything else is missing
thanks

Signed-off-by: YANGDB <[email protected]>

YANG-DB · 2024-11-14T04:01:05Z

@YANG-DB A high level question: will we use local spark to do sanity test instead of using OpenSearch Domain in future? It seems not an end to end testing. For example we found this issue #875 in sanity test with Domain env.

Hi @LantaoJin
the idea behind this is to allow an open-source user to experiment with the PPL language in the development environment itself directly. It serves as a fast way to experiment with spark local cluster before moving it into more complicated use cases.
The ultimate goal is to have a separate testing for open-source environment which is not depended on a specific provider.
it doesnt function as a sanity test but rather as a user experiment tutorial for paying around with the language and understanding its capabilty.

YANG-DB · 2024-11-14T04:05:33Z

@YANG-DB What's the motivation to add this doc?

I think we already have the guide about the local spark ppl usage in root README: https://github.com/opensearch-project/opensearch-spark/blob/main/README.md#ppl-build--run

And the ppl commands testing is somehow duplicate with ppl-commands doc, that place should be the single of truth for each command https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/README.md

Hi @qianheng-aws - thanks for the feedback
as I mentioned above this simple tutorial is a basic way for explaining how to quickly get started with PPL for a local spark cluster and is extending the README part.
its supposed to be used by developer which are trying to understand whether this spark-opensource-ppl solution fits their need without the need to deploy a more complicated use case into a real spark cluster.

… queries Signed-off-by: YANGDB <[email protected]>

LantaoJin · 2024-11-14T10:03:43Z

docs/ppl-lang/local-spark-ppl-test-instruction.md

+## Start Spark with the plugin
+Once installed, run spark with the generated PPL artifact: 
+```shell
+bin/spark-sql --jars "/PATH_TO_ARTIFACT/oopensearch-spark-ppl-assembly-x.y.z-SNAPSHOT.jar" \


double o typo here

Signed-off-by: YANGDB <[email protected]>

* add local spark ppl testing documentation and details Signed-off-by: YANGDB <[email protected]> * update more sample test tables and commands Signed-off-by: YANGDB <[email protected]> * update more sample test tables and commands Signed-off-by: YANGDB <[email protected]> * update more sample test tables and commands Signed-off-by: YANGDB <[email protected]> * update for using opensearch-spark-ppl-assembly-x.y.z-SNAPSHOT.jar Signed-off-by: YANGDB <[email protected]> * update tutorial documentation on using a local spark-cluster with ppl queries Signed-off-by: YANGDB <[email protected]> * typo fix Signed-off-by: YANGDB <[email protected]> --------- Signed-off-by: YANGDB <[email protected]>

add local spark ppl testing documentation and details

d832fdf

Signed-off-by: YANGDB <[email protected]>

YANG-DB requested review from dai-chen, mengweieric, vamsimanohar, penghuo, seankao-az, anirudha, kaituo, noCharger, LantaoJin and ykmr1224 as code owners November 13, 2024 22:28

YANG-DB added documentation Improvements or additions to documentation Lang:PPL Pipe Processing Language support 0.7 labels Nov 13, 2024

YANG-DB added 5 commits November 13, 2024 16:04

update more sample test tables and commands

d1901e7

Signed-off-by: YANGDB <[email protected]>

update more sample test tables and commands

385bfa7

Signed-off-by: YANGDB <[email protected]>

update more sample test tables and commands

ee47109

Signed-off-by: YANGDB <[email protected]>

Merge branch 'main' into local-test-instructions

593386b

Merge branch 'main' into local-test-instructions

56fca0a

LantaoJin requested changes Nov 14, 2024

View reviewed changes

update for using opensearch-spark-ppl-assembly-x.y.z-SNAPSHOT.jar

ad3f646

Signed-off-by: YANGDB <[email protected]>

YANG-DB requested a review from LantaoJin November 14, 2024 03:03

YANG-DB added 2 commits November 13, 2024 21:15

update tutorial documentation on using a local spark-cluster with ppl…

83232bf

… queries Signed-off-by: YANGDB <[email protected]>

Merge branch 'main' into local-test-instructions

ebbf446

LantaoJin approved these changes Nov 14, 2024

View reviewed changes

LantaoJin reviewed Nov 14, 2024

View reviewed changes

typo fix

211b68e

Signed-off-by: YANGDB <[email protected]>

YANG-DB merged commit bf60e59 into opensearch-project:main Nov 14, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local spark ppl testing documentation #902

local spark ppl testing documentation #902

YANG-DB commented Nov 13, 2024

LantaoJin commented Nov 14, 2024

qianheng-aws commented Nov 14, 2024 •

edited

Loading

LantaoJin Nov 14, 2024

LantaoJin Nov 14, 2024

YANG-DB Nov 14, 2024

YANG-DB commented Nov 14, 2024 •

edited

Loading

YANG-DB commented Nov 14, 2024

LantaoJin Nov 14, 2024

local spark ppl testing documentation #902

local spark ppl testing documentation #902

Conversation

YANG-DB commented Nov 13, 2024

Description

Related Issues

Check List

LantaoJin commented Nov 14, 2024

qianheng-aws commented Nov 14, 2024 • edited Loading

LantaoJin Nov 14, 2024

Choose a reason for hiding this comment

LantaoJin Nov 14, 2024

Choose a reason for hiding this comment

YANG-DB Nov 14, 2024

Choose a reason for hiding this comment

YANG-DB commented Nov 14, 2024 • edited Loading

YANG-DB commented Nov 14, 2024

LantaoJin Nov 14, 2024

Choose a reason for hiding this comment

qianheng-aws commented Nov 14, 2024 •

edited

Loading

YANG-DB commented Nov 14, 2024 •

edited

Loading