Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]Create spark ppl local test documentation instructions #896

Closed
YANG-DB opened this issue Nov 12, 2024 · 1 comment
Closed

[FEATURE]Create spark ppl local test documentation instructions #896

YANG-DB opened this issue Nov 12, 2024 · 1 comment
Labels
enhancement New feature or request Lang:PPL Pipe Processing Language support

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Nov 12, 2024

Is your feature request related to a problem?
Update the opensearch documentation for running a local spark cluster with opensearch flint jar and test the ppl new commands

What solution would you like?
we need a documentation of setting up a local spark cluster - we would like to see detailed instructions for setting the spark cluster including the spark flint jars and possibly a docker-compose defining these services

Do you have any additional context?

see comment

# Produce the artifact
sbt clean sparkPPLCosmetic/publishM2

# Start Spark with the plugin
bin/spark-sql --jars "/ABSOLUTE_PATH_TO_ARTIFACT/opensearch-spark-ppl_2.12-0.6.0-SNAPSHOT.jar" \
--conf "spark.sql.extensions=org.opensearch.flint.spark.FlintPPLSparkExtensions"  \
--conf "spark.sql.catalog.dev=org.apache.spark.opensearch.catalog.OpenSearchCatalog" \
--conf "spark.hadoop.hive.cli.print.header=true"

# Insert test table and data
CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT, con STRING);

INSERT INTO employees VALUES ("Lisa", "Sales------", 10000, 35, 'test');
INSERT INTO employees VALUES ("Evan", "Sales------", 32000, 38, 'test');
INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28, 'test');
INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33, 'test');
INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33, 'test');
INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28, 'test');
INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38, 'test');
INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23, 'test');
INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25, 'test');

# Execute WMA with basic option:

source=employees | trendline sort age wma(2, salary);

name	dept	salary	age	con	salary_trendline
Paul	Engineering	29000	23	test	NULL
Chloe	Engineering	23000	25	test	25000.0
Jane	Marketing	29000	28	test	27000.0
Fred	Engineering	21000	28	test	23666.666666666668
Alex	Sales------	30000	33	test	27000.0
Tom	Engineering	23000	33	test	25333.333333333332
Lisa	Sales------	10000	35	test	14333.333333333334
Jeff	Marketing	35000	38	test	26666.666666666668
Evan	Sales------	32000	38	test	33000.0


# Execute WMA with alias:

source=employees | trendline sort age wma(2, salary) as CUSTOM_NAME

name	dept	salary	age	con	CUSTOM_NAME
Paul	Engineering	29000	23	test	NULL
Chloe	Engineering	23000	25	test	25000.0
Jane	Marketing	29000	28	test	27000.0
Fred	Engineering	21000	28	test	23666.666666666668
Alex	Sales------	30000	33	test	27000.0
Tom	Engineering	23000	33	test	25333.333333333332
Lisa	Sales------	10000	35	test	14333.333333333334
Jeff	Marketing	35000	38	test	26666.666666666668
Evan	Sales------	32000	38	test	33000.0


# Execute WMA with multiple calculations:

source=employees | trendline sort age wma(2, salary) as WMA_2 wma(3, salary) as WMA_3;


name	dept	salary	age	con	WMA_2	WMA_3
Paul	Engineering	29000	23	test	NULL	NULL
Chloe	Engineering	23000	25	test	25000.0	NULL
Jane	Marketing	29000	28	test	27000.0	27000.0
Fred	Engineering	21000	28	test	23666.666666666668	24000.0
Alex	Sales------	30000	33	test	27000.0	26833.333333333332
Tom	Engineering	23000	33	test	25333.333333333332	25000.0
Lisa	Sales------	10000	35	test	14333.333333333334	17666.666666666668
Jeff	Marketing	35000	38	test	26666.666666666668	24666.666666666668
Evan	Sales------	32000	38	test	33000.0	29333.333333333332
Time taken: 0.466 seconds, Fetched 9 row(s)

@YANG-DB YANG-DB added enhancement New feature or request untriaged Lang:PPL Pipe Processing Language support labels Nov 12, 2024
@YANG-DB
Copy link
Member Author

YANG-DB commented Nov 13, 2024

In addition we need to produce an html report similar to this one for sanity tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Lang:PPL Pipe Processing Language support
Projects
None yet
Development

No branches or pull requests

2 participants