Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPl flatten command #784

Merged
merged 4 commits into from
Oct 31, 2024
Merged

PPl flatten command #784

merged 4 commits into from
Oct 31, 2024

Conversation

lukasz-soszynski-eliatra
Copy link
Contributor

@lukasz-soszynski-eliatra lukasz-soszynski-eliatra commented Oct 16, 2024

Description

The flatten command is introduced

Issues Resolved

#669

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@lukasz-soszynski-eliatra lukasz-soszynski-eliatra force-pushed the flatten branch 2 times, most recently from a7a0677 to d566d3f Compare October 16, 2024 12:56
@YANG-DB YANG-DB added Lang:PPL Pipe Processing Language support 0.6 labels Oct 16, 2024
docs/ppl-lang/ppl-fillnull-command.md Outdated Show resolved Hide resolved
private val planTransformer = new CatalystQueryPlanVisitor()
private val pplParser = new PPLSyntaxParser()

test("test fillnull only field") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasz-soszynski-eliatra plz change the test name to match the ppl query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@YANG-DB YANG-DB Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasz-soszynski-eliatra plz add more tests with different queries including

  • stats (aggregations)
  • parse (parsing of regExp)
  • eval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@YANG-DB YANG-DB changed the title Flatten PPl flatten command Oct 29, 2024
super.beforeAll()

// Create test table
createSimpleNestedJsonContentTable(tempFile, testTable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasz-soszynski-eliatra plz add additional test from other types of tables or use cases:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@YANG-DB
Copy link
Member

YANG-DB commented Oct 30, 2024

@lukasz-soszynski-eliatra also plz rebase ...

#### Data
| \_time | bridges | city | coor | country |
|---------------------|----------------------------------------------|---------|------------------------|---------------|
| 2024-09-13T12:00:00 | [{801, Tower Bridge}, {928, London Bridge}] | London | {35, 51.5074, -0.1278} | England |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does these example data come from? I suggest to create some example data by ourselves instead of copying from internet only if they are from public dataset. @YANG-DB please do not use internet data, specially documentation of other similar product, in github issue either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These data come from the issue #669 but were extended.

@@ -53,6 +53,7 @@ commands
| renameCommand
| fillnullCommand
| fieldsummaryCommand
| flattenCommand
;

commandName
Copy link
Member

@LantaoJin LantaoJin Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add FLATTEN keyword in the end of commandName section.

Copy link
Member

@LantaoJin LantaoJin Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! LGTM except this^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add FLATTEN keyword in the end of commandName section.
please add here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +4 to +6
Using `flatten` command to flatten a field of type:
- `struct<?,?>`
- `array<struct<?,?>>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these data_types what we expecting for this command? I thought the expected input is json string. But it's fine for now. We can enhance it later.

@@ -53,6 +53,7 @@ commands
| renameCommand
| fillnullCommand
| fieldsummaryCommand
| flattenCommand
;

commandName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add FLATTEN keyword in the end of commandName section.
please add here

@YANG-DB YANG-DB merged commit b71a9ed into opensearch-project:main Oct 31, 2024
4 checks passed
YANG-DB added a commit that referenced this pull request Nov 2, 2024
* update antlr grammar for (future) P1 command syntax

Signed-off-by: YANGDB <[email protected]>

* add trendline command

Signed-off-by: YANGDB <[email protected]>

* add expand command

Signed-off-by: YANGDB <[email protected]>

* add geoip command

Signed-off-by: YANGDB <[email protected]>

* PPl `flatten` command (#784)

* The flatten command implemented

Signed-off-by: Lukasz Soszynski <[email protected]>

* The flatten command integration tests were extended with additional checks for logical plans.

Signed-off-by: Lukasz Soszynski <[email protected]>

* flatten, added more tests related to plan translation and integration tests

Signed-off-by: Lukasz Soszynski <[email protected]>

* Flatten command added to command names list.

Signed-off-by: Lukasz Soszynski <[email protected]>

---------

Signed-off-by: Lukasz Soszynski <[email protected]>

* Extract source table names from mv query (#854)

* add sourceTables to MV index metadata properties

Signed-off-by: Sean Kao <[email protected]>

* parse source tables from mv query

Signed-off-by: Sean Kao <[email protected]>

* test cases for parse source tables from mv query

Signed-off-by: Sean Kao <[email protected]>

* use constant for metadata cache version

Signed-off-by: Sean Kao <[email protected]>

* write source tables to metadata cache

Signed-off-by: Sean Kao <[email protected]>

* address comment

Signed-off-by: Sean Kao <[email protected]>

* generate source tables for old mv without new prop

Signed-off-by: Sean Kao <[email protected]>

* syntax fix

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>

* Fallback to internal scheduler when index creation failed (#850)

* Fallback to internal scheduler when index creation failed

Signed-off-by: Louis Chu <[email protected]>

* Fix IT

Signed-off-by: Louis Chu <[email protected]>

* Fix IOException

Signed-off-by: Louis Chu <[email protected]>

---------

Signed-off-by: Louis Chu <[email protected]>

* New trendline ppl command (SMA only) (#833)

* WIP trendline command

Signed-off-by: Kacper Trochimiak <[email protected]>

* wip

Signed-off-by: Kacper Trochimiak <[email protected]>

* trendline supports sorting

Signed-off-by: Kacper Trochimiak <[email protected]>

* run scalafmtAll

Signed-off-by: Kacper Trochimiak <[email protected]>

* return null when there are too few data points

Signed-off-by: Kacper Trochimiak <[email protected]>

* sbt scalafmtAll

Signed-off-by: Kacper Trochimiak <[email protected]>

* Remove WMA references

Signed-off-by: Hendrik Saly <[email protected]>

* trendline - sortByField as Optional<Field>

Signed-off-by: Kacper Trochimiak <[email protected]>

* introduce TrendlineStrategy

Signed-off-by: Kacper Trochimiak <[email protected]>

* keywordsCanBeId -> replace SMA with trendlineType

Signed-off-by: Kacper Trochimiak <[email protected]>

* handle trendline alias as qualifiedName instead of fieldExpression

Signed-off-by: Kacper Trochimiak <[email protected]>

* Add docs

Signed-off-by: Hendrik Saly <[email protected]>

* Make alias optional

Signed-off-by: Hendrik Saly <[email protected]>

* Adapt tests for optional alias

Signed-off-by: Hendrik Saly <[email protected]>

* Adden logical plan unittests

Signed-off-by: Hendrik Saly <[email protected]>

* Add missing license headers

Signed-off-by: Hendrik Saly <[email protected]>

* Fix docs

Signed-off-by: Hendrik Saly <[email protected]>

* numberOfDataPoints must be 1 or greater

Signed-off-by: Hendrik Saly <[email protected]>

* Rename TrendlineStrategy to  TrendlineCatalystUtils

Signed-off-by: Hendrik Saly <[email protected]>

* Validate TrendlineType early and pass around enum type

Signed-off-by: Hendrik Saly <[email protected]>

* Add trendline chaining test

Signed-off-by: Hendrik Saly <[email protected]>

* Fix compile errors

Signed-off-by: Hendrik Saly <[email protected]>

* Fix imports

Signed-off-by: Hendrik Saly <[email protected]>

* Fix imports

Signed-off-by: Hendrik Saly <[email protected]>

---------

Signed-off-by: Kacper Trochimiak <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Co-authored-by: Kacper Trochimiak <[email protected]>

* update iplocation antlr

Signed-off-by: YANGDB <[email protected]>

---------

Signed-off-by: YANGDB <[email protected]>
Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Louis Chu <[email protected]>
Signed-off-by: Kacper Trochimiak <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Co-authored-by: lukasz-soszynski-eliatra <[email protected]>
Co-authored-by: Sean Kao <[email protected]>
Co-authored-by: Louis Chu <[email protected]>
Co-authored-by: Hendrik Saly <[email protected]>
Co-authored-by: Kacper Trochimiak <[email protected]>
YANG-DB added a commit that referenced this pull request Nov 5, 2024
* update antlr grammar for (future) P1 command syntax

Signed-off-by: YANGDB <[email protected]>

* add trendline command

Signed-off-by: YANGDB <[email protected]>

* add expand command

Signed-off-by: YANGDB <[email protected]>

* add geoip command

Signed-off-by: YANGDB <[email protected]>

* PPl `flatten` command (#784)

* The flatten command implemented

Signed-off-by: Lukasz Soszynski <[email protected]>

* The flatten command integration tests were extended with additional checks for logical plans.

Signed-off-by: Lukasz Soszynski <[email protected]>

* flatten, added more tests related to plan translation and integration tests

Signed-off-by: Lukasz Soszynski <[email protected]>

* Flatten command added to command names list.

Signed-off-by: Lukasz Soszynski <[email protected]>

---------

Signed-off-by: Lukasz Soszynski <[email protected]>

* Extract source table names from mv query (#854)

* add sourceTables to MV index metadata properties

Signed-off-by: Sean Kao <[email protected]>

* parse source tables from mv query

Signed-off-by: Sean Kao <[email protected]>

* test cases for parse source tables from mv query

Signed-off-by: Sean Kao <[email protected]>

* use constant for metadata cache version

Signed-off-by: Sean Kao <[email protected]>

* write source tables to metadata cache

Signed-off-by: Sean Kao <[email protected]>

* address comment

Signed-off-by: Sean Kao <[email protected]>

* generate source tables for old mv without new prop

Signed-off-by: Sean Kao <[email protected]>

* syntax fix

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>

* Fallback to internal scheduler when index creation failed (#850)

* Fallback to internal scheduler when index creation failed

Signed-off-by: Louis Chu <[email protected]>

* Fix IT

Signed-off-by: Louis Chu <[email protected]>

* Fix IOException

Signed-off-by: Louis Chu <[email protected]>

---------

Signed-off-by: Louis Chu <[email protected]>

* New trendline ppl command (SMA only) (#833)

* WIP trendline command

Signed-off-by: Kacper Trochimiak <[email protected]>

* wip

Signed-off-by: Kacper Trochimiak <[email protected]>

* trendline supports sorting

Signed-off-by: Kacper Trochimiak <[email protected]>

* run scalafmtAll

Signed-off-by: Kacper Trochimiak <[email protected]>

* return null when there are too few data points

Signed-off-by: Kacper Trochimiak <[email protected]>

* sbt scalafmtAll

Signed-off-by: Kacper Trochimiak <[email protected]>

* Remove WMA references

Signed-off-by: Hendrik Saly <[email protected]>

* trendline - sortByField as Optional<Field>

Signed-off-by: Kacper Trochimiak <[email protected]>

* introduce TrendlineStrategy

Signed-off-by: Kacper Trochimiak <[email protected]>

* keywordsCanBeId -> replace SMA with trendlineType

Signed-off-by: Kacper Trochimiak <[email protected]>

* handle trendline alias as qualifiedName instead of fieldExpression

Signed-off-by: Kacper Trochimiak <[email protected]>

* Add docs

Signed-off-by: Hendrik Saly <[email protected]>

* Make alias optional

Signed-off-by: Hendrik Saly <[email protected]>

* Adapt tests for optional alias

Signed-off-by: Hendrik Saly <[email protected]>

* Adden logical plan unittests

Signed-off-by: Hendrik Saly <[email protected]>

* Add missing license headers

Signed-off-by: Hendrik Saly <[email protected]>

* Fix docs

Signed-off-by: Hendrik Saly <[email protected]>

* numberOfDataPoints must be 1 or greater

Signed-off-by: Hendrik Saly <[email protected]>

* Rename TrendlineStrategy to  TrendlineCatalystUtils

Signed-off-by: Hendrik Saly <[email protected]>

* Validate TrendlineType early and pass around enum type

Signed-off-by: Hendrik Saly <[email protected]>

* Add trendline chaining test

Signed-off-by: Hendrik Saly <[email protected]>

* Fix compile errors

Signed-off-by: Hendrik Saly <[email protected]>

* Fix imports

Signed-off-by: Hendrik Saly <[email protected]>

* Fix imports

Signed-off-by: Hendrik Saly <[email protected]>

---------

Signed-off-by: Kacper Trochimiak <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Co-authored-by: Kacper Trochimiak <[email protected]>

* update iplocation antlr

Signed-off-by: YANGDB <[email protected]>

* update scala fmt style

Signed-off-by: YANGDB <[email protected]>

* `cidrmatch` ppl command add logical tests and docs (#865)

* update logical tests and docs

Signed-off-by: YANGDB <[email protected]>

* update scala fmt style

Signed-off-by: YANGDB <[email protected]>

* fix type error

Signed-off-by: YANGDB <[email protected]>

---------

Signed-off-by: YANGDB <[email protected]>

* Support Lambda and add related array functions (#864)

* json function enhancement

Signed-off-by: Heng Qian <[email protected]>

* Add JavaToScalaTransformer

Signed-off-by: Heng Qian <[email protected]>

* Apply scalafmtAll

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

* Add IT and change to use the same function name as spark

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

* Add document and separate lambda functions from json functions

Signed-off-by: Heng Qian <[email protected]>

* Add lambda functions transform and reduce

Signed-off-by: Heng Qian <[email protected]>

* polish lambda function document

Signed-off-by: Heng Qian <[email protected]>

* polish lambda function document

Signed-off-by: Heng Qian <[email protected]>

* Minor fix

Signed-off-by: Heng Qian <[email protected]>

* Minor change to polish the documents

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: YANGDB <[email protected]>
Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Louis Chu <[email protected]>
Signed-off-by: Kacper Trochimiak <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Co-authored-by: lukasz-soszynski-eliatra <[email protected]>
Co-authored-by: Sean Kao <[email protected]>
Co-authored-by: Louis Chu <[email protected]>
Co-authored-by: Hendrik Saly <[email protected]>
Co-authored-by: Kacper Trochimiak <[email protected]>
Co-authored-by: qianheng <[email protected]>
kenrickyap pushed a commit to Bit-Quill/opensearch-spark that referenced this pull request Dec 11, 2024
* The flatten command implemented

Signed-off-by: Lukasz Soszynski <[email protected]>

* The flatten command integration tests were extended with additional checks for logical plans.

Signed-off-by: Lukasz Soszynski <[email protected]>

* flatten, added more tests related to plan translation and integration tests

Signed-off-by: Lukasz Soszynski <[email protected]>

* Flatten command added to command names list.

Signed-off-by: Lukasz Soszynski <[email protected]>

---------

Signed-off-by: Lukasz Soszynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 Lang:PPL Pipe Processing Language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants