Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Understandable logical documentation #1008

Open
adityamohan93 opened this issue Jan 4, 2025 · 3 comments
Open

[FEATURE] Understandable logical documentation #1008

adityamohan93 opened this issue Jan 4, 2025 · 3 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@adityamohan93
Copy link

Is your feature request related to a problem?
I found this repo opensearch-spark looking for a way to write to our OpenSearch service domain using Spark on EMR directly instead of making calls from a REST endpoint from a service or OpenSearch Ingestion pipeline. However the README is confusing to say the least. It introduces something called flint but never explains what it is and by saying OpenSearch Flint is ... It consists of four modules:.

What solution would you like?

  1. Provide documentation in README on whether and how this repo can be used to write to OpenSearch using Spark and add more details around doing so with EMR and Glue.
  2. Explain what Flint is.
  3. What is the entire purpose of this repo? The repo name is opensearch-spark but the description is Spark Accelerator framework ; It enables secondary indices to remote data stores. If it only meant for that or can it help other use cases?

What alternatives have you considered?
Discard the Spark path for my write use case.

Do you have any additional context?
None.

@adityamohan93 adityamohan93 added enhancement New feature or request untriaged labels Jan 4, 2025
@YANG-DB
Copy link
Member

YANG-DB commented Jan 5, 2025

Hi @adityamohan93
Thanks for your comment - we are currently working on such an example use case - adding a specific docker-based framework that will show such API/Queries usage examples - please review and let me know what you think and if you have any suggestions:

@dai-chen dai-chen added the documentation Improvements or additions to documentation label Jan 7, 2025
@dai-chen
Copy link
Collaborator

dai-chen commented Jan 7, 2025

We also leverage doctest framework to provide clear and executable examples in the SQL documentation: https://github.com/opensearch-project/sql/blob/main/docs/dev/testing-doctest.md. We could consider applying the same approach in the Spark repository. I've created issue #1009 for tracking.

@adityamohan93
Copy link
Author

Thanks @YANG-DB and @dai-chen . This new issue created now clearly points out what's missing for new external users(like myself) to understand how one can use this repo. A lot of users of OpenSearch, especially ML teams, would like to use Spark for batch writes and reads and I believe your repo could becomes their one-stop-shop. The alternative would be to use opensearch-py directly using native Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants