Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Enhance Flint query profiling and acceleration insights #997

Open
dai-chen opened this issue Dec 19, 2024 · 0 comments
Open

[FEATURE] Enhance Flint query profiling and acceleration insights #997

dai-chen opened this issue Dec 19, 2024 · 0 comments
Labels
enhancement New feature or request performance Make it fast!

Comments

@dai-chen
Copy link
Collaborator

dai-chen commented Dec 19, 2024

Is your feature request related to a problem?

Flint users often struggle to understand how their SparkSQL queries benefit from acceleration indices. They lack clarity on whether their queries are being optimized by Flint’s indices or running as direct queries. Furthermore, users have limited visibility into which acceleration index, if any, is enhancing query performance. Troubleshooting performance issues typically requires opening the Spark dashboard UI to inspect job, stage, and task metrics, which is both time-consuming and inefficient.

What solution would you like?

  1. Provide advanced APIs to understand acceleration such as "What-If" and "WhyNot" API that help users assess potential performance improvements and understand why existing acceleration cannot be applied to the given query.
  2. Expose more Spark metrics in query execution history to users, including the missing metrics from Flint reader and writer.
  3. More logging in Flint optimizer rule within Flint's query rewrite logic to explicitly with more details for troubleshooting.
  4. Provide a user-friendly interface to display query execution history and related metrics, making it easier for users to analyze performance and acceleration impacts. Ref: https://docs.databricks.com/en/sql/user/queries/query-history.html

What alternatives have you considered?

  1. Provide the link to Spark dashboard to allow users to directly check query execution details and independently analyze performance metrics and identify bottlenecks.
  2. Provide guidance for users to troubleshoot acceleration on their own as follows.

query-perf-rca

Do you have any additional context?

The following questions from users highlight the need for this feature:

  1. How do I know if I am performing a direct query or if my query is benefiting from accelerated indices?
  2. How do I determine which acceleration index helped boost my query performance?
  3. How can I check if my query is performing faster before and after creating accelerations?
  4. To see performance benefits, should I execute the query in OpenSearch SQL workbench or Dev Tools?
@dai-chen dai-chen added enhancement New feature or request untriaged performance Make it fast! and removed untriaged labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make it fast!
Projects
None yet
Development

No branches or pull requests

1 participant