-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Spark calculations #493
Comments
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena/client/start_calculation_execution.html |
The way dbt have handled that is by treating the query as a generic
Definitely. It seems like we need to add calls to the new spark api endpoints to the |
I am trying to implement a cursor class that executes Spark calculations in the following branch. It looks like the PySpark code can be executed as follows. import textwrap
from pyathena import connect
conn = connect(work_group="spark-primary", cursor_class=CalcCursor)
with conn.cursor() as cursor:
cursor.execute(
textwrap.dedent(
"""
spark.sql("create database if not exists spark_demo_database")
"""
)
) Since it would be difficult to add features to a regular cursor, I have implemented a different cursor class. If you have any ideas, please feel free to suggest them. |
@laughingman7743 Thank you so much that. I have reviewed the PR. There are a couple of additional models you can test and check if they cause issues. Pandas dataframeimport pandas as pd
return pd.DataFrame({"A": [1, 2, 3, 4]}) Spark dataframereturn spark.createDataFrame(data, ["A"]) Think you can also import pyspark and return a pyspark dataframe but I haven't tested that one out |
The code for the Athena Example notebook is as follows: Spark Dataframes:
Spark SQL:
|
Dataframe would be using Spark Dataframe, not Pandas. I am not sure of the use case that would return values. You will probably be running code that writes data out to S3. |
That was to mainly check if the import cause any issues but think we can skip that feedback 👍🏽 |
Implement SparkCursor to support Spark calculations (fix #493)
I just have released v3.1.0. 🎉 |
Description
Add support to run spark calculations using any cursor
Related docs
Comments
I am currently working on adding support to run python models using the dbt-athena-community adapter and it would be much easier to accomplish if the pyathena library supports this first. I don't think mock_athena supports these yet so testing it actually much more difficult than I thought.
The text was updated successfully, but these errors were encountered: