Skip to content

Commit

Permalink
add features to insert_df_to_hive_table
Browse files Browse the repository at this point in the history
  • Loading branch information
dombean committed Nov 15, 2024
1 parent dc7edab commit 621ff73
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions rdsa_utils/cdp/io/output.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ def insert_df_to_hive_table(
Parameters
----------
spark : SparkSession
spark
Active SparkSession.
df : SparkDF
df
SparkDF containing data to be written.
table_name : str
Name of the Hive table to write data into.
overwrite : bool, optional
overwrite
Controls how existing data is handled, default is False:
For non-partitioned data:
Expand All @@ -64,10 +64,10 @@ def insert_df_to_hive_table(
For partitioned data:
- True: Replaces data only in partitions present in DataFrame
- False: Appends data to existing partitions or creates new ones
fill_missing_cols : bool, optional
fill_missing_cols
If True, adds missing columns as nulls. If False, raises error
on schema mismatch (default is False).
repartition_column : Union[int, str, None], optional
repartition_column
Controls data repartitioning, default is None:
- int: Sets target number of partitions
- str: Specifies column to repartition by
Expand Down

0 comments on commit 621ff73

Please sign in to comment.