Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'include_column_list' parameter #58

Merged
merged 5 commits into from
Dec 16, 2019
Merged

Add 'include_column_list' parameter #58

merged 5 commits into from
Dec 16, 2019

Conversation

eeshugerman
Copy link

I found this un-merged patch by @kyrozetera on the old databricks repo (link), and it solves an issue I'm facing.

My use case

Many of the Redshift tables that I'm working with have created_dt columns, like so:
created_dt timestamp not null default current_time
These columns are intended to be left unspecified on inserts/copies, so that the default is used. But for this to work, a column list must be included in the COPY statement, or else I get an error: Missing data for not-null field

Author's use case

It appears the author's motivation for this patch is related to, but different than my own: databricks#340

@codecov-io
Copy link

codecov-io commented Dec 9, 2019

Codecov Report

Merging #58 into master will increase coverage by 0.16%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #58      +/-   ##
==========================================
+ Coverage   73.92%   74.09%   +0.16%     
==========================================
  Files          15       15              
  Lines         767      772       +5     
  Branches      103      106       +3     
==========================================
+ Hits          567      572       +5     
  Misses        200      200

Copy link

@lucagiovagnoli lucagiovagnoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for the pull request. Can you please:

  1. address comments
  2. Bump version and changelog (patch version should be ok as this is backwards compatible)


val mockRedshift = new MockRedshift(
defaultParams("url"),
Map(TableName.parseFromEscaped(defaultParams("dbtable")).toString -> null))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the schema null here? (in the Map object MAP[Table, StructType]). Should we use the TestUtils.testSchema ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but I'm guessing the original author copied it from the test above this one, which for some reason uses null. I've changed it to TestUtils.testSchema.

@@ -442,6 +442,28 @@ class RedshiftSourceSuite
mockRedshift.verifyThatExpectedQueriesWereIssued(expectedCommands)
}

test("Include Column List adds the schema columns to the COPY query") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please write a second small test near this one showing the difference when include_column_list = false ?

@eeshugerman
Copy link
Author

eeshugerman commented Dec 14, 2019

Thanks for reviewing @lucagiovagnoli! I've made the requested changes to the tests.

  1. Bump version and changelog (patch version should be ok as this is backwards compatible)

I bumped the minor version instead, since this PR adds functionality. From https://semver.org:

Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

But I can switch to patch if you prefer.

@eeshugerman eeshugerman changed the title add 'include_column_list' parameter Add 'include_column_list' parameter Dec 14, 2019
Copy link

@lucagiovagnoli lucagiovagnoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor works! I was exactly trying to follow semver but got confused.

@lucagiovagnoli lucagiovagnoli merged commit d4396f8 into spark-redshift-community:master Dec 16, 2019
pull bot pushed a commit to samklr/spark-redshift that referenced this pull request Sep 2, 2023
…n-list

Add 'include_column_list' parameter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants