-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'include_column_list' parameter #58
Add 'include_column_list' parameter #58
Conversation
Codecov Report
@@ Coverage Diff @@
## master #58 +/- ##
==========================================
+ Coverage 73.92% 74.09% +0.16%
==========================================
Files 15 15
Lines 767 772 +5
Branches 103 106 +3
==========================================
+ Hits 567 572 +5
Misses 200 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for the pull request. Can you please:
- address comments
- Bump version and changelog (patch version should be ok as this is backwards compatible)
|
||
val mockRedshift = new MockRedshift( | ||
defaultParams("url"), | ||
Map(TableName.parseFromEscaped(defaultParams("dbtable")).toString -> null)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the schema null here? (in the Map object MAP[Table, StructType]). Should we use the TestUtils.testSchema ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but I'm guessing the original author copied it from the test above this one, which for some reason uses null
. I've changed it to TestUtils.testSchema
.
@@ -442,6 +442,28 @@ class RedshiftSourceSuite | |||
mockRedshift.verifyThatExpectedQueriesWereIssued(expectedCommands) | |||
} | |||
|
|||
test("Include Column List adds the schema columns to the COPY query") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please write a second small test near this one showing the difference when include_column_list = false ?
Co-authored-by: kyrozetera <[email protected]>
also some minor formatting tweaks for the same test
Thanks for reviewing @lucagiovagnoli! I've made the requested changes to the tests.
I bumped the minor version instead, since this PR adds functionality. From https://semver.org:
But I can switch to patch if you prefer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor works! I was exactly trying to follow semver but got confused.
…n-list Add 'include_column_list' parameter
I found this un-merged patch by @kyrozetera on the old databricks repo (link), and it solves an issue I'm facing.
My use case
Many of the Redshift tables that I'm working with have
created_dt
columns, like so:created_dt timestamp not null default current_time
These columns are intended to be left unspecified on inserts/copies, so that the
default
is used. But for this to work, a column list must be included in the COPY statement, or else I get an error:Missing data for not-null field
Author's use case
It appears the author's motivation for this patch is related to, but different than my own: databricks#340