Add 'include_column_list' parameter #58

eeshugerman · 2019-12-09T19:14:26Z

I found this un-merged patch by @kyrozetera on the old databricks repo (link), and it solves an issue I'm facing.

My use case

Many of the Redshift tables that I'm working with have created_dt columns, like so:
created_dt timestamp not null default current_time
These columns are intended to be left unspecified on inserts/copies, so that the default is used. But for this to work, a column list must be included in the COPY statement, or else I get an error: Missing data for not-null field

Author's use case

It appears the author's motivation for this patch is related to, but different than my own: databricks#340

codecov-io · 2019-12-09T20:03:27Z

Codecov Report

Merging #58 into master will increase coverage by 0.16%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #58      +/-   ##
==========================================
+ Coverage   73.92%   74.09%   +0.16%     
==========================================
  Files          15       15              
  Lines         767      772       +5     
  Branches      103      106       +3     
==========================================
+ Hits          567      572       +5     
  Misses        200      200

lucagiovagnoli

This looks great! Thanks for the pull request. Can you please:

address comments
Bump version and changelog (patch version should be ok as this is backwards compatible)

lucagiovagnoli · 2019-12-13T16:21:07Z

src/test/scala/io/github/spark_redshift_community/spark/redshift/RedshiftSourceSuite.scala

+
+    val mockRedshift = new MockRedshift(
+      defaultParams("url"),
+      Map(TableName.parseFromEscaped(defaultParams("dbtable")).toString -> null))


why is the schema null here? (in the Map object MAP[Table, StructType]). Should we use the TestUtils.testSchema ?

Not sure, but I'm guessing the original author copied it from the test above this one, which for some reason uses null. I've changed it to TestUtils.testSchema.

lucagiovagnoli · 2019-12-13T16:22:10Z

src/test/scala/io/github/spark_redshift_community/spark/redshift/RedshiftSourceSuite.scala

@@ -442,6 +442,28 @@ class RedshiftSourceSuite
    mockRedshift.verifyThatExpectedQueriesWereIssued(expectedCommands)
  }

+  test("Include Column List adds the schema columns to the COPY query") {


Could you please write a second small test near this one showing the difference when include_column_list = false ?

Co-authored-by: kyrozetera <[email protected]>

also some minor formatting tweaks for the same test

eeshugerman · 2019-12-14T05:11:50Z

Thanks for reviewing @lucagiovagnoli! I've made the requested changes to the tests.

Bump version and changelog (patch version should be ok as this is backwards compatible)

I bumped the minor version instead, since this PR adds functionality. From https://semver.org:

Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

But I can switch to patch if you prefer.

lucagiovagnoli

minor works! I was exactly trying to follow semver but got confused.

…n-list Add 'include_column_list' parameter

lucagiovagnoli suggested changes Dec 13, 2019

View reviewed changes

Elliott Shugerman and others added 5 commits December 13, 2019 22:00

add 'include_column_list' parameter

849dd82

Co-authored-by: kyrozetera <[email protected]>

fix scalastyle check

edebea4

code review: don't use null for schema in test

bb252e3

also some minor formatting tweaks for the same test

code review: add test for include_column_list=false

5d8f9d2

code review: changelog, bump version

bc113fa

eeshugerman requested a review from lucagiovagnoli December 14, 2019 05:12

eeshugerman changed the title ~~add 'include_column_list' parameter~~ Add 'include_column_list' parameter Dec 14, 2019

lucagiovagnoli approved these changes Dec 16, 2019

View reviewed changes

lucagiovagnoli merged commit d4396f8 into spark-redshift-community:master Dec 16, 2019

pull bot pushed a commit to samklr/spark-redshift that referenced this pull request Sep 2, 2023

Merge pull request spark-redshift-community#58 from eeshugerman/colum…

2d5c49e

…n-list Add 'include_column_list' parameter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'include_column_list' parameter #58

Add 'include_column_list' parameter #58

eeshugerman commented Dec 9, 2019

codecov-io commented Dec 9, 2019 •

edited

Loading

lucagiovagnoli left a comment

lucagiovagnoli Dec 13, 2019

eeshugerman Dec 14, 2019

lucagiovagnoli Dec 13, 2019

eeshugerman commented Dec 14, 2019 •

edited

Loading

lucagiovagnoli left a comment

Add 'include_column_list' parameter #58

Add 'include_column_list' parameter #58

Conversation

eeshugerman commented Dec 9, 2019

My use case

Author's use case

codecov-io commented Dec 9, 2019 • edited Loading

Codecov Report

lucagiovagnoli left a comment

Choose a reason for hiding this comment

lucagiovagnoli Dec 13, 2019

Choose a reason for hiding this comment

eeshugerman Dec 14, 2019

Choose a reason for hiding this comment

lucagiovagnoli Dec 13, 2019

Choose a reason for hiding this comment

eeshugerman commented Dec 14, 2019 • edited Loading

lucagiovagnoli left a comment

Choose a reason for hiding this comment

codecov-io commented Dec 9, 2019 •

edited

Loading

eeshugerman commented Dec 14, 2019 •

edited

Loading