Multiple tables support (destination) #120

lovromazgon · 2024-06-25T11:53:26Z

Description

TBD

Quick checks:

I have followed the Code Guidelines.
There is no other pull request for the same update/change.
I have written unit tests.
I have made sure that the PR is of reasonable size and can be easily reviewed.

…yet)

lovromazgon

Some notes about the changes.

lovromazgon · 2024-06-27T14:38:32Z

destination/compress/gzip.go

+	pr, pw := io.Pipe()
+	w := gzip.NewWriter(pw)


Compression now happens in memory while uploading the file.

lovromazgon · 2024-06-27T14:39:17Z

destination/schema/snowflake/datatype.go

+)
+
+// DataType represents a Snowflake data type.
+type DataType interface {


DataType was added as a way to represent the Snowflake data type.

lovromazgon · 2024-06-27T14:40:20Z

destination/schema/snowflake/table.go

-	"context"
-	"strings"
+// Table represents a Snowflake table.
+type Table struct {


Table is now used everywhere where we need the schema of a table. It collects the info about the table name, its schema (columns, primary keys) and exposes a way to get "connector columns" (operation, created at, updated at, deleted at).

lovromazgon · 2024-06-27T14:43:28Z

destination/writer/batch.go

+	UpdateBatch
+)
+
+type Batch struct {


This struct was introduced to collect everything needed for a batch of records (i.e. a single CSV file). It can be created either using NewInsertBatch or NewUpdateBatch and depending on the function used it will produce a different merge query and a different filename.

lovromazgon · 2024-06-27T14:45:32Z

destination/writer/writer.go

+				return bytes.NewBuffer(nil)
+			},
+		},
+		tableCache: make(map[string]snowflake.Table),


Snowflake tables and their schema is now cached in the writer. We only retrieve it the first time. The assumption is that the table schema won't be modified externally, only through the connector.

lovromazgon · 2024-06-27T14:47:09Z

destination/writer/writer.go

+		col2 := cols2[k]
+		if !strings.EqualFold(col1.Name, col2.Name) {
+			return fmt.Errorf("column %d doesn't match (%s:%T != %s:%T)", k, col1.Name, col1.DataType, col2.Name, col2.DataType)
 		}
+		// TODO check data type? what if the source record has nil values and we don't know types?


This is a bit shady - we compare the schema (columns) we extracted from the first record with the snowflake table schema. The problem is if some fields are nullable and/or the record is partially populated, because we expect the exact same number of columns and the same order of columns.

I'm pretty sure the ordering is not guaranteed right now, as a schema fetched from the record will have connector columns in front (operation, created_at etc.), while the columns in a snowflake table schema will be simply ordered alphabetically. Some work is needed to make sure we have consistent ordering.

lovromazgon · 2024-06-27T14:49:17Z

destination/writer/batch.go

+}
+
+var insertBatchTemplate = template.Must(
+	template.New("insertBatch").


I've rewritten these merge queries to use Go templates, IMO it's easier to read, as you can see what gets inserted in which place.

lovromazgon · 2024-06-27T14:50:08Z

source/interface.go

@@ -12,6 +12,8 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

+//go:generate mockgen -typed -destination=mock/iterator.go -package=mock . Iterator


I created go:generate statements and removed make mockgen.

lovromazgon · 2024-06-27T17:05:42Z

destination/schema/snowflake/datatype_test.go

+		have string
+		want DataType
+	}{{
+		have: `{"type":"FIXED","precision":38,"scale":0,"nullable":true}`,


The test cases are taken from the Snowflake documentation page, except the last couple of cases, which are not described in docs. For those I have created a test table in snowflake and checked the output of SHOW COLUMNS.

lovromazgon · 2024-06-27T17:08:12Z

destination/format/csv.go

 	sdk "github.com/conduitio/conduit-connector-sdk"
-	"github.com/go-errors/errors"


In the files I've touched I removed github.com/go-errors/errors in favor of errors. Now that the builtin package provides Join, Is and As I don't see a good reason to use an external dependency for this.

lovromazgon · 2024-06-30T16:51:06Z

destination/schema/snowflake/datatype.go

+	"encoding/json"
+	"fmt"
+
+	"github.com/lovromazgon/jsonpoly"


Note that I extracted the code that handled polymorphic JSON types into a library, as I thought it could be generally useful (github.com/lovromazgon/jsonpoly. In commit 2c8ff92 I replaced the code in the connector in favor of using the lib.

samirketema and others added 2 commits June 25, 2024 19:56

progress - destination multiple tables

74f377d

a bit of refactoring, a bit of multiple collection support (not done …

39c14e9

…yet)

lovromazgon force-pushed the samir/dest-multiple-tables branch from 65d878e to 39c14e9 Compare June 26, 2024 19:26

lovromazgon added 2 commits June 27, 2024 16:37

update writer and format csv

fd17c36

use request ID as id of batch

cc0b313

lovromazgon commented Jun 27, 2024

View reviewed changes

use lovromazgon/jsonpoly instead of custom polymorphic code

2c8ff92

lovromazgon commented Jun 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple tables support (destination) #120

Multiple tables support (destination) #120

lovromazgon commented Jun 25, 2024

lovromazgon left a comment

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 27, 2024

lovromazgon Jun 30, 2024

		sdk "github.com/conduitio/conduit-connector-sdk"
		"github.com/go-errors/errors"

Multiple tables support (destination) #120

Are you sure you want to change the base?

Multiple tables support (destination) #120

Conversation

lovromazgon commented Jun 25, 2024

Description

Quick checks:

lovromazgon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment