Skip to content

Commit

Permalink
UpdatesafterReview
Browse files Browse the repository at this point in the history
  • Loading branch information
Joseph Newman authored and Joseph Newman committed Nov 3, 2023
1 parent ca3e1ac commit 0ee9472
Showing 1 changed file with 28 additions and 20 deletions.
48 changes: 28 additions & 20 deletions modules/data-loading/pages/spark-connection-via-jdbc-driver.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -27,31 +27,21 @@ This limits the concurrent JDBC connections to 40.
====

== Load From a Data Lake via Spark
. Create a GSQL xref:gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[Loading Job], where you map the source column index to target vertex/edge attribute.
. Use the `write()` function of the `DataFrame` to build a `DataFrameWriter`.
. Use the `write()` function of the `DataFrame` to build a Spark `DataFrameWriter`.
+
.. Specify the `mode("overwrite")` to set the save mode.
.. Specify `format("jdbc")` to leverage the xref:https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html[JDBC Data Source].
.. Specify the JDBC connection properties in configuration options.
. Create a GSQL xref:gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[Loading Job], where you map the source column index to target vertex/edge attribute.

=== 1) Create a Loading Job
.Example:
[source, gsql]
CREATE LOADING JOB load_Social FOR GRAPH Social {
DEFINE FILENAME file1;
DEFINE FILENAME file2;
LOAD file1 TO VERTEX Person VALUES ($0, $1, $2);
LOAD file2 TO EDGE Friendship VALUES ($0, $1);
}

The loading job above, `load_Social` loads the 1st, 2nd, and 3rd columns of source file, `file1`, to the 1st, 2nd, and 3rd attributes of the vertex `Person`.
=== 1) Write a Spark `DataFrameWriter`

Write a Spark DataFrameWriter function that will write data to CSV files, following the example below.

NOTE: You need to choose names for a GSQL loading job and its data files that you will be using in Step 2.

=== 2) Build a `DataFrameWriter`

. Build a `DataFrameWriter`, with `write()`.
.. Specify a `mode("overwrite")`.
.. Specify a `format("jdbc")`.
.. Specify the JDBC connection properties in configuration options.

.Example: `DataFrameWriter` as "df"
[source, gsql]
Expand Down Expand Up @@ -91,10 +81,28 @@ Jerry,45,male
Jenny,33,female
Lizzy,19,female
.Example: Post Request to TigerGraph
=== 2) Create a Loading Job
Write a GSQL loading job, using the job and file names that you used in step 1, to map data from the CSV file(s) to TigerGraph vertices and edges.
.Example:
[source, gsql]
http://host:port/restpp/ddl/Social?tag=load_Person&filename=file1
--data <delimited_data>
CREATE LOADING JOB load_Social FOR GRAPH Social {
DEFINE FILENAME file1;
DEFINE FILENAME file2;
LOAD file1 TO VERTEX Person VALUES ($0, $1, $2);
LOAD file2 TO EDGE Friendship VALUES ($0, $1);
}
The loading job above, `load_Social` loads the 1st, 2nd, and 3rd columns of source file, `file1`, to the 1st, 2nd, and 3rd attributes of the vertex `Person`.
//Alternatively, loading jobs can be run as post requests.
//.Example: Post Request to TigerGraph
//[source, gsql]
//http://host:port/restpp/ddl/Social?tag=load_Social&filename=file1
//--data <delimited_data>
See the pages xref:gsql-ref:ddl-and-loading:creating-a-loading-job.adoc[], xref:gsql-ref:ddl-and-loading:running-a-loading-job.adoc[] and xref:tigergraph-server:API:built-in-endpoints.adoc#_loading_jobs[Loading Jobs as a REST Endpoint] for more information about loading jobs in TigerGraph.
== Advanced Usages with Spark
Expand Down

0 comments on commit 0ee9472

Please sign in to comment.