title	description	services	documentationcenter	ms.assetid	ms.service	ms.workload	ms.topic	ms.author	author	manager	ms.date
Transform data with Databricks Jar	Learn how to process or transform data by running a Databricks Jar.	data-factory			data-factory	data-services	conceptual	abnarain	nabhishek	shwang	03/15/2018

Transform data by running a Jar activity in Azure Databricks

[!INCLUDEappliesto-adf-asa-md]

The Azure Databricks Jar Activity in a Data Factory pipeline runs a Spark Jar in your Azure Databricks cluster. This article builds on the data transformation activities article, which presents a general overview of data transformation and the supported transformation activities. Azure Databricks is a managed platform for running Apache Spark.

For an eleven-minute introduction and demonstration of this feature, watch the following video:

[!VIDEO https://channel9.msdn.com/Shows/Azure-Friday/Execute-Jars-and-Python-scripts-on-Azure-Databricks-using-Data-Factory/player]

Databricks Jar activity definition

Here is the sample JSON definition of a Databricks Jar Activity:

{
    "name": "SparkJarActivity",
    "type": "DatabricksSparkJar",
    "linkedServiceName": {
        "referenceName": "AzureDatabricks",
        "type": "LinkedServiceReference"
    },
    "typeProperties": {
        "mainClassName": "org.apache.spark.examples.SparkPi",
        "parameters": [ "10" ],
        "libraries": [
            {
                "jar": "dbfs:/docs/sparkpi.jar"
            }
        ]
    }
}

Databricks Jar activity properties

The following table describes the JSON properties used in the JSON definition:

Property	Description	Required
name	Name of the activity in the pipeline.	Yes
description	Text describing what the activity does.	No
type	For Databricks Jar Activity, the activity type is DatabricksSparkJar.	Yes
linkedServiceName	Name of the Databricks Linked Service on which the Jar activity runs. To learn about this linked service, see Compute linked services article.	Yes
mainClassName	The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library.	Yes
parameters	Parameters that will be passed to the main method. This is an array of strings.	No
libraries	A list of libraries to be installed on the cluster that will execute the job. It can be an array of <string, object>	Yes (at least one containing the mainClassName method)

Note

Known Issue - When using the same Interactive cluster for running concurrent Databricks Jar activities (without cluster restart), there is a known issue in Databricks where in parameters of the 1st activity will be used by following activities as well. Hence resulting to incorrect parameters being passed to the subsequent jobs. To mitigate this use a Job cluster instead.

Supported libraries for databricks activities

In the above Databricks activity definition you specify these library types: jar, egg, maven, pypi, cran.

{
    "libraries": [
        {
            "jar": "dbfs:/mnt/libraries/library.jar"
        },
        {
            "egg": "dbfs:/mnt/libraries/library.egg"
        },
        {
            "maven": {
                "coordinates": "org.jsoup:jsoup:1.7.2",
                "exclusions": [ "slf4j:slf4j" ]
            }
        },
        {
            "pypi": {
                "package": "simplejson",
                "repo": "http://my-pypi-mirror.com"
            }
        },
        {
            "cran": {
                "package": "ada",
                "repo": "https://cran.us.r-project.org"
            }
        }
    ]
}

For more details refer Databricks documentation for library types.

How to upload a library in Databricks

Using Databricks workspace UI

To obtain the dbfs path of the library added using UI, you can use Databricks CLI (installation).

Typically the Jar libraries are stored under dbfs:/FileStore/jars while using the UI. You can list all through the CLI: databricks fs ls dbfs:/FileStore/job-jars

Copy library using Databricks CLI

Use Databricks CLI (installation steps).

Example - copying JAR to dbfs: dbfs cp SparkPi-assembly-0.1.jar dbfs:/docs/sparkpi.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transform-data-databricks-jar.md

transform-data-databricks-jar.md

Transform data by running a Jar activity in Azure Databricks

Databricks Jar activity definition

Databricks Jar activity properties

Supported libraries for databricks activities

How to upload a library in Databricks

Using Databricks workspace UI

Copy library using Databricks CLI

Files

transform-data-databricks-jar.md

Latest commit

History

transform-data-databricks-jar.md

File metadata and controls

Transform data by running a Jar activity in Azure Databricks

Databricks Jar activity definition

Databricks Jar activity properties

Supported libraries for databricks activities

How to upload a library in Databricks

Using Databricks workspace UI

Copy library using Databricks CLI