Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example to demonstrate using a custom sys.path #41

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions knowledge_base/job_with_custom_sys_path/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
/.databricks
/.venv
/.vscode
__pycache__
43 changes: 43 additions & 0 deletions knowledge_base/job_with_custom_sys_path/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Job with custom `sys.path`

This example demonstrates how to:
1. Define a job that takes parameters with values that derive from the bundle.
2. Use the path parameter to augment Python's `sys.path` to import a module from the bundle.
3. Access job parameters from the imported module.

## Prerequisites

* Databricks CLI v0.230.0 or above

## Usage

This example includes a unit test for the function defined under `my_custom_library` that you can execute on your machine.

```bash
# Setup a virtual environment
uv venv
source .venv/bin/activate
uv pip install -r ./requirements.txt

# Run the unit test
python -m pytest
```

To deploy the bundle to Databricks, follow these steps:

* Update the `host` field under `workspace` in `databricks.yml` to the Databricks workspace you wish to deploy to.
* Run `databricks bundle deploy` to deploy the job.
* Run `databricks bundle run print_bundle_configuration` to run the job.

Example output:

```
% databricks bundle run print_bundle_configuration
Run URL: https://...

2024-10-15 11:48:43 "[dev pieter_noordhuis] Example to demonstrate job parameterization" TERMINATED SUCCESS
```

Navigate to the run URL to observe the output of the loaded configuration file.

You can execute the same steps for the `prod` target.
1 change: 1 addition & 0 deletions knowledge_base/job_with_custom_sys_path/config/dev.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[ "this is my development config" ]
1 change: 1 addition & 0 deletions knowledge_base/job_with_custom_sys_path/config/prod.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[ "this is my production config" ]
1 change: 1 addition & 0 deletions knowledge_base/job_with_custom_sys_path/config/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[ "this is my test config" ]
20 changes: 20 additions & 0 deletions knowledge_base/job_with_custom_sys_path/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
bundle:
name: job_with_custom_sys_path

include:
- ./resources/*.job.yml

workspace:
host: https://e2-dogfood.staging.cloud.databricks.com

targets:
dev:
default: true
mode: development

prod:
mode: production

# Production mode requires explicit configuration of the identity to use to run the job.
run_as:
user_name: "${workspace.current_user.userName}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from .loader import load_configuration

__all__ = [
"load_configuration",
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import json
from os import path

from my_custom_library import parameters


def load_configuration() -> any:
"""
Load the configuration file for the bundle target.
"""
config_file_path = path.join(
parameters.bundle_file_path(), "config", f"{parameters.bundle_target()}.json"
)
with open(config_file_path, "r") as file:
return json.load(file)
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from functools import cache


@cache
def bundle_file_path() -> str:
"""
Return the bundle file path.

This function expects a job parameter called "bundle_file_path" to be set.

It is mocked during testing.

The dbutils import is done inside the function so it is omitted when run locally.
"""
from databricks.sdk.runtime import dbutils
return dbutils.widgets.get("bundle_file_path")


@cache
def bundle_target() -> str:
"""
Return the bundle target.

This function expects a job parameter called "bundle_target" to be set.

It is mocked during testing.

The dbutils import is done inside the function so it is omitted when run locally.
"""
from databricks.sdk.runtime import dbutils
return dbutils.widgets.get("bundle_target")
2 changes: 2 additions & 0 deletions knowledge_base/job_with_custom_sys_path/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
databricks-sdk
pytest
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
resources:
jobs:
print_bundle_configuration:
name: Example to demonstrate job parameterization

parameters:
- # The bundle deployment's root file path in the workspace.
name: "bundle_file_path"
default: "${workspace.file_path}"

- # The bundle target name (e.g. "dev" or "prod").
name: "bundle_target"
default: "${bundle.target}"

tasks:
- task_key: print
notebook_task:
notebook_path: ../src/print.ipynb
49 changes: 49 additions & 0 deletions knowledge_base/job_with_custom_sys_path/src/print.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The below cell retrieves the path to this bundle's deployment file path,\n",
"and adds it to the Python path."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"from databricks.sdk.runtime import dbutils\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to import dbutils from the SDK here? It seems like this notebook will only be run from the workspace.

"bundle_file_path = dbutils.widgets.get(\"bundle_file_path\")\n",
"sys.path.append(bundle_file_path)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if this solves the customer problem. But there's another approach here where you could support interactive execution as well: you could have a %run ./add-sys-path here. The add-sys-path would then be a notebook that does something like https://github.com/databricks/bundle-examples/blob/main/default_python/fixtures/.gitkeep#L11 to add its containing folder to the sys path.

]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from my_custom_library import load_configuration\n",
"from pprint import pprint\n",
"\n",
"pprint(load_configuration())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from os import path

import my_custom_library
import my_custom_library.parameters


def mock_bundle_file_path(monkeypatch):
def mock():
return path.join(path.dirname(__file__), "..")

monkeypatch.setattr(
my_custom_library.parameters,
"bundle_file_path",
mock,
)


def mock_bundle_target(monkeypatch):
def mock():
return "test"

monkeypatch.setattr(
my_custom_library.parameters,
"bundle_target",
mock,
)


def test_load_configuration(monkeypatch):
mock_bundle_file_path(monkeypatch)
mock_bundle_target(monkeypatch)

configuration = my_custom_library.load_configuration()
assert configuration == ["this is my test config"]