Azure Databricks — Multiple Asset Bundle Deployment and Runs

Prashanth Kumar
5 min readJul 10, 2024

--

Introduction to Azure Databricks Asset Bundles

Azure Databricks Asset Bundles simplify the deployment and management of Databricks notebooks, libraries, and other resources as cohesive units. They ensure consistency across environments and streamline the deployment process.

Why Use Azure Databricks Asset Bundles?

Asset Bundles offer several advantages:

  • Consistency and Reproducibility: Bundles package notebooks, libraries, and configurations together, ensuring consistent deployments across various environments.
  • Simplified Deployment: By consolidating all necessary assets into a single deployable unit, errors during deployment are minimized.
  • Version Control: Enables tracking of changes and easy rollback to previous versions.
  • Environment Isolation: Facilitates controlled deployment across different environments, enhancing stability and testing capabilities.

Common Problems and Solutions

Problem:

One common problem when deploying and running an Asset bundle in Databricks using any CI tool is that you have to define the bundle name along with other parameters. How do you manage this when deploying multiple bundles?

databricks bundle run <name> --refresh-all

Solution:

Let’s start with the steps to achieve this. To handle multiple bundles, here is the solution: I have added my new bundles to my GitHub repository

Now in order to deal with Multiple bundles here is the solution, I have added my new bundles in my Github repository either you can store them as .dbc or .yml format based on your requirement. Once you are ready to save the files it is good to save them with the job name how they are going to get deploy in Azure Databricks.

  1. Let’s first save all the new .dbc or .yml files in your respective folder; here, I am saving everything under the “Resources” folder.

2. Now, let’s check the Events1.yml file, which contains my job definition and its subsequent tasks.

3. Here, you can see that I have five jobs that need to be part of the Asset Bundle Deployment and run. To proceed, instead of providing an individual command for each asset bundle, I want to dynamically pick all of them.

4. For that, in my GitHub workflow, I have created a new YML Workflow file and defined all my Databricks parameters. First, I want to verify if all new .yml files are being captured or not by adding this step and getting the output:

 - name: List bundle files
run: |
ls xxx/Resources/*.yml | xargs -n 1 basename | sed 's/\.yml$//' > xxx/Resources/bundle_names.txt

5. This step is optional if you want to check and read the contents.

- name: Read bundle names
id: read-bundle-names
run: |
while IFS= read -r line; do
echo "Found bundle: $line"
echo "::set-output name=bundle_names::$line"
done < xxx/Resources/bundle_names.txt

6. Finally, to run the bundle, I am retrieving the bundle names from the bundle_names.txt file and adding them to my final command:

 - name: Run Databricks bundles
run: |
cd WellEventLog/Resources
for bundle_name in $(cat bundle_names.txt); do
echo "Running bundle: $bundle_name"
databricks bundle run "$bundle_name" --refresh-all
done

How the Provided Script Helps

The GitHub Actions workflow Deploy Databricks Asset Bundle automates the deployment of Databricks asset bundles to a specified environment (develop in this case). Here’s how it works:

Trigger: It triggers on a manual workflow dispatch or a specific branch.

Deployment Steps:

Deploy New Bundle: Deploys the bundle to the develop environment using the databricks bundle deploy command.

Pipeline Update: Once deployed, it triggers a pipeline update (pipeline_update-develop) to validate and execute the bundle.

Execution:

List and Read Bundle Names: Lists all .yml files in xxx/Resources/ and sets them as output variables.
Run Bundles: Iteratively runs each bundle found in bundle_names.txt using databricks bundle run, refreshing all associated resources.

Here is the full yml file which anyone can use

name: Deploy Databricks Asset Bundle

on:
workflow_dispatch:

jobs:
deploy-develop:
if: github.ref == 'refs/heads/feature/prashanth1'
name: "Deploy develop bundle"
runs-on: ubuntu-latest
environment: develop
env:
DATABRICKS_HOST: ${{ vars.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
DATABRICKS_BUNDLE_ENV: develop

steps:
- name: Checkout code
uses: actions/checkout@v2


- uses: databricks/setup-cli@main


- run: databricks bundle destroy --auto-approve
working-directory: RootPath/


- run: databricks bundle deploy -t develop
working-directory: RootPath/


pipeline_update-develop:
if: github.ref == 'refs/heads/feature/prashanth1'
name: "Run pipeline update for develop"
runs-on: ubuntu-latest
environment: develop
env:
DATABRICKS_HOST: ${{ vars.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
DATABRICKS_BUNDLE_ENV: develop


needs:
- deploy-develop

steps:

- uses: actions/checkout@v3


- uses: databricks/setup-cli@main


- name: List bundle files
run: |
ls RootPath/Resources/*.yml | xargs -n 1 basename | sed 's/\.yml$//' > WellEventLog/Resources/bundle_names.txt

- name: Read bundle names
id: read-bundle-names
run: |
while IFS= read -r line; do
echo "Found bundle: $line"
echo "::set-output name=bundle_names::$line"
done < RootPath/Resources/bundle_names.txt

# Run the Databricks bundles for each retrieved bundle name.
- name: Run Databricks bundles
run: |
cd WellEventLog/Resources
for bundle_name in $(cat bundle_names.txt); do
echo "Running bundle: $bundle_name"
databricks bundle run "$bundle_name" --refresh-all
done

this is how the output looks while you deploy, in my case specifically Jobs.

and this is my Internal tasks execution runs.

Conclusion

By using Azure Databricks Asset Bundles and automating their deployment with the provided GitHub Actions workflow, teams can ensure consistent, reproducible deployments across environments, mitigate common deployment issues, and maintain better control over their Databricks workflows and configurations. This approach enhances reliability, facilitates collaboration, and supports efficient development practices.

Reference links:

--

--