Azure Databricks Creation using PowerShell and Terraform

Prashanth Kumar
4 min readSep 10, 2021

--

Introduction

Over last couple of years Databricks have evolved a lot and lot of enhancements happened in that space as well. As i see lot of options are available to create PowerShell or Terraform or Using Azure CLI. As an experiment i tried using all 3 of them to see how it can be integrated well.

Purpose of this article is to

  • create Databricks using PowerShell, Terraform and using Rest API call.

Content

  • azure-pipelines.yml — Azure pipelines yml file for to Build and deploy it to Azure cloud
  • checkcluster.ps1 — This file is to check cluster creation and its status
  • createcluster.ps1 — This file is responsible to create cluster under Databricks.
  • createresourcegroup.ps1 — This file is for to create a new resource group
  • databricks-workspace-template.json —This file is responsible for actual json body of Databricks workspace and its associated components which will be a part of it.
  • databrickstoken.ps1 → This file is responsible is to get token from Azure and assign it to Databricks for secure login.
  • main.tf → This file contains your actual cluster creation, RG creation and deployment mode.
  • newworkspace.ps1 → This file is responsible for to create a new workspace in Databricks.
  • validatetemplate.ps1 → This file will monitor to see if the databricks got created successfully or is there any errors during creation.
  • variables.tf → This file contains different properties which will be a part during any terraform call.
  • versions.tf → This file contains provider details along with version.

Files Content

Requirements

  • terraform = 2.26.0
  • AppRegistration in Azure cloud
  • Powershell

Build Stage

  1. first step is to create a new Enterprise application in Azure, here in our case i have created a new App given the name as “AzureDatabricks”. Once you create please copy ApplicationID.

2. Now lets go to our AzureDevOps git repo where are modifying the files. Please update “databrickstoken.ps” file at line 2.

$TOKEN = (az account get-access-token — resource 2ff814a6–3304–4ab8–85cb-xxxxxxxxxx | jq — raw-output ‘.accessToken’)

3. Now lets update the same in “Azure-pipelines.yml” file → at line 79

$TOKEN = (az account get-access-token — resource 2ff814a6–3304–4ab8–85cb-xxxxxxxxx | jq — raw-output ‘.accessToken’)

I have added Azure-pipelines is for the folks who are more comfortable working with yaml format.

4. Now lets start creating a new pipeline, as a first step i am copying the files and creating a artifact.

Databricks CI Pipeline

and adding another task to publish them. Lets trigger a new build.

Release Stage → option 1

Lets create a new release pipeline first using simple Terraform tf files and with new workspace creation using PowerShell script.

  1. Lets open AzureDevOps → go to Release phase → create a new Release pipeline. Add a new (Build) artifact.
  2. As a part of Release pipeline I have added below tasks to create a new Databricks cluster.
  • Azure CLI task to create a new resource group and to save .tfstate file.
  • Adding Azure CLI task to get storage key.
  • Replacing any tokens in Terraform file.
  • Installing Terraform using Terraform Installer with 0.12.3 version.
  • Terraform Init
  • Terraform Plan
  • and finally Terraform Apply with Auto approve option.

Release stage → Option2 using Azure CLI

  • Creating a new resource group.
  • Validating Template for any errors.
  • New Databricks instance creation.
  • Generating Databricks Token.
  • Creating a new Cluster inside Databricks
  • Cluster creation status.
New resource group creation
Template Validation
New Databricks instance creation
Generating Databricks PAT token
Creating a new Databricks cluster
Cluster Validation

Validation

  1. Now lets login to Azure and start validating our new Databricks instance and its supporting components.
  2. Please check Databricks status under specified resource group.

3. Next validation would be Login to Azure Databricks by clicking on Launch Workspace. Make sure it asks for your Login details → it should show “sign in to Databricks” screen.

It will show the workspace

4. check Workspace → make sure you are able to open your workspace.

5. Check if any clusters are getting created or are they ready.

(This is directly related with our task6 in our release pipeline [check cluster])

Source code Github link: learnprofile/Databricks (github.com)

--

--

Prashanth Kumar
Prashanth Kumar

Written by Prashanth Kumar

IT professional with 20+ years experience, feel free to contact me at: Prashanth.kumar.ms@outlook.com

No responses yet