Triggering Azure Databricks Jobs from Azure Function App Using Service Principal Authentication
In current Cloud environment, automation and secure integration between services are essential. While Azure Databricks supports triggering jobs via REST APIs using Personal Access Tokens (PATs), this approach can pose security and lifecycle management challenges. In this article, I’ll walk you through a more secure and scalable alternative: using an Azure Function App to trigger Databricks jobs via Azure Active Directory (AAD) tokens obtained through a Service Principal (SPN).
Why Use a Service Principal Instead of PAT?
- PATs are user-scoped and expire frequently.
- SPNs are app-registered identities that can be centrally managed and rotated.
- AAD tokens allow for better integration with enterprise identity and access management policies.
Architecture Overview
- Azure Function App (HTTP Trigger)
- Azure AD App Registration (Service Principal)
- Databricks REST API (Jobs API v2.1)
- Token acquisition via OAuth2 client credentials flow
Code Walkthrough
The first step is to create a new Azure Function App using Visual Studio Code. You can follow this official guide to get started with the setup and deployment process.
The function performs three main tasks:
- Main function serves as the entry point for the Azure Function App
- Acquire an AAD token for Databricks using the SPN.
- Trigger a Databricks job using the acquired token.
Main Function:
The main function serves as the entry point for the Azure Function App. It is triggered by an incoming HTTP request and is responsible for orchestrating the entire flow—from authentication to job execution.
Here we have to define corresponding SPN ClientID, Secret, TenantID, Databricks Instance, JobID.
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info("HTTP trigger function received a request to trigger Databricks job.")
tenant_id = "xxx"
client_id = "xxx"
client_secret = "xxx"
databricks_instance = "adb-xxx.xx.azuredatabricks.net"
job_id = xxx
try:
token = get_azure_ad_token(tenant_id, client_id, client_secret)
result = trigger_databricks_job(databricks_instance, job_id, token)
return func.HttpResponse(
body=json.dumps(result, indent=2),
status_code=200,
mimetype="application/json"
)
except Exception as e:
logging.error("Exception occurred: %s", str(e))
return func.HttpResponse(f"Failed: {str(e)}", status_code=500)
AAD Token retrieval
def get_azure_ad_token(tenant_id, client_id, client_secret, resource="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"):
url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"
payload = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"resource": resource
}
logging.info("Requesting Azure AD token...")
response = requests.post(url, data=payload)
logging.info("Token response status: %s", response.status_code)
logging.info("Token response body: %s", response.text)
response.raise_for_status()
return response.json()["access_token"]- The
resourceID: "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d" is the fixed GUID for Azure Databricks. - The token is retrieved using the OAuth2 client credentials flow.
Trigger Databricks Job
This function calls the Databricks Jobs API to trigger a job using the AAD token.
def trigger_databricks_job(databricks_instance, job_id, token):
url = f"https://{databricks_instance}/api/2.1/jobs/run-now"
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
payload = {
"job_id": job_id
}
logging.info("Sending request to trigger Databricks job...")
logging.info("URL: %s", url)
logging.info("Headers: %s", headers)
logging.info("Payload: %s", payload)
response = requests.post(url, headers=headers, json=payload)
logging.info("Response status: %s", response.status_code)
logging.info("Response body: %s", response.text)
response.raise_for_status()
return response.json()Testing and Validation
To test the Azure Function locally, you can simply run the following command in your terminal:
func host startAlternatively, if you’re using Visual Studio Code, you can:
- Click on the Azure icon in the Activity Bar.
- Expand the Workspace section.
- Navigate to Local Project → Functions.
- Click “Start Debugging” to launch and test your function locally.
This allows you to validate the end-to-end flow before deploying it to Azure.
at Visual Studio Code
Validation at Postman
Validation On Databricks
Go to Job runs → you can view latest job trigger.
If you open Job trigger properties you can view additional details.
Benefits of This Approach
- Improved security: No need to store or rotate PATs.
- Scalability: Easily integrate with CI/CD pipelines or event-driven architectures.
- Auditability: SPN usage can be monitored via Azure AD logs.
Future Enhancements
- Support for triggering notebooks, experiments, or ML models.
- Add parameterization for dynamic job inputs.
- Implement retry logic and status polling.
To further enhance security you can move keys as Environment variables and their give your Keyvault reference. You can then access these secrets securely using Managed Identity and the DefaultAzureCredential class from the Azure SDK.
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
def get_secret(vault_url, secret_name):
credential = DefaultAzureCredential()
client = SecretClient(vault_url=vault_url, credential=credential)
return client.get_secret(secret_name).value
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info("Triggering Databricks job using secrets from Key Vault.")
vault_url = "https://<your-keyvault-name>.vault.azure.net/"
try:
tenant_id = get_secret(vault_url, "databricks-tenant-id")
client_id = get_secret(vault_url, "databricks-client-id")
client_secret = get_secret(vault_url, "databricks-client-secret")
databricks_instance = get_secret(vault_url, "databricks-instance")
job_id = get_secret(vault_url, "databricks-job-id")
token = get_azure_ad_token(tenant_id, client_id, client_secret)
result = trigger_databricks_job(databricks_instance, job_id, token)
return func.HttpResponse(
body=json.dumps(result, indent=2),
status_code=200,
mimetype="application/json"
)
except Exception as e:
logging.error("Error: %s", str(e))
return func.HttpResponse(f"Failed: {str(e)}", status_code=500)Full code
Here is a full python code which i have added under _init_.py
import logging
import azure.functions as func
import requests
import json
def get_azure_ad_token(tenant_id, client_id, client_secret, resource="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"):
url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"
payload = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"resource": resource
}
logging.info("Requesting Azure AD token...")
response = requests.post(url, data=payload)
logging.info("Token response status: %s", response.status_code)
logging.info("Token response body: %s", response.text)
response.raise_for_status()
return response.json()["access_token"]
def trigger_databricks_job(databricks_instance, job_id, token):
url = f"https://{databricks_instance}/api/2.1/jobs/run-now"
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
payload = {
"job_id": job_id
}
logging.info("Sending request to trigger Databricks job...")
logging.info("URL: %s", url)
logging.info("Headers: %s", headers)
logging.info("Payload: %s", payload)
response = requests.post(url, headers=headers, json=payload)
logging.info("Response status: %s", response.status_code)
logging.info("Response body: %s", response.text)
response.raise_for_status()
return response.json()
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info("HTTP trigger function received a request to trigger Databricks job.")
tenant_id = "xxx"
client_id = "xxx"
client_secret = "xxx"
databricks_instance = "xxx"
job_id = xxx
try:
token = get_azure_ad_token(tenant_id, client_id, client_secret)
result = trigger_databricks_job(databricks_instance, job_id, token)
return func.HttpResponse(
body=json.dumps(result, indent=2),
status_code=200,
mimetype="application/json"
)
except Exception as e:
logging.error("Exception occurred: %s", str(e))
return func.HttpResponse(f"Failed: {str(e)}", status_code=500)