Microsoft Azure Cost Management using Azure Databricks DLT (Delta Live Tables) and PowerBI

Prashanth Kumar
6 min readApr 24, 2024

--

Problem Statement:

Organizations leveraging cloud computing services like Azure often face challenges in managing and optimizing their cloud spending effectively. Without proper visibility into resource utilization and associated costs, organizations may encounter cost overruns, inefficient resource allocation, and difficulty in identifying cost-saving opportunities. Additionally, the dynamic nature of cloud environments, with fluctuating workloads and evolving usage patterns, further complicates cost management efforts.

How to proceed?

This is specifically for Azure Databricks users who are extensively using Azure Databricks for their day to day operations.

The integration of Azure Cost Management with Azure Databricks DLT (Delta Live Tables) addresses the need for comprehensive cost analysis by providing insights into both Azure spending and data processing expenses. This integration allows organizations to understand the full cost implications of their data processing activities and make informed decisions about resource allocation.

Customized Reporting:

Integration of AzureDatabricks will create a good reporting option. PowerBI offers powerful visualization capabilities that enable organizations to create customized reports and dashboards showcasing Azure spending trends, data processing costs, and resource utilization metrics. Integrating Azure Cost Management data with PowerBI enables organizations to generate interactive reports that provide stakeholders with actionable insights into cost optimization opportunities.

Predictive Analysis:

Leveraging historical cost data from Azure Cost Management and data processing metrics from Databricks DLT, organizations can perform predictive analysis to forecast future costs based on anticipated workloads and resource utilization patterns. This predictive capability empowers organizations to proactively adjust resource allocation and optimize costs before they escalate.

Now lets look into actual Implementation steps to Achieve this.

Implementation

  1. First step is to create setup for Azure Cost Management, for this we are just going to use simple PowerShell script to create a new export.

There are couple of Scenario’s to create a new export, Make sure you select all available columns so that it would be easy for you to filter based on multiple conditions.

az costmanagement export create --name "NameOftheScope" --type "Usage" --dataset-configuration columns="InvoiceSectionName" columns="AccountName" columns="AccountOwnerId" columns="SubscriptionId" columns="SubscriptionName" columns="ResourceGroup" columns="ResourceLocation" columns="Date" columns="ProductName" columns="MeterCategory" columns="MeterId" columns="MeterName" columns="MeterRegion" columns="UnitOfMeasure" columns="Quantity" columns="CostInBillingCurrency" columns="EffectivePrice" columns="CostCenter" columns="ConsumedService" columns="ResourceId" columns="Tags" columns="OfferId" columns="ServiceInfo1" columns="ServiceInfo2" columns="ResourceName" columns="ReservationId" columns="ReservationName" columns="UnitPrice" columns="ProductOrderId" columns="ProductOrderName" columns="PublisherType" columns="PublisherName" columns="ChargeType" columns="Frequency" columns="PricingModel" columns="AvailabilityZone" columns="BillingAccountId" columns="BillingAccountName" columns="BillingCurrencyCode" columns="BillingPeriodStartDate" columns="BillingPeriodEndDate" columns="Term" columns="AdditionalInfo" --timeframe "MonthToDate" --storage-container="exports" --storage-account-id="/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx/resourceGroups/ResourceGroupName/providers/Microsoft.Storage/storageAccounts/StorageAccountName" --storage-directory="ad-hoc" --recurrence "Daily" --recurrence-period from="2024-04-19T03:00:00Z" to="2050-02-01T00:00:00Z" --schedule-status "Active" --scope "/subscriptions/xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxx"

In the above example I have used all available columns for my report, Make sure you select “ — recurrence-period” should be always a future date and end date will be a future one.

The last option of — scope is from where you want to pull the cost(s). This command you can run either from Azure CLI or PowerShell.

2. Now we have a data in Azure Storage Container →Next step is to Load Data onto Azure Delta Live Tables. For this lets open Azure Databricks → Click on Delta Live Tables.

Click on New Pipeline → enter all the details.

Make sure under Source Code → Define source code where your Code is running for any refresh activity. Here is the code.

CREATE OR REFRESH STREAMING LIVE TABLE costs_azure_raw USING DELTA PARTITIONED BY (BillingPeriodStartDate)
COMMENT "RAW Azure costs per item updated daily, duplicated each day of the month, use the clean data for reporting"
TBLPROPERTIES ("quality" = "raw", pipelines.autoOptimize.zOrderCols = "Date")
AS
SELECT *,
_metadata.file_name as filepath FROM cloud_files("${azurepath}", "csv",
map("header", "true",
"timestampFormat", "MM/dd/yyyy",
"charToEscapeQuoteEscaping",'"',
"escape", '"',
"cloudFiles.inferColumnTypes", "true"
))

This code path you need to add it in Source Code:

Under Destination make sure to provide your Destination Schema

Under Advanced give your Azure Storage container path, In our case as I am using multiple folders so i have used top level folder.

abfss://exports@xxxxxxxxstoragedev.dfs.core.windows.net/ad-hoc/

3. For post processing of Data you need to run below SQL query and use below to load data onto the table.

CREATE OR REPLACE TABLE azure_costing.costs_azure
COMMENT 'Azure costs cleaned and ready for use in reporting. CostInBillingCurrency is the actual cost per item, IncurredDate is the day it happened'
AS
with mostdayspermonth as (
SELECT filepath, row_number() OVER(PARTITION BY SubscriptionName, BillingPeriodStartDate ORDER BY count(*) desc) AS r
FROM azure_costing.costs_azure_raw
group by SubscriptionName, BillingPeriodStartDate, filepath
)
SELECT
SubscriptionName,
ResourceGroup,
ResourceLocation,
Date as IncurredDate,
ProductName,
MeterCategory,
MeterName,
MeterRegion,
UnitOfMeasure,
Quantity,
EffectivePrice,
CostInBillingCurrency,
ConsumedService,
ResourceId,
Tags,
AdditionalInfo,
UnitPrice,
ProductOrderName,
Term,
PublisherType,
PublisherName,
ChargeType,
Frequency,
PricingModel,
BillingCurrencyCode,
BillingPeriodStartDate,
BillingPeriodEndDate,
lower(trim(coalesce(nullif(replace(substring(regexp_extract(Tags,'("ClusterName.*?":.*?".*?")'),15),'"'),''),''))) as ClusterName
from azure_costing.costs_azure_raw
where filepath in (SELECT filepath from mostdayspermonth where r = 1)
and Date < current_date() - 1

You should be seeing a new tables something like this

Once we open table → click on Sample Data → you should be able to see data from different subscriptions.

Just for a testing I ran sql query to see if we have data from different subscriptions, here is the quick snippet.

PowerBI:

PowerBI offers powerful visualization capabilities that enable organizations to create customized reports and dashboards showcasing Azure spending trends, data processing costs, and resource utilization metrics. Integrating Azure Cost Management data with PowerBI enables organizations to generate interactive reports that provide stakeholders with actionable insights into cost optimization opportunities.

Steps to Integrate Azure Databricks with PowerBI.

  1. first make sure you get valid ADB instance name.
  2. Next click on your compute cluster → go to Advanced tab → under JDBC/ODBC → capture http path.
  3. For authorization you need to pass Databricks PAT token, make sure to generate token well in advance.

This is how your Source connection should look like

= Databricks.Catalogs("adb-xxxxxxxxxxxx.14.azuredatabricks.net", "/sql/protocolv1/o/xxxxxxxx/1234-123456-b9ulmva", [Catalog=null, Database=null, EnableAutomaticProxyDiscovery=null])

Later you can see columns under Data

On a high level this is how the final dashboard looks

For Data Source connectivity.

Open PowerBI →Click on Transform Data → Transform Data

Once it opens a new window → Under Applied Steps click “Source”.

Click on Table → It will give you pop up window → Click Continue

and then appropriate table.

Additional info:

Here we have used Import option, however if someone wants they can use direct query as well. Along with that they can make use of Unity catalog.

But base underlying logic will remain same.

--

--