Azure Machine Learning Studio Deployment using Github Actions.

8 min readNov 14, 2023

Introduction:

Recently I was working along with Data scientists and Analytics engineers and one of the common problem statement they mentioned about adding new models/experiments to Azure ML studio along with assigning new environments/experiments. Most of the times they have to do it manually rather than creating everything manual they want this to be automated.

Azure Machine Learning (ML) workflows involve a series of steps, from setting up a workspace to executing training jobs. This article explores a Python script leveraging the Azure Machine Learning (Azure ML) SDK to orchestrate these tasks seamlessly. We’ll break down the script into sub-elements, explaining each component’s role in the overall process.

Here we are Azure Machine Learning (Azure ML) allows you to create end-to-end machine learning workflows, from data preparation to model deployment. In this article, we’ll focus on setting up an Azure ML pipeline using Github actions.

The intention for using Python script as it is user friendly from Data scientists/Analytic engineers point of view, also many of them are adopting python.

Step 1: Set Up Your Azure ML Workspace

First, you need to connect to your Azure ML workspace using the appropriate credentials. Replace "subscription_id", "resource_group", and "workspace_name" with your own values.


import azureml.core
from azureml.core import Workspace

subscription_id = "subscription_id"
resource_group = "resource_group"
workspace_name = "workspace_name"

# Load the workspace config
workspace = Workspace.get(
    name='workspace_name',
    subscription_id='subscription_id',
    resource_group='resource_group'
)

Here, the Workspace.get method establishes a connection to the Azure ML workspace using provided subscription, resource group, and workspace name.

Step 2: Create a Compute Target

Next, you’ll create a compute target for your Databricks notebook. This step ensures that you have a designated environment for running your notebook.


from azureml.core.compute import AmlCompute, ComputeTarget

compute_name = "notebook-compute101"
if compute_name in workspace.compute_targets:
    compute_target = workspace.compute_targets[compute_name]
else:
    # Provision a new compute target
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS2_V2", max_nodes=4)
    compute_target = ComputeTarget.create(workspace, compute_name, compute_config)
    compute_target.wait_for_completion(show_output=True)
print("New compute name: {}".format(compute_name))

You can find other sizes at : https://learn.microsoft.com/en-us/azure/machine-learning/concept-compute-target?view=azureml-api-2

Step 3: Defining python Conda Dependencies and Environment

Create a Conda environment with the necessary packages for your notebook. You can adjust the pip_packages list as needed.

from azureml.core.conda_dependencies import CondaDependencies

conda_dep = CondaDependencies.create(
    conda_packages=['pandas', 'scikit-learn', 'numpy', 'scipy'],
    pip_packages=['azureml-sdk', 'seaborn', 'azureml-mlflow', 'psutil', 'ipykernel', 'matplotlib']
)

To check if Conda dependencies are deployed fine or not. Open Azure ML studio → click on Environments → click on newly created environment → under details.

You can find more information about conda: https://www.activestate.com/resources/quick-reads/how-to-manage-python-dependencies-with-conda/

Step 4: Uploading Jupyter Notebook to Default datastore

Sometimes Jupyter notebook files will be required by Data scientists or Analytics engineers, So for this we are uploading the file to default Azure ML datastore. This step is vital for making data and code accessible during experiment runs.

datastore = workspace.get_default_datastore()

local_file_path = "./notebooks/TestIMSDatabaseHelper.ipynb"
target_path = "Users/Prashanth.kumar4/TestIMSDatabaseHelper101.ipynb"

# Uploading the Jupyter notebook to datastore
datastore.upload_files(files=[local_file_path], target_path=target_path, overwrite=True)
print("Notebook uploaded successfully.")

To validate open Azure ML Studio → click on Data → click on Datastores → click on browse → expand on my named folder

Step 5: Creating new Environment

Azure Machine Learning environments are an encapsulation of the environment where your machine learning training happens. They specify the Python packages, environment variables, and software settings around your training and scoring scripts. They also specify runtimes (Python, Spark, or Docker). The environments are managed and versioned entities within your Machine Learning workspace that enable reproducible, auditable, and portable machine learning workflows across a variety of compute targets.

from azureml.core import Environment

environment_name = 'notebook101'
conda_environment_name = 'project_environment'
environment = Environment.get(workspace, name=environment_name)

# Register the environment
environment.register(workspace=workspace)
print("New environment name: {}".format(environment_name))

You can find more information about environments at

About Azure Machine Learning environments - Azure Machine Learning

Learn about machine learning environments, which enable reproducible, auditable, & portable machine learning dependency…

learn.microsoft.com

https://azure.github.io/azureml-cheatsheets/docs/cheatsheets/python/v1/environment/#:~:text=Azure%20ML%20Environments%20are%20used,can%20use%20custom%20docker%20images.

Step 6: Setting up new experiment(s).

An Azure Machine Learning experiment represent the collection of trials used to validate a user’s hypothesis. Experiment has to be associated with specific workspace.

from azureml.core import Experiment

experiment_name = 'mlworkspace101'
experiment = Experiment(workspace, experiment_name)

Step 7: Running experiment with specific python model/training.

The script configures and submits a training job as a script run configuration, incorporating the specified compute target and environment.

from azureml.core.runconfig import ScriptRunConfig



# Create a ScriptRunConfig
src = ScriptRunConfig(
    source_directory="./script",
    script="./train.py",
    compute_target=compute_target,
    environment=environment
)

# Submit the training job
run = experiment.submit(src, tags={'job_name': new_job_name})

# Wait for the run to complete
run.wait_for_completion(show_output=True)

This segment executes the training job, leveraging the defined environment and compute resources.

In order to check Model execution, go to Azure ML studio → click on Compute → select your newly created Compute cluster → click on Jobs tab → you can see new job.

further click on that display name → click on “Outputs + Logs” → you can see the output

Also you can view the Jobs at → click on your experiment and it will show latest ran job.

Issues:

Before I conclude while working with Azure ML studio I have seen some Issues or may be it is by design.

Experiment(s): when you keep working on experiments and if you want to drop them unfortunately Azure ML studio just gives an option as Archive rather than permanent delete. So if you click on “view archived experiments” you can see all your old experiments list and then if you want to recover them there is an option “Unarchive experiment”.

2. Environment(s): Same goes with environments, if you click on custom environments → select toggle button “Include archived”. It will show your previous environment and then it gives you the option to restore them. However sometimes to keep studio clean Data scientist may be looking for Permanent delete option.

3. Pipeline jobs : next issue is related with Pipeline jobs it doesn't show any previous execution list. The only way to look only current running one (if someone can give more guidance on this happy to learn).

4. Running Models: One of the important issue which I noticed when you are using Github actions run your python script make sure you create a separate folder to hold all your python script. If you keep it outside unfortunately you will see error as “your project exceeds the file limit on 2000”.

Conclusion

In this article, you’ve learned how to set up an Azure ML pipeline that includes:

Creating new compute cluster
Creating new environment
Creating new experiment
Uploading your notebooks to default datastore
Running your new models
Running pipeline & jobs.

Here is the full python script which i have used it

import azureml.core
from azureml.core import Workspace, ComputeTarget, ScriptRunConfig, Datastore
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core import Experiment
from azureml.core import Environment
from azureml.core import environment


subscription_id = "subscription_id"
resource_group = "resource_group"
workspace_name = "workspace_name"

# Load the workspace config
workspace = Workspace.get(
    name='workspace_name',
    subscription_id='subscription_id',
    resource_group='resource_group'
)

#Incase if the compute target already exist & which you want to utilize then use line 23/24
#compute_target_name = "notebook-compute"
#compute_target = ComputeTarget(workspace, compute_target_name)


# Create a new compute target (if not already created), need to find out how to attach existing one.
compute_name = "notebook-compute101"
if compute_name in workspace.compute_targets:
    compute_target = workspace.compute_targets[compute_name]
else:
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS2_V2", max_nodes=4)
    compute_target = ComputeTarget.create(workspace, compute_name, compute_config)
    compute_target.wait_for_completion(show_output=True)
print ("new compute name: {compute_name}")



# Define the conda dependencies for your environment
conda_dep = CondaDependencies.create(
    conda_packages=['pandas', 'scikit-learn', 'numpy', 'scipy'],
    pip_packages=['azureml-sdk', 'seaborn', 'azureml-mlflow', 'psutil', 'ipykernel', 'matplotlib']
)





# Get the default datastore in your workspace
datastore = workspace.get_default_datastore()


# Jupyter notebook file you want to upload
local_file_path = "./notebooks/TestIMSDatabaseHelper.ipynb"

# Target path within the datastore where the file should be uploaded
target_path = "Users/TestIMSDatabaseHelper101.ipynb"

# Uploading new Jupyter notebook to datastore
datastore.upload_files(files=[local_file_path], target_path=target_path, overwrite=True)
print("Notebook uploaded successfully.")


# Specify existing custom environment details then use line 61, 62, 63
#environment_name = 'notebook1'
#conda_environment_name = 'project_environment'
#environment = Environment.get(workspace, name=environment_name)




# Create a new environment
environment_name = 'notebook101'
conda_environment_name = 'project_environment'
environment = Environment.get(workspace, name=environment_name)

# Register the environment
environment.register(workspace=workspace)
print("New environment name: {}".format(environment_name))

# Create a new experiment & associated EnvironmentImage
#image_name = environment_name.name + "image"
#experiment_image_name = Experiment(workspace, name="testexperiment")
#print ("experiment_image_name: {experiment_image_name}")




# New experiment name
#experiment_name = 'qgcdetmlworkspace'
experiment_name = 'mlworkspace101'
experiment = Experiment(workspace, experiment_name)

# Tags the name for your new training job
new_job_name = 'test_training_job_name101'

# Specify the existing compute target
compute_target_name = 'notebook-compute101'
compute_target = ComputeTarget(workspace, compute_target_name)

# Create a ScriptRunConfig
src = ScriptRunConfig(
    source_directory="./script",
    script="./train.py",
    compute_target=compute_target,
    environment=environment
)

# Submit the training job
run = experiment.submit(src, tags={'job_name': new_job_name})

# Wait for the run to complete
run.wait_for_completion(show_output=True)

Here is my Github actions yaml file

name: Azure ML Workflow

on:
  push:
    branches:
      - main 

jobs:
  deploy_notebook:
    runs-on: ubuntu-latest
    environment: dev
    env:
      SPN_CERT_PASSWORD: ${{ secrets.SPN_CERT_PASSWORD }}
      SPN_CLIENT_ID: ${{ secrets.SPN_CLIENT_ID }}
      SPN_TENANT_ID: ${{ secrets.SPN_TENANT_ID }}
      SUBSCRIPTION_ID: ${{ secrets.SUBSCRIPTION_ID }}
      
    steps:
    - name: Checkout Repository
      uses: actions/checkout@v2

    - name: Login to Azure
      uses: Azure/login@v1
      with:
       creds: '{"clientId":"${{ env.SPN_CLIENT_ID }}","clientSecret":"${{ env.SPN_CERT_PASSWORD }}","subscriptionId":"${{ env.SUBSCRIPTION_ID }}","tenantId":"${{ env.SPN_TENANT_ID }}"}'

    - name: Set up virtual environment
      run: az extension add -n ml -y
      
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: 3.x

    - name: Set up virtual environment
      run: python -m venv venv

    - name: Activate virtual environment
      run: source venv/bin/activate


    
    - name: Install Dependencies
      run: |
         pip install azureml.core
         pip install setuptools
         pip install azureml.train
         
         
    - name: List Files
      run: ls -la
    
    - name: uploading jupyter notebooks 
      run: python mlstudionotebookpush.py

feel free to provide your comments.