Ksama Arora

Run Training Script as Command Job in Azure Ml

Step 1: Convert a Notebook to a Script

Scripts are ideal for testing and automation in production. To create a production-ready script:

Example notebook code to read and split data:

# read and visualize the data
print("Reading data...")
df = pd.read_csv('diabetes.csv')
df.head()

# split data
print("Splitting data...")
X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

Refactored into two functions - Read the data & Split the data:

# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
def main(csv_file):
    df = get_data(csv_file)

    # Split data
    X_train, X_test, y_train, y_test = split_data(df)

# function that reads the data
def get_data(path):
    df = pd.read_csv(path)
    return df

# function that splits the data
def split_data(df):
    X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
    return X_train, X_test, y_train, y_test

Test the script in the terminal:

Screenshot-2024-06-04-at-1-28-40-AM.png

Screenshot-2024-06-04-at-1-29-09-AM.png

Step 2: Run a Script as a Command Job

Screenshot-2024-06-04-at-1-29-36-AM.png

from azure.ai.ml import command

# Configure job
job = command(
    code="./src", # Folder containing the script
    command="python train.py", # Specifies the file to run
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest", # Required packages
    compute="aml-cluster", # Compute resource
    display_name="train-model", # Job name
    experiment_name="train-classification-model" # Experiment name
)

# Submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

Step 3: Use Parameters in Command Job

Increase script flexibility using parameters.

Using script arguments:

import argparse # using library argparse
import pandas as pd
from sklearn.linear_model import LogisticRegression

def main(args):
    # read data
    df = get_data(args.training_data)

# function that reads the data
def get_data(path):
    df = pd.read_csv(path)
    return df

def parse_args():
    # setup arg parser
    parser = argparse.ArgumentParser()

    # add arguments
    parser.add_argument("--training_data", dest='training_data', type=str)

    # parse args
    args = parser.parse_args()

    # return args
    return args

# run script
if __name__ == "__main__":

    # parse args
    args = parse_args()

    # run main function
    main(args)

Passing arguments to a script:

python train.py --training_data diabetes.csv

Configure command job with arguments:

from azure.ai.ml import command

job = command(
    code="./src",
    command="python train.py --training_data diabetes.csv",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="train-model",
    experiment_name="train-classification-model"
)