Getting with AWS SageMaker

Share This Class:

Table of Contents

Overview

Most machine learning (ML) projects follow a workflow that involves generating sample data, training a model, and implementing the model.
These steps have subtasks and are iterative.
More often, ML engineers and data scientists need an environment in which they can experiment and prototype ideas more quickly.
After prototyping, deploying and scaling machine learning models is also a mystery few know.

It will be ideal and convenient if, without tedious setup, ML engineers and data scientists can easily transition from experimentation or prototyping to deploying scalable, production-ready ML models. This is where Amazon SageMaker comes in .

machine learning workflow

What is Amazon SageMaker?

Sagemaker was created to provide a platform to support the development and deployment of machine learning models.
Quoting from the official website:

Amazon SageMaker is a fully managed service that gives all developers and data scientists the ability to quickly create, train, and deploy machine learning (ML) models. SageMaker takes the heavy lifting out of every step of the machine learning process to facilitate high-quality model development.

Traditional ML development is a complex, expensive iterative process, and even more complicated because there are no built-in tools for the entire machine learning workflow. You need to tie together tools and workflows, which is time consuming and error prone. SageMaker solves this challenge by providing all the components used for machine learning in a single set of tools so that models get to production faster with much less effort and at a lower cost.
source: https://aws.amazon.com/sagemaker/

Amazon SageMaker Features:

  • Sagemaker Provides customizable Amazon ML instances with a developer-friendly notebook environment preloaded with ML frameworks and libraries.
  • Seamless integration with AWS storage services like (s3, RDS DynamoDB, Redshift, etc.) for analytics.
  • SageMaker provides more than 15 of the most widely used ML algorithms and also supports the creation of custom algorithms.
SageMaker Features

To train models in sagemaker, you will have to create a training job specifying the path to your training data in s3, the training script or built-in algorithm, and the EC2 container for training.

After training, the model artifacts are loaded into s3. From this artifact, a model can be created and deployed in EC2 containers with endpoint configuration for prediction or inference.

What we will build

In this tutorial, we will create a machine learning model to predict the sentiment of a text.
The details of data processing and model building are well explained in my previous tutorial . We will focus on training and deploying the model on Amazon Sagemaker.
Optionally, I accompanied this tutorial with a complete notebook to upload to your Sagemaker notebook instance to run alongside this tutorial if you wish.

We are building a custom model and it is much more convenient to use the sagemaker python SDK to train and deploy the model.
The same tasks can be performed using the sagemaker web user interface, mainly when using built-in algorithms.

Steps:

  • Step 1: create an Amazon S3 bucket
  • Step 2: Create an Amazon SageMaker Laptop Instance
  • Step 3: create a Jupyter notebook
  • Step 4: download, explore and transform training data (see previous tutorial )
  • Step 5: train a model
  • Step 6: Deploy the model to Amazon SageMaker
  • Step 7: validate the model
  • Step 8: Integrate Amazon SageMaker Endpoints into Internet-facing Applications
  • Step 9: clean

First, we create a bucket s3. This is where we will store the training data and also where the model artifacts will be saved later.
Create a bucket called tensorflow_sentiment_analysis

Create an Amazon SageMaker laptop instance:

Go to Sagemaker in the AWS console in the left pane, click on the Notebook instance (1) and then click on Create Notebook instance (2) .

on the next page enter the name of the notebook, any name of your choice will work. You can leave the rest as the default for the purpose of this tutorial. After that, clickcreate notebook instance

Create a notebook instance

The notebook will be created and the status will be pending for a short time and then it will change to InService. At this stage, you can click Open Jupyter or Open Jupyter Lab. The difference between the two are differences in the user interface.
I prefer to use Jupyter lab because it has a file explorer and supports multiple tabs for open files and it also feels more like an IDE

Pending status
Status in service

Download, explore and transform training data

Download the dataset and upload it to your notebook instance. See this tutorial for an explanation of data exploration and transformation.

The data is transformed and saved in s3.

Before you can use the sagemaker SDK API, you need to create a session,
then call it upload_datawith the name of the data and key prefixwhat is the path to the s3 bucket.
This returns the full s3 path of the data file. You can check to check as shown above.

Training the model

To train a TensorFlow model, you must use the TensorFlow estimator from the sagemaker SDK

TensorFlow estimator

entry_point : this is the script to define and train your model. This script will run in a container. (More on this later)

role – The role assigned to the running notebook. you get it by running the coderole = sagemaker.get_execution_role()

train_instance_count : the number of container instances that will be activated to train the model.

train_instance_type – The type ofcontainer instance that will be used to train the model.

framwork_version: version of TensorFlow used in the training script. you get it runningtf_version = tf.__version__

py_version : Python version used.

script_mode – If set to True, the estimator will use Script mode wrappers (default: False). This will be ignored if py_version is set to ‘py3’.
This allows arbitrary script code to be run in a container.

Hyperparameters : these are the parameters required to run the training script.

Now that you know what each parameter means, let’s understand the content of the training script.

%%writefile train.py
import argparse
import os
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dense
from tensorflow.keras.layers import Embedding, Dropout
import pandas as pd
if __name__ == '__main__':
 
 parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line        arguments to the script.
parser.add_argument(‘--pochs’, type=int, default=10)
parser.add_argument(‘--batch-size’, type=int, default=100)
parser.add_argument('--learning-rate’, type=float, default=0.1)
parser.add_argument(‘--gpu-count’, type=int, 
                                  default=os.environ['SM_NUM_GPUS'])
# input data and model directories
 parser.add_argument(‘--model-dir’, type=str, 
                                 default=os.environ['SM_MODEL_DIR'])
parser.add_argument(‘--train’, type=str, 
                            default=os.environ['SM_CHANNEL_TRAIN'])
args, _ = parser.parse_known_args()
 
 epochs = args.epochs
 lr = args.learning_rate
 batch_size = args.batch_size
 gpu_count = args.gpu_count
 model_dir = args.model_dir
 training_dir = args.train
 
 training_data = pd.read_csv(training_dir+’/train.csv’,sep=’,’)
 tweet = training_data.text.values
 labels = training_data.airline_sentiment.values
 
 num_of_words = 5000
 token = Tokenizer(num_words=num_of_words)
 token.fit_on_texts(tweet)
 
 vocab_size = len(token.word_index) + 1 # 1 is added due to 0 index
 
 tweet_sequence = token.texts_to_sequences(tweet)
 
 max_len = 200
 padded_tweet_sequence = pad_sequences(tweet_sequence, 
                                                 maxlen=max_len)
 
 # Build the model
 embedding_vector_length = 32
 model = Sequential() 
 model.add(Embedding(vocab_size, embedding_vector_length,   
                                        input_length=max_len))
 model.add(Dropout(0.2))
 model.add(LSTM(100)) 
 model.add(Dropout(0.2))
 model.add(Dense(1, activation=’sigmoid’)) 
 model.compile(loss=’binary_crossentropy’,optimizer=’adam’,
                                           metrics=[‘accuracy’]) 
 
 model.fit(padded_tweet_sequence,labels,validation_split=0.3,    
                    epochs=epochs, batch_size=batch_size, verbose=2)
 
 tf.saved_model.simple_save(
 tf.keras.backend.get_session(),
 os.path.join(model_dir, ‘1’),
 inputs={‘inputs’: model.input},
 outputs={t.name: t for t in model.outputs})

Because SageMaker imports your training script, you must put your training code in a main guard ( if __name__=='__main__':) so that SageMaker does not inadvertently run your training code at the wrong point of execution.

All hyperparameters are passed to the script as command line arguments.
The training script also accesses the environment variables in the training container instance. Like the following

  • SM_MODEL_DIR: A string representing the path that the training job writes the model artifacts. After training, the artifacts in this directory are loaded into S3 to host models.
  • SM_NUM_GPUS: An integer representing the number of GPUs available to the host.
  • SM_CHANNEL_XXXX: A string representing the path to the directory that contains the input data for the specified channel. For example, if you specify two input channels in the Tensorflow estimator call fit, named ‘train’ and ‘test’, the environment variables SM_CHANNEL_TRAINand are set SM_CHANNEL_TEST.

To start training, call the fitmethod and pass the training data path to start training. This creates a training job in sagemaker. You can check the training jobs section to see the job created.

Start training
Training work in progress

If all goes well, you should see the output below in the last section of the output logs.

Deploy the model to Amazon SageMaker

To implement, we call the deploymethod in the estimator passing the following parameters.

initial_instance_count: the initial number of inference instances for lunch.
This can be escalated if the request load increases.

instance_type – The instance type for the inference container.

endpoint_name – A unique name for the endpoint model.

Validating the model

After calling the implementation method, the end point of the model is returned and this can be used to validate the model using test data as shown below.

Integration of Amazon SageMaker endpoints in Internet-facing applications.

The end use of ML models is for applications to send you requests for inference / prediction. This can be achieved using the API gateway and the lambda function.

architecture for application integration

Applications will make requests to the API endpoint, this will trigger a lambda function, the lambda function will preprocess the data to what the input model expects. that is, convert the text input to a numeric representation and then send it to the model for prediction.
The lambda function receives the result of the prediction, which is then returned to the API gateway to send to users.

Clean up

Make sure to call end_point.delete_endpoint()to remove the end point from the model.
Then go ahead and delete the files uploaded by sagemaker from your s3 bucket.

conclusion

In this tutorial, you learned how to train and implement deep learning models in Amazon Sagemaker.