Getting with AWS SageMaker
Share This Class:
Table of Contents
Overview
Most machine learning (ML) projects follow a workflow that involves generating sample data, training a model, and implementing the model.
These steps have subtasks and are iterative.
More often, ML engineers and data scientists need an environment in which they can experiment and prototype ideas more quickly.
After prototyping, deploying and scaling machine learning models is also a mystery few know.
It will be ideal and convenient if, without tedious setup, ML engineers and data scientists can easily transition from experimentation or prototyping to deploying scalable, production-ready ML models. This is where Amazon SageMaker comes in .
What is Amazon SageMaker?
Sagemaker was created to provide a platform to support the development and deployment of machine learning models.
Quoting from the official website:
Amazon SageMaker is a fully managed service that gives all developers and data scientists the ability to quickly create, train, and deploy machine learning (ML) models. SageMaker takes the heavy lifting out of every step of the machine learning process to facilitate high-quality model development.
Traditional ML development is a complex, expensive iterative process, and even more complicated because there are no built-in tools for the entire machine learning workflow. You need to tie together tools and workflows, which is time consuming and error prone. SageMaker solves this challenge by providing all the components used for machine learning in a single set of tools so that models get to production faster with much less effort and at a lower cost.
source: https://aws.amazon.com/sagemaker/
Amazon SageMaker Features:
- Sagemaker Provides customizable Amazon ML instances with a developer-friendly notebook environment preloaded with ML frameworks and libraries.
- Seamless integration with AWS storage services like (s3, RDS DynamoDB, Redshift, etc.) for analytics.
- SageMaker provides more than 15 of the most widely used ML algorithms and also supports the creation of custom algorithms.
To train models in sagemaker, you will have to create a training job specifying the path to your training data in s3, the training script or built-in algorithm, and the EC2 container for training.
After training, the model artifacts are loaded into s3. From this artifact, a model can be created and deployed in EC2 containers with endpoint configuration for prediction or inference.
What we will build
In this tutorial, we will create a machine learning model to predict the sentiment of a text.
The details of data processing and model building are well explained in my previous tutorial . We will focus on training and deploying the model on Amazon Sagemaker.
Optionally, I accompanied this tutorial with a complete notebook to upload to your Sagemaker notebook instance to run alongside this tutorial if you wish.
We are building a custom model and it is much more convenient to use the sagemaker python SDK to train and deploy the model.
The same tasks can be performed using the sagemaker web user interface, mainly when using built-in algorithms.
Steps:
- Step 1: create an Amazon S3 bucket
- Step 2: Create an Amazon SageMaker Laptop Instance
- Step 3: create a Jupyter notebook
- Step 4: download, explore and transform training data (see previous tutorial )
- Step 5: train a model
- Step 6: Deploy the model to Amazon SageMaker
- Step 7: validate the model
- Step 8: Integrate Amazon SageMaker Endpoints into Internet-facing Applications
- Step 9: clean
First, we create a bucket s3. This is where we will store the training data and also where the model artifacts will be saved later.
Create a bucket called tensorflow_sentiment_analysis
Create an Amazon SageMaker laptop instance:
Go to Sagemaker in the AWS console in the left pane, click on the Notebook instance (1) and then click on Create Notebook instance (2) .
on the next page enter the name of the notebook, any name of your choice will work. You can leave the rest as the default for the purpose of this tutorial. After that, clickcreate notebook instance
The notebook will be created and the status will be pending for a short time and then it will change to InService. At this stage, you can click Open Jupyter or Open Jupyter Lab. The difference between the two are differences in the user interface.
I prefer to use Jupyter lab because it has a file explorer and supports multiple tabs for open files and it also feels more like an IDE
Download, explore and transform training data
Download the dataset and upload it to your notebook instance. See this tutorial for an explanation of data exploration and transformation.
The data is transformed and saved in s3.
Before you can use the sagemaker SDK API, you need to create a session,
then call it upload_data
with the name of the data and key prefix
what is the path to the s3 bucket.
This returns the full s3 path of the data file. You can check to check as shown above.
Training the model
To train a TensorFlow model, you must use the TensorFlow estimator from the sagemaker SDK
entry_point : this is the script to define and train your model. This script will run in a container. (More on this later)
role – The role assigned to the running notebook. you get it by running the coderole = sagemaker.get_execution_role()
train_instance_count : the number of container instances that will be activated to train the model.
train_instance_type – The type ofcontainer instance that will be used to train the model.
framwork_version: version of TensorFlow used in the training script. you get it runningtf_version = tf.__version__
py_version : Python version used.
script_mode – If set to True, the estimator will use Script mode wrappers (default: False). This will be ignored if py_version is set to ‘py3’.
This allows arbitrary script code to be run in a container.
Hyperparameters : these are the parameters required to run the training script.
Now that you know what each parameter means, let’s understand the content of the training script.
%%writefile train.py
import argparse
import os
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dense
from tensorflow.keras.layers import Embedding, Dropout
import pandas as pd
if __name__ == '__main__':
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument(‘--pochs’, type=int, default=10)
parser.add_argument(‘--batch-size’, type=int, default=100)
parser.add_argument('--learning-rate’, type=float, default=0.1)
parser.add_argument(‘--gpu-count’, type=int,
default=os.environ['SM_NUM_GPUS'])
# input data and model directories
parser.add_argument(‘--model-dir’, type=str,
default=os.environ['SM_MODEL_DIR'])
parser.add_argument(‘--train’, type=str,
default=os.environ['SM_CHANNEL_TRAIN'])
args, _ = parser.parse_known_args()
epochs = args.epochs
lr = args.learning_rate
batch_size = args.batch_size
gpu_count = args.gpu_count
model_dir = args.model_dir
training_dir = args.train
training_data = pd.read_csv(training_dir+’/train.csv’,sep=’,’)
tweet = training_data.text.values
labels = training_data.airline_sentiment.values
num_of_words = 5000
token = Tokenizer(num_words=num_of_words)
token.fit_on_texts(tweet)
vocab_size = len(token.word_index) + 1 # 1 is added due to 0 index
tweet_sequence = token.texts_to_sequences(tweet)
max_len = 200
padded_tweet_sequence = pad_sequences(tweet_sequence,
maxlen=max_len)
# Build the model
embedding_vector_length = 32
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length,
input_length=max_len))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dropout(0.2))
model.add(Dense(1, activation=’sigmoid’))
model.compile(loss=’binary_crossentropy’,optimizer=’adam’,
metrics=[‘accuracy’])
model.fit(padded_tweet_sequence,labels,validation_split=0.3,
epochs=epochs, batch_size=batch_size, verbose=2)
tf.saved_model.simple_save(
tf.keras.backend.get_session(),
os.path.join(model_dir, ‘1’),
inputs={‘inputs’: model.input},
outputs={t.name: t for t in model.outputs})
Because SageMaker imports your training script, you must put your training code in a main guard ( if __name__=='__main__':
) so that SageMaker does not inadvertently run your training code at the wrong point of execution.
All hyperparameters are passed to the script as command line arguments.
The training script also accesses the environment variables in the training container instance. Like the following
SM_MODEL_DIR
: A string representing the path that the training job writes the model artifacts. After training, the artifacts in this directory are loaded into S3 to host models.SM_NUM_GPUS
: An integer representing the number of GPUs available to the host.SM_CHANNEL_XXXX
: A string representing the path to the directory that contains the input data for the specified channel. For example, if you specify two input channels in the Tensorflow estimator callfit
, named ‘train’ and ‘test’, the environment variablesSM_CHANNEL_TRAIN
and are setSM_CHANNEL_TEST
.
To start training, call the fit
method and pass the training data path to start training. This creates a training job in sagemaker. You can check the training jobs section to see the job created.
If all goes well, you should see the output below in the last section of the output logs.
Deploy the model to Amazon SageMaker
To implement, we call the deploy
method in the estimator passing the following parameters.
initial_instance_count: the initial number of inference instances for lunch.
This can be escalated if the request load increases.
instance_type – The instance type for the inference container.
endpoint_name – A unique name for the endpoint model.
Validating the model
After calling the implementation method, the end point of the model is returned and this can be used to validate the model using test data as shown below.
Integration of Amazon SageMaker endpoints in Internet-facing applications.
The end use of ML models is for applications to send you requests for inference / prediction. This can be achieved using the API gateway and the lambda function.
Applications will make requests to the API endpoint, this will trigger a lambda function, the lambda function will preprocess the data to what the input model expects. that is, convert the text input to a numeric representation and then send it to the model for prediction.
The lambda function receives the result of the prediction, which is then returned to the API gateway to send to users.
Clean up
Make sure to call end_point.delete_endpoint()
to remove the end point from the model.
Then go ahead and delete the files uploaded by sagemaker from your s3 bucket.
conclusion
In this tutorial, you learned how to train and implement deep learning models in Amazon Sagemaker.