Author: Josh Patterson
Date: April 19th, 2019
This tutorial covers the basics of how to use TensorFlow's Estimator API to write modeling code that will run in a consistent fashion in multiple execution modes. In a previous article we looked at how to run a pre-built TensorFlow program in distributed mode on Kubeflow. However, the TensorFlow code itself was rather complex as it had the user dealing with all sorts of things beyond focusing on the model training itself. In this tutorial the reader will learn:
For newer readers who aren't familiar with the landscape of machine learning tooling, we'll start off by defining TensorFlow:
Building on that definition, the TensorFlow Estimator API is a high-level TensorFlow API that makes machine learning programming easier when dealing with different execution modes (e.g., "local", "distributed"). Historically TensorFlow coding has involved a lot of low-level details such as placing specific operations on specific GPUs. Estimators make sharing implementations of models easier to share between data scientists. Another aspect of Estimators is that they build the TensorFlow graph for you and there is no explicit Session. Many data scientists do not want to have to deal with these type details, so Estimators make things considerably simpler. Ultimately it allows the user to get consistent results regardless if they are executing locally or in the cloud in distributed mode."TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications."
Estimators provide a standard way to deal with the following actions:
There are many pre-built Estimators already in the TensorFlow library, but you may write your own custom Estimator
as well. Any Estimator
, built-in or a custom one we create, will be based on the tf.estimator.Estimator
class.
TensorFlow now supports converting any Keras model into an Estimator
, speeding up your model development.
This is done by defining a model with tf.keras.Model
and then converting the model to an tf.estimator.Estimator
object with the tf.keras.estimator.model_to_estimator()
method. Once we've got an Estimator
representation of the Keras model, we can train the model in the same way we'd train any Estimator
model.
Previously we might have used the Experiment class for building TensorFlow training code, but at this point the Experiment class has been marked deprecated. The Estimator class should now be directly used in place of where we'd have used the Experiment class, and it appears to be a better design pattern as well.
The primary steps necessary to write TensorFlow training code with Estimators are:
TrainSpec
and EvalSpec
, respectively) to be passed to tf.estimator.train_and_evaluate
EvalSpec
can also include information on how to export your trained model for prediction (serving).
The .train_and_evaluate(...)
method provides a consistent interaface for training locally or in the cloud, non-distributed or in a distributed-fashion. Check out the TensorFlow documentation for more details. In the next section we show an example of the Estimator API in practice.
We include below a basic TensorFlow Estimator API example code listing. This Estimator API example models the canoncical Iris dataset. While this example is not a complex deep learning model, the Iris dataset is simple and well-understood. The example below allows us to see the Estimator API in action without dealing with the distractions of a more complex model.
In the sections below, we provide commentary on the following areas of the code from above:
RunConfig
classEstimator
with the DNNClassifier
classTrainSpec
classEvalSpec
classWe use the include iris_data.py util functions to download and load the Iris dataset for us locally, as seen in the code snippet below (from the program listing above):
# Fetch the data
(train_x, train_y), (test_x, test_y) = iris_data.load_data()
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_x.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
The iris_data.py utilities do a few things under the hood:
RunConfig
class.
The RunConfig class is part of the Estimator API in TensorFlow and it specifies the configurations for an Estimator
run.
We can see the RunConfig
class in action in our code in the snippet below:
config = tf.estimator.RunConfig(
model_dir="/tmp/tf_estimator_iris_model",
save_summary_steps=1,
save_checkpoints_steps=100,
keep_checkpoint_max=3,
log_step_count_steps=10)
This configuration does a few things for us, but primarily sets things like the directory where we'll save our model checkpoints (e.g., model_dir="/tmp/tf_estimator_iris_model"
) and then how often to checkpoint our model (e.g., save_checkpoints_steps=100
). All of the properties for the RunConfig class are listed in the documentation, which we show the init function here for:
__init__(
model_dir=None,
tf_random_seed=None,
save_summary_steps=100,
save_checkpoints_steps=_USE_DEFAULT,
save_checkpoints_secs=_USE_DEFAULT,
session_config=None,
keep_checkpoint_max=5,
keep_checkpoint_every_n_hours=10000,
log_step_count_steps=100,
train_distribute=None,
device_fn=None,
protocol=None,
eval_distribute=None,
experimental_distribute=None
)
There are other properties that can be set for RunConfig
, but we'll save those for a future article. Now that we've looked at how to configure our job, let's now move on to how to create the Estimator
itself.
For this example we're going to build a small multi-layer perceptron neural network to model our Iris dataset.
# Build 2 hidden layer DNN with 10, 10 units respectively.
estimator = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
model_dir="/tmp/tf_estimator_iris_model",
# Two hidden layers of 10 nodes each.
hidden_units=[10, 10],
# The model must choose between 3 classes.
n_classes=3)
In the code section above, we can see the DNNClassifier
class being instantiated defining our feature columns, where we'll train the model, and then the size of the hidden layers (10 nodes per layer). Finally we tell TensorFlow that we want the model to give us an output based on 3 classes. Let's now move on to setting up the TrainSpec
for the Estimator.
train_input_fn = lambda:iris_data.train_input_fn(train_x, train_y,
opts.batch_size)
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn,
max_steps=1000)
In the code section above we can see two lines. The first line builds a train_input_fn
with the lambda
keyword in python based on the train_input_fn(...)
method in our iris_utils.py utilities.
The second line takes the train_input_fn
we created and passes it to the TrainSpec
class in the Estimator API. This class instance will be used by the Estimator when we train the model in a moment.
Oftentimes we will not have a pre-made input function for our Estimator. To write a custom input function for TrainSpec in Estimators, check out the TensorFlow documentation on datasets for Estimators. Now we'll move on to evaluation with Estimators.
# Evaluate the model.
eval_input_fn = lambda:iris_data.eval_input_fn(test_x, test_y,
opts.batch_size)
eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn,
steps=None,
start_delay_secs=0,
throttle_secs=60)
Similar to TrainSpec
, EvalSpec
uses the lambda keyword in Python to create an input function that is then used directly as a parameter in the EvalSpec
class.
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
Lastly, we'll highlight how it all comes together for training an Estimator
. We use the Estimator
, TrainSpec
, and EvalSpec
as parameters to the tf.estimator.train_and_evaluate( ... ) method. Veterans of TensorFlow programming instantly recognize that this function encapsulates a number of things for training runs and is far easier to use consistently for data scientists. Another interesting aspect of this method is how it can train models in a distributed fashion for TensorFlow without changing the training code.
To run this example application, first we need to pull the code down from github:
git clone https://github.com/pattersonconsulting/tensorflow_estimator_examples.git
Next, change into the new project directory that was just created:
cd tensorflow_estimator_examples
The user needs to account for dependency management in running any Python program. The two common options are:
Once you have at least TensorFlow 1.12.0 installed locally, you should be able to run the example with the command:python tf_estimator_iris_single.py
The console output should look similar to the output shown below:
In this example we walked through the new Estimator API for TensorFlow and highlighted some of its core concepts. We hope the reader enjoyed the walkthrough. In future articles we'll take a look at how this example can be extended to distributed TensorFlow and then further executed on systems such as Kubeflow for on-premise/cloud/hybrid operations.
If you'd like further help in topics such as: