Author: Josh Patterson
Date: February 25th, 2019
This tutorial is on how to get Kubeflow 3.5 installed and working on GKE in under 30 minutes. For a review on why Kubeflow is compelling in the current machine leanring infrastructure landscape, check out our report.
If you dont already have an account on Google Cloud Platform (GCloud), you can sign up for a free trial.
Before you install Google Cloud's SDK, make sure and upgrade python to the latest version to avoid issues (e.g., SSL issues). After you have python updated, take a look at the instructions for getting the GCloud SDK working.
For Mac OSX users, a simple way to do this is to use the interactive installer.
Once you have the GCloud SDK working, log into your GCloud account from the command line with the gcloud auth tool:
gcloud auth login
This will pop up some screens in your web browser asking for permission via OAuth for GCloud tools to access the Google Cloud Platform.
make sure we have enabled the Google Kubernetes API: https://console.cloud.google.com/apis/library/container.googleapis.com
We need a project on Google Cloud Platform to organize our project resources inside. To create a project on GCloud, follow the instructions in the video embedded below:
We also will need to enable the Kubernetes Engine API on GCP.
Once you have the project created, check to make sure it shows up from the command line with the command:
gcloud projects list
This command lists all of the projects we have in our account on the google cloud platform. The output should be similar to below:
Now we need to create a Kubernetes cluster for our project on Google Cloud Platform. First, we need to set our current working project from the command line so we'll use the command as shown below:
PROJECT_ID=kubeflow-3-5-project
gcloud config set project $PROJECT_ID
gcloud container clusters create [your-cluster-name-here] \
--zone us-central1-a --machine-type n1-standard-2
Note that the name of the project and the project ID may not be exactly the same, so be careful. Most of the time we want to use the Project_ID of our project when working from the command line. It may take 3-5 minutes for the system to complete the kubernetes cluster setup on GCP.
kubectl controls the Kubernetes cluster manager and is a command line interface for running commands against Kubernetes clusters. We use kubectl to deploy and manage applications on Kubernetes. Using kubectl, we can
An easy way to install kubectl on OSX is to use the brew command:
brew install kubernetes-cli
Once we have kubectl, we need permission for it to talk to our remote managed kubernetes cluster on GCP. We get the credentials for our new kubernetes cluster with the command:
gcloud container clusters get-credentials kubeflow-codelab --zone us-central1-a
This command writes a context into our local ~/.kube/context
file so kubectl knows where to look for the current cluster we're working with. In some cases, you will be working with multiple clusters, and their context information will also be stored in this file.
Once we can connect to our kubernetes cluster with kubectl, we can check out the status of the running cluster with the command:
kubectl cluster-info
We should see output similar to below:
Ksonnet is a CLI-supported framework for extensible Kubernetes configurations. Ksonnet provides an organizational structure and specialized features for managing configurations across different clusters and environments. For this demo we'll use Ksonnet (the ks command) to install a specific version of Kubeflow (v3.5) and deploy it to our new managed GKE cluster. Let's take a quick look at some key terminology used with ksonnet applications.
A Ksonnet environment is a unique location to deploy our application to. It consists of:
A Ksonnet prototype is an object that describes a set of kubernetes resources in an abstract way. This object also includes associated parameters for these resources. An example of this is how Kubeflow has a prototype for tf-job-operator
as we'll see later in this article.
A Ksonnet component is a specific implementation of a prototype. We create a component by "filling in" the parameters for the prototype. A prototype can be deployed to a kubernetes cluster and can also directly generate standard Kubernetes YAML files. Each environment may be customized with different parameters for a prototype.
To install Ksonnet on OSX easily just use brew with the command:
brew install ks
This will get ksonnet install locally, which will pull code from github and use kubectl to deploy the application code (Kubeflow) to our GKE cluster.
We need to use a GitHub personal access token with ksonnet otherwise we quickly run into GitHub API limits. We need to create a personal access token and use it in place of a password when ksonnet is performing operations over HTTPS with Git on the command line against github's API. Once you have logged in and created the token, we set it as an env variable with the command:
export GITHUB_TOKEN=ece2d65f0070abf00283f000460fc10952a87a2
Now we are ready to use Ksonnet and deploy Kubeflow to our cluster.
Use Ksonnet's ks command to initalize your new kubernetes application.
ks init [app-name]
cd [app-name]
The output on the screen should look similar to what you see below.
As we can see above, ksonnet found our GKE cluster context in our kubeconfig file and was able to configure our new ksonnet application to use it. Ksonnet also initialized our application directory with some template code. Now we need to customize this application code with the Kubeflow codebase from github.
At this point we need to add Kubeflow to our custom ksonnet project. To do this we nee add the Kubeflow repository to our project and then we pull the individual kubeflow packages into our local project. Specifically here we need to add kubeflow 3.5 to the ksonnet registry so it knows here to look to download the code. We do this with the following comand:
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/v0.3.5/kubeflow
Next we want to install packages from this github repository in our local ksonnet application. We want to work with a specific version of Kubeflow for this tutorial, so we'll specify the v0.3.5
version of Kubeflow with the command below.
ks pkg install kubeflow/core@v0.3.5
The console won't report much, but you should see the following output:
We should now be able to see the installed package in our ./app.yaml
file and also in our ./vendor
directory.
Now we need to generate the components for the application from the package we installed from the v0.3.5 Kubeflow codebase on github. We generate components with the following commands:
ks generate ambassador ambassador
ks generate jupyterhub jupyterhub
ks generate centraldashboard centraldashboard
ks generate tf-job-operator tf-job-operator
This components will give us a minimal install of Kubeflow. The console output for of the commands above will look similar to what we see below.
Each of these components was installed in our custom Ksonnet application in our local directory.
Finally, we want to send these components to our kubernetes cluster. We apply local Kubernetes manifests (components) to remote clusters with the following commands:
# Create all the core components
ks apply default -c ambassador
ks apply default -c jupyterhub
ks apply default -c centraldashboard
ks apply default -c tf-job-operator
For each of these commands, we will send output on the console similar to what we show below:
As we can see above, the Kubeflow components were installed on GKE remotely. We'll confirm those components are on the kubernetes cluster in a moment, but first we want to take a quick look at what was deployed.
Quick notes on each component are listed below.
The system also provides compelling flexibility in how data scientists can use the library (language independent as well) of their choice in a notebook or outside of a notebook on Kubeflow. It becomes a compelling offering quickly as now a data scientist can quickly move a workload built in the language of their choice from their laptop to an on-premise enterprise cloud or to a public cloud and leverage more hardware. This fits in with the pattern where machine learning jobs typically are prototyped on a user laptop and then once validated are moved to a more powerful system for training and then model deployment.
At this point, we have installed the basic version of Kubeflow 0.3.5 on our remote GKE kubernetes cluster.
To confirm our cluster is operational and the components are running, try the following command:
kubectl get services
We should see a list of components running that match the components we just installed on our cluster.