Set up a machine learning pipeline with this Kubeflow tutorial

Machine Learning


You don’t have to use Kubernetes to power your machine learning deployment. But if you do, Kubeflow is the easiest and fastest way to get your machine learning workloads up and running on Kubernetes.

Kubeflow is an open source tool that streamlines the deployment of machine learning workflows on Kubernetes. Kubeflow’s main purpose is to simplify environment setup for building, testing, training, and operating machine learning models and applications for data science and MLOps teams.

While it is possible to deploy machine learning tools such as TensorFlow and PyTorch directly into a Kubernetes cluster without using Kubeflow, Kubeflow automates much of the process required to get these tools up and running. increase. To decide if it’s the right choice for your machine learning project, learn how Kubeflow works, when to use it, and how to install it to deploy your machine learning pipelines.

Pros and Cons of Kubernetes and Kubeflow for Machine Learning

Before deciding whether to use Kubeflow in particular, it’s important to understand the pros and cons of running AI and machine learning workflows on Kubernetes in general.

Should you run your machine learning models on Kubernetes?

Kubernetes has several advantages as a platform for hosting machine learning workflows.

The first is scalability. Kubernetes makes it easy to add or remove nodes from a cluster to change the total resources available to that cluster. This is especially beneficial for machine learning workloads whose resource consumption requirements can fluctuate significantly. For example, you can scale your cluster up during training of a model that is typically resource-intensive, and then scale it down to reduce infrastructure costs after training is complete.

Machine learning project steps: identify a business problem, lay out processes and gather information from experts, select and prepare data, select and tune algorithms, retune based on results increase.
Tools like Kubeflow can speed deployment of machine learning projects by standardizing and streamlining the stages of the model development lifecycle.

Hosting your machine learning workflows on Kubernetes also gives you the advantage that your containers have access to bare metal hardware. This helps accelerate the performance of your workloads using GPUs and other hardware not accessible in your virtual infrastructure. By running his workloads in standalone containers, he can access bare metal infrastructure without Kubernetes, but orchestrating containers with his Kubernetes makes it easier to manage workloads at scale.

However, the main reason you don’t want to use Kubernetes to host your machine learning workflow is that it adds another layer of complexity to your software stack. For smaller workloads, a Kubernetes-based deployment may be overkill. In these situations, it makes more sense to run the workload directly on a VM or bare metal server.

When should you choose Kubeflow?

A key advantage of using Kubeflow for machine learning is the tool’s quick and simple deployment process. With just a few kubectl commands, you’ll have a ready-to-use environment to start deploying your machine learning workflows.

Kubeflow, on the other hand, is limited to the tools and frameworks it supports, and may end up including resources you don’t use. If you only need one or two specific machine learning tools, it may be easier to deploy them separately than using Kubeflow. But for those who want a general-purpose machine learning environment on Kubernetes, it’s hard to argue against using Kubeflow.

Kubeflow Tutorial: Installation and Setup Tutorial

For most Kubernetes distributions, installing Kubeflow is as simple as running a few commands.

This tutorial demonstrates the process using K3s, a lightweight Kubernetes distribution that you can run on your laptop or PC, but you should be able to follow the same steps on any mainstream Kubernetes platform.

Step 1. Create a Kubernetes cluster

Start by creating a Kubernetes cluster if you don’t already have one.

To set up a cluster using K3s, first download K3s with the following command.

curl -sfL https://get.k3s.io | sh -

Then run the following command to start the cluster.

sudo k3s server &

To check that everything is running as expected, run the following command:

sudo k3s kubectl get node

The output looks like this:

NAME           STATUS       ROLES            AGE    VERSION
chris-gazelle  Ready  control-plane,master   2m7s  v1.25.7+k3s1

Step 2. Install Kubeflow

With the cluster up and running, the next step is to install Kubeflow.

To do this on your local machine with K3s, use the following command:

sudo -s
export PIPELINE_VERSION=1.8.5
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

When installing Kubeflow in a non-local Kubernetes cluster, the following commands work in most cases.

export PIPELINE_VERSION=<kfp-version-between-0.2.0-and-0.3.0>
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/base/crds?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"

Step 3. Make sure the container is running

Even after installing Kubeflow, it is not fully functional until all the containers that make up Kubeflow are running. Check the status of the container with the following command:

kubectl get pods -n kubeflow

If the container does not run successfully after a few minutes, examine the logs to determine the cause.

Step 4. Start using Kubeflow

Kubeflow provides a web-based dashboard for creating and deploying pipelines. To access that dashboard, first make sure your port forwarding is configured correctly by running the following command:

kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

If you’re running Kubeflow locally, you can access the dashboard by opening the URL in your web browser. http://localhost/8080. If you installed Kubeflow on a remote machine, local host Use the IP address or hostname of the server running Kubeflow.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *