Flyte: an open-source orchestrator for ML/AI workflows

Machine Learning


Does artificial intelligence and machine learning data require its own workflow and orchestration system? Rather than software, we offer an open source solution called Flyte that provides workflow and orchestration that fits the unique demands of your data. According to Union.ai, it is possible.

“The most common feedback I get from people using orchestrators for machine learning is that they are not built for AI workflows, machine learning workflows because they are forced to write YAML code and do not use Docker Because we are forced to understand the files, Martin Stein, Chief Marketing Officer and Head of Developer Relations at Union.ai, told The New Stack: “Machine learning engineers, You have to actually do what data scientists, researchers don’t do.”

Basically, with Flyte, developers write code and run it locally or remotely, he added.

Niels Bantilan, Chief Machine Learning (ML) Engineer at Union.ai said: “In our opinion, the main difference between software and machine learning is that software is stateless. … Data and models, on the other hand, are constantly changing.”

Flyte as a machine learning orchestrator

“What is orchestration?” Bantiran asked rhetorically. “To create new musical analogies, the orchestra conductor is essentially the central point of coordination telling each section how and when to play each instrument and what dynamics to play. is a software orchestrator, a workflow orchestrator orchestrates when certain computations are done, where certain data is pulled from and where it is pushed, essentially orchestrating this whole system to achieve the desired behavior. They are very similar at the level of abstraction that enables

He argued that Flyte solves these problems with a wealth of tools.

It’s Union’s position to deliver the power of data and artificial intelligence orchestrators.

  • Management and Security: RBAC, data ownership, multi-tenancy and scheduling
  • Monitoring and visualization: Data lineage, data visualization, workflow visualization, task-level observer capabilities
  • Performance and Accuracy: Strongly typed interface, GPU acceleration.parallelism; signaling
  • Workflow efficiency: intra-task checkpoints; failure recovery; Rerun a single task. caching; spot/preemptible instances; timeouts; dynamic resource allocation. notification;
  • Flexibility: In-task checkpoints; version control; dependency isolation; multi-cloud support.

No coincidence — Union came up with the list — Flyte addresses each of these bullet points.

“Most orchestrators don’t do the same thing that Flyte does,” explains Stein. “Flyte is one of the few orchestrators that really goes beyond what data-only orchestrators like Airflow do. It’s not built for.It’s built for data pipelines.”

Competitor Amazon Web Services’ SageMaker has the same problem, he added.

Flyte works by automating hard infrastructure challenges.

“Things like parallelism and GPUs — you don’t have to write Flyte-specific functions,” says Stein. “This is very important because Flyte does this automatically under the hood, so you don’t have to put Flyte ‘Please run in parallel, yada, yada, yada’ in your Python code, and the decorator actually does what Adjust how much you want your machine to run or how much compute you need at the task level.”

A cloud-native orchestration platform is built on top of Kubernetes, and you’ll need the help of a Kubernetes engineer if you want to run Flyte alone.

Not an ML Ops tool, but…

Flyte is often mistaken for an ML Ops tool, Stein said. it is not.

“We run MLOps on top of Flyte, so you can bring in weights and biases, y-logs, whatever you want. Basically hook these things together and it works perfectly. That’s exactly the power of an orchestrator,” Stein said.

But you can see a complete machine learning workflow where one workflow connects to another, he added. This will allow his data scientists to see what’s going on in everything from start to finish, Stein said.

“You have a data team, a classification model team, a predictive model team, and they all use the same platform, Flyte, they can all work together in the same workspace, but they still don’t step on each other,” says Bantilan. . .

This open source tool also boasts 50 integrations including DataBricks and Anyscale and Ray. But it does so without direct access to your data, he added, Stein. It is also SOC 2 compliant, Stein added.

“We don’t have access to your data, which is really the most important thing,” says Stein.

group Created by sketch.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *