Reduce ML redundancy with Shopify open source Tangle

Machine Learning


Shopify Open Source Tangle Accelerates Machine Learning with Reproducible Global Workflows
Shopify Open Source Tangle Accelerates Machine Learning with Reproducible Global Workflows

Shopify has open sourced its internal ML platform, Tangle, to help developers reduce redundant compute and build faster, more reproducible pipelines.

Shopify has open sourced Tangle, its internal machine learning experimentation platform designed to reduce iteration, enhance reproducibility, and accelerate development cycles. The platform was born out of the challenges faced by Shopify’s search and discovery team, which handles millions of products and billions of queries.

Before Tangle, engineers often struggled to recreate historical results by rebuilding identical datasets and rerunning lengthy preprocessing steps. According to Shopify, “Machine learning development shouldn’t work this way, but it does. 80% of development time is spent on data engineering, not algorithms.”

The platform has already saved more than a year of computing time internally. “The CPU time savings alone are ridiculous,” says Shopify CTO Mikhail Parakin. A 10-hour pipeline can now be completed in just 20 minutes if only one component is changed.

Tangle features a visual pipeline interface with content-based caching. Pipelines are constructed as directed acyclic graphs made up of components, which are language-independent units that wrap any CLI program. Components run in isolation within containers, ensuring deterministic behavior and automatic reuse of artifacts. Shopify says: “Tangle’s cache operates globally across all users. All three pipelines share artifacts, even between executions.”

The platform is language and environment agnostic, supporting Python, JavaScript, Rust, or any file-based workflow across cloud or on-premises setups. The visual editor provides real-time execution status, cached steps, logs, performance insights, and stores complete lineage to ensure reproducibility.

“Tangle is a key part of our Shopify data and ML system,” said Tobi Lutke, Shopify CEO. “It simplifies complex tasks, automatically avoids multiple attempts, and saves a tremendous amount of waste.”

By open sourcing Tangle, Shopify enables the broader developer community to reduce redundant compute, build reproducible ML workflows, integrate existing code without constraints, and promote best practices in machine learning engineering.





Source link