Introducing CoreWeave Sandboxes to accelerate reinforcement learning, agent tool usage, and model evaluation

CoreWeave, Inc.’s Essential Cloud for AI™ announced CoreWeave Sandboxes, an execution layer that provides AI researchers and platform teams with a secure, isolated environment for performing reinforcement learning (RL), using agent tools, and performing model evaluation. The new product is available on a customer’s own CoreWeave infrastructure or as a serverless runtime via Weights & Biases (W&B).

Also read: AIThority interview with Rohit Agarwal, Founder and CEO of Portkey

As AI systems evolve from producing output to performing actions, training them requires more than just computing. Advanced AI workflows such as RL and evaluation require an isolated execution environment to run code securely, maintain information across steps, and scale across concurrent workloads.

Furthermore, most organizations do not have a unified execution layer for RL, agent tool usage, and model evaluation. Instead, they rely on custom-built systems, loosely integrated tools, or third-party sandbox products that are outside of the core infrastructure. As scale, concurrency, and workflow complexity increase, those disconnected approaches become harder to manage, less reliable, and harder to manage.

CoreWeave Sandboxes provide a unified execution layer through two access models. Oncluster for platform teams that run training on CoreWeave Kubernetes Service (CKS) and Serverless via W&B for researchers and applied AI teams who need enterprise-grade isolation without the infrastructure overhead.

Designed for scale, simplicity and control
CoreWeave Sandboxes, currently available through the Cloud Console and Python SDK, can run directly within a customer’s CKS cluster, allowing teams to run RL, agent tool usage, and model evaluation workloads in parallel with AI jobs without adding a separate execution stack. Python SDK is included at launch Create and manage an isolated and secure environment that can handle complex round-trip tasks and run multiple jobs simultaneously. Built-in session management, storage integration, and monitoring tools enable teams to execute these workflows with low operational overhead.

For teams without an existing CoreWeave cluster or who want to scale their current compute, CoreWeave Sandboxes are also available as a serverless runtime through Weights & Biases. Researchers can authenticate with their existing W&B API key, install the Python client, and start running sandboxes in minutes without needing to provision a cluster or make any infrastructure decisions. All sandboxes run in their own completely isolated virtual environment by default. This means that a failure, memory spike, or runaway process in one sandbox won’t affect other sandboxes. If something goes wrong, your team doesn’t have to hunt through disconnected systems to find out why. Sandbox activity is captured directly into the same W&B execution view as training metrics, so debugging happens in context rather than across tools.

“CoreWeave Sandboxes solve a real gap in our AI research stack: the ability to run secure, isolated code at scale directly on existing compute,” said Brian Belgodere, senior technical staff member for AI/ML Systems at IBM Research. “Our reinforcement learning workflow launches thousands of sandboxes in parallel for each training step. Each sandbox has its own container image and resource boundaries. Researchers pip install cwsandbox and run their sandboxes within minutes, with no infrastructure knowledge required.”

“As the use and evaluation of agent tools moves to production scale, teams need an execution layer that behaves like any other part of their infrastructure, that is managed, observable, and close to the workflows already running in CoreWeave,” said Chen Goldberg, vice president of products and engineering at CoreWeave. “CoreWeave Sandboxes bridges the execution gap between reinforcement learning and agent workflows without the need for teams to build custom execution systems. And for teams who need these capabilities without managing their own clusters, the serverless path through Weights & Biases gives them access to the same execution layer in minutes.”

Addressing increasingly complex AI workflows
“Managing separate clusters and scheduling sandboxes across different node types lacked a unified solution, which was time-consuming and resource-intensive. CoreWeave Sandbox eliminates that problem,” said Roman Soletskyi, AI Scientist at Mistral. “Currently, we run hundreds of sandboxes concurrently on CPU nodes and parallel Slurm training jobs on GPU nodes, all through one setup. The Python SDK allows researchers to get started quickly, and the CoreWeave team worked closely with us to seamlessly fit the open source SDK into our codebase.”

“Enterprises are under pressure to build agent-driven AI automation as quickly as possible and are looking for help to reduce the time from idea to live agents,” said Holger Mueller, Vice President and Principal Analyst at Constellation Research. “As we enter the next phase of agent-driven AI automation, we need to support reward validation and evaluation without adding custom infrastructure to the environments we are already running. Dedicated execution that stays within the existing training infrastructure reduces operational sprawl and eliminates the vulnerability of homegrown sandbox systems. This gap is one that general-purpose sandbox vendors and CPU-only sandbox vendors are not designed to solve.”

Built on proven AI infrastructure
CoreWeave consistently delivers industry-leading infrastructure performance, as evidenced by record-breaking MLPerf benchmark results, establishing itself as the only AI cloud to earn top Platinum rankings in both SemiAnalysis ClusterMAX™ 1.0 and 2.0, and ranking second in inference speed and price performance to Moonshot AI’s Kim K2.6 in independent inference benchmarks conducted by Artificial Analysis. It has been ranked number 1.

Also read: AI-powered risk intelligence: How financial institutions are anticipating systemic shocks

[To share your insights with us, please write to psen@itechseries.com]

Source link