Hugging Face’s Ben Burtenshaw recently presented the potential of AI agents in systems engineering, arguing that coding agents should be leveraged for these complex tasks. In his talk, Bartenshaw emphasized that AI agents are becoming increasingly capable, moving beyond simple code generation and into more advanced system-level engineering.
Hugging Face’s Ben Burtenshaw talks about AI systems engineering — From an AI engineer
Visual TL;DR. AI Agent for Engineering enables systems engineering tasks. Systems engineering tasks include custom kernels. Custom kernels lead to performance optimization. Agent benchmarks are required for systems engineering tasks. AI Agents for Engineering builds multi-agent labs. Multi-agent labs create automated research labs. Systems engineering tasks advance AI systems engineering. Autoresearch Labs powers AI systems engineering.
AI Agents for Engineering: Coding agents that evolve beyond simple code generation
Systems Engineering Tasks: Tackling complex engineering challenges, discovering APIs, and connecting systems
Custom kernels: Optimize performance with code specific to specific hardware
Performance optimization: Achieve faster execution through customized kernel development
Agent Benchmarking: Measuring and Comparing AI Agent Capabilities and Performance
Multi-agent labs: Building automated research labs with interconnected AI agents
Autoresearch Labs: Enabling AI agents to conduct research and development autonomously
AI systems engineering: Leveraging AI agents to design and implement complex systems
Visual TL;DR
The role of AI agents in systems engineering
Burtenshaw emphasized that AI agents are no longer just tools for writing snippets of code. They are evolving into sophisticated collaborators capable of tackling complex engineering challenges. He pointed to the increasing acceptance of coding agents, citing the example of Andrej Karpathy and DHH, who have been using coding agents for many years. This acceptance is growing as agents demonstrate the ability to perform tasks such as discovering APIs, connecting systems, and even managing home automation devices.
Custom kernels and performance optimizations
A large portion of Burtenshaw’s presentation focused on creating and optimizing custom compute kernels specifically for AI workloads. He described the basic components of the kernel (functions compiled to run on the GPU and executed from Python) and emphasized the importance of optimizing them for efficiency. Burtenshaw showed how custom kernels, such as the popular Flash Attention, can significantly increase computational density, reduce the time it takes to communicate tensors, and ultimately keep the GPU running at optimal performance.
He also introduced Hugging Face’s “Kernel” library, a platform designed to make it easy to build computing kernels. This library aims to enforce a uniform and predictable structure, ensure reproducibility, provide native PyTorch compatibility, and encourage community sharing. Burtenshaw demonstrated how developers can publish their own kernels to the hub for others to access.
Benchmarks and agent performance
To illustrate the agent’s effectiveness in this area, Burtenshaw presented benchmark results. He shared how to use agents to generate, benchmark, and optimize CUDA kernels. In a particular example, we highlighted an average speedup of 1.94x on H100 GPUs for the Qwen3-8B model when using the agent to generate and optimize kernels. This demonstrates the tangible performance gains that can be achieved through agent-assisted engineering.
The power of multi-agent automated research labs
Mr. Bartenshaw also delved into the concept of a multi-agent automated research lab, outlining a system comprised of specialized agents working together. This system includes:
Researcher: Scout Hugface papers for ideas and define research directions.
planner: It acts as a central coordinator, owns the experimental cues, and proposes hypotheses.
Worker agent: Run experiments, get code, and test hypotheses.
Reporter: Monitor job progress, synchronize status, and provide an overview of active jobs and anomalies.
This multi-agent approach enables systematic and automated exploration of hyperparameters and model architectures, resulting in a more efficient and effective research cycle. Monitoring and visualizing these experiments using tools like Trackio can provide important insights into the research process.
Important points
Mr. Burtenshaw concluded with several important points.
Agents work best when using primitives and public, well-defined interfaces rather than overly abstract interfaces.
Hugging Face Hub is a robust platform ready to support AI workloads with core infrastructure for storage, compute, and versioning.
Multi-agent systems can be effectively built with specialized roles to automate and accelerate AI research.
The presentation highlighted the increasing capabilities of AI agents in systems engineering, highlighting their potential to drive efficiency and innovation in the field.