Super Agent Now Open Source, Boosting Hardware Efficiency

AI Video & Visuals


2025 confirmed that the future of artificial intelligence lies in integration, efficiency, and accessibility, not just the size of raw models. The 'Mixed Experts' year-end review, hosted by Tim Huang, brought together key speakers including Chris Hay, Gabe Goodhart, Kawtar El Maghraoui, Aaron Bowman, and Abraham Daniels to analyze the defining trends of the past year and predict the trajectory for 2026. The central story was the maturation of agents and the resulting pressure on the underlying hardware and software ecosystem. Tim Hwang spoke with veteran panelists on the year-end episode of “Mixture of Experts” about the biggest moments in AI, including the maturation of AI agents, the rise of open source, and structural constraints on hardware supply.

Despite Huang's playful claim that agents are “dogs that don't bark”, Chris Hay was quick to defend his earlier prediction that 2025 would be “the year of the super agent”. Hay argued that the underlying technology of advanced inference and expanded tool usage has indeed arrived, although it has been integrated into existing models such as ChatGPT Deep Research and Claude Code. He emphasized that today's agents are orchestrators rather than single specialized functions. Hay pointed out that current models “can think and plan much longer,” allowing multiple tools to be chained together to accomplish complex goals, such as generating an entire presentation from a single prompt. This shift moves beyond simple one-off tasks to autonomous workflows, redefining what a functional AI agent actually is in the commercial world.

Gabe Goodhart highlighted the year of breakthroughs for open source models, citing advances like KimiK2 Thinking that brought open performance parity closer to proprietary systems. This rapid convergence has shifted the primary friction points from model quality to the surrounding infrastructure and user experience. The open source community is great for individual components, but struggles with uniform packaging. Integrating disparate tools and models into a consistent user-facing product remains complex. Goodhart suggested that achieving the same level of user experience and enjoyment as closed systems is the next big hurdle to overcome before open source can truly dominate in all areas.

The material reality of AI was a major theme, especially regarding the lack of high-end AI accelerators. Kaoutar El Maghraoui emphasized that by 2025, the lack of AI hardware will be firmly established as a “structural constraint” rather than a temporary bottleneck. This unprecedented demand is primarily driven by large-scale frontier model training, forcing the industry to bifurcate its focus on scale-up (large clusters such as NVIDIA H200/B200, AWS Trainium, etc.) and scale-out (efficient, small-scale models running locally or at the edge). El Maghraoui predicted that 2026 will be defined by the tension between “frontier model classes and efficient model classes,” noting the increasing possibility of running models with 1 billion to 5 billion parameters locally through techniques such as 4-bit quantization and specialized neural processing units (NPUs). Promoting this efficiency is essential. Because instead of relying solely on continued increases in computing power, where supply constraints remain, “the industry really needs to scale efficiencies.” This change is driving new hybrid architectures that combine elements such as transformers and state-space models (SSMs) to optimize both performance and energy consumption.

This focus on efficiency aligns with the growing importance of modularity in multimodal systems, a key area of ​​IBM's Granite model. Aaron Bowman highlighted the need for models that can interpret and act on diverse data (language, vision, action) to enable complex and autonomous digital workers. Abraham Daniels detailed the focus on building modular functionality that leverages adapters and external orchestration layers. This allows businesses to customize their AI solutions by combining purpose-built components and avoid the overhead of monolithic omni-enabled models. This architecture allows developers to “invoke each function as needed,” which reduces the footprint and improves practical performance for certain enterprise use cases, such as complex document processing and search augmentation generation (RAG) pipelines, Daniels claimed.

Panelists agreed that the convergence of these trends sets the stage for a fierce battle for the AI ​​“gateway.” Entities that successfully integrate inference, tools, and models (regardless of whether the models are open or closed) will capture the market, whether through browser integration (Perplexity, ChatGPT on steroids), mobile operating systems (Apple, Google/Android), or enterprise platforms. This battle over the control plane is evident across multiple digital surfaces, including browser extensions and embedded mobile applications. Ultimately, success will depend on being able to mask the underlying complexity of the multi-agent, hybrid architecture systems that are now the norm and provide the most seamless user experience.



Source link