Inworld AI has announced that Inworld Runtime will become an AI runtime designed to scale consumer applications.
Developers can move quickly from prototypes to production, supporting user growth from 10 to 10 million users with minimal code changes.
By automating AI operations, Inworld Runtime frees engineering resources for new product development and provides tools for designing and deploying no-code experiments. Current Partners of Inworld AI – The startups of leading media companies, Triple-A Studios and AI-Native, have already leveraged runtimes as the basis for the AI stack for the next generation of real-time, millions of AI capabilities and experiences.
It's interesting to see Inworld AI expand its product line. This is because the focus was on creating AI tools for game developers to create smart, non-player characters and more for games.
“We needed it ourselves, so we built a runtime. Our existing tools weren't available to deliver at the speed and scale that our partners needed,” Inworld CEO Kylan Gibbs in a statement. “When we realised that every consumer AI company faced these same barriers, we knew we had to open up what we had built. We saw the industry reach an inflection point. Thousands of builders hit the same scaling wall as us.
Built for internal purposes, all consumer builders now possible
Coming from Google and DeepMind, the founding team at Inworld AI recognized the momentum that AI is pouring into business automation and professional applications, and consumers were lagging behind.
Therefore, the company has built an in-world that helps reach everyone anywhere, by enabling the benefits of AI to make the next generation of consumer applications possible. We started with AI agents for games and media partners such as Xbox, Disney, Nvidia, Niantic, and NBCuniversal.
To accelerate this task, Inworld AI has built a runtime as an internal infrastructure to handle the unique demands of consumer AI. Maintain real-time performance at millions of concurrent user scales, with user-specific quality expectations focusing on engagement and maintaining costs below cents per user per day.
As health/fitness, learning and social applications companies began approaching the company, Inworld AI has discovered and decided to publish these companies that face the exact same challenges that runtimes are already solving internally.
“We needed it ourselves, so we built a runtime. Our existing tools weren't available to deliver with the speed and scaling of our partners we needed,” Evgenii Shingarev, Vice President of Engineering at Inworld AI, said in a statement. “When we realised that every consumer AI company faced these same barriers, we knew we had to open up what we had built. We saw the industry reach an inflection point. Thousands of builders hit the same scaling wall as us.
Learn the three factors that determine consumer AI leaders
Inworld AI said it has discovered three key factors that determine success or failure through the four-year deployment of consumer AI applications. Excellence in all three is a must. Either weakness prevents consumer AI capabilities or applications from achieving market leadership.
1. Time from prototype to production
Creating an AI demo takes hours, but reaching production preparation usually requires more than six months of infrastructure and quality improvement work. Teams need to handle provider outages, implement fallbacks, manage rate limits, provide computing power, accelerate, optimize costs, and ensure consistent quality. Building with category leaders saw most consumer AI projects either make a leap or stagnate and die in the gaps.
2. Allocating resources for new product development
Since launch, most engineering teams spend more than 60% of their time on maintenance tasks. Change provider debugging, manage model updates, handle scaling issues, and optimize costs. This minimizes resources to build new features and causes products to stagnate while competitors move forward. Inworld has experienced this firsthand, as even innovative teams are trapped in a maintenance cycle rather than building what users want next.
3. Experiment speed
Consumer preferences continue to evolve, but traditional 2-4-week deployment cycles cannot meet this pace. Teams need to test numerous variations, measure their impact on real users, and scale winners. All have no friction between code deployment and App Store approval. Working with partners across the industry has proven that the fastest learners win, but existing infrastructure makes rapid iteration almost impossible.
“We scaled our prototype to 1 million users in 19 days, with more than 20 times the cost savings,” Status CEO FAI said in a statement.
Nanobit CEO Ivan Murat said in a statement: “Inworld AI offers two levers: personalization and content. You need to experiment with features at all times.
Technical design for Inworld Runtime
Inworld Runtime offers these capabilities through multiple innovations, including:
1. Adaptation graph
A C++-based graph execution system that uses SDKs such as Node.js, Python, and other to solve the scaling cross-platform limitations faced by most AI frameworks. Developers configure their applications using pre-optimized nodes as building blocks (using APIs from top providers such as LLM, TTS, STT, knowledge, memory, etc.) that handle low-level integration work and automatically optimize data streams between components. The same graph seamlessly scales from 10 test users to 10 million concurrent users, minimizing code changes and managed endpoints. Vibe coding friendly interface allows for a leap from prototype to production in days rather than months.
2. Automated MLOPS
Beyond basic operations, Runtime provides self-contained infrastructure automation using integrated telemetry that captures logs, traces, and metrics across all interactions. Practical insights such as identifying bugs, user patterns, and optimization opportunities have emerged through portals, observability and experimental management platforms. Runtime runs automatic failover between providers, manages capacity across models, and intelligently limits rates. It also supports custom on-premises deployments with optimized model hosting for the enterprise. As an application scale, it provides access to all the cloud infrastructure you need to train, tune and host custom models that break the cost-quality frontier of default models.
3. Live experiment
Expand or expand the experiment with one click. The code-separated configuration allows instant A/B testing without deployment friction. The runtime can automatically run hundreds of experiments simultaneously by defining variants through the SDK, managing tests through the portal, and testing different models, prompts, graph configurations, and logic flows. Expand in seconds using automatic impact measurements for user metrics.
Proven results from early adopters of Inworld Runtime
Runtime deployments demonstrate consistent technical achievements.
- Our biggest partners (major IP owners, media companies, AAA studios) have already leveraged the runtime as the basis for their AI stacks
- WishRoll scales from the prototype to 1 million users in 19 days, cost savings of over 95%
- Little Umbrella can ship new AI games while using Inworld to reduce updates and maintenance efforts for existing titles
- Streamlabs built a multimodal real-time streaming assistant with features that are not viable six months ago
- Bible Chat upgraded and expanded voice capabilities, reducing voice costs by 85%
- Nanobit offers millions of people personalized AI stories in sustainable unit economics
Availability and pricing
Developers can get started right away by downloading the runtime SDK with comprehensive documentation and migration guides. Runtime works natively with code assistants such as Cursor, Claude Code, Google CLI, Windsurf, Zencoder, and more.
Start with your own projects or use Inworld templates and demo apps as inspiration. Runtime is flexible in deploying via client applications, cloud provider servers, or custom on-premises installations with Inworld-managed model hosting. Once produced, this portal can be used for observability and rapid experimentation.
Runtime prices are fully used and there are no upfront costs. Developers can experiment with all models and features and pay scale successfully, ensuring that they are consistent with the economics of consumer applications where costs must remain sustainable as usage increases. With access to cutting-edge models from Anthropic, Google, Mistral and OpenAI, developers have the most choices to easily test and select the best model for their use cases.
Runtime also provides access to top open source models such as Deepseek, Llama, and Qwen via Lightning-Fast Providers Groq, Tenstorrent and Fireworks AI. Developers with existing Microsoft or Google ties can use their cloud commitments to access runtimes through Azure Marketplace and Google Cloud Marketplace.
Inworld is building an AI runtime for consumer applications. The company was founded in 2021 and raised more than $120 million from investors including Lightspeed Venture Partners, Kleiner Perkins, Section32, Founders Fund, Stanford University and Microsoft's M12.
The company will host its first Consumer AI Summit in San Francisco in Spring 2026, bringing together technology leaders to expand the next generation of consumer AI applications.
