Hybrid by design: Designing new models for federal AI delivery

This is the 14th article in the IT Lifecycle Management series. Provide technology that helps government.

Federal agencies have spent years balancing cloud and on-premises environments. Today, artificial intelligence is forcing systems integrators who support artificial intelligence and artificial intelligence to design for both intentionally and at scale.

As AI moves from pilot projects to daily operations, this new model is gaining ground across governments and their delivery partners. “Systems are defined not by where they run, but how quickly and safely they deliver results,” said Graham Gilmer, a senior vice president at Booz Allen whose research focuses on autonomy and AI in defense.

Hybrid architectures are no longer a transitional phase. This is the foundation for delivering AI at mission speed, Gilmer said. “Depending on your use case, you can use one, both, or a hybrid combination for redundancy.”

It’s a fundamental shift in thinking. Mike Watkinson, Chief Revenue Officer at Future Tech Enterprise, says, “The mission defines the architecture, and the architecture enables and secures the mission.”

This principle is not new. What’s new is that we’re directly shaping technical decisions, right down to the level of computing form factors, data placement, and engineering workflows.

During our series of panel discussions, Provide technology that helps governmentwe asked Gilmer and Watkinson to share how this is creating new infrastructure realities for governments and their technology partners.

From centralized systems to decentralized distribution

For more than a decade, cloud computing has provided a path to expansion. However, the need for control, latency management, or operational resiliency has not been eliminated.

Today’s environment requires systems that can operate across classification levels and geographies, often without persistent connections, Gilmer said. To this end, government agencies are pursuing distributed architectures that intentionally combine cloud infrastructure, on-premises environments, and edge systems. Additionally, new user expectations are emerging regarding how they should be able to access the tools they need to do their jobs.

“We’re constantly monitoring the gap between what’s available outside the ship, on cell phones and at work, and what’s available inside the government,” he said. “I’m happy to say it’s actually finished.”

Gilmer and Watkinson said the focus is shifting from standardization to orchestration, creating new expectations for system integrators to design, integrate, and operate across environments, not just deployments within a single stack.

New reality 1: GPU form factors are redefining what’s possible

One of the most important technological changes enabling the use of AI in a hybrid model is the rapid evolution of GPU-based computing.

Eighteen months ago, running meaningful large-scale language models required access to hyperscale infrastructure. Today, advances in graphics processing equipment are changing this situation.

Watkinson noted that GPUs’ memory density, power efficiency, and packaging allow for desktop-class as well as portable deployments. With more than 100 gigabytes of VRAM, the system can now support medium-sized language models suitable for real-world mission workflows, he said.

Watkinson said these systems are at a tipping point, noting that desktop GPUs will “accelerate AI and accelerate mission outcomes.” By consolidating compute, storage and security controls into a single enclosure, it simplifies deployment and eliminates many of the integration challenges associated with distributed cloud environments, he said.

Gilmer said the use of on-premises desktops has spread pretty quickly all the way to the edge, where AI tools can now be accessed. “It used to be that there were only very small LLMs,” he said. “Right now, I think it’s going to be about medium-sized, very capable LLMs, things that could change the way missions are executed or companies run, or certainly sensitive or classified workloads that are handled within the government.”

What about takeout? In practice, this means that agencies will be able to:

Run containerized models locally for inference
Deploy AI to sensitive or disconnected networks without external dependencies
Launch a rapid prototyping environment without a multi-year infrastructure investment

This means access to computing is no longer restricted, Watkinson said. For integrators, this shifts the challenge from sourcing infrastructure to rapidly integrating AI capabilities into mission workflows.

New reality 2: Inference, not training, is shaping the architecture

As government agencies scale AI, they are realizing that the primary cost is not building the models, but running them.

Inference workloads, such as real-time queries, agent interactions, and automated pipelines for delivering results from models, currently account for the majority of computing demand. “80% to 90% is heavily skewed toward inference,” Gilmer said, explaining that inference is a “huge variable cost” that is increased by query volume and agent-based systems that run continuously.

This change has direct architectural implications for how and where AI workloads run, especially for integrators who are responsible for balancing cost, performance, and deployment across their environments.

Cloud environments offer elasticity, but they also introduce costs for sending and receiving data, variability in latency, and pricing that is difficult to predict at scale. In contrast, localized GPU deployments allow for a more controlled cost model.

“It reduces the cost of repetition and the cost of experimentation,” Watkinson said.

He pointed to lower “cost per token” and lower iteration costs when workloads run closer to the data. This is especially important as government agencies test and refine AI applications. Repetition speed is just as important as steady state performance.

As a result, Gilmer and Watkinson said, a more nuanced deployment strategy is possible. Training may still take place in a concentrated environment. However, inference is increasingly distributed on-premises, at the edge, or in hybrid configurations.

New reality 3: Data locality and control drive design decisions

Although computing is becoming more flexible, data still requires strict governance, which continues to be a challenge for government agencies with vast data stores.

As a result, data strategy becomes a shared responsibility between government agencies and their integration partners, especially in environments with strict classification boundaries.

“It’s not just the amount of data that matters, but what data you work with,” Watkinson said, highlighting a challenge that is becoming central to AI architectures. Federal agencies must balance accessibility with strict requirements for classification, privacy, and provenance.

Gilmer added that the need for on-premises localized processing is increasing because in many cases “there’s a reason why you don’t want your data to leave a certain area.”

That reality is reshaping system design. AI pipelines must account for data residency requirements, environment-wide version control, and access controls associated with identities and roles (as defined by the federal Zero Trust Framework).

In practice, this often means bringing AI models into data rather than data into models, reversing previous cloud-centric assumptions.

New reality 4: Engineering workflows are evolving with AI

The impact of these architectural changes extends directly to engineering practices.

The impact is not just technical. It changes the way engineering itself is done, especially for integrators who provide and maintain systems on behalf of government agencies. AI is becoming an active participant in development.

“Productivity has increased tremendously,” Gilmer said, citing measurable improvements in software engineering results. Development environments are increasingly supported by AI agents that can generate code, refactor legacy systems, and assist with testing.

This is particularly impactful in federal environments where legacy codebases present ongoing challenges.

AI systems can help:

Translate outdated languages to modern frameworks
Extract business logic from legacy applications
Accelerate modernization without requiring limited expertise

“This shift is here to stay, with development agents working in parallel with human engineers,” Gilmer said. The result is a new engineering model that combines human judgment with machine-driven scale.

New reality 5: Speed continues to redefine shipping

All these changes converge on one result: faster delivery.

Traditional acquisition cycles and development schedules no longer match the pace of AI innovation. Government agencies and their technology partners are working together to adopt models built on rapid iteration and real-world validation.

“We have to learn on the job,” Gilmer said. To achieve this, it is necessary to introduce it early and continually improve it. This allows teams to “iterate faster and experiment more,” Watkinson added, lowering the barrier to trying new approaches. It’s happening across the government technology sector, both at government agencies and their contractors.

Speed is no longer a secondary metric. This is the primary measure of mission success. Watkinson put it simply: “How important is speed to market?” This is now a critical issue for government agencies investing in AI.

Key design principles for hybrid AI

Start with mission requirements, not infrastructure

Designed for inference cost, performance, and scalability

Keep data local when control or classification is required

Designed for integration across environments and vendors

Achieve reliable operations across cloud, on-premises, and edge environments

Prioritize speed, iteration, and continuous improvement

Check out more smart tips and tactics for FSI from our top techs. A series that provides technology useful to government

Source link