Build real-world, on-device AI with LiteRT and NPU

Users can leverage instant AI features such as real-time video effects, ASR, and motion capture in the mobile app. However, for developers, running advanced models on a device often requires balancing unique challenges related to device thermal management, preserving battery life, and preventing frame drops. To provide fast and responsive AI experiences without compromising performance, LiteRT unlock Neural processing unit (NPU)hardware built specifically for these workloads.

LiteRT is a cross-platform, production-ready framework for on-device AI. Acceleration of CPU, GPU, and NPU Across mobile, desktop, and IoT platforms. LiteRT is designed for performance and scalability, simplifying the deployment of fast AI capabilities through a unified API. This abstracts away the complexity of integrating with multiple NPU SDKs and allows developers to target different silicon without writing vendor-specific code.

Transform NPU performance into meaningful experiences

LiteRT is already powered across Google products, popular apps, and even SDKs. Used by industry leaders like Google Meet, Epic Games, and Argmax Inc., here’s what NPU acceleration looks like in real-world production apps.

Google Meet: By leveraging mobile NPU, Google Meet successfully introduced Ultra-HD segmentation model 25 times bigger than previous versions without sacrificing inference speed. The key is to maintain a consistent power footprint and generate the thermal headroom necessary to provide high-quality background displacement throughout a typical 20-30 minute session.

Epic Games Co., Ltd.: High-fidelity, real-time animation experiences require extraordinary efficiency. Epic’s Live Link Face (Beta) app for Android allows creators to capture performances from a single camera and generate real-time MetaHuman facial animations that can be streamed directly from the device to Unreal Engine.

Real-time facial analysis is computationally intensive and requires consistently low latency. By using LiteRT on the NPU, Epic enables dedicated on-device acceleration on supported Android devices, delivering up to 30 FPS performance for real-time MetaHuman animations.

Sorry, your browser does not support playing this video

Real-time MetaHuman Facial Animation in Unreal Engine with NPU

Argmax Co., Ltd. We recently collaborated with LiteRT to release the Argmax Pro SDK for Android for on-device speech recognition. By leveraging LiteRT and AI Pack feature delivery via Google Play, Argmax was able to achieve the highest level of accuracy and real-time speed while respecting app size constraints on Android. Importantly, we leveraged LiteRT’s Ahead-Of-Time (AOT) compilation to eliminate costly on-device compilation steps and enable cutting-edge audio models like NVIDIA Parakeet TDT 0.6B v2 to run with industry-leading latency.

Performance tests across Google Tensor, MediaTek, Qualcomm Technologies SoCs, and Argmax Pro SDK show that upgrading from GPU to NPU: 2x speedup. In addition to faster speeds, the power efficiency of the NPU allows Argmax SDK Enterprise customers such as Heidi Health to perform reliable on-device live transcription even during long sessions with less impact on battery life. Finally, by offloading runtime libraries and models to on-demand downloads through Play’s AI Pack, devices dynamically retrieve models optimized for specific NPUs.

Sorry, your browser does not support playing this video

Argmax’s Kotlin-first SDK brings the highest levels of accuracy and real-time speed to Android with seamless NPU and GPU acceleration powered by Google LiteRT.

Google AI Edge Gallery app: To help developers test and validate NPU acceleration performance, Google AI Edge Gallery app Current characteristics NPU support For some Gemma models and built-in benchmark tools. With AI Edge Gallery available on Android, you can quickly see the true potential of AI performance on mobile hardware. Developers can also visit the Google AI Edge Gallery on GitHub to build their own experiences.

Sorry, your browser does not support playing this video

Explore different on-device LLM use cases using Google AI Edge Gallery

Scale performance across the hardware spectrum

While the performance gains for audio, animation, and video are obvious, the variety of vendor-specific SDKs and complexity has traditionally made it difficult for developers to navigate the path to NPUs. LiteRT provides streamlined workflows and cross-platform support, allowing developers to deploy advanced models from mobile phones to industrial IoT and AI PCs without sacrificing performance or portability.

Cross-platform NPU support

As highlighted in a recent Google AI Edge Gemma 4 blog post, LiteRT extends NPU acceleration beyond mobile, allowing you to deploy models to a variety of hardware using a single framework. For the industrial edge, LiteRT supports platforms such as the Qualcomm Dragonwing™ IQ8 series, which also powers Arduino VENTUNO Q, enabling high-reliability use cases such as robotics and smart manufacturing with models such as Gemma 4. For desktops, LiteRT is preparing for AI PC through integration with OpenVINO™. Intel® Core™ Ultra Series 2 and 3 processor to deliver significant power savings and responsiveness for local GenAI workloads.

Large-scale performance validation

Google AI Edge Portal provides a benchmarking service across 100+ of the most popular mobile phones that provides insight into ML workloads across devices, accelerators, and configurations. Developers can now decide which data-driven deployment is best for their use case and target device, such as using AOT or JIT. To use the latest Portal NPU features, sign up for a private preview here.

Sorry, your browser does not support playing this video

Google AI Edge portal benchmark results

Start your NPU journey

With production-ready NPU integration, LiteRT provides an integrated workflow that abstracts the low-level complexity of both. Just in Time (JIT) and Advance (AOT) Expand.

Read our documentation and get started with NPU acceleration today.

Please let us know your feedback and feature requests by opening an issue on our GitHub channel. I can’t wait to see what you build!

Acknowledgment

_{Google: Akshat Sharma, Alice Zheng, Andrew Zhang, Ashley Lin, Byungchul Kim, Changming Sun, Charlie Xu, Chenchen Tang, Chunlei Niu, Cormac Brick, Derek Bekebrede, Fabian Bergmark, Fengwu Yao, Gerardo Carranza, Gregory Karpiak, Jae Yoo, Jing Jin, Jingjiang Li, Julius Kammerl, Jun Jiang, Lu Wang, Maria Lyubimtseva, Mariana Quesada, Marissa Ikonomidis, Matt Kreider, Matthias Grundmann, Meghna Dzhokhar, Na Li, Ping Yu, Renjie Wu, Risika. Sinha, Sachin Kotwani, Salil Tambe, Shalgay Pisarchik, Shalgay Pisarchik, Somdatta Banerjee, Steven Toribio, Suleman Shahid, Terry Ho, Wai Hong Lo, Weiyi Wang, Xiaomin Hu.}

_{Partners: Allen Huang, Ankit Kapoor, Ardha Atahan Ibis, Athira Orkhon, Brian Keene, Cheng Seng, Chendao Li. Cheng Yen Ling, Chun Ting Ling (Graham), Cord Ling, Deep Yap, Dylan Angus, Felix Bohm, Hung-Chun Liu, Ji Kuan Ling, Jiun Kai Yang (Kelvin), Kedar Garratt, Ken Seeger, Lakshmi Rayapudi, Ray Chen, Mike Tremaine, Minche Ling (Vincent), Poyuan Zhen, Metahuman Team, Vinesh Sukumar, Waimun Wong, Yilu Chen, Yuting Wang, and Zach Nagengast.}

Source link