China is vigorously pursuing an AI arms race. While the rest of the world is seeing an influx of AI-driven smartphone features, primarily voice assistants and app-to-app interactions, China is making big strides. ZTE, a multinational telecommunications company based in Shenzhen, has introduced a smartphone equipped with an AI agent. Built in collaboration with ByteDance, the device features an agent that not only resides within the app but also integrates directly into the operating system. Its most amazing feature is that it can operate a smartphone just like a human.
Taylor Organ, an entrepreneur from Shenzhen, shared a prototype called Nubia M153 on his X (formerly Twitter) account. The smartphone runs on a customized version of Android integrated with ByteDance's Doubao AI agent. For the uninitiated, Doubao is ByteDance's proprietary large-scale general-purpose AI model ecosystem that has been widely deployed across China as a chatbot and productivity tool.
This prototype is much more than a regular on-device assistant. Organ's demo showed that AI can take full stack control of a phone. This means that the AI can examine the user interface, open the app, download the app, tap the screen to type, make a call, and perform multi-step tasks without the user even knowing which app they need. Simply put, the AI here uses the phone like a human user, not like an app.
What does the Agentic AI smartphone do?
Organ started a thread and indicated that he would request the AI to find someone willing to wait in his line. Although this is not yet the norm in India, gig economy apps in China typically offer waiting list services to people in hospitals, government offices, and other high-demand venues. Organ asks the AI questions in English, and the AI responds immediately. You'll see the AI select the local service app, configure the task, fill in the required fields, and provide a final confirmation screen. The CEO admits in a short video that he wouldn't have known which app would handle the job or how to configure it. This video shows an AI agent running the entire process autonomously.
Another moment of deep seeking. This is the world's first full-fledged smartphone. This is an engineering prototype of ZTE's Nubia M153 running ByteDance's Doubao AI agent fused to Android at the OS level. You have complete control over your phone. You can display the UI, select/download apps, etc. pic.twitter.com/lM9PYMoQek
— Taylor Ogan (@TaylorOgan) December 4, 2025
This is a breakthrough because most current AI assistants on smartphones can reason about tasks but cannot interact with third-party apps on your behalf. Samsung, Apple, and other tech giants are experimenting with AI actions, but they are largely permission-gated and restricted to partner apps only. The ZTE-ByteDance prototype here is much more advanced as the AI can act as if it were a human directly within the graphical user interface (GUI).
The hardware behind Agentic AI
Organ revealed in the thread that the prototype is powered by Qualcomm's new Snapdragon 8 Elite Gen 5 chipset with 16 GB of RAM. This is a key point for agents to split their workload between cloud-based semantic inference and on-device screen control. According to the OP, running “screen vision” locally allows the AI to work quickly and maintain the privacy of sensitive UI interactions like payment flows and passwords.
When it comes to AI models, ByteDance's Doubao is currently used by more than 175 million people in China. This is essentially a large, sparse, expert mixed model with multimodal meaning text and vision and support. The second example is when Organ clicks on a photo of a NIO battery swapping station and asks, “What is this?” The model identifies stations from images, links them to NIO's national EV charging network, and explains how it works.
This is not a chat overlay, but a true multimodal agent. Powered by the latest Snapdragon 8 Elite Gen 5 with 16GB RAM, you can push more agent workloads on the device. Now, take a picture of the NIO battery swap station and ask, “What is this?” Running… pic.twitter.com/b0rg7iJX3l
— Taylor Ogan (@TaylorOgan) December 4, 2025
Cloud + on-device architecture
Probably the coolest demo is hotel booking. The CEO takes one photo of the hotel entrance. He doesn't say anything other than that he plans to book a stay. AI understands assignments and divides the workload.
Story continues below this ad
First, Doubao (in the cloud) translates the semantics of what hotel it is, what you want to book tonight, whether pet policy is important, etc. Second, Nebula-GUI (on-device) is reportedly a 7 billion parameter model trained by ZTE that handles physical actions such as opening Ctrip (a Chinese booking app), entering dates, finding the best rate, checking the app's pet policy, and notifying Organ if dogs are allowed.
According to the demo, this two-tiered architecture allows tasks to run smoothly. Simply put, Doubao makes the plans and Nebula-GUI executes them.
App-level knowledge and interaction with other bots
In another demo, an agent is asked to book a robotaxi, and Doubao uses GPS data to find local ride-hailing apps and determine which operator will operate a particular route. On Organ's phone, the Nebula-GUI opens the Baidu Apollo app, navigates the menu, selects the ride location, and confirms the trip. After a while, Organ asked to change the drop-off location during the ride. Again, the AI recognizes an active Apollo session, opens the correct screen to change the destination, and triggers a confirmation both on the phone and on the robotaxi itself. This is a great demonstration of AI's app-specific knowledge.
At this point, the feeling of “voice commands” disappears and it starts to feel like a real assistant. I can't remember which number I used to log into the Baidu Apollo robotaxis app. Doubao will dig into your app settings and give you the last four digits of this account's phone number. pic.twitter.com/FT9I9q3QMi
— Taylor Ogan (@TaylorOgan) December 4, 2025
During the demo, if Organ forgets the phone number linked to his Apollo account, the AI will navigate through the app's settings and display the last four digits. Now, this is something that most AI assistants can't do without having access and deep OS-level visibility.
Meanwhile, in another test, Organ is using Meituan, a Chinese tech company that provides on-demand drone delivery services. He asks the agent to order two drinks, and the agent updates the cart, makes payment, and arranges for delivery to a nearby locker. Then, when Meituan's automated system makes a confirmation call, Doubao answers on your behalf and speaks to Meituan's bot. Therefore, both bots complete the exchange without user intervention. This is an example of how an agent can negotiate with other agents on behalf of a user.
I tell them to order two drinks in front of me. Reuse your cart, update quantities, make payment, and Meituan's drone will fly your order to a nearby locker. When Meituan's automated phone system calls to let me know that a delivery has arrived, Doubao automatically answers and speaks to my bot. pic.twitter.com/rpGvGUVOvA
— Taylor Ogan (@TaylorOgan) December 4, 2025
During his walks, Organ admitted that he uses the device as a passive intelligence layer to identify whether a store is part of a network of brands in Shenzhen, check trademark and company registration data, and assess whether a passerby wearing an NYPD jacket is a real police officer. In the demo, the system correctly contextualizes the location (Shenzhen) and identifies the jacket as a civilian fashion item.
Story continues below this ad
The demo also shows ByteDance's image generation tool, which changes only the clothing in the photo while leaving the scene unchanged. This allows agents to re-render people in Chinese police uniforms or FBI jackets on request.
What does this mean for us?
The device is essentially an OS-native GUI agent trained on the Chinese mobile UI flow and supported by a large-scale multimodal inference model. You no longer need to understand apps, menus, or workflows. Just state the intent of your call. Handle execution.
Currently, nothing in the global smartphone market has demonstrated this level of autonomy. It remains to be seen whether this will be commercialized, but this prototype clearly shows how agent-enabled smartphones have the potential to change our lives. It also shows that the first true agent smartphones may not come from Silicon Valley, but from China's integrated AI and mobile ecosystem.
