The world's first fully agent-driven AI smartphone: Is this China's second DeepSeek moment? | Technology News

Applications of AI


China is vigorously pursuing an AI arms race. While the rest of the world is seeing an influx of AI-driven smartphone features, primarily voice assistants and app-to-app interactions, China is making big strides. ZTE, a multinational telecommunications company based in Shenzhen, has introduced a smartphone equipped with an AI agent. Built in collaboration with ByteDance, the device features an agent that not only resides within the app but also integrates directly into the operating system. Its most amazing feature is that it can operate a smartphone just like a human.

Taylor Organ, an entrepreneur from Shenzhen, shared a prototype called Nubia M153 on his X (formerly Twitter) account. The smartphone runs on a customized version of Android integrated with ByteDance's Doubao AI agent. For the uninitiated, Doubao is ByteDance's proprietary large-scale general-purpose AI model ecosystem that has been widely deployed across China as a chatbot and productivity tool.

This prototype is much more than a regular on-device assistant. Organ's demo showed that AI can take full stack control of a phone. This means that the AI ​​can examine the user interface, open the app, download the app, tap the screen to type, make a call, and perform multi-step tasks without the user even knowing which app they need. Simply put, the AI ​​here uses the phone like a human user, not like an app.

What does the Agentic AI smartphone do?

Organ started a thread and indicated that he would request the AI ​​to find someone willing to wait in his line. Although this is not yet the norm in India, gig economy apps in China typically offer waiting list services to people in hospitals, government offices, and other high-demand venues. Organ asks the AI ​​questions in English, and the AI ​​responds immediately. You'll see the AI ​​select the local service app, configure the task, fill in the required fields, and provide a final confirmation screen. The CEO admits in a short video that he wouldn't have known which app would handle the job or how to configure it. This video shows an AI agent running the entire process autonomously.

This is a breakthrough because most current AI assistants on smartphones can reason about tasks but cannot interact with third-party apps on your behalf. Samsung, Apple, and other tech giants are experimenting with AI actions, but they are largely permission-gated and restricted to partner apps only. The ZTE-ByteDance prototype here is much more advanced as the AI ​​can act as if it were a human directly within the graphical user interface (GUI).

The hardware behind Agentic AI

Organ revealed in the thread that the prototype is powered by Qualcomm's new Snapdragon 8 Elite Gen 5 chipset with 16 GB of RAM. This is a key point for agents to split their workload between cloud-based semantic inference and on-device screen control. According to the OP, running “screen vision” locally allows the AI ​​to work quickly and maintain the privacy of sensitive UI interactions like payment flows and passwords.

When it comes to AI models, ByteDance's Doubao is currently used by more than 175 million people in China. This is essentially a large, sparse, expert mixed model with multimodal meaning text and vision and support. The second example is when Organ clicks on a photo of a NIO battery swapping station and asks, “What is this?” The model identifies stations from images, links them to NIO's national EV charging network, and explains how it works.

Cloud + on-device architecture

Probably the coolest demo is hotel booking. The CEO takes one photo of the hotel entrance. He doesn't say anything other than that he plans to book a stay. AI understands assignments and divides the workload.

Story continues below this ad

First, Doubao (in the cloud) translates the semantics of what hotel it is, what you want to book tonight, whether pet policy is important, etc. Second, Nebula-GUI (on-device) is reportedly a 7 billion parameter model trained by ZTE that handles physical actions such as opening Ctrip (a Chinese booking app), entering dates, finding the best rate, checking the app's pet policy, and notifying Organ if dogs are allowed.

According to the demo, this two-tiered architecture allows tasks to run smoothly. Simply put, Doubao makes the plans and Nebula-GUI executes them.

App-level knowledge and interaction with other bots

In another demo, an agent is asked to book a robotaxi, and Doubao uses GPS data to find local ride-hailing apps and determine which operator will operate a particular route. On Organ's phone, the Nebula-GUI opens the Baidu Apollo app, navigates the menu, selects the ride location, and confirms the trip. After a while, Organ asked to change the drop-off location during the ride. Again, the AI ​​recognizes an active Apollo session, opens the correct screen to change the destination, and triggers a confirmation both on the phone and on the robotaxi itself. This is a great demonstration of AI's app-specific knowledge.

During the demo, if Organ forgets the phone number linked to his Apollo account, the AI ​​will navigate through the app's settings and display the last four digits. Now, this is something that most AI assistants can't do without having access and deep OS-level visibility.

Meanwhile, in another test, Organ is using Meituan, a Chinese tech company that provides on-demand drone delivery services. He asks the agent to order two drinks, and the agent updates the cart, makes payment, and arranges for delivery to a nearby locker. Then, when Meituan's automated system makes a confirmation call, Doubao answers on your behalf and speaks to Meituan's bot. Therefore, both bots complete the exchange without user intervention. This is an example of how an agent can negotiate with other agents on behalf of a user.

During his walks, Organ admitted that he uses the device as a passive intelligence layer to identify whether a store is part of a network of brands in Shenzhen, check trademark and company registration data, and assess whether a passerby wearing an NYPD jacket is a real police officer. In the demo, the system correctly contextualizes the location (Shenzhen) and identifies the jacket as a civilian fashion item.

Story continues below this ad

The demo also shows ByteDance's image generation tool, which changes only the clothing in the photo while leaving the scene unchanged. This allows agents to re-render people in Chinese police uniforms or FBI jackets on request.

What does this mean for us?

The device is essentially an OS-native GUI agent trained on the Chinese mobile UI flow and supported by a large-scale multimodal inference model. You no longer need to understand apps, menus, or workflows. Just state the intent of your call. Handle execution.

Currently, nothing in the global smartphone market has demonstrated this level of autonomy. It remains to be seen whether this will be commercialized, but this prototype clearly shows how agent-enabled smartphones have the potential to change our lives. It also shows that the first true agent smartphones may not come from Silicon Valley, but from China's integrated AI and mobile ecosystem.





Source link