
Based in San Francisco
Standard Intelligence announces AI model “ FDM-1 ” has been trained on 11 million hours of video and is billed as “the world’s first general-purpose computer action model.”
The first fully general computer action model | Blog
https://si.inc/posts/fdm1/
AI that can operate PCs has already been put into practical use, but many AI models are developed through reinforcement learning of visual language models (VLMs) developed based on PC screenshots, and are not suitable for long-term tasks such as operating CAD applications. Additionally, VLM development requires annotating screenshots, which requires a large number of people and a significant amount of time.
Unlike VLM-based PC manipulation AI, FDM-1 is trained using a total of 11 million hours of video from the internet, including video recordings of video editing and live coding broadcasts. A system called “IDM” has also been developed to automate video annotation.
Automatic annotation of live-action videos is difficult, but in the case of PC operation videos, it is relatively easy to build an automatic annotation system because it is possible to link screen changes and operation contents on a one-to-one basis, such as “When “h” appears on the screen, it means “h key was pressed.” To develop FDM-1, we first asked a company to manually annotate 40,000 hours of video, and then developed IDM from that data. Automatically annotate 11 million hours of video.

Furthermore, we have also developed an encoder tailored to the operating conditions specific to PCs. These innovations allow FDM-1 to achieve high efficiency, allowing 200,000 tokens to represent 36,000 frames of video. For the same 200,000 tokens, Gemini can only process 775 frames and Claude can only process 162 frames. The development team emphasized the high efficiency of FDM-1, saying, “Approximately 2 hours of 30fps video can be compressed into 1 million tokens.”

FDM-1 can now process longer videos with fewer tokens, automatically running context-sensitive applications such as CG and CAD applications.

It is also possible to use FDM-1 as an autonomous driving system by replacing car operations with arrow key operations.
Standard Intelligence claims that FDM-1 allows PC-operated AI to move from being constrained by training data to being constrained by computational complexity.
