Beijing, February 5, 2026 /PRNewswire/ — Kling AI, the AI-powered creative platform, today announced the launch of Kling 3.0 models, including Video 3.0, Video 3.0 Omni, Image 3.0, and Image 3.0 Omni. This gives creators greater narrative control and stronger consistency in AI video generation. This model series features significant upgrades in consistency, photorealistic output, increased video duration up to 15 seconds, and native audio production in multiple languages, dialects, and accents.
Video 3.0 improves element consistency with video and image references.
Powered by a unified unified training framework, the Kling 3.0 model series supports full multimodal input and output across text, images, audio, and video, bringing together video understanding, generation, and editing into one streamlined AI workflow. The model integrates multiple tasks such as text-to-video, image-to-video, reference-to-video, and in-video editing into a single native multimodal architecture, allowing the model to follow complex narrative logic, provide precise shot control, and maintain strong prompt compliance.
Kling AI 3.0 models are currently available for early access exclusively to Ultra subscribers and will be available to the public soon.
Video 3.0: Cinematic quality video production
Video 3.0 improves element consistency and provides even more creative control, allowing creators to upload reference videos and multiple image references to ensure visual consistency of characters, objects, and scenes from frame to frame.
Its main features are:
- Native audio for different languages and accents: This model can generate speech in English, Chinese, Japanese, Korean, Spanish, and accents such as American, British, and Indian accents. It also allows users to create complex multi-character dialogue scenes, where each character speaks a different language, with precise user control over content, delivery, and speaking order.
- Extended video time: The Video 3.0 model also supports generating long videos up to 15 seconds. Extended video playback times also mean the model can handle complex sequences with long takes and multiple developments with smooth cinematic transitions.
- Intelligent multi-shot storytelling: Video 3.0 understands multi-scene, multi-shot instructions and dynamically adjusts camera angles and shots to suit your creative direction, from classic shot-reverse-shot dialogue to advanced cross-cutting dialogue and voiceovers.
- Improved preservation of text in images: This model can preserve or generate text such as signage, captions, and branding elements with high accuracy. This is especially valuable in e-commerce advertising use cases, for example, when a character is wearing a branded shirt and the logo remains crisp and legible throughout the video.
- photorealistic output: Video 3.0 can present realistic characters with expressive, dynamic performances and produce photorealistic output with enhanced realism.
Video 3.0 Omni: Advanced Storyboarding and Reference Management
Built on Kling Video O1’s “Elements” functionality, Video 3.0 Omni provides advanced reference-based generation for unparalleled consistency. Creators can upload a reference video, allowing AI to extract the visual and vocal characteristics of their characters and faithfully recreate them across new scenes.
The new Video 3.0 Omni model also rolls out multi-shot storyboarding capabilities, allowing users to generate professional shots with the ability to specify each shot’s length, shot size, perspective, narrative content, and camera movement in the storyboard.
Image 3.0 Omni: Ultra-high resolution visuals with cinematic realism
In addition to video updates, Kling AI introduces Image 3.0 and Image 3.0 Omni, which now support 2K and 4K ultra-high resolution output for professional use cases ranging from virtual scene visualization to full-scale production assets. This model maintains the quality of textures, lighting, and materials with incredible precision and consistency for exceptional realism.
Built on the foundation of the newly introduced Kling O1 and 2.6 series, the Kling AI 3.0 model lineup embodies the Multimodal Visual Language (MVL) framework and marks a decisive evolution from basic video generation to sophisticated professional orchestration. Delivering measurable advances in narrative accuracy, output quality, and cinematic control.
Since its launch in June 2024, Kling AI currently serves more than 60 million creators worldwide. To date, we have produced over 600 million videos and partnered with over 30,000 enterprise clients. Its adoption is widespread in the film and advertising industries, accelerating the visualization of storyboards and product concepts, and powering production workflows from animation and CGI to creating entirely new visual assets.
The debut of Kling 3.0 heralds an era in which anyone can turn their ideas into a movie, with the role of AI fundamentally changing from a mere generative tool to an intelligent creative partner who can understand artistic intent and transform ideas into reality.
For more information about Video 3.0, see below. https://app.klingai.com/global/release-notes/whbvu8hsip?type=dialog
For more information about Image 3.0, see below. https://app.klingai.com/global/release-notes/rz3idhopum?type=dialog
Source: Kling AI

