First open source model for synchronous 4K video and audio generation

In a bold move to shake up the landscape of AI content creation, Lightricks, the company behind popular apps like Facetune and the early AI video platform LTX Studio, has fully open sourced its latest multimodal model, LTX-2. Announced in early January 2026, this release marks a significant shift for the company, moving from proprietary tools to community-driven development.

The LTX-2 ranks 23rd on the LMSYS Video Arena leaderboard, but its real strength lies in being the first fully open-weight model capable of producing clips of up to 20 seconds with synchronized audio (including dialogue, music, and sound effects) at resolutions up to 4K and frame rates up to 50 FPS.

This feature builds on the foundation of the earlier LTX-Video model, which powered LTX Studio's “content factory” capabilities before similar tools flooded platforms like X (formerly Twitter).

Founded in 2013 and known for bridging imagination and creativity through AI-driven apps, Lightricks has long been focused on supporting creators. The company's LTX Studio is one of the pioneering platforms for AI-assisted video production, allowing users to generate content from concept to final rendering. However, the decision to open source LTX-2 raises interesting questions regarding their business strategy.

As one X user pointed out, community tools like ComfyUI and n8n are already replicating LTX Studio functionality using other models, potentially making proprietary technology commoditized.

By releasing the full stack of LTX-2, including model weights, inference pipeline, and training code, Lightricks may aim to foster an ecosystem around its technology and drive adoption while maintaining premium APIs and enterprise-grade services.

LTX-2 technological breakthrough

At its core, LTX-2 is a diffusion transformer (DiT)-based foundational model with 19 billion parameters, roughly divided into 14 billion for video processing and 5 billion for audio. It employs an integrated asymmetric two-stream transformer architecture to co-generate audio and video through a cross-attention mechanism, ensuring seamless synchronization in a single pass.

This design allows for multimodal inputs such as text to video, image to video, audio to video, or a combination thereof, producing a consistent output with perfect matching of visuals, lip movements, environmental sounds, and music.

The main features are:

resolution and performance: Supports generation up to 4K (but achieved via a multi-stage pipeline with spatial and temporal upscalers, rather than pure native output). Frame rates reach 50 FPS and clip lengths extend to 20 seconds, longer than many competitors.
Audio integration: Native support for dialog, background music, and SFX that are generated synchronously without separate post-processing.
control function: Includes LoRA (Low Rank Adaptation) for precise control over camera movement, structure, depth, pose, and style. Keyframe interpolation and automatic prompt enhancements further refine the output of your production workflow.
efficiency: Optimized for consumer-grade GPUs, especially NVIDIA hardware. The model is quantized in NVFP8 (30% smaller size and up to 2x faster speed) and NVFP4 formats, enabling local execution on systems such as RTX GPUs with just 60% less VRAM. In partnership with NVIDIA, these optimizations enable high-fidelity production without relying on the cloud.

Although Lightricks' site claims “native 4K,” a look at the technical details reveals that the higher resolution relies on an upscaling module (such as the x2 spatial and temporal upscaler) in a two-stage pipeline.

While this approach is effective at achieving crisp 4K results, it means that base generation is done at a lower resolution before scaling, similar to techniques used by other AI companies such as Stability AI. Community discussions on X highlight this nuance, with users of ComfyUI workflows noting the built-in upscaling of the final output.

The model is available on Hugging Face and GitHub, along with a monorepo codebase containing the core definition, pipeline, and packages for training. Training is fully supported and you can fine-tune LoRA for custom styles and motions in less than an hour with the appropriate hardware. Licensed under a community agreement, it emphasizes ethical use while warning against potential bias and the production of inappropriate content.

Impact on communities and ecosystems

This release created a buzz across the AI community. With X, creators praise the integration with ComfyUI, which allows for a seamless workflow for video generation. This is further amplified with NVIDIA optimizations, with reports of 3x faster inference on RTX cards. Early demos showcase cinematic clips from animated scenes to realistic narratives, all with integrated audio.

But the open source strategy confounds some observers. Lightricks' core business revolves around premium tools such as LTX Studio, which charges for advanced features. By making LTX-2 available for free, we risk cannibalizing our own products, especially if enthusiasts replicate pipelines like Studio with open tools.

The co-founder and CEO's comments on Reddit suggest a focus on accelerating innovation through community contributions, with the potential to feed back into the commercial ecosystem.

Also read:

Looking to the future

LTX-2 marks a milestone in the democratization of AI video production, lowering barriers for independent creators and studios. Although it does not outperform benchmarks, its focus on control, efficiency, and openness positions it as a foundational tool for future development. As AI video evolves, Lightricks' pivot could encourage more companies to embrace open source and facilitate rapid advances in multimodal generation.

Now, creators can get involved through Hugging Face's demo or local settings and turn their prompts into polished videos with unprecedented ease. Whether this will keep Lightricks in business remains to be seen, but there is no doubt that this release will accelerate the creative revolution in AI.

Source link