Why Avatar is bullish on cracking AI video and winning for global giants

The race for AI video generation is currently dominated by global giants like OpenAI, Google, and Chinese startups with computing budgets in the billions of dollars. All of those companies are competing with larger models and more GPUs, but a Bengaluru-based startup has emerged to counter this trend.

Just as Chinese AI lab DeepSeek changed the course of the LLM market, 12-year-old Peak XV helped avatar attempted a similar disruption in the AI video generation market.

With the release of Varya, an AI video generation model, Avataar claims it can now generate videos for as low as ₹0.50 per second. This is at least 10x cheaper than the cheapest AI video creation model currently available.

According to the startup, this is India’s first distilled AI video generation model developed under the Indian government’s IndiaAI Mission. In general, generating AI videos is a token-intensive task that requires a large amount of computing power.

With its distilled AI video generation model, the startup was able to significantly reduce computing costs, proving that the winners of this competition are not those that use more computing, but those that require the least amount of computing.

However, Avatar’s cost advantage does not come from building smaller models. “Most distillation projects work by shrinking the number of parameters; a 70 billion parameter model is compressed to 7 billion,” said Sravant Aluru, co-founder and CEO of Avatar.

However, while this method helps users save cost and time, it often comes at the expense of quality.

Avatar already knows this. Varya is built on Alibaba’s open source Wan 2.2 architecture and maintains a 14-Bn parameter footprint of the same size as the teacher model.

What Avatar changes is the way we reason about video generation. Standard diffusion-based video models produce their output through a long iterative denoising process, typically around 50 consecutive steps. In this process, the model incrementally refines the noisy signal into a coherent video.

Varya collapses this into four steps, but accomplishes this through a redesigned inference framework where each step performs a separate function, rather than repeating the same operation at a finer resolution.

Aluru told Inc42 that the first two steps focus on trajectory formation, establishing the rough structure, motion path, and composition logic of the video, and the last two steps generate the actual output frames.

The system internally integrates several techniques, including role-aware monitoring, distribution matching, and classifier-free guidance enhancement.

Specifically, on the NVIDIA H200 GPU, Varya produces a 5-second 720p video in about 45 seconds.

Running the same task on the underlying base model, Wan 2.2, takes approximately 1,230 seconds. The company claims 27x speed and cost improvements compared to supervised models.

Will Avatar be able to succeed where Sora failed?

To understand why Varya is important, let’s take a look at the global AI video market. This market, despite spending huge amounts of money, failed to find a mass audience.

Of note here is OpenAI’s Sora. Launched in late 2024 to tremendous hype, the model produced photorealistic video clips that seemed to usher in a new era of content creation.

The economic situation behind the scenes was less appealing. By March 2026, each 10-second clip in OpenAI will cost approximately $1.30 to produce, translating to approximately $15 million in inference costs per day, compared to lifetime revenue of just $2.1 million.

Downloads peaked at 3.33 million in November 2025 and fell to 1.13 million in February 2026, with active users below 500,000.

Additionally, a $1 billion partnership with Disney that was supposed to unlock more than 200 characters from Marvel, Pixar, and Star Wars also fell apart. Ultimately, OpenAI shut down the Sora standalone app in March 2026.

Sora’s collapse did not destroy the AI video. However, it has revealed structural problems that the industry has not been able to resolve. The cost of producing high-quality video during inference remains prohibitive for most users and most use cases.

For example, Google’s Veo 3.1 Standard video costs about $0.75 per second, while OpenAI’s Sora 2 costs between $0.30 and $0.50 per second.

Runway Gen-4.5 costs about $0.15 per second, and Kuaishou’s Kling 3.0 costs about $0.10 per second.

This means that a 30-second video can cost anywhere from a few dollars to more than $20, depending on the model. Although prices have come down significantly from 2024 levels, they are still too expensive for many small businesses, teachers, and independent creators, especially in markets like India.

Avatar’s Varya prices video generation at ₹0.48 per second, or approximately $0.0057 at current exchange rates. This makes it approximately 17 times cheaper than Kling, the world market cost champion, and more than 130 times cheaper than Veo 3.1 Standard.

Notably, the startup has not yet published technical documentation detailing Varya’s architecture and methodology, meaning these claims remain self-reported for now. An assessment of whether Varya can truly compete with global models in terms of quality as well as cost will only be possible once the paper is published and third-party benchmarks follow.

So far, Varya wins in the affordability department.

“A huge number of viewers have ideas but don’t have the affordable tools to express them in video,” Arles said.

12 years of development

Avatar’s path to building AI video models is not as linear as the announcement suggests.

Founded in 2014 by Sravanth Aluru, Prashanth Aluru, Gaurav Baid, and Mayank Tiwari, the startup focuses on spatial visual computing, specifically enabling e-commerce brands to replace flat 2D product images with life-sized 3D augmented reality experiences.

In January 2022, the startup raised $45 million in a Series B round led by Tiger Global with participation from Sequoia Capital India (now Peak XV Partners). This round was one of the largest funding rounds in the applied 3D/AR field at the time.

By 2023, the startup had around 180 specialists across Silicon Valley and Bangalore, and its corporate customers included brands like Sleep Number, Pepperfry and Bajaj Auto. The core competencies the company has built – understanding how objects, space, and visual context interact and doing so computationally at scale – have proven to be a meaningful foundation for the generative video problems they are currently solving.

With the launch of Varya, the startup will now have three main user groups for Varya.

Businesses can fine-tune models based on their own data, and integration with tools like Adobe Firefly is planned to automate video creation across marketing and product catalogs.

For creators and small businesses, the price of video generation is around ₹0.5 per second, making AI video production affordable for platforms like Instagram Reels, YouTube Shorts, and WhatsApp Status.

Aluru is particularly bullish on education, claiming that Varya will enable more than 1.5 million schools in India to create engaging visual learning content, allowing teachers to create educational videos without expensive production resources.

Peak XV’s Dr. Ranjan Anandan believes Varya’s importance extends beyond the launch of a single product. He sees this as part of a broader pattern that has defined India’s technology success story, where local innovations succeed not by matching Western incumbents in scale, but by radically cutting costs and adapting products for Indian users.

Pointing to similarities with Anandan’s rise, he argued that AI adoption in India will ultimately be driven not just by cutting-edge benchmarks, but also by affordability, cultural relevance, and population-scale accessibility.

“India has never built a leadership position in technology by emulating the West, simply because we haven’t been able to afford it. When it comes to AI, what we need going forward is a strategy of working differently: aiming for population scale, significantly lowering costs, making it culturally relevant and in India’s context,” Anandan said.

Edited by Nikhil Subramaniam

Source link