Disruption of encoder training in diffusion models enables more efficient generation AI

Investigating a new approach to improving the generated AI model — The developed model model added noise to the actual data via the encoder and modified the model via the samples reconstructed via the decoder. To reduce computational costs and prevent overfitting, we use two objective functions: previous loss and drift matching. Credit: Tokyo Institute of Science

A new framework for the generative diffusion model was developed by researchers at Science Tokyo, and the generative AI model has been significantly improved. This method reinterpreted the Schrodinger Bridge model as a variational autoencoder with infinite number of latent variables, reducing computational costs and preventing overfitting. By properly disrupting encoder training, this approach has enabled the development of more efficient generation AIs with wider applicability than standard diffusion models.

Diffusion models are one of the most widely used approaches in generation AI for creating images and audio. These models generate new data by gradually adding noise (noise) to the actual sample and learning how to return the process (removal) to realistic data. A widely used version of the score-based model achieves this by a diffusion process that connects a sufficiently long interval before the data. However, this method will result in longer time intervals between noise and removal processes, slowing sample generation if the data is strongly different from before.

Currently, a research team at the Tokyo Institute of Science (Science Tokyo) in Japan has proposed a new framework for diffusion models. They achieved this by reinterpreting the Schrödinger Bridge (SB) model, a type of diffusion model, as a variational autoencoder (VAE).

The study was led by graduate Kentaro Kaba and Professor Masayuki Ohzeki of the Faculty of Physics at Science Tokyo. Their findings were published in Physical Review Research September 3rd, 2025.

The SB model is more flexible than a standard score-based model because it allows you to connect any two probability distributions over a finite time using stochastic differential equations (SDEs). This supports more complex nosing processes and high quality sample generation. However, the trade-off is that SB models are mathematically complex and expensive to train.

The proposed method addresses this by reorganizing the SB model as a VAE with multiple latent variables. “The key insights are to extend the number of latent variables from one to infinitely and take advantage of data processing inequality. This perspective allows us to interpret SB-type models within the framework of VAES,” says Kaba.

In this setup, the encoder represents a forward process that maps real data into noisy latent spaces, while the decoder reverses the process of reconstructing realistic samples, and both processes are modeled as SDEs trained by neural networks.

This model employs a training goal with two components. The first is the previous loss, ensuring that the encoder correctly maps the data distribution to the prior distribution. The second is drift matching. This trains the decoder to mimic the dynamics of the inverse encoder process. Furthermore, once previous losses stabilize, encoder training can be stopped early. This allows you to complete learning faster and reduce the risk of overfitting and maintaining the high accuracy of your SB model.

“The objective function consists of previous losses and drift matching parts that characterize the training of the encoder and decoder neural networks, respectively. Together, it reduces the computational cost of training for the SB type model. It has been demonstrated that disrupting the training of the encoder reduces the challenges of overfitting.

This approach is flexible and can be applied to other probabilistic rulesets, non-Markov processes, making it a widely applicable training scheme.

detail:
Kentaro Kaba et al., Schrödinger Bridge-Type diffusion model is an extension of the variational autoencoder. Physical Review Research (2025). doi:10.1103/dxp7-4hby

Provided by Tokyo Institute of Science

Quote:More efficient generation AI (2025, September 29) was obtained on September 29, 2025 from https://techxplore.com/news/2025-09-Encoder-diffusion-enables-enby-efficient-generic.html due to interruption of encoder training in diffusion models.

This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.

Source link