Stability AI’s new model is slightly better at hand generation

Stability AI, a startup funding various generative AI experiments, has released a new version of Stable Diffusion, a text-to-image AI system that initially competed against OpenAI’s DALL-E 2.

Called Stable Diffusion XL (SDXL), the new system is available in beta through DreamStudio, Stability AI’s generative art tool, and improves on the original in key ways. Tom Mason, his CTO at Stability AI, said it brings “richness” to image generation that was lacking in the older model (Stable Diffusion 2.1), with the most notable improvements being seen in applications such as graphic design and architecture. I’m here.

“We are excited to announce the latest in our Stable Diffusion series of imaging solutions,” he said in a boilerplate. “[It’s] Transforming several industries…the results are happening before your eyes. “

Exaggeration aside, SDXL certainly looks as good as, and perhaps even better than, the model responsible for MIdJourney’s latest release, “Balenciaga Pope” (among other memes).

Previous versions of Stable Diffusion and many other text-to-image systems struggled a lot to reproduce certain anatomy like hands, but SDXL has no such problems. . Hands are always…well, unrealistic. But they’re way ahead of the nightmare fuel that SDXL’s predecessor often produced.

The SDXL has better hand handling, but is clearly not perfect.

Stable Diffusion 2.1 is clearly inferior in hand. (I’ll see for myself.)

SDXL seems to be good for text generation as well. This is a task that has historically thrown generative AI art models into a loop. But if my quick test is any indication, it still has a way to go.

The upper part is the result of Stable Diffusion 2.1. The lower row is the output from SDXL.

Stability AI also said in a press release that SDXL features “enhanced image composition and face generation” and, unlike its predecessor, requires long and detailed prompts to create “descriptive images.” I claim not. In addition, SDXL includes not only text-to-image prompts, but also image-to-image prompts (inputting one image and getting variations of that image), inpainting (reconstructing missing parts of an image). ), and outpainting (building seamless images). extension of an existing image).

As a wildcard, I tried to recreate the Balenciaga Pope meme with the shortest possible prompt: “Balenciaga Pope.” I have to say that the difference in results was bigger than I expected. She was posing in stylish apparel.

Like previous iterations of Stable Diffusion, SDXL will be open source once SDXL leaves beta, says Stability AI. In addition to DreamStudio, SDXL is now available through Stability’s API and has early access.

While the art of generative AI technology is moving forward, tools like SDXL are causing major problems in the process of building and commercializing companies. Stability AI is in the crosshairs lawsuit The company claims it violated the rights of millions of artists by developing tools using copyrighted images collected from the web.stock Image supplier Getty Images also sued Stability AI, reportedly for using images from the site without permission to create the original Stable Diffusion.

The open source release of Stable Diffusion has also been controversial due to its relatively light usage restrictions. Some communities on the web have taken advantage of it to generate eponymous celebrity deepfakes and graphic depictions of violence. To date, at least one U.S. Congressman has called for regulation to address the release of “poor content control” models like Stable Diffusion.

In response to the lawsuit, Stability AI recently pledged to honor an artist’s request to remove art from Stable Diffusion’s training data set, but it did not apply to SDXL. Only his Stable Diffusion model for the next generation, codenamed “Stable Diffusion”. 3.0”, according to Spawning, the organization leading the opt-out effort, so far the artist has removed more than 78 million of his works of art from the training dataset.

Stability AI is under pressure to monetize a vast array of AI efforts, from art and animation to biomedicine and generative audio. Stability AI CEO Emad Mostaque has hinted at plans for his IPO, but Semafor recently announced that Stability AI had raised more than $100 million from his venture capital last October, with a reported valuation. The amount he reported is over $1 billion. generate income,”

Source link