How is machine learning powering generative NFT art?

Generative art has long been one of the most prominent use cases for machine learning (MI), but the area has only recently gained widespread importance. This development has been driven primarily by advances in computation and a new class of techniques that allow programs to acquire knowledge without the need for large confidential data sets that are extremely rare and very expensive to produce. rice field. Add to that the fact that the divide that exists between the generative NFT art community and AI research has narrowed in recent years, and many of the new generative art techniques have yet to be widely adopted by big-name artists. latest approach.

The driving force behind generative art

Even many of the early AI innovators who saw generative AI as a relatively hidden subset of machine learning were surprised by its rapid rise. The remarkable progress in generative AI can be attributed to his three main factors:

Multimodal AI: The last five years have seen an explosion of AI techniques that work across multiple domains such as language, image, video, and voice. This enabled the development of models that create videos and images from natural language.

Pre-trained language model: Recent developments in multimodal AI have been fueled by significant advances in language models using techniques such as GPT-3. This allowed language to be used as the primary mechanism for producing artistic results such as images, sounds, and videos. In this new phase of generative AI, language plays a key role in lowering the barriers that allow individuals to communicate with generative AI models.

Popular model: Currently, the majority of photorealistic art created by AI techniques relies on a technique known as diffusion models. Before the introduction of diffusion models, the generative AI space was dominated by techniques such as generative adversarial networks (GANs) and variational autoencoders (VAEs), but these techniques were difficult to scale and lacked diversity. produce output. Diffusion models get around these limitations by destroying the input data image until it is completely destroyed by noise and then reconstructing it. In theory, if a model can reconstruct an image from something that is noise, it should be able to do so from virtually any domain, including other domains such as language.

Source link