AI image generator struggles to write and count

Generative AI tools like Midjourney, Stable Diffusion, and DALL-E 2 have amazed us with their ability to generate stunning images in seconds.

author

Seidari Mildjarili

Professor, Director, Artificial Intelligence Research Optimization Center, Torrens University, Australia

But despite their achievements, a puzzling divide still exists between what AI image generators can produce and what we can produce. For example, these tools often provide unsatisfactory results for seemingly simple tasks such as counting objects or creating precise text.

If generative AI has reached such unprecedented heights in creative expression, why struggle with tasks that elementary school students can complete?

Exploring the root causes can help reveal the complex numerical nature of AI and the nuances of its functioning.

Limitations in AI lighting

Humans can easily recognize different fonts and handwritten text symbols (letters, numbers, letters, etc.). You can also compose text in different contexts and understand how the meaning changes depending on the context.

Current AI image generators lack this essential understanding. They don’t understand exactly what the text symbols mean. These generators are built on artificial neural networks trained on large amounts of image data, from which they “learn” associations and make predictions.

Combinations of shapes in training images are associated with different entities. For example, his two inward lines that intersect might represent the tip of a pencil or the roof of a house.

However, when it comes to text and quantity, even small imperfections are noticeable, so the association has to be incredibly precise. Our brains can miss the tip of a pencil or the slightest misalignment of the roof, but not so much when it comes to how we write words or how many fingers we have on our hand.

As far as the text-to-image model is concerned, a text symbol is simply a combination of lines and shapes. With so many different styles of text and letters and numbers used in seemingly infinite arrangements, models often don’t learn how to effectively reproduce text.

The main reason for this is insufficient training data. AI image generators require much more training data than other tasks to accurately represent text and quantities.

Tragedy in the hands of AI

Problems also arise when working with small objects that require intricate details, such as hands.

In the training images, hands are often small, holding objects, or partially obscured by other elements. It becomes difficult for AI to associate the term “hand” with an accurate representation of his five-fingered human hand.

As a result, the hand generated by the AI is often look oddmore or less fingers, or hands partially covered by sleeves, purses, or other objects.

A similar problem arises with quantities. AI models lack a clear understanding of quantities such as the abstract concept of “4”.

Therefore, an image generator might respond to the “4 apples” prompt by learning from an infinite number of images that feature large amounts of apples, and return an incorrect amount of output.

In other words, the diversity of associations in the training data affects the accuracy of the output quantities.

Will AI one day be able to write and count letters?

It is important to remember that text-to-image conversion and text-to-video conversion are relatively new concepts in AI. Current generation platforms are “low-res” versions of what is expected in the future.

Advances in training processes and AI technology may make future AI image generators even more capable of generating accurate visualizations.

Also note that most publicly accessible AI platforms do not offer the highest level of functionality. Generating accurate text and quantities requires a highly optimized and tuned network, so a paid subscription to a more advanced platform may yield better results.

Seyedali Mirjalili does not work for, consult with, own shares in, or receive funding from any company or organization that might benefit from this article, and does not have any related, other than academic appointments. Affiliation not disclosed.

/ Courtesy of The Conversation. This material from the original organization/author may be of the nature of its time and has been edited for clarity, style and length. Mirage.News does not take any organizational positions or positions and all views, positions and conclusions expressed herein are solely those of the authors.

Source link