AI voice clones appear in audiobooks from Amazon, Apple and Google

Audiobooks (originally known as “talking books”) are a fairly recent phenomenon, but their history predates Apple and Amazon. The concept of talking books began in his 1930s and existed for use by the blind. It wasn’t until the 1970s that taped books began to ease the anxiety of commuters. But it wasn’t until they were absorbed into our mobile phones that this media really took off.

Audiobooks have grown steadily since the beginning of the iPhone era. The industry has seen double-digit growth in his decade, and the trend is expected to accelerate further. Publishing industry research firm Wordsrated estimates that the audiobook sector is now estimated at more than $5 billion in sales, nearly $2 billion in the United States, the world’s largest audiobook market, with revenues It is expected to grow by 26.4% annually from 2022. According to Wordsrated, audiobooks are “the world’s fastest growing book format, surpassing it by a significant margin.”

Audiobooks will also become another market that AI will attempt to invade, with AI-generated voices stealing microphones from voice actors. Are consumers ready for AI to whisper in their ear? In fact, it’s already happening.

Alphabet’s Google Play and Apple Books have made some use of AI-generated voices, and we expect this trend to continue. Google Play offers publishers the ability to create auto-narrated audiobooks, as long as the publisher owns the audiobook rights and opts for auto-narration. Nothing is created without the consent of the publisher, nor can consumers legally create it themselves.

“For many publishers, producing an audiobook can be a significant investment,” said Judy Chang, director of product management for Google Play Books. Paying voice actors is part of the cost formula. “Publishers can assess audiobook demand for their titles before investing in human narration,” she says.

how people listen to books

People love audiobooks. These are the most commonly consumed audio products after music. But his use of AI voices in audiobooks provides what might be described as a particularly intimate use of new technology. It’s not like asking Alexa for the weather or playing a song. And this could mark the limits of how consumers (and businesses) can or can go about replacing human narrators with computer-generated voices, at least for now.

“People are very sensitive to sound,” says David Ciccarelli, CEO of Voices, the largest voice-over market. With his eyes he can discern movements at 24 frames per second, but with his ears he can discern movements with fidelity 20,000 times per second. “Most people listen to audiobooks with earphones, so it feels more intimate,” he added.

The quality of narration is also an important issue, as it is highly dependent on the listener’s sense of connection with the voice. “Nearly 60% of listeners ditched audiobooks because they didn’t like the narrator. People love to listen to other people, especially when stories are told,” Chicarelli said. rice field.

Making AI voices not only sound human, but also connecting with listeners couldn’t be easier. Vocalization is, after all, acting, and that art is difficult to reproduce. “The best thing humans can do that AI can’t do is timing,” Chicarelli said. “Whether it’s between awkward moments or a hilarious sense of comedy-like timing, it’s hard for AI voices to get this right right away.”

Speed can also be an issue for AI, as the pace of the narration changes depending on what is being read. We naturally read some parts of plots and arguments at a different rate than others because we understand what we are reading. AI is not. “Professional narrators know when to speed up and then return to their normal reading pace,” Chickerell says. They also know how to pronounce words and have no problem with homographs.

AI voices will get better, and listeners will have less resistance to AI voices accordingly. The question with groundbreaking new technology is not “when,” but “when.” Chiccarelli knows it.

“The industry knew that change was coming and that AI was going to be here and better,” he said. “It went from funny to so-so and now it’s getting better and better,” he added. Voice cloning for professional voice artists is foreseeable, emphasizing the importance of treading its path ethically and protecting the “credit, consent and reward” rights of voice actors’ work.

Even AI voices nominally have voice actors somewhere in the process. According to Voicebot.ai founder and CEO Brett Kinsella, text-to-speech systems are gaining popularity in the media because they can express more high-fidelity emotional content through synthesized speech. But even in these cases, you need a voice actor to transform your voice into another voice.

Voice actor’s statement

Some voice actors have chosen to refrain from activities. Brad Ziffer, a voice actor with 14 years of experience, said, “I refuse any voice work where they listen to my voice and make an AI model out of it.” “The best way to protect yourself is to just keep your distance,” he said.

Over the past 20 years, narrators have gone from reading printed copies of books and editing page-turning sounds to reading aloud on tablets. From studio-only recordings to multi-title recordings at home. An audio editor has gone from splicing tapes with a razor to editing his files digitally by rolling back and overwriting mistakes. The publisher is moving content from cassette to his CD distribution to digital distribution. “Each transition has been accompanied by fear and anxiety, but with each transition we have learned, grown, adapted and thrived,” said Michelle Cobb, executive director of the Audio Publishers Association.

Cobb said the growth of the audio industry has expanded the range of opportunities, and new technologies are part of it. As the audience grows and the appetite for audio content grows, publishers promote original and audio-first productions, expanding their creative approaches and encouraging more consumers to try audio. He said it is now possible. “AI technology can assist workflows. AI is not a new tool for voice talent, producers and publishers, many of whom are using AI to improve post-production quality control,” he said. Stated.

As of last week, that approach to voice production now includes The Beatles.

This evolution will inevitably include the risks posed by AI. “The fear of machines taking over someone’s life, regardless of their profession, is real,” Cobb said. “But I know I’m not alone in appreciating the deep, rich, emotionally intelligent performance of my favorite narrator, a traditional verbal narrator who effectively tells a human story. I know,” he added.

Where ChatGPT meets Alexa and Siri

The biggest changes happening now are focusing on text and images rather than voice, with generative AI chatbots led by OpenAI’s ChatGPT taking over more sentences, including novels, and generative AI graphic models turning images into generating. Kinsella noted that early on, AI voices played a fundamental role in helping him integrate AI into everyday life. “Voice was actually the previous wave of AI…Siri, Alexa and Google Assistant all use synthetic voices,” he said. The inputs and outputs of these devices will evolve from voice to voice, and eventually his text-based AI form may follow a similar development pattern. “ChatGPT brings back the text-first approach. While some use cases remain text, others naturally transition to speech input first, and then audio (synthetic speech) output over time. ,” Kinsella said. “ChatGPT’s mobile his app currently allows voice input, but does not have text-to-speech functionality for audible responses, which will certainly be the case for some use cases.”

When it comes to publishing, audiobooks are on the rise, but still represent a relatively small percentage of the overall publishing pie, so additional time and cost requirements will continue to influence decision making.

“Some publishers don’t want to pay the extra cost, but some authors are reluctant to pay for it themselves,” Kinsella said. “Even if the author recorded it with their own voice, it would take studio and editing costs and could take days to complete.”

AI can make overcoming these barriers a little easier.

In an effort to bring more audiobooks to readers, Apple has developed programs to reduce or eliminate friction in audiobook production. Authors can create audiobooks with no upfront direct cost or time commitment. Companies that serve Apple authors receive a fee for each audiobook sold.

Amazon, which owns Audible, one of the big players in the space, also offers a similar audiobook recording service, but it uses professional voice actors instead of synthesized voices. “It would be logical to add voice clones and Poly synthesized voices to this kind of service, but I’m not aware of any activity on this front,” Kinsella said.

Apple declined to comment. Amazon did not respond to requests for information about the audiobooks it offers.

Text format that AI is most likely to read

Ziffer is understandably concerned about the role AI will play in his profession. “I am very cautious about the world of AI. I believe AI has great potential…but it is easy to exploit. Believe me, no synthetic speech algorithm has come yet, it can perfectly reproduce all the nuances of the human voice,” he said.

AI voices need to overcome natural vocal inflections, comprehension/interpretation of reading, ability to evoke emotion, and emotional changes in response to material dictates. As companies start experimenting with AI, Ziffer said he wouldn’t be surprised if there was some impact on his income. But he said, “He still hasn’t found a client who said he chose AI voice over me.

Ziffer expects AI to be most widely used among companies with smaller budgets and those focused on e-learning texts. “But for those who want the best, the job is best left to humans,” he says. “Lively actors who have real feelings, brains and emotions and can bring the work to life are great candidates for dynamic and authentic VO. , nothing beats the real thing.”

Voice actor Andrea Collins, who has 15 years of experience, also believes AI will bring necessary tradeoffs for some companies. “I think this will be a great tool for clients who want their projects to be completed very quickly and at a reasonable price,” she said. Texts that companies ignore real voices for speed include presentations and compliance materials. Speed is also an unavoidable factor in general audiobook production.

“When it comes to audiobooks, AI voices can process 30,000 words much faster than humans, so I’m pretty sure they’re going to need a lot of space,” Collins said.

He hasn’t seen AI have a significant impact on his finances yet, but “I think that day will come. So I’m trying to stay ahead of the curve instead of worrying about it. ‘ she added.

Collins has taken steps this year to replicate her voice. “Most prominent artists I know do the same. My hope is that my clone voice will become another tool in my business, while I can work passively on projects. , to be able to have my voice on projects that need a human voice. Bigger budget,” she said.

Veteran voice actor John Kubin says his peers need to manage AI’s new reality wisely. “I have been saying for several years when this technology was new that half of his VO acting work would be wasted because of it. I still believe this to be true. But it may be years from now before that happens.”

He focuses on what he expects to be new market segments for long-term projects, where AI and human cloned voices can meet in the middle. “The 100,000+ word scripts used in many of these big projects can never be touched by a 10-foot pole. We will give it away and collect the money for free,” Kubin said.

He knows that many of his colleagues may still object to using a machine to get to bed. “I might be one of the few creators/voice actors who thinks this is the best thing since sliced bread,” Kubin said. But from a business perspective, he said, it will be difficult to keep up with the changing scale of AI. “I’ve been joking for a while saying, ‘If I could make money just dubbing without dubbing, that would be great!'” Well, here we are. “

Source link