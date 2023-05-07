Kiwi voice artist Toby Ricketts believes his voice has been used without his permission to create an AI voice.

From an AI-generated deep fake Drake song, to voice actors discovering their cloned tones on the internet, text-to-speech tech is creating global controversy. Nikki Macdonald asks if New Zealand is immune.

Toby Ricketts​ had been expecting it. He’d already had friends overseas discover they’d been cloned.

But that didn’t lessen the uncomfortable feeling when he hit play on AI voice “Mitchell” and heard his voice-double talking back.

It’s not exactly him – it’s so awkwardly robotic, that would be defamatory, he laughs. It sounds like a Franken-merge of his voice and others, with maybe some Aussies added in.

It’s no substitute for the mellifluous tones of the 2018 and 2019 Voice of the Year, who makes his living voicing everything from ads to documentaries to World Health Organisation Covid videos, from his home studio in Northland.

But this is only the beginning. When Ricketts later puts a seven-minute clip into another cloning programme, then gets his AI twin to read a promo reel, the likeness is remarkable. “Scary,” he says.

AI is coming for the voiceover stars.

How does it work?

Ricketts is one of maybe 20-30 Kiwis whose main income comes from the most personal of instruments – their voicebox. Mostly, they’re anonymous – the breathy narration of wildlife documentaries, the breathless promotion of car ads, that damn supermarket checkout woman that tells you to remove your items from the bagging area.

But in the past month, the voices of the world have been stepping out of the shadows to protest the harvesting – and copying – of their instruments without their permission.

Sophisticated AI technology can now clone speech using just a short audio clip. Then text-to-speech programmes can make that cloned voice read anything you tell it to.

And as more AI voiceover sites pop up, voice actors are hearing themselves in the AI voices offered.

While artists and musicians are similarly worried their work is being unfairly used to train AI image and music generators, this is more than appropriating someone’s skill. It feels like stealing their very identity.

“It’s a bit like if you see someone on the street and you think, ‘Oh, that’s my friend’ and then you realise, ‘Oh it’s not quite my friend – something about their face is different’,” Ricketts says. “So it’s the same kind of uncanny resemblance, that’s a bit uncomfortable.”

“Mitchell” and “Molly” are two New Zealand accent voices offered on several text-to-speech websites, including play.ht. One site also offers a third Kiwi speaker, “Aria”.

The play.ht Mitchell demo sounds enough like Ricketts that when the Sunday Star-Times sent the sample to BigMouth Voices​ managing director Sarah McLeod​ she immediately recognised him in it.

So if Mitchell is Ricketts, or a synthesis of him and others, where did they get his voice, is it legal, and what does that mean for the future of his craft?

Pretty legal?

Ricketts talks a lot on the internet. His marketing profile includes demo reels for potential clients. He’s on YouTube videos and podcasts. It wouldn’t be hard to find clips of his voice to train an AI voice generator.

Asked how and where it got its Mitchell sample, play.ht did not respond.

Ricketts reckons an “Australian” voice on text-to-voice site Revoicer.com – "sociable and adventure sounding" Noah – also contains elements of him.

Revoicer last month removed one of its sample voices after Irish voice artist Remie Michelle Clarke​ claimed it was a dead ringer for her smooth sound.

Revoicer says it sources its voices from Microsoft’s Azure “speech studio”. “Noah” is derived from their “William” voice. Azure also features New Zealand accent voices named “Mitchell” and “Molly”.

A Microsoft spokesman says its Mitchell voice is not derived from Ricketts’ work, and the company only uses samples with permission.

“As part of our commitment to using AI responsibly, we inform any voice talent that we work with about the use of their audio recordings to create synthetic voices, and we obtain their legal consent for this use.

“We do not have a relationship with Mr Ricketts, we did not use any voice audio from him, and any similarity is coincidental.”

Already, you can see how hard it would be to track – and remove – the source of a disputed voice clip.

But is it even illegal to copy the way someone speaks?

AJ Park intellectual property lawyer, Paul Johns​, says while copying someone’s voice seems wrong, there’s no straight-forward legal solution.

“It feels like you’re taking something that belongs to someone and using it without compensating them...That may seem morally right – but I don’t think there’s much in the way of intellectual property law that really helps.”

The problem is that copyright law protects the recording and its content, rather than the tool used to make it – the music created, rather than the instrument and way of playing it.

So the only real legal avenue is to trace the snippets used to train the AI voice, and then sue for unauthorised use of those copyright-protected recordings, Johns says. But some old contracts allowed diverse uses and on-selling.

If an AI copy of a celebrity voice was used to promote something – say Morgan Freeman’s voice endorsing a retirement village – that might breach the Fair Trading Act, which outlaws misleading and deceptive conduct.

But that wouldn’t work for an anonymous voice artist, and there’s unlikely to be much motivation for politicians to extend the protection to include them, Johns says.

Images via Getty/Prince Williams/Wireimage & Getty/Theo Wargo/Live Nation YouTube pulled an AI-generated track emulating Drake and The Weeknd after allegations of breach of copyright.

There have been successful legal challenges to sound-alike AI. In 2021, Canadian voice actor Bev Standing​ sued TikTok, after discovering she was the voice of the company’s text-to-speech function, which creates video voiceovers. TikTok settled the case.

Standing believed the company got her voice samples from a job she did reading thousands of English sentences for a Chinese government research organisation, which was supposed to be used for translations.

And just last month, an AI-generated fake collaboration track between Drake and The Weeknd was removed from Spotify and YouTube, after Universal Music Group argued it breached copyright.

An easier option than suing retrospectively is making sure contracts rule out the use of any recordings to train AI. That’s something voice agent Sarah McLeod is focusing on.

“AI is out there, it’s not going anywhere. There are a lot of scared voiceover artists out there at the moment. But it’s because there’s no regulation in place, yet.”

But once a voice clip is out there, can you really stop people stealing it, Ricketts wonders?

“Once a video is on YouTube, it's in the public domain. How are they going to trace it back? Especially if they're mixing it with other voices, like they already have been, on the internet.

“So it's a very grey area of the law. Because, how do you start to pick apart a voice and say, what elements are yours and what aren't?”

McLeod reckons someone will develop a DNA-matching tool for synthesised AI voice clips, that can identify the component voices.

Supplied Ricketts was named the Male Voice Artist of the Year in Europe's 2019 One Voice Awards.

So is this the end of voice actors?

“This guy can’t read at all,” Ricketts laughs, listening to play.ht’s “Mitchell” speak.

For now, he’s not too threatened by the AI voices that sound a bit like him. Mostly, because they’re still a bit rubbish.

It’s also harder to edit an AI voiceover to change the emotion or intonation than it is to direct a voice actor live in a recording suite.

Also, Ricketts does four distinct accents and a host of different voices.

“When people hire a voiceover like me, they’re paying for nuance and interpretation. They’re not paying for me to read a script.

“If you put the script into the AI voiceover twice, you’ll get exactly the same result. And from my experience of being in voiceover sessions, that’s not what people want. People want options, and they want it to sound human.”

Supplied Christchurch voice artist Vanessa Wells expects her “everybody’s voice” to be replaced by a synthesised AI voice.

Vanessa Wells is everywhere. At the Countdown self-checkout, at Auckland Airport, on Wellington's rail network.

She’s the “everybody’s voice” she believes is most likely to be supplanted by AI. While she would be “gutted” to stumble across a site using her voice without her permission, she expects composite AI voices to become ubiquitous.

“When you synthesise it into a generic, non-person’s voice, that’s when it’s going to be really useful...The kind of voice work I have done over the past 10 years will reduce dramatically with the advent of decent synthesised technology.”

But the Christchurch voice actor isn’t crying about it. She’s branched out into filmmaking so doesn’t rely on voice work for a living. And there are some jobs no-one will be sad to lose, like one of her first voice jobs some 14 years ago, reading out numbers for an automated phone system, so it could read back credit card numbers in natural-sounding strings of three.

“If I never have to count from one to 1000 again in uptone, downtone and mid-tone, I would be delighted.”

Wells reckons there will always be a place for voice actors to inject personality and performance into scripts. She can’t see Pak’nSave subbing in bland AI to replace comedian Paul Ego in its stickman ads, for example. (Pak’nSave says that’s “not something we’re looking at”.)

“I do think that it is the future, and I don’t think we need to be scared of it...It might actually just take away the worst parts of our job, and we just need to get on with finding our niche.”

McLeod also thinks there will always be a place for real, pro voice actors, who have already survived race-to-the-bottom freelance websites such as Fiverr​, with voiceovers offered for as little as $5.

“We still have actors acting in films, they’re not all CGI and animations, and I think it will be exactly the same for voiceovers.”

Fiona Goodall/Getty Images Canadian singer Grimes invited musicians to clone her voice to create new songs and says she will split any royalties 50/50.

What if you could clone your voice, and earn without working?

When she was offered an AI job 18 months ago, Wells thought deeply about whether to do it.

It was for a text-to-read company that would turn her samples into an AI voice that could read anything. She made sure the contract was watertight – that it limited both what kind of material could be read (eg no porn) and how her voice samples could be used.

It was well-paid, and for a good cause – helping people with disabilities. And it couldn't be done any other way. So she hit record.

“That’s where the technology is amazing, and it’s making access for people who don’t normally have access to text. Incredible. So there’s massive advantages to it.”

The next step is to clone your own voice, and get it working for you. Canadian singer Grimes invited musicians to clone her voice to create new songs, for which she will claim half the royalties.

Having created a “pretty impressive” AI version of his voice in less than five minutes, Ricketts is both excited and scared by the potential.

He can imagine a future Spotify for voices, where you license your AI clone for a monthly fee. Instead of reading scripts himself, he would instead manage and market his cloned voice.

McLeod, too, can imagine an agent’s platform offering the top 20 Kiwi and Aussie male and female AI voices, on a pay-per-use model.

“We’ve got to make sure that voiceover artists keep getting paid.”

Australian voice talent manager and consultant, Andrew Curnock​, says AI voiceovers are currently “a little bit of a wild west”, with companies trying to smuggle in “super dodgy and unconscionable” contract clauses to turn actors’ voiceovers into synthetic AI voices, that can then be used in perpetuity.

“In terms of voice harvesting, that’s a legitimate concern. Nobody wants to hear their own voice out in the wild, especially if they’re not getting paid for it.”

But Curnock is working with independent startup companies to find ways to both produce AI voices, and give voice actors a fair deal. That might mean clients licensing a synthesised voice for 12 months.

He thinks AI voices will still struggle to compete with the range of emotions that live voice actors can switch between. And he wouldn’t recommend a voice actor famous for their narration voice giving that voice to an AI model. But it could be a chance to experiment with different characters or a whole new you.

“Potentially, there’s royalties and whole new passive income streams...The ones that stand back and wish the technology didn’t exist aren’t really helping, because it’s either going to be the indies negotiating a fair framework, or it’s going to be the big tech giants that set the terms and sort of bulldoze their way through.”