Celebrity chef, Anthony Bourdain, famous for his TV shows No Reservations and The Layover, died by suicide in 2018.
I remember being shocked to hear that the smiling, witty man who had taken me on several culinary journeys through his television appearances had ended his life so tragically.
Morgan Neville, an Oscar-winning documentary filmmaker, released Roadrunner: A Film About Anthony Bourdain in July 2021. This documentary features three quotes spoken by an AI voice model of Anthony Bourdain, a fact that was not revealed to viewers.
As a result, social media has erupted in a furor over the ethics of cloning a dead man’s voice without his consent and without disclosing the artifice to viewers.
Ethics of voice cloning
Neville commissioned Descript, a voice cloning company, to create an AI model of Anthony Bourdain’s voice.
Neville handed over hours of recordings of Bourdain’s voice pulled from TV shows, radio shows, podcasts, and audiobooks as “training data” for the voice cloning software. The result was an indistinguishable piece of artificial audio that went unnoticed when Roadrunner was released.
Only when Neville talked about it in interviews did people understand that a fake voice had been used. Obviously, there is concern over the ethics of it all.
- What if a dead man’s voice is cloned to speak inflammatory and/or vulgar sentences?
- Do people retain rights over the use of their voice and words after they’re dead?
When Neville was questioned about the ethics of using Bourdain’s voice without his consent, he made an off-hand comment about constituting an ethics committee later. This shows that Neville isn’t too concerned about the ethical implications of voice cloning.
It looks to me like Neville views it as a technological advancement that has allowed him to reconstruct a dead man’s life through his own words.
To be fair, the cloned bits together span only 45 seconds and they’re words that Bourdain has written in an email. So Neville isn’t putting words in Bourdain’s mouth or attributing false statements to him.
However, it seems strange to me that Neville did not disclose the use of AI in the documentary. Did he not realize it would make him seem dishonest?
The relentless march of AI technology has roused fears of job loss. It costs less to employ an AI to perform a task than a human. It also takes less time. The benefits of using AI, especially for repetitive tasks, is clear to companies.
Deepfakes, face rendering, and voice cloning show us a glimpse of the future that belongs to AI.
Here’s a post by Descript that shares its thoughts about the Anthony Bourdain controversy.
How voice cloning works
Synthetic speech is not a new concept.
If you’ve used voice assistants like Siri or Alexa, or encountered an IVR (interactive voice response) system when calling customer care, you have experienced synthetic speech. However, these artificial voices sound distinctly robotic and it’s easy to understand that a machine is speaking and not a human.
Voice cloning uses a technology called text-to-speech (TTS) that converts text into a synthetic audio. This enables humans and computers to interact through voice.
Two approaches to TTS exist:
a) Concatenative approach – wherein a collection of audio recordings is used to create a pool of words and sounds, from which sentences can be generated.
b) Parametric approach – wherein statistical models of speech are used to simplify the process of generating synthetic speech
The parametric TTS approach costs less and requires less effort than the concatenative approach. But neither approach results in a natural human voice.
AI and deep learning have advanced voice cloning technology such that a close imitation of a human voice can be generated.
AI-based voice cloning software uses neural networks to generate more human-like speech.
Advanced models require just a few seconds of speech samples to create a natural-sounding human voice. It is also possible to change the gender and accent of the speech!
These AI-based tools are better at capturing the emotion, inflection, pronunciation, and intonation of human speech.
Applications of voice cloning
Voice cloning technology was created to help, not deceive.
a) Assistive technology
Voice cloning can help differently abled people to communicate, especially those who have lost their voice.
Voiceover artists and actors can use AI-enabled voice cloning software to dub dialogues in different languages faster and cost-effectively.
AI voice cloning software can recreate the voices of famous people and authors and use it to narrate their books or letters.
Historical figures could narrate their life stories, much like Bourdain’s AI voice does in Roadrunner. Keeping aside the ethical issues, it makes the experience so much more engrossing for the viewer/listener.
Check out John F. Kennedy’s speech (in his AI voice) that he would have delivered in Dallas in 1963 had he not been assassinated.
e) Educational aids
The teaching possibilities of cloning the voices of historical figures and using them to narrate important world events and speeches are endless.
Wouldn’t you like to hear Anne Frank narrate her experiences in the Secret Annexe in her own voice?
The Dark Side of Voice Cloning
Unfortunately, humans have found ways to use voice cloning technology to deceive people and spread misinformation.
a) Voice Phishing/ Vishing
An evolution of email phishing attacks, voice cloning enables the use of fake voices to con people into thinking they’re speaking to someone they trust. Phone calls and voicemail are the new weapons.
Watch this video to learn more:
b) Voice spoofing
AI-created synthetic speech could be used to “make” people say things they have never said in real life. Such voice scams can have disastrous effects for the person, eroding his/her credibility and reputation.
Voice spoofing can be used to impersonate government officials, bank executives, and even trusted family members.
It poses serious security issues for biometric systems that have so far considered voice to be a reliable measure of identity. Biometric systems can be fooled into thinking that an authentic user is speaking, thus granting access to sensitive information.
Do you remember Queen Elizabeth doing a TikTok dance and delivering an alternative Christmas message in a video created by BBC to illustrate the dangers of deepfake videos?
That’s the danger of misusing AI voices. Criminals can make people say whatever they want to foment unrest and violence and sway public opinion. Imagine the consequences of fake politically motivated speech or hate speech.
Where there’s poison, there’s got to be an antidote.
Thankfully, anti-spoofing technology exists — called “voice liveness detection.” It’s also an AI-enabled software that can distinguish between a human voice and a fake voice.
However, the core question of whether the use of AI voice cloning is ethical remains to be satisfactorily answered.
What do you think?