The ability to make anyone say what you want is quickly becoming a reality with fresh startup Lyrebird propelling voice synthesis further than ever before.
Lyrebird is a massively impressive AI that beats out all of it’s competition with crazy-fast voice scanning and surprisingly accurate results. With just 60 seconds worth of speech samples Lyrebird can accurately clone any voice to a mind-bogglingly accurate copy, reproducing accent, inflection and tone for an almost-lifelike replication.
The AI’s algorithm can cleverly process and manipulate the sounds of speech to translate it into a framework for how that voice sounds in techy terms. This is a massive leap forward from many speech synthesis projects which have seen the vocals programmed to create it’s sounds, rather than being able to understand and create it’s own (to an extent).
Other companies have made major developments in the area, such as Google who have been leading life-like AI voice development with their sophisticated project WaveNet. Additionally Adobe have been working on a similar project called ‘Project VoCo’ which allows editing and tweaking of speech similar to how their other programs work on other formats.
However, this Canadian startup blows even major company Adobe out of the water, with Project VoCo requiring 20 minutes of voice samples to recreate a voice, as opposed to Lyrebird’s 1 minute process. The recreations, whilst still tinged with robotism, are surprisingly accurate given the process – check out Barack Obama, Donald Trump and Hillary Clinton discussing how impressed they are with Lyrebird:
Lyrebird’s technology is so advanced in fact that it can implement emotion into it’s speech, changing whether the voice is angry, sympathy, stress and more. Of course, with accurate voice replication comes a lot of potential issues and privacy conundrums, which the company acknowledge in an ‘Ethics’ section on their website, reading:
Lyrebird is the first company to offer a technology to reproduce the voice of someone as accurately and with as little recorded audio. Such a technology raises important societal issues that we address in the next paragraphs.
Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries. Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else.
By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.
For now however, the AI isn’t quite realistic enough to trick you into thinking that Trump has declared war – you can just wait for him to do that in real life. The project is currently still in development and information on a release seems vague, but the technology is fascinating and gives us an exciting (and potentially terrifying) glance into our future.