One of the hottest topics in the world now is the Digital Assistants Technology. New devices try to catch up Amazon’s Echo and its Alexa vast AI ecosystem. Google is about to release its own home-based assistant, while Microsoft with Cortana and other software giants will soon follow up closely with more devices and possibly avatars.
Most of us consider HAL-9000 to be the first and possibly the best Digital Assistant we have encountered (putting aside its subordination against Dave). Who can’t remember HAL’s final attempt to make David sympathize him before disconnecting his memory modules?
Who can’t remember HAL’s calm and soothing voice, even though he was disobeying his master with his epic response: “I ‘m sorry Dave, I am afraid I can’t do that!
That voice of Douglas Rain was considered to be the perfect voice for a futuristic Digital Assistant, and it truly was.
In today’s concepts of Digital Assistants such as robots or avatars we often meet robotic voices when a machine talks. In these movies, even though the AI is super advanced, the speech most of the times is more robotic than today’s regular Text-to-Speech voices. The creators want to remind the viewer that this is a robot, a machine, and even though it can understand anything you say well, even though it can think, process and have a mind of its own, it will speak back to you like a robot! Cortana in Halo, AUTO in Wall-e, almost all automatic prompts in self-destructing space-ships need to sound robotic in order for people to understand that behind this intelligence there will always be electrons that power it.
When AI assistants have to show a more humane side, consciousness or even feelings, then these robots must sound more human-like. Scarlet Johansson in Her, Paul Bettany in Iron-man or Kevin Spacey in the Moon voiced the most advanced and human-like AI assistants, and made the viewers feel sympathy for them.
What kind of voice do we want for a Digital Assistant? A robot-like one or a human-like? But there is another question that needs an answer first: can we actually create a voice that sounds like a human? A synthetic voice that sounds like HAL, like GERTY, like Samantha or like JARVIS? And if not, how do we define if a TTS voice is appropriate for an AI device or not.
A good approach would be what a friend and guru in the AI field once told me: “if I can listen to a Wikipedia article from a TTS voice and it does not make me want to jump off the window, then it is OK for a Digital Assistant’s voice!”
What the above statement describes is that you have to make sure that the Text-to-Speech part will not backfire and ruin your product which is AI. You have to make sure that TTS will not ruin the user experience and distract from the actual service, which is Digital Assistance!
Are there voices like this? Yes, there are voices that keep you away from the window most of the times. The TTS voices we develop at innoetics are appropriate for Digital Assistants as they sound flawless in almost any text they read-out-loud. At innoetics we have developed a unique technique for developing custom voices that sound perfect, especially for domain-specific applications.
But we aim further that this! We are currently focusing on the most advanced synthetic voices ever designed, mainly for Digital Assistants and dialogue systems. Voices that will sound more human-like than any other synthetic voice before! Synthetic voices that will sound flawless with any text, and will be able to be driven by advanced linguistic and supra-segmental data. Synthetic voices that will give a unique character to the target AI agent or device, and make it become a part of everyday’s life.
We are getting there!