Home / SmartTech / The promise of voice AI in game development

The promise of voice AI in game development



AI voices now sound more realistic than ever in the history of synthetic speech technology.

What started as simple text-to-speech (TTS) combined with hundreds of hours of recorded dialogue has evolved into natural-sounding AI voices synthesized from just a few hours of audio.

Here and here you can see realistic voices as an audio example. These are examples from Replica Studios in which the main character is confronted with a monster in a cave.

Why is that important? The latest advances in voice AI offer a host of new opportunities for creatives, game developers, game media, and more.

Leviathan Games, developer of titles with popular IP addresses such as Spider-Man and The Lord of the Rings, has started using voice AI in its development cycle. “Creatives will always look for new limits in order to cross the creative limits. See how 3D animation software has changed over the past decade, ”said Wyeth Ridgway, owner and technical director of Leviathan Games. “Pixar animators have changed the direction of the industry by creating their own disruptive modeling, rendering, animation and lightning software. And now we see parallels with advances in voice AI technology that have the potential to completely transform game development. “

Voice AI is a world that is different from chained text-to-speech

Traditional or concatenated TTS works by joining or concatenating various recorded sounds to form words and sentences. Voice actors have to record hundreds of hours of dialogue and a lot of manual labor to carefully label these sounds.

Because of this, it is extremely difficult to add support for new voices with concatenated TTS.

According to Susan Bennett, the original Siri voice, she recorded hundreds of phrases and sentences to get all the sound combinations in English, and it took four months a day, five days a week, five months to get the first recording and Updates completed.

Voice AI is completely different.

At the end of 201

6, DeepMind demonstrated WaveNet, the first deep neural network that could convincingly model the human voice with far fewer audio recordings. The identification of the training data required very basic work.

Since then, we’ve seen newer deep learning techniques using LSTMs and GANs. When trained in just a few hours of audio recording, the AI ​​learns to say words and make sounds that weren’t even part of the original training set. At the same time, it offers extensive customization in terms of emotional and expressive skills.

Here and here you can listen to some examples of expressive tone changes. These come from Agartha by Replica Studios, a Lord of the Rings inspired game in which the enemy attacks a fortress (think of the battle of Helm’s Deep) The two Towers).

Advances in research coupled with the spread of cloud computing make the technology more accessible than ever. Therefore, it may be the right time for game developers to explore voice AI and capitalize on its significant time and cost efficiencies – and for its future promise of better, more personal, and more engaging storytelling.

Dialog prototyping meets scalability

Game development offers numerous ways to use voice AI.

Think of triple-A games like Red Dead Redemption 2 or The Witcher with hundreds of thousands of recorded lines of dialogue. It’s a daunting undertaking and costly considering the number of hours it takes to book studio time with voice actors, record dialogue, edit the script, revise, and re-record if necessary during development.

Game design is an iterative process. Designers test and collect user feedback on many different areas using a game prototype before launch, such as: B. First time user experience (FTUE), certain game mechanics, animations, interactions with player characters and much more.

However, prototyping lacks a developer-friendly tool for creating in-game voices. Given the production costs of iteration, refinement, and perfect dialogue, including returning voice actors for multiple takes, significant resources are required, which is why game studios often forego it.

However, this is changing with the feasibility and improved accessibility of voice AI for rapid prototyping among game designers.

For smaller studios, Voice AI solutions can bring significant savings while raising the bar for production quality (as we’ve seen with animation software).

With larger studios, the advantage lies in the time, cost and production efficiency. Imagine how much voice AI would have positively impacted Red Dead Redemption 2’s development schedule and release date if it had been used on the 500+ hour dialogue record prototype.

Reaching a turning point to immerse yourself in voice AI

While there are hundreds of thousands of indie games today that have little or no voice dialogue, that could change in the next few years. At the same time, larger game studios could soon explore deeper storytelling with even more NPCs interacting with players.

Listen to these audio samples from Replica Studios’ Defense Protocols here and here. This scene is about a ship’s captain and an AI (inspired by Portal’s cheeky GlaDOS) who deal with an enemy attack.

When voice actors take advantage of AI language technology, game developers have access to an extensive library of AI voices to choose from for their game, while voice actors can create new revenue streams for themselves through an optimized language market. The quality bar will also rise.

Voice actors are starting to embrace change in their industry. Simon J Smith, writer / director and speaker, said, “Many wouldn’t expect it, but I am optimistic about the future of voice AI and how it can help improve the licensing of my voice and intellectual property expand. I see voice AI is on the same evolutionary path as animation, bringing more demand and accessibility to licensing my work for game dialogues designed by studios of all sizes. “

As the AI ​​algorithms that learn human speech patterns (the steady advancement of NLP) and speech synthesis applications using the industry standard Speech Synthesis Markup Language (SSML) continue to improve, we are entering a phase where developers are beginning the The necessary tools are available to create a high quality text-to-speech in-game voice that is really true to scale.

Here you can find more audio samples from Replica Studios.

It is still early, but we are not far from that vision. And as technology gains momentum, game developers and content creators, voice actors, and other talent will team up to create this ecosystem.

Dynamic personalization of the players in the game

But what about use cases beyond rapid prototyping? The effects of voice AI go well beyond efficiency and scalability.

Voice AI technologies will open new avenues to meet the desire for more personalization in games.

Players spend hours perfecting their avatar creation in games like Fallout 4 or Fortnite, and soon Cyberpunk 2077 as well. From the physical appearance of your character to clothing and accessories to the gait, customization is a part of the gaming experience.

The possibilities of voice AI are endless to enable truly personalized and dynamic narration in the game with character voices.

Marco DeMiroz is the co-founder and general partner of the Venture Reality Fund, which sees how Voice AI can enhance VR gaming experiences with custom dialogues and gameplay. DeMiroz: “Imagine the ability to dynamically add audio and storylines to games. A player can create their personal avatar the way they currently do, and now have a variety of fun, quirky, and more options to choose from for their avatar’s voice. And their avatar can interact with NPCs and other characters with their own unique voices, also created by the player. In addition, the voice AI can deliver ultra-realistic and customized voices that can dynamically change gameplay per player based on their own skills and progress. Voices could automatically adapt to new vectors in gameplay to provide players with a high quality, personalized experience. “

While text-to-speech enables simple in-game dialogue in real time, the future promise of speech AI technology transforms text into performances. This is where the game designer can create the story and script and each player can play a part in the narrative with their own voice or even choose a licensed celebrity voice to play certain key characters. Imagine a storefront where players can select Samuel L. Jackson to pronounce their avatar (much like he licensed his voice for the Alexa assistant).

Here is an example of a character from Replica Studios’ Moon Defense. It is a science fiction game that you play as a member of an alien race.

Or, envision a future where game developers can dynamically incorporate dialogue with esports commentators into games, e.g. B. World Cup updates to FIFA and Sunday Night Football updates to the Madden NFL.

For both engagement and retention, game developers could explore new dynamic language elements that push creative boundaries and unlock new types of game mechanics.

Where voice AI is going in 2021

Starfinder is a science fiction language game starring Laura Bailey and Nathan.

Above: Starfinder is a science fiction language game.

Photo credit: Amazon

As synthetic language and creative tools that enable language customization and scalability, we will see a profound shift in our commitment to AI-enabled digital language technologies. It will shift from a primarily transactional one – “Alexa, tell me the weather” – to one based on dynamic interactions and relationships between characters in any digital narrative or experience.

Advances in the underlying technology of speech AI are continuous at this point – driven by the market’s desire for more powerful creative tools and natural-sounding synthetic voices. Speech AI technologies are currently supported by better data analysis, newer approaches to model prosody, and other voice attributes, all of which contribute to how we perceive and evaluate synthetic speech quality. It’s a significant step change that no one could have predicted.

At the same time, we expect increased investment in digital rights management and security features for digital voice IP by 2021, which will result in more voice actors and other celebrities migrating to a digital market. We anticipate that the tools, speech synthesis technology, and marketplace will continue to motivate content developers and game designers to embrace speech AI technology over the next year. It’s certainly a market to watch, let alone embrace, when considering your roadmap for game development for the near future.

Shreyas Nivas is the co-founder and CEO of Replica Studios.


Source link