Voice Controlled Games: The Rise of Speech Technology in Gaming

Last Updated August 12, 2021

Button mashing isn’t the only way to collect coins, smash bricks, and jump on top of enemies. Video games can now be won or lost using a secret weapon: our voices.

Creating a video game is extraordinarily difficult. First, you decide between designing a first-person shooter (FPS) or a role-playing game (RPG). Then, it takes years to properly flesh out the plot, the gameplay, character development, worlds, and game mechanics.

Hundreds of moving parts must come together seamlessly.

In open-world games, there are simply so many scenarios the game developers must run through, test, and create content for. Not only that, but the game must be able to change and adapt based on each player’s actions.

Additionally, there’s audio to add. That involves recording voiceovers with actors, not to mention translating it all for foreign markets. That’s why the game localization sector is growing so fast.

Now, imagine adding another level to gaming through speech recognition technology.

What is a voice-controlled video game?

A voice-controlled game allows you to command your character and interact with others through speech.

Innovative gaming companies champion this idea with the intention of making gaming more accessible for visually impaired and disabled people. It also allows players to immerse themselves further into gameplay through yet another layer of integration.

Voice control also lowers the learning curve for beginners, seeing as less importance is placed on figuring out controls.

The Challenge of Developing Voice-Controlled Games

The challenge for game developers is accounting for the hundreds of hours of speech data needed to build the voice-recognition engines for these games to run smoothly.

Developers must consider accents, dialects, and whole languages on top of baseline video game localization for players in different cultures. There’s also the gathering of all the different potential phrases or voice commands a user would say within the context of the game to account for.

Recording and implementing only a handful of expected phrases without accounting for natural language utterances means the player might never say the “correct” phrase to trigger a response.

As you can see, a lot goes into making speech recognition work smoothly. There’s nothing more frustrating as a gamer than having to repeat yourself, only to be met with your character’s impassive-yet-expectant stare.

So, let’s take a walk through the rise of speech technology in gaming. We’ll start with where we’ve been, move to where we are, then what we’re expecting to see more of.

Early Examples of Voice-Controlled Games

As with any new technology, pioneers will be the ones dealing with the uncharted territory.

Here’s a couple examples of those early attempts to incorporate speech recognition into video games.


In the late 1990s, Leonard Nimoy (whom you may know most famously for his role playing Spock in various incarnations of Star Trek) provided voice-over narration for what Matt Vella of Time Magazine named, “one of the strangest, most wonderful experiments in video game history: Seaman.”

Released for Sega’s Dreamcast console, the Japanese game put a virtual pet in the care of players who were charged with feeding, nurturing, and guiding its evolution from sea to land.

First released in 1999, the game’s speech recognition system was a little lackluster. Players spoke to the creature via a microphone accessory plugged into their controller, resulting in tedious gameplay and controls.

However, it was still lauded for its creativity, receiving an Excellence Award for Interactive Art at the Japan Media Arts Festival.

Perhaps it was the novelty of Nimoy’s voice that elevated the success of the game. Or maybe it was the wonderment of having a fish with a man’s face respond to your questions with their own witty responses.

Either way, players were hooked by not only being able to control their characters, but also communicate with them.

Bot Colony

Canadian-based North Side Inc. was another early pioneer who saw the potential for speech recognition early on. They took on the challenge of building a full-fledged 3D-graphics video game centered around speech recognition technology.

The game was named Bot Colony. It was a sci-fi story in which players find themselves in a technology-driven future. Robots have become ubiquitous helpers in human society.

At the core of North Side’s technology is natural language understanding (NLU). It recognizes natural speech patterns rather than being limited to overly rigid or limited static command phrases. The players can therefore command the robots remotely using speech.

At least, that’s the idea behind it. Playing the game is an entirely different experience.

“The reality of Bot Colony is that it’s as frustrating as it is inspiring” says Josh Tolentino of Game Critic. “Even after multiple minutes-long ‘training sessions’ designed to get the computer to accurately recognize the player’s voice… questions and commands will need to be stated and restated multiple times to get [the bots] to respond properly.”

Ultimately it was concluded that gameplay was much more efficient when the microphone was muted. Instead, a keyboard was used to type in questions and commands manually.

Though Bot Colony wasn’t quite the success story North Side was hoping for, Tolentino added the “the prospect of being able to simply ‘talk’ to a machine in the way one talks normally, and have it respond in a (somewhat) intelligent manner.”

It’s this attraction – this magical moment – that keep developers chugging along in search of a better way to bring speech technology to video games.

Speech Recognition Meets New Video Game Technology

Speaking directly into a microphone isn’t the only new technology to enter the spotlight in the gaming industry. Smart controllers and biofeedback technology, for example, took center stage as video games became even more complex.

Mass Effect 3

Back in 2012, Microsoft struggled to convince the gaming world that motion sensor controllers would be the next big thing. Their belief was they would elevate and improve a person’s gaming experience.

However, with the release of Mass Effect 3, all that was about to change.

Aimed at making the gameplay as easy and intuitive as possible, Mass Effect 3’s Kinect controller was completely voice-controlled. Players shout commands to their squad mates, and they follow your orders. Simply tell your character to activate a skill, and they will.

Players can also use the Kinect microphone in dialogue sequences.

Traditionally in RPG games, you use your controller buttons to pick phrases from the game’s discussion wheel to direct the flow of the protagonist’s responses. With the benefit of speech recognition technology, players do this with their voice instead.

In games like Mass Effect 3, speech recognition technology radically simplifies the game’s controls while simultaneously allowing more complex gameplay action.

It’s more immersive – not only because yelling at your screen results in in-game action, but also because you never have to break the cinematic experience to pause your gameplay and jump into an abstract menu screen.


At its core, Flying Mollusk’s Nevermind is an adventure-horror game. You dive into the minds of psychological trauma patients for whom traditional treatment methods have proven ineffective.

In-game, you are a “Neuroprober” – a physician who, using cutting-edge technology, can enter the surreal subconscious of these victims and solve the puzzles of their minds and unlock the root of their trauma.

The Windows and Mac versions of Nevermind use biofeedback technology to detect the player’s feelings of stress and excitement while playing. When you start to become scared or anxious, the game will dynamically respond to those feelings, which in turn affects gameplay.

The biofeedback functionality is not essential to play Nevermind, but if you have an Apple Watch or a Garmin, you can plug in and start experiencing the tech for yourself.

Nevermind was developed to help players “become more mindful of their feelings of stress and anxiety levels and help them practice managing those feelings.”

How does this relate to voice? Nevermind’s biofeedback technology and voice-activated controls bring it to a whole new level of immersion because it considers not only what you say but how you say it. Stress levels in your voice affect gameplay.

Flying Mollusk even tackled the challenge of bringing Nevermind to an international audience. In fact, we helped conduct video game localization and localization testing for Nevermind into ten languages, including German, Spanish, Japanese, Korean, Russian, plus Simplified and Traditional Mandarin Chinese.

Voice-Activation Meets the App Store

Amazon is behind one of today’s most well-known smart speech recognition systems, and the company also invests in voice-activated game development through various accelerator programs.

Techstars x Amazon’s Alexa Accelerator accepted its first batch of incubator companies mid-2017, but what has really propelled Amazon’s Alexa forward as a bona fide platform is Amazon’s Alexa Skills Kit (ASK).

Alexa Skills allows third-party developers to create apps and tap into the power of Alexa without ever needing native support. In other words, video game designers can utilize the thousands of data strings Amazon has already collected and integrated into Alexa to their advantage.

Android Auto gives you the option to activate games by saying “Hey Google, play a game” while you’re in the car. On long road trips, you can

Android Auto gives you the option to activate games by saying “Hey Google, play a game” while you’re in the car. On long road trips, you can play a voice-controlled game of Jeopardy, for example.

Wayne Investigation

The Wayne Investigation is a skill developed by Warner Bros. to help promote Batman v Superman: Dawn of Justice.

Combining speech recognition technology and produced audio assets (namely, compelling music and sound effects), The Wayne Investigation seems almost like an old-timey radio show with a modern technological twist.

Just say “Alexa, open The Wayne Investigation” to begin.

The game models a “decision tree” format, explains Alexa Blogs writer Emily Roberts. “From three starting actions, users can make up to 37 decisions, each taking the user down paths that lead the player to new and iconic Gotham characters and locations before completing the game.”

During the first week, the game engaged seven times more (per weekly average) than all other skills combined. It earned top spot for both total time spent engaging with the skill and average time spent per user.

Other similar games that have been created include, Runescape Quests Skill: One Piercing Note, The Baker Street Experience, The Magic Door, and many more.

Chicken Scream

While some mobile games are based on complicated technology, some are simpler. We previously tested one of these simplified voice-activated apps for ourselves: Chicken Scream.

The only way to move your character along its journey across dangerous terrain is by using your voice. Stay silent to keep the chicken still, speak to move the chicken forward, and squawk loudly to make it jump.

The unique controls offer a nice change of pace from other games you find in the app store. The game doesn’t need to understand and interpret certain words or phrases. The removal of literal speech recognition opens it to be used by anyone, anywhere around the world.

There aren’t any language limitations or barriers for localization here. In fact, we even tested the app using chicken sounds from speakers of different languages around the world. This app, and others like it, are less intimidating examples of voice-activated technology.

Scream Go Hero is a similar game. You use your voice to move and jump between platforms, and the more you shout, the higher you jump.

A New Era in Voice-Controlled Games

Some people still favor the classic joystick/button and keyboard/mouse combinations to communicate with and control our characters on-screen. Games that make use speech recognition, though praised for their innovation, have yet to truly succeed in their mission. They haven’t ever quite moved past the novelty aspect of it all.

One of the major challenges is the vast amount of speech data needed to make it work. Collecting speech data at scale is a complex task, and video games need to work in concert with third-party providers who have spent years developing efficient workflows and technology.

And, as these partnerships do indeed start to form, the future of speech recognition technology in video games can become even more accurate, localized, and totally ubiquitous.

We Can Help with Your Next Voice-Controlled Video Game

At Summa Linguae Technologies, we’ve worked for years to develop our speech data collection and annotation processes.

Our data solutions team is recognized by our clients to be extremely versatile with our outside-of-the-box thinking. We offer custom speech data collection at scale as well as video game localization and testing to get your next project off the ground.

To learn how we can help your company, book a consultation now.

Related Posts

Summa Linguae uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy.

Learn More