In-Car Speech Recognition: The Past, Present, and Future

Lift the hood of the modern car and, instead of the once oily assortment of mechanical, moving parts, you’ll see something that looks more like a large, black computer.

This description is appropriate, as the clicks and clunks of traditional engineering are being replaced by the silent ones and zeroes of digital technology.

The evolution of the automobile has entered a new era. Your daily drive is transforming into a smart device. And, as with other smart devices, speech recognition technology is becoming an everyday part of the in-car experience.

What made the automotive industry take to in-car speech recognition capabilities in such a big way? This is an area where sheer need drove the innovation, rather than just the ability to market a new concept or device to tech-hungry consumers.

Let’s look at the past, present, and future of in-car speech recognition technology.

Why In-Car Speech Recognition?

Many advancements in speech recognition have been driven by the need to keep the public safe while still acknowledging a device-dependent culture. That’s especially true when it comes to vehicles.

Whether it’s a text message or using Google Maps, the impulse to take our eyes off the road has become second nature. In-car speech recognition systems have become an almost standard feature in all many new vehicles on the market today.

But even though safe driving behaviors (and in many places, the law) requires us to ignore the constant phone calls, emails, and text messages while behind the wheel, that kind of disconnectedness isn’t quite the reality.

In-car speech recognition systems aim to remove the distraction of looking down at your mobile phone while you drive. Instead, a heads-up display allows drivers to keep their eyes on the road and their mind on safety.

A recent report published by the UK’s Transport Research Laboratory (TRL) shows that driver distraction levels are much lower when using voice activated systems compared to touch screens.

However, the study recommends that further research is necessary to drive home the use of spoken instructions as the safest method for future in-car control.

Still, it’s a move in the right direction.

How Can In-Car Speech Recognition Help?

The specifics of what can be controlled by speech depends on the car.

The typical selection of voice-controlled features is grouped into three categories: basic, intermediate, and advanced.

The basic voice-activated provision is centered around the car’s media and entertainment system. Drivers can use their voice to switch stations, adjust volume, skip tracks, and so on.
Intermediate systems allow the driver to, on top of the basic functionalities, make and receive telephone calls, program the GPS, and adjust the air-conditioning.
More advanced technology incorporates an internet connection, which facilitates spoken web browsing and the use of apps.
The ultimate form will be achieved with autonomous cars.

At the advanced level, for example, a driver may ask for directions, book a restaurant, and locate a parking space without taking their eyes off the road or their hands off the wheel.

If inspiration strikes, they can call up a notes app and dictate their thoughts. Then, they can request a weather update and have their text messages read out to them. Eventually, they’ll be able to request a show on their favorite streaming service.

In-Car Speech Recognition Applications

Understanding the need for why it came about, here are some of the most popular systems that enable drivers to find directions, send emails, make phone calls, and play music, all by using the sound of their voice.

Apple CarPlay

CarPlay brings a stripped down and safety-focused version of iOS to your car’s touch-screen display. Siri is fully integrated into CarPlay, and you can connect your iPhone and your car’s factory-installed entertainment system is replaced by Apple’s familiar icons.

Press the voice button on your steering wheel and Siri will be there to help you switch between playlists, navigate to the nearest gas station, send text messages, and even email your boss with a stellar excuse about being stuck behind a school bus instead of at your early morning project meeting.

When you receive text or email, for example, you’ll see a notification on the CarPlay infotainment screen. When you tap it, Siri will read the message aloud.

You can reply right away by dictating a message for Siri to transcribe so you don’t have to take your eyes off the road to type out a message.

Google Android Auto

Android still dominates the global smartphone market, so Android fans will feel right at home with the paired-down version of their phone screens on their vehicles’ dashboard.

Unlike CarPlay, though, Android Auto could only connect via a USB cable and then relied on Bluetooth for voice phone calls through the car. Over 500 models are supported wirelessly, with even more coming soon.

Like CarPlay, Android Auto displays information like music and podcasts, calls, text messaging, GPS maps, and more.

Once your phone is connected to the car, Android Auto activates a unique and important safety feature – it renders your phone basically useless to ensure you will not use it while driving. There’s no need to pick it up, look at it, or even adjust the volume.

In case you’re wondering, there’s not much difference when it comes to using Siri on Carplay or Google Assistant on Android Auto, as both systems have similar response times and functionality.

Eventually, Android Auto is going to be replaced with Google Assistant Driving Mode, set to become the dominant UI available in the car.

After launching first in the U.S., the mode is finally making its way to other parts of the world.

Manufacturer-Specific Setups

Along with these broader, universal systems, car manufacturers have experimented with their own brand-specific systems. Ford’s Sync and GM’s OnStar have a proprietary on-board digital system, for example.

They incorporate the driver’s phone and voice into navigation, entertainment, and other limited features. Unfortunately, as many automakers have learned, the complete suite of features isn’t always easy to incorporate.

As a result, early incarnations of in-car voice control didn’t really live up to the hype. Only a very small number of commands were recognized and the chance of being misunderstood was frustratingly high.

Manufacturers and drivers both soon discovered that a ‘talking car’ is not much use if it didn’t understand their language or accent, or only functioned when there was absolutely no background noise.

More recent advances in AI, however, have enabled voice recognition technology to improve immensely.

Nuance & BMW

Nuance may be less of a household name than Apple or Android, but that doesn’t make them any less of a game-changer.

They provide a prime example of an AI developer and a car manufacturer coming together to create effective and innovative in-car speech recognition capabilities.

Available first in the BMW 3 Series, BMW Intelligent Personal Assistant is an “AI-powered digital companion that enables drivers to operate their car and access its functions and information simply by speaking.”

Nuance’s conversational-AI powered mobility assistant platform is key to BMW’s personal assistant, powering a multitude of features that are core to the in-car experience:

Customizable wake word – Drivers can use the standard “Hey BMW” wake-up word or change the name of the assistant to one of their choosing for a more personalized experience.
Voice-powered interaction – Nuance developed natural language understanding and generation in , enabling drivers to use their natural way of speaking to control key in-car functions, including point of interest search, navigation, temperature control, radio control, and weather.
Smart, voice-activated car manual – Available in US English, German, and Mandarin to start, with more languages coming, drivers will be able to access the entire car manual using their voice.
Voice-triggered experience and caring modes – Drivers can express their emotional and cognitive states using natural language, like being stressed or tired. The BMW Intelligent Personal Assistant responds by switching several of the car systems into a more appropriate for the situation.

Backed by a purchase by Microsoft in 2021, there’s certainly more to come from Nuance.

The Road Ahead for Voice-Controlled Cars

Recent research suggests that 73% of drivers expect to use voice assistants built into their cars for one or more purposes by 2022. According to Automotive World, by 2028, in-car voice control will be embedded in nearly 90% of new vehicles sold globally.

So, whether a driver wants to “Play ‘Shut Up and Drive’ by Rihanna” or ask, “Where’s the nearest Starbucks?”, speech recognition is going to be an essential feature in most new cars, particularly as we move towards autonomous cars, which will see drivers experiencing a fully hands-off driving experience.

Natural language processing is the game-changer for voice control, and its effectiveness is dependent on machine learning. The foundation of any AI technology is data: the more it has, the smarter and more personalized the experience will be.

As a result, annotated data sets containing pre-recorded voices speaking in different languages, in several accents, with a range of speaking styles and with a variety of background noises (such as the car radio or chatter in the car) are used by speech-recognition manufacturers to ensure their systems can understand and respond to natural-language speech.

Referring back to Nuance, for example, they required support with speech data collection—meaning hundreds of hours of voice data in various languages, demographics, and locations around the world.

They had a need for a precise and comprehensive amalgamation of all possible terms, accents, phrases that would be used to communicate in the vehicle, and the data was used to teach in-car systems to communicate with human beings.

With the data we collected, Nuance was able to build their research base, and continue the innovation in human and vehicle interaction. You can read all about it here.

For the time being, the full adoption of in-car speech recognition technology remains dependent on the standard of the capabilities matching drivers’ expectations.

If you say, “Find a restaurant, but not Chinese,” the difference between getting the correct information and getting a list of Chinese restaurants is down to the quality and quantity of the phrases stored on the system’s database.

Similarly, getting a useful response to a follow-up question, such as “Will it be open at 5pm?” comes down to the exhaustiveness of the data.

Let Us Help with In-Car Speech Recognition Data

Summa Linguae Technologies provides custom speech data collection and video & image data collection services to train your self-driving car AI or in-car speech recognition technology.

Free Data Collection Resources

Looking for additional resources to assist with collecting autonomous car data? Check out these helpful downloads:

Eye Gaze Sample Set (Download) – Get a sample of high-quality eye gaze data.
Road, Car, and People Dataset (Download) – Training a system that requires road image data? Download our sample dataset.