Case Study: Voice Data Collection

Nuance takes the stress out of human-to-tech communication. Their innovations in voice, natural language understanding, and systems integration work together to create more human-oriented technology; tech that has adapted to the way people communicate instead of forcing people to adapt to machines.

The Challenge: In-Car Speech Recognition

The challenge was developing the next-generation of in-car speech recognition technology. In this case, Nuance needed support with voice data collection—meaning hundreds of hours of voice data in various languages, demographics, and locations around the world.

The data would be used to teach in-car systems to communicate with human beings. Therefore, our client had a need for a precise and comprehensive amalgamation of all possible terms, accents, phrases that would be used to communicate in the vehicle.

The Solution: Voice Data Collection

In order to collect high-quality voice data in the right environment and conditions, Summa Linguae traveled to 10 countries and collected data from more than 2,000 participants over three months. The project initially began with voice data collections in China, Russia, Japan, Korea, Poland, Italy, Turkey and Spain.

We presented the participants with various loosely structured scenarios. In response to these scenarios, participants phrased the requests the way they liked.  Natural language data is critically important as terminology and the sentence structure will vary between participants. Culture, education, dialect, social environment and many other attributes have an impact on how a user will articulate a request.

When our project team eventually returned to our home base in Vancouver with suitcases full of valuable data, we conducted data collection in another 15 languages locally. Thanks to the multicultural nature of Vancouver, we were able to find almost every foreign language we needed within the city limits. Summa Linguae collected the likes of Russian, Dutch, Korean, and more from over 40 participants per language. With the data we collected, our client was able to build their research base, and continue the innovation in human and machine interaction.






Hours of data

Our Data Collection Services

Our data collection services include more than just in-field voice data collection. We also offer remote speech collection, terminology and lexicon development, multilingual transcription, and linguistic analysis. Find details on our data collection services page or reach out to us below.

Want to be our next success story?

Reach out to us today to learn how we can cater a data collection project to your specific requirements.

    Summa Linguae uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy.

    Learn More