Speech Data Collection

Remote and in-person speech data collection services for voice-enabled technology.

Get Started

High-quality data

High-quality speech data on an easy-access data management platform

Any audio requirements

Enjoy 100% flexibility with acoustic and scenario setup. From inside a car to a dinner party; you name it, we’ll set it up.

Any demographic

Collect data in 35+ languages: we gather voice data locally and abroad; small scale or with hundreds of participants

Custom Speech Data Collection Solutions

Summa Linguae Technologies offers end-to-end speech data collection solutions to ensure your voice-enabled technology is ready for a diverse and multilingual audience.

We can take on any scope of project; from building a natural language corpus, to managing in-field data collection, transcription, and semantic analysis.

Using Summa Linguae’s custom-built multilingual data management platform, our clients are able to access their data and the associated metadata quickly and efficiently through an easy-to-use API.

Speech Data to Fit Your Needs

We know that data collection projects for speech recognition come in all shapes and sizes.

Some projects have extremely specific acoustic or participant requirements that require tremendous amounts of planning, creativity, and innovation. We love those projects.

Other projects have simpler specifications, but may require fast turnaround time or an extremely high volume of speech samples. We love those projects too.

No matter what the scope and scale of your speech data collection project, we can custom-tailor a collection solution to suit your needs.

Ready to start collecting speech data? Just let us know what you need.

Contact Us

Download Our Sample Speech Data Sets

Download our free speech data sample sets to see if our data solutions are a fit for your solution.


Warning: Undefined array key 0 in /home/summalinguae/ftp/main/wp-content/themes/custom/template-parts/common/download_list.php on line 49

Alexa Wake Word Samples

24 custom audio samples / 4 languages / Varying ages and genders


Warning: Undefined array key 0 in /home/summalinguae/ftp/main/wp-content/themes/custom/template-parts/common/download_list.php on line 49

Phone Conversation Samples

Natural phone conversations / 3 languages / Transcriptions included

Martin Sander

Manager of Research Data, Nuance Communications

Summa Linguae Technologies has provided exceptional services to the Data Collection team at Nuance Communications, Inc. They have supervised large scale data collection simultaneously in three different countries, consistently delivering quality data on or ahead of schedule. And this was done twice in short order – in Europe and in Asia. Our continuing relationship with Summa Linguae is a great asset to the company.

What type of speech data do you need?

Tell us about your project and we’ll tailor a data collection plan to your exact needs.

    Summa Linguae uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy.

    Learn More