Conversational Data Collection

Conversational data collection services for chatbots, voice assistants, and speech-enabled devices.

Get Started

High-quality data

Pre-packaged or custom-collected conversational speech data delivered through an easy-access data management platform.

Complete flexibility

Choose your acoustic scenario, audio requirements, speaker demographics, annotation specifications, and more.

Any language or dialect

Need conversations in a specific language or dialect? We can collect data quickly and efficiently in any language at a small scale or with thousands of participants.

Conversational Data Collection Services

Whether you’re building a chatbot, voice assistant, or speech-enabled device, understanding natural conversational speech is crucial for a seamless user experience. Summa Linguae Technologies offers pre-packaged or custom-collected conversational data collection solutions to help power your conversational interfaces.

We offer phone conversations, text chat transcripts, or any other unique scenario you may require. And we do more than collection, we can also provide full annotation, classification, and labeling services.

Want to learn more about our data collection and processing services? Check out our Data Solutions page.

In-field or Crowd-sourced Collection

Need conversational data in a specific bitrate, accent, or dialect? We offer both in-field and crowd-sourced data collection services to capture the exact conversations you need.

Or maybe you’re in need of a high quantity of data with a fast turnaround? Summa Linguae has built a proprietary data collection app that connects us to thousands of participants worldwide.

In need of a unique scenario? We can tailor in-country data collection of any complexity. We have traveled over 15 countries worldwide for our data collection projects, collecting speech data from more than 10,000 participants. Contact us to learn more.

Ready to start collecting speech data? Just let us know what you need.

Contact Us


hours of data

collected in-house, crowd-sourced, and in the field



of speech recognition data collected locally and abroad



when it comes to acoustic and scenario setup

Pre-packaged Conversational Speech Data

Looking for affordable, readily available conversational speech data? Summa Linguae has hundreds of hours of pre-recorded, natural phone conversations—fully annotated—available in English, Dutch, Japanese, and more.

Download our free speech data sample sets below to see if our pre-packed data sets are a fit for your solution.

Martin Sander

Manager of Research Data, Nuance Communications

Summa Linguae Technologies has provided exceptional services to the Data Collection team at Nuance Communications, Inc. They have supervised large scale data collection simultaneously in three different countries, consistently delivering quality data on or ahead of schedule. And this was done twice in short order – in Europe and in Asia. Our continuing relationship with Summa Linguae is a great asset to the company.

Contact us now for conversational speech data

Tell us about your project and we’ll recommend a data set or tailor a data collection plan to your exact needs.

    Summa Linguae uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy.

    Learn More