Download the Phone Conversation Data Sample

In need of natural speech data for your chatbot or speech-enabled device? This data sample features transcribed audio recordings of two-party phone conversations in three languages.

Download the Data Set

Building a conversational interface?

Summa Linguae Technologies offers pre-packaged or custom-collected conversational data solutions to help power your conversational interfaces.

Our pre-packaged phone conversation data sets include:

  1. Dutch phone conversations: 500 hours, 936 conversations
  2. Japanese phone conversations: 500 hours, 787 conversations
  3. Irish English phone conversations: 50 hours, 99 conversations

Need data in another language, accent, or dialect? No problem.

We offer conversational data collection services for target demographic.

What's included in the sample data set?

This sample data set contains 5 minutes of audio files in Dutch, Japanese, and Irish English. The files are in the .WAV audio format with corresponding .JSON transcription files.

This data was initially collected to train a conversational Automatic Speech Recognition (ASR) system on phone call data.

Download Now

Participants held phone conversations with friends and family members through our custom SIP platform. Conversations range in length from 9 to 180 minutes, averaging 30 minutes each.

Transcription was done in timestamped segments by human transcribers without the assistance of ASR, and with a high emphasis placed on accuracy and quality.

This sample is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Download Now

Get the phone conversation data set

    Summa Linguae uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy.

    Learn More