In need of natural speech data for your chatbot or speech-enabled device? This data sample features transcribed audio recordings of two-party phone conversations in three languages.
Summa Linguae Technologies offers pre-packaged or custom-collected conversational data solutions to help power your conversational interfaces.
Our pre-packaged phone conversation data sets include:
- Dutch phone conversations: 500 hours, 936 conversations
- Japanese phone conversations: 500 hours, 787 conversations
- Irish English phone conversations: 50 hours, 99 conversations
Need data in another language, accent, or dialect? No problem.
We offer conversational data collection services for target demographic.
What's included in the sample data set?
This sample data set contains 5 minutes of audio files in Dutch, Japanese, and Irish English. The files are in the .WAV audio format with corresponding .JSON transcription files.
This data was initially collected to train a conversational Automatic Speech Recognition (ASR) system on phone call data.
Participants held phone conversations with friends and family members through our custom SIP platform. Conversations range in length from 9 to 180 minutes, averaging 30 minutes each.
Transcription was done in timestamped segments by human transcribers without the assistance of ASR, and with a high emphasis placed on accuracy and quality.
This sample is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Get the phone conversation data set
Alexa Wake Work Data Set
Alexa wake word recordings in four different languages in a variety of accents, ages, and genders
Phone Conversation Data Set
Transcribed phone conversation recordings in Dutch, Japanese, and Irish English