Off-the-Shelf Data Sets

Looking for high-quality, annotated data for your machine learning applications? Explore our collection of off-the-shelf speech, image, and video data sets below. Most data sets have a downloadable sample file to give you a preview of the capabilities of our ready-to-order or highly customizable data solutions.

Speech Data

Call Center Data in Japanese

This data set contains recordings of call center conversations in Japanese (jp_JP).

Speech Data

Call Center Data in US English

This data set contains recordings of up to 1000 hours of call center conversations in US English (en_US).

Speech Data

Google Wake Words in US English

Google Wake Words in US English (en_US) of 103 participants of age 19-68.

Speech Data

Siri Wake Words and Voice Commands in US English

Siri Wake Words and Voice Commands in US English (en_US) of 103 participants of age 19-68.

Speech Data

Alexa Wake Words in Mexican Spanish (Adults)

Alexa Wake Words in Mexican Spanish (es_MX) of 106 participants of age 16-65.

Speech Data

Phone Conversations in Japanese

500 hours of phone conversations in Japanese (jp_JP).

Speech Data

Alexa Wake Words in EU Spanish (Adults)

Alexa wake words in Spanish (es_ES) of 104 participants of age 15-60.

Speech Data

Phone Conversations in Irish English

500 hours of phone conversations in Irish English (en_IE).

Image Data

Eye Gaze Images

62 different people, 187 eye gaze directions, 3 different head poses, and 347,820 eye gaze images.

Video Data

Roads, Cars, and People Video

4 cameras recorded traffic (cars and pedestrians) at an intersection from either a 45 or 90 degree angle.

Speech Data

Siri Wake Words in US English

US English wake words using "Siri" from 103 participants of age 19-68.

Speech Data

Google Wake Words and Voice Commands in US English

US English voice commands including the wake word "OK Google" from 103 participants of age 19-68.

Speech Data

Voice Commands in Canadian French (Youth)

Voice commands in Canadian French (fr_CA) of 50 participants of age 6-14.

Speech Data

Alexa Wake Words in Canadian French (Adults)

Alexa wake words and voice commands in Canadian French (fr_CA) of 100 participants of age 15-65.

Speech Data

Voice Commands in Canadian French (Adults)

Voice commands in Canadian French (fr_CA) of 100 participants of age 15-65.

Speech Data

Alexa Wake Words in Canadian French (Youth)

Alexa wake words and voice commands in Canadian French (fr_CA) of 50 participants of age 6-14.

Speech Data

Voice Commands in Italian (Youth)

Voice commands in Italian (it_IT) of 65 participants of age 6-14.

Speech Data

Voice Commands in Italian (Adults)

Voice commands in Italian (it_IT) of 135 participants of age 15-65.

Speech Data

Alexa Wake Words in Italian (Youth)

Alexa wake words in Italian (it_IT) of 65 participants of age 6-14.

Speech Data

Alexa Wake Words in Italian (Adults)

Alexa wake words in Italian (it_IT) of 135 participants of age 15-65.

Speech Data

Voice Commands in Mexican Spanish (Adults)

Voice commands in Mexican Spanish (es_MX) of 106 participants of age 16-65.

Speech Data

Voice Commands in Mexican Spanish (Youth)

Voice commands in Mexican Spanish (es_MX) of 51 participants of age 6-14.

Speech Data

Alexa Wake Words in Mexican Spanish (Youth)

Alexa wake words in Mexican Spanish (es_MX) of 51 participants of age 6-14.

Speech Data

Alexa Wake Words in EU Spanish (Youth)

Alexa wake words in Spanish (es_ES) of 51 participants of age 6-14.

Speech Data

Voice Commands in EU Spanish (Adults)

Voice commands in Spanish (es_ES) of 104 participants of age 15-60.

Speech Data

Voice Commands in EU Spanish (Youth)

Voice commands in Spanish (es_ES) of 51 participants of age 6-14.

Speech Data

Phone Conversations in Dutch

50 hours of phone conversations in Dutch (nl-NL).

Want to build your own data set?

Contact us now to learn how we can collect a custom data set for your unique AI solution.

    Summa Linguae uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy.

    Learn More