Building an Alexa-enabled voice product? Make sure you’re ready for a global, multilingual customer base. Hear the difference that data variance makes with this sample of 24 Alexa wake word recordings in four languages.
If you’re training an Alexa-enabled product or device, you’ll need high-quality speech data to train your voice recognition model for different accents, age groups, and genders.
This sample dataset was originally collected for Amazon’s Alexa wake word functionality and contains speech data from across the world. The sample contains English, Italian, Spanish, and French speech data from varying ages and genders.
Here are a few use cases for this speech data sample:
- Identify the specific phrases and words used to wake up Amazon’s Alexa
- Hear the accents and tonal differences that need to be considered by Alexa
- Analyze metadata that gives your team and Alexa complete context
This sample dataset contains 24 .WAV audio files that were collected and labeled by Summa Linguae Technologies. The metadata for each sample is provided as well. All samples were recorded, collected, and processed online to capture intricacies in speech data from various countries and regions.
The provided audio files are free to use and test for your educational and research purposes only. This work is licensed under a Creative Commons Attribution Non-Commercial No-Derivates 4.0 International License.
Download the free wake word data set
Alexa Wake Work Data Set
Alexa wake word recordings in four different languages in a variety of accents, ages, and genders
Phone Conversation Data Set
Transcribed phone conversation recordings in Dutch, Japanese, and Irish English