Let’s Talk About Conversational Data Collection for AI

Last Updated December 2, 2021

conversational data collection

From chatbots to your favorite fast-food spot, conversational data is powering modern advancements in AI. But how do we collect the data?

Already a multi-billion-dollar business, the global conversational artificial intelligence (AI) market is expected to grow by over 20 percent each year through 2025.

Central to that growth is a constant flow of conversational data collection and processing.

Using the best possible conversational AI interface, it’s difficult for the user to recognize whether they’re chatting with a human or a virtual assistant. This is where the Turing Test comes into play – the method of determining whether or not the AI is capable of making us believe it’s a human being.

The technology isn’t quite there yet, but conversational AI developers are working tirelessly to improve accuracy and usability.

Natural conversation data is crucial for a seamless user experience with chatbots, automated call centers, and other speech-enabled devices.

In this article, we will discuss what we mean by conversational data, how to collect it, and some of the challenges you may encounter along the way.

What is conversational data?

Conversational data refers to naturally occurring dialogue for the purpose of machine learning. This can be an oral or written exchange of sentiments, observations, opinions, or ideas between two or more parties in short or long form.

Conversational data trains AI to replicate the flow of human conversations. It’s not just programming different languages and dialects, but also phraseology, pronunciations, filler words, slang, and other variables.

Conversational data comes from from phone calls, customer service interactions, social media chats, and even from a drive thru order of a hamburger and fries.

These examples typify the two main types of conversational data: data that’s pre-packaged or data that’s custom built.

Recorded Conversations

Conversational data comes from call centers and customer service conversations, as well as podcasts, and phone calls.

The goal is to collect unscripted speech data between two speakers. Natural dialogue that can be time-stamped and transcribed to train a particular AI solution is desired.

Text Chat Transcripts

Text conversational data comes from customer service chat or emails threads, or chat or social media interactions.

When you contact a TV service provider to cancel or amend a subscription, for example, your typed interaction with a real-life customer service agent will be saved as a transcript.

That text (and a multitude of other similar chats) will be annotated to develop AI versions of that service in the future.

Let’s take a closer look at how this data is gathered.

Where to Find Conversational Data

AI developers can obtain pre-packaged or custom-collected conversational data to help power conversational interfaces.

Here’s how it’s collected and what’s done with the data before it’s delivered to you.

Proprietary Data

Well, that was easy.

If you’re already in the call center business, you have access to an abundance of recordings that can be transcribed and disseminated for a variety of purposes.

The caveat here is user consent and compliance with privacy laws.

In Canada, for example, you must inform the customer they’re being recorded, clearly state the purpose of the recording, and ask for their consent.

Field Data Collection

With this method, conversational data collection is executed in person, in a specifically chosen physical location or environment.

A great example of field data collection is a “voice recognition party.” To collect natural speech data, we’ve hosted a number of these events.

We set up microphones and cameras in participants’ homes to collect conversational data in its most natural form, with several competing voices and sounds in the room. This data was used to help train speech recognition algorithms to detect a single voice in a noisier setting.

This can be very useful for call centers where customers are calling from home with a lot of background noise.  Or they can be used to train smart speakers that are expected to work in a noisy room.

Field data collection is a good option if you have specific audio or equipment requirements that otherwise can’t be achieved remotely. For example, you may want to record audio from the end user device.

Crowd-Sourced Collection

Here, participants are recruited online based on their language and demographic profile. It’s also known as remote data collection.

Let’s say you need conversations in a specific language or dialect. We can collect speech data quickly and efficiently in any language on a small scale or with thousands of participants.

We accomplish this via Robson, our in-house data platform and mobile app for Android and iOS that allows us to collect and annotate data from any end user that fits your data needs.

Participants sign up to record conversations in various target languages. Don’t worry – we pay Robson participants fairly and we protect their privacy like it’s our own.

For example, phone conversation data sets are available in Japanese, Irish English, and Dutch.

The benefit here is that this solution is affordable, scalable, and highly customizable to your needs.

Data Purchasing

We partner with call centers to record, collect, and buy conversational data safely and securely.

These data sets feature real call center conversations taking place in a variety of languages for domains such as travel and tourism, retail, finance, and more.

Personally identifiable information (PII) is redacted from all recordings, so these call center data sets are ready to use for your solution.

Examples include call center conversations in Japanese and US English.

Transcription and Annotation of Conversational Data

By whatever means the conversational data is collected, it can be transcribed and annotated to whatever degree you require.

When it comes to speech transcription for AI within the space of conversational data, the information trains and validates voice recognition algorithms on a variety of applications.

To begin, the transcriber—either a person or a computer—records what is said, when it’s said, and by whom.

We err on the side of human speech transcription to ensure accuracy and inclusivity, and to handle complex environments and use cases.

The annotation process consists of labeling noises, repetitions, false starts, changes in language, and who is speaking. If you’re looking for solely for the data, it can be delivered as is.

Here’s an example of how this works in practice:

We previously worked on client projects where we transcribed call center data in Japanese and US English with the purpose of training conversational AI. In collaboration with the client, we established the following speech annotation and transcription process:

  1. Annotation – One annotator works on segmentation, speaker tagging, and meta data.
  2. Partial QA – One of our team members QAs a sampling of the annotated files to ensure they’re ready for transcription.
  3. Transcription – A different transcriber inserts the transcription and any necessary tags.
  4. Full QA – The same QA reviews 100% of the transcription files.

By following this multi-step process – beginning with annotating the speech data and then performing partial QA – we ensure that the transcription step is as efficient as possible.

This helps our client save on costs without sacrificing quality.

Data solutions experts work with you to understand exactly what level of transcription you need. And if your requirements aren’t yet fully defined, we can help you choose the right solution.

Conversational Data Challenges

As you may imagine, conversational audio data comes with a lot of variables. Not only are there two ends to the conversation, but outside factors pop up as well if the chat is recorded in an uncontrolled environment.

Here are a few of the challenges that have emerged.

  • File Quality – If the recording itself is botched or unlistenable, clearly there won’t be many advancements in AI coming from it.
  • Speech vs. audio hours– Ideally, the data is heavy on conversation and light on ring tones, keypad dialing and dead air.
  • Mono vs. stereo / 1 vs. 2 channels: If the agent and caller on different lines, the audio needs to be synced into one file so that both sides of the conversation are captured. The transcriptionist doesn’t want to cross reference each file.
  • Domain: As mentioned above, conversational data can be industry specific. Everything voice assistants do is impressive, but there are industries where general AI isn’t enough; Siri won’t be able to answer domain-specific questions that a person in the financial industry may have, for example.
  • Background noise: There could be dog barking and doorbells ringing. You may want that scrubbed, or you may be perfectly content working around it.

With a background in language services, we have first-hand experience with the challenges of developing AI technology, and rely on our “four Ps” approach to speech data collection to work through them.

Get the Conversational Data You Need

We have hundreds of hours of pre-recorded, natural phone conversation data—fully annotated—available in English, Dutch, Japanese, and more.

We also provide high-quality, customizable conversational data in any language or dialect.

Download our free speech data sample sets below to see if our pre-packed data sets are a fit for your solution.

Ready to start collecting customized speech data? Just let us know what you need. Contact us today.

Related Posts

Summa Linguae uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy.

Learn More