Off-the-Shelf AI Training datasets

Datasets for Training Chatbots, Healthcare, and Conversational AI Models

Sample datasets

Access High-Quality, Scalable Datasets to Train Chatbot, Conversational AI, & Healthcare Apps

The datasets include an hour of Conversational AI Training Data in languages such as Australian English, UK English, Danish, Hindi, Indonesian, Malay, Afrikaans, Arabic, Irish, and more. The healthcare data consists of physician-dictated audio detailing patients’ clinical conditions and care plans, along with transcribed conversations and clinical documents.

DatasetsFileUse CaseDescriptionDownload
Physician Dictation
Physician dictation audio files
Audio Files
HealthcareAn audio recording, dictated by physicians describing patients' clinical condition & plan of care in the hospital/clinical setting.
Physician Dictation
Verbatim transcribed text files
Verbatim Transcribed Text Files
HealthcareA set of transcribed documents corresponding to the dictation audio dataset. Verbatim transcription, as required to train speech recognition acoustic & vocabulary models.
Physician Clinical Notes
Physician dictation notes
Dictation Notes
HealthcareA set of clinical documents as dictated by the physician describing patients’ clinical condition.
Physician Clinical Notes
Physician dictation notes
De-identified Dictation Notes
HealthcareA set of formatted clinical documents as dictated by the physicians to train medical AI models.
Human-Bot Conversations
Australian english
Australian English
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Human-Bot Conversations
Uk english
UK English
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Danish
Danish
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Hindi
Hindi
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Telugu
Telugu
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Indonesian
Indonesian
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Hebrew
Hebrew
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Malay
Malay
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Afrikaans
Afrikaans
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Arabic
Arabic
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Irish
Irish
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Scottish
Scottish
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files
Conversations Datasets
Welsh
Welsh
Conversational AIA sample of an audio conversation and the corresponding transcribed JSON files