Off-the-Shelf AI Training datasets
Datasets for Training Chatbots, Healthcare, and Conversational AI Models
Access High-Quality, Scalable Datasets to Train Chatbot, Conversational AI, & Healthcare Apps
The datasets include an hour of Conversational AI Training Data in languages such as Australian English, UK English, Danish, Hindi, Indonesian, Malay, Afrikaans, Arabic, Irish, and more. The healthcare data consists of physician-dictated audio detailing patients’ clinical conditions and care plans, along with transcribed conversations and clinical documents.
Datasets | File | Use Case | Description | Download |
---|---|---|---|---|
Physician Dictation |
Audio Files | Healthcare | An audio recording, dictated by physicians describing patients' clinical condition & plan of care in the hospital/clinical setting. | Download |
Physician Dictation |
Verbatim Transcribed Text Files | Healthcare | A set of transcribed documents corresponding to the dictation audio dataset. Verbatim transcription, as required to train speech recognition acoustic & vocabulary models. | Download |
Physician Clinical Notes |
Dictation Notes | Healthcare | A set of clinical documents as dictated by the physician describing patients’ clinical condition. | Download |
Physician Clinical Notes |
De-identified Dictation Notes | Healthcare | A set of formatted clinical documents as dictated by the physicians to train medical AI models. | Download |
Human-Bot Conversations |
Australian English | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Human-Bot Conversations |
UK English | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Danish | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Hindi | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets | Telugu | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Indonesian | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Hebrew | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Malay | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Afrikaans | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Arabic | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Irish | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Scottish | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |
Conversations Datasets |
Welsh | Conversational AI | A sample of an audio conversation and the corresponding transcribed JSON files | Download |