Chinese English Dataset
中英文数据集
Overview
Title
Chinese English Language Dataset
Dataset Type
Call-Center
Description
Unscripted, synthetic telephonic conversation between “agent” and “customer”, Approx. Audio Duration (Range) 5-15 Minutes.
Use Case
ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
Data Set Details
Total hours
169
Sample Rate
8 kHz
Audio Channel
Dual
Recording Platform
Desktop
Audio Format
.wav
Transcription Format
.json
WER (%)
5
Data Set Demographics
Country
China
Language
Chinese English
Gender
Female 1790, Male 523, Unknown 13
Number of Speakers
2,326
Age
18-50
Overview
Title
Chinese English Language Dataset
Dataset Type
Media Audio
Description
Licensable Public domain audio/video files such as interviews, podcasts etc – 1 to 5 people. Approx. Audio Duration (Range) 15-60 minutes.
Use Case
ASR, Virtual Assistant, Chatbot, Conversational AI, Speech Analytics, TTS, Language Modelling
Data Set Details
Total hours
249
Sample Rate
16 kHz
Audio Channel
Mono
Recording Platform
Web Sourcing
Audio Format
.wav
Transcription Format
.json
WER (%)
5
Data Set Demographics
Country
China
Language
Chinese English
Gender
Female 126, Male 346, Unknown 6
Number of Speakers
478
Age
18-50
Overview
Title
Wuhan Language Dataset
Dataset Type
Spontaneous Dialogue
Total hours
500
Sample Rate
16 kHz
Chinese Language Dataset
TTS
300
16 kHz
Hokkienese Language Dataset
Spontaneous Dialogue
100
16 kHz
Shanghai Language Dataset
Spontaneous Dialogue
500
16 kHz
Sichuan Language Dataset
Spontaneous Dialogue
500
16 kHz
English (Chinese) Language Dataset
Scripted Spontaneous
2,000
16 kHz
Featured Clients
Empowering teams to build world-leading AI products.
Can’t find what you are looking for?
New off-the-shelf datasets are being collected across all data types
Contact us now to let go of your audio/speech training data collection worries