Arabic & Thai & Vietnamese & Hindi & English & Chinese Language Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 150k
Annotation: Yes
Description: Arabic & Thai & Vietnamese & Hindi & English & Chinese Language Dataset
Arabic Text Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 1k
Annotation: Yes
Description: The Arabic Text Dataset contains a collection of text samples written in Arabic. It includes various forms of content, such as news articles, social media posts, literature, and dialogue, spanning different topics and writing styles. This dataset is used for tasks such as natural language processing (NLP), text classification, sentiment analysis, and machine translation in Arabic language applications.
Chinese & English & Tibetan & Uyghur Language Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 38k
Annotation: Yes
Description: Chinese & English & Tibetan & Uyghur Language Dataset
Chinese and English Menu Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 60k
Annotation: Yes
Description: The Chinese and English Menu Dataset contains images or text samples of restaurant menus that feature both Chinese and English languages. It includes various fonts, layouts, and menu structures, presenting bilingual dish names, descriptions, and prices. This dataset is useful for tasks such as optical character recognition (OCR), machine translation, and menu digitization in multilingual settings.
Chinese Handwritten Composition Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 3k
Annotation: Yes
Description: The Chinese Handwritten Composition Dataset contains samples of handwritten Chinese text, including compositions, essays, and other long-form text. It features various handwriting styles and levels of complexity, and is used for tasks such as handwriting recognition, text analysis, and machine learning model training.
Chinese WIFI Prompt Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 1k
Annotation: Yes
Description: The Chinese WIFI Prompt Dataset consists of text samples found in WIFI prompts and login screens written in Chinese. It typically includes various prompts, instructions, and error messages related to connecting to or managing WIFI networks. This dataset is used for tasks like text recognition, natural language processing, and improving user interfaces for network connectivity.
English & Chinese Handwriting Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 12k
Annotation: Yes
Description: The English & Chinese Handwriting Dataset contains handwritten samples in both English and Chinese, showcasing various writing styles and character complexities. It is typically used for training and evaluating handwriting recognition models, supporting multilingual text analysis, and other related research. The dataset includes a diverse range of characters, digits, words, and sentences in both languages.
English & Chinese Shopsign Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 30k
Annotation: Yes
Description: The English & Chinese Shopsign Dataset includes images of shop signs that feature both English and Chinese text. It captures various signage elements such as store names, advertisements, promotions, and directions, displayed in diverse fonts, styles, and formats. This dataset is used for tasks like text detection and recognition, multilingual scene understanding, and improving computer vision models for interpreting bilingual signage.
English & Chinese Special Angle Text Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 50k
Annotation: Yes
Description: The English & Chinese Special Angle Text Dataset contains images of text displayed at various angles and orientations in both English and Chinese. It includes text from sources like signs, advertisements, and documents that are not presented in standard horizontal formats. This dataset is used for training and evaluating text detection and recognition models, particularly those capable of handling text in non-traditional orientations and perspectives.
English Menu Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 20k
Annotation: Yes
Description: The English Menu Dataset includes images or text samples of restaurant menus written in English. It features a variety of fonts, layouts, and formatting styles, with content ranging from dish names to descriptions and prices. This dataset is often used for tasks like optical character recognition (OCR), text extraction, and menu digitization in food-related applications.
English Scenes Text Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 33k
Annotation: Yes
Description: The English Scenes Text Dataset consists of images containing natural scenes with embedded English text. The text appears in various forms, such as signs, billboards, and posters, often in diverse fonts, sizes, and orientations. This dataset is commonly used for training and testing models in text detection, recognition, and scene understanding tasks.
Handwritten Text Dataset
Use Case: Document AI
Format: HEIC (images) & .mov (videos)
Count: 94053
Annotation: No
Description: Live Photos with Handwritten text for Japanese, Korean & Russian
Recording Device: iPhone & iPad Camera
Recording Condition: - Aggressive Lighting/Glare - Camera Flash On - Colored Light - Low Light, No Camera Flash - Normal
Japanese & Korean Language Dataset
Bounding box+Text
Use Case: OCR
Format: Image
Count: 40k
Annotation: Yes
Description: The Japanese & Korean Language Dataset includes text samples in both Japanese and Korean. It features a range of content such as sentences, phrases, and words, encompassing various contexts and styles. This dataset is used for tasks like natural language processing (NLP), machine translation, and text analysis in multilingual applications.