Arabic & Thai & Vietnamese & Hindi & English & Chinese Language Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 150k

Annotation: Yes

Description: Arabic & Thai & Vietnamese & Hindi & English & Chinese Language Dataset

Arabic Text Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 1k

Annotation: Yes

Description: The Arabic Text Dataset contains a collection of text samples written in Arabic. It includes various forms of content, such as news articles, social media posts, literature, and dialogue, spanning different topics and writing styles. This dataset is used for tasks such as natural language processing (NLP), text classification, sentiment analysis, and machine translation in Arabic language applications.

Chinese & English & Tibetan & Uyghur Language Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 38k

Annotation: Yes

Description: Chinese & English & Tibetan & Uyghur Language Dataset

Chinese and English Menu Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 60k

Annotation: Yes

Description: The Chinese and English Menu Dataset contains images or text samples of restaurant menus that feature both Chinese and English languages. It includes various fonts, layouts, and menu structures, presenting bilingual dish names, descriptions, and prices. This dataset is useful for tasks such as optical character recognition (OCR), machine translation, and menu digitization in multilingual settings.

Chinese Handwritten Composition Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 3k

Annotation: Yes

Description: The Chinese Handwritten Composition Dataset contains samples of handwritten Chinese text, including compositions, essays, and other long-form text. It features various handwriting styles and levels of complexity, and is used for tasks such as handwriting recognition, text analysis, and machine learning model training.

Chinese WIFI Prompt Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 1k

Annotation: Yes

Description: The Chinese WIFI Prompt Dataset consists of text samples found in WIFI prompts and login screens written in Chinese. It typically includes various prompts, instructions, and error messages related to connecting to or managing WIFI networks. This dataset is used for tasks like text recognition, natural language processing, and improving user interfaces for network connectivity.

English & Chinese Handwriting Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 12k

Annotation: Yes

Description: The English & Chinese Handwriting Dataset contains handwritten samples in both English and Chinese, showcasing various writing styles and character complexities. It is typically used for training and evaluating handwriting recognition models, supporting multilingual text analysis, and other related research. The dataset includes a diverse range of characters, digits, words, and sentences in both languages.

English & Chinese Shopsign Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 30k

Annotation: Yes

Description: The English & Chinese Shopsign Dataset includes images of shop signs that feature both English and Chinese text. It captures various signage elements such as store names, advertisements, promotions, and directions, displayed in diverse fonts, styles, and formats. This dataset is used for tasks like text detection and recognition, multilingual scene understanding, and improving computer vision models for interpreting bilingual signage.

English & Chinese Special Angle Text Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 50k

Annotation: Yes

Description: The English & Chinese Special Angle Text Dataset contains images of text displayed at various angles and orientations in both English and Chinese. It includes text from sources like signs, advertisements, and documents that are not presented in standard horizontal formats. This dataset is used for training and evaluating text detection and recognition models, particularly those capable of handling text in non-traditional orientations and perspectives.

English Menu Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 20k

Annotation: Yes

Description: The English Menu Dataset includes images or text samples of restaurant menus written in English. It features a variety of fonts, layouts, and formatting styles, with content ranging from dish names to descriptions and prices. This dataset is often used for tasks like optical character recognition (OCR), text extraction, and menu digitization in food-related applications.

English Scenes Text Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 33k

Annotation: Yes

Description: The English Scenes Text Dataset consists of images containing natural scenes with embedded English text. The text appears in various forms, such as signs, billboards, and posters, often in diverse fonts, sizes, and orientations. This dataset is commonly used for training and testing models in text detection, recognition, and scene understanding tasks.

Handwritten Text Dataset

Use Case: Document AI

Format: HEIC (images) & .mov (videos)

Count: 94053

Annotation: No

Description: Live Photos with Handwritten text for Japanese, Korean & Russian

Recording Device: iPhone & iPad Camera

Recording Condition: - Aggressive Lighting/Glare - Camera Flash On - Colored Light - Low Light, No Camera Flash - Normal

Japanese & Korean Language Dataset

Bounding box+Text

Use Case: OCR

Format: Image

Count: 40k

Annotation: Yes

Description: The Japanese & Korean Language Dataset includes text samples in both Japanese and Korean. It features a range of content such as sentences, phrases, and words, encompassing various contexts and styles. This dataset is used for tasks like natural language processing (NLP), machine translation, and text analysis in multilingual applications.

Train AI Models with Multilingual Text Detection & OCR Datasets