A Case Study on Facial Recognition Model
Anti-Spoofing Video Dataset for Fraud Detection AI Models
Discover how Shaip delivered 25,000 high-quality anti-spoofing video datasets featuring real and replay attack scenarios to train AI models for fraud detection.
Project Overview
Shaip partnered with a leading AI security company to provide a high-quality, off-the-shelf anti-spoofing video dataset designed to enhance AI model training for fraud detection. The dataset included 25,000 videos capturing both real and replay attack scenarios, ensuring robust training data for anti-spoofing models.
Each of the 12,500 participants contributed two videos—one real and one replay attack—recorded at 720p or higher resolution with a frame rate of 26 FPS and above.
The project’s goal was to deliver authentic and diverse datasets that would enable AI models to effectively distinguish between real and spoofed biometric videos, thereby reducing fraud risks in biometric authentication systems.
Key Stats
25,000 total videos (12,500 real videos, 12,500 replay attack videos)
12,500 unique
participants
5 ethnicity groups
represented in the dataset
Phased delivery: 4 batches of 6,250 videos each
Metadata attributes: 12 key parameters for enhanced dataset usability
Anti-Spoofing Biometric Dataset Scope
Dataset Curation:Â The project focused on delivering high-quality anti-spoofing video datasets consisting of real and replay attack videos. Key aspects included:
- 12,500 participants contributing two videos each (1 real, 1 spoofed).
- Diversity in recording devices to enhance model adaptability.
- Balanced ethnic representation to ensure dataset inclusivity.
Metadata Collection:Â Each video was accompanied by 12 metadata attributes to enhance dataset usability.
Video Data Collection Challenges
Maintaining balanced ethnicity-wise data distribution while sourcing high-quality videos.
Ensuring that each participant contributes one real and one replay attack video to maintain dataset integrity.
Adhering to strict guidelines for FPS (≥ 26), resolution (≥ 720p), and timestamp accuracy (+/- 0.5ms).
How We Solved It
Shaip provided a structured and high-quality dataset to meet the project’s requirements. The solution included:
Dataset Curation & Quality Control
- 25,000 videos collected across 4 phases to ensure a steady and structured data flow, avoiding bottlenecks.
- Rigorous validation process to ensure compliance with FPS, resolution, and metadata accuracy. Each video underwent multiple quality checks before final acceptance.
- Comprehensive metadata tagging with 12 attributes:
- File ID/Name
- Type of Attack (Real/Replay)
- Person ID
- Video Resolution
- Video Duration
- Ethnicity of the Subject
- Gender of the Subject
- Whether Video is Original or Spoofed
- Device Name/Model
- Person Speaking or Not
- Timestamp Start Time
- Timestamp End Time
- Balanced Ethnic Group Distribution: The dataset was meticulously curated to maintain a balanced ethnic representation. The distribution includes Hispanic (33%), South Asian (21%), Caucasian (20%), African (15%), and East Asian & Middle Eastern populations (each comprising up to 6%).
- No duplicate entries to maintain dataset uniqueness and prevent biases in AI training.
- Ethnically diverse participant selection to create a dataset that reflects real-world user variations, improving AI model adaptability and fairness.
- Recording device variation included multiple smartphone models, cameras, and lighting conditions to enhance the model’s robustness against different environmental settings.
Outcome
The high-quality, diverse anti-spoofing video dataset provided by Shaip enabled the client to train AI models to accurately differentiate between real and spoofed videos in various biometric authentication scenarios. The dataset contributed to:
Enhanced AI performance in detecting fraudulent biometric attacks.
Strengthened the model’s ability to recognize replay attacks across different ethnicities, devices, and environmental conditions.
The dataset serves as a foundation for future anti-spoofing model enhancements and expansions.
Shaip’s dataset has been instrumental in enhancing our AI-driven anti-spoofing models. The diversity, quality, and structured metadata provided a strong foundation for improving fraud detection in biometric authentication systems.