Voice-Based UPI Payment Prompts: Capturing Diversity for Enhanced AI Models

Leveraging Shaip’s expertise in prompt creation and diverse audio recordings to support voice-based UPI payment systems with high-quality, culturally diverse data.

Voice-based upi payment prompts

Project Overview

Shaip partnered with a leading fintech company to develop a voice-based payment application by creating and recording diverse UPI payment prompts. The project involved the creation of 2,500 unique prompts and 87,000 diversified prompts across 13 payment-related intents, such as sending money, requesting money, balance inquiry, and bill payments. These prompts were recorded over 200 hours by 45 speakers from diverse regions, backgrounds, and age groups, ensuring a wide array of linguistic and environmental diversity.

The project’s goal was to develop high-quality training data for an AI model that can recognize and respond to voice commands related to UPI payments in real-world settings.

Voice-based upi payment

Key Stats

Audio hours of UPI payment prompts recorded

200

Speakers from diverse backgrounds(age, education, region)

45

Intents covered, with 87,000+ diversified prompts

13

Languages: English, with speakers from various native language backgrounds (Kumaoni, Bengali, Malayalam, Gujarati, Hindi, Marathi etc.)

Project Scope

Prompt Creation

The scope included creating unique prompts for a voice-based UPI payment system. Prompts were designed to cover multiple intents, ensuring they were diverse in structure, vocabulary, and named entities. Some key aspects included:

13 Key Intents, including:

  • Send Money: 65,653 unique and diversified prompts
  • Balance Inquiry: 3,052 prompts
  • Request Money: 26,972 prompts
  • Transaction History, Recharge, Bill Payment, etc.

Audio Recording

To ensure authenticity and real-world applicability, prompts were recorded by 45 speakers from different linguistic backgrounds. The diversity captured through different native languages, regional dialects, and environments (indoor and outdoor) helped enhance the training data.

  • Language Diversity: Users fluent in English but with varied native languages, such as Kumaoni, Gujarati, Hindi, Bangla, Marathi & Malayalam.
  • Age, Gender, and Educational Background: Data captured a broad range of demographics.
  • Urban & Rural Speakers: To reflect real-world use, both urban and rural speakers were included.
  • Recording Environment: Audio recordings were conducted in both indoor and outdoor settings, with a variety of background noises included.

Challenges

Linguistic and Regional Diversity

Ensuring prompts reflect diverse regional dialects and speaker characteristics required careful planning and execution.

Natural Audio Variations

Handling background noises and environmental conditions (indoor vs. outdoor) was crucial for real-world application.

Diverse Speaker Profiles

The inclusion of speakers from different age groups, educational backgrounds, and rural/urban regions introduced complexity in capturing authentic data.

Solution

Shaip delivered a solution that addressed the project’s challenges by implementing advanced NLP techniques and careful planning in both prompt creation and recording. Key aspects of the
solution included:

Prompt Creation

  • 2,500 unique prompts were created, each diversified by structure and vocabulary.
  • 13 intents were covered, ranging from basic payment requests to more complex inquiries such as transaction history and bill payments.

Audio Recording

  • 200 hours of audio recordings were conducted by 45 users, ensuring diversity across native languages, environments, & speaker demographics.
  • Both indoor and outdoor environments were used for recording to ensure natural audio variation.
  • Speakers represented a range of regional dialects, ensuring accurate linguistic representation.
IntentSendBalance EnquiryRequest MoneyTransaction History
PromptDo a payment of twenty one hundred to Sumatri for house rentI want to know my current balance in my savings account.Can you request Raji, three hundred and eighteen rupees for an emergency?
Show me my debit card transaction history.
Gujaratiસુમાત્રીને ઘરના ભાડા પેટે એકવીસસો ચૂકવોહું મારા બચત ખાતામાં મારી વર્તમાન બેલેન્સ જાણવા માંગુ છું.શું તમે રાજી પાસેથી ઇમર્જન્સી માટે ત્રણસો અઢાર રૂપિયા માંગી શકો છો?મને મારા ડેબિટ કાર્ડના વ્યવહાર દેખાડો.
Hindiसुमात्री को मकान किराए के लिए इक्कीस सौ रुपए का भुगतान करें |मैं अपने बचत खाते में वर्तमान शेष राशि जानना चाहता हूँ।क्या आप राजी से किसी इमरजेंसी के लिए तीन सौ अठारह रुपये मांग सकते हैं?मुझे मेरा डेबिट कार्ड का लेनदेन ब्यौरा दिखाओ।
Malayalamവീട്ടുവാടകയായി സുമത്രിക്ക് രണ്ടായിരത്തിഒരുന്നൂറ് നൽകൂ.എൻ്റെ സേവിംഗ്സ് അക്കൗണ്ടിലെ നിലവിലെ തുക അറിയാൻ ഞാൻ ആഗ്രഹിക്കുന്നു.രാജിയോട് മുന്നൂറ്റി പതിനെട്ട് രൂപ അടിയന്തരാവശ്യത്തിന് ആവശ്യപ്പെടാമോ?എൻ്റെ ഡെബിറ്റ് കാർഡ് ഇടപാട് വിവരണം കാണിക്കൂ.
Teluguఇంటి అద్దె కోసం సుమత్రికి ఇరవై ఒక్క వంద చెల్లించండినేను నా సేవింగ్స్ అకౌంట్ లో నా ప్రస్తుత బ్యాలెన్స్ ను తెలుసుకోవాలనుకుంటున్నాను.ఎమర్జెన్సీ కోసం రాజిని మూడు వందల పద్దెనిమిది రూపాయలు అడగగలరా?నా డెబిట్ కార్డ్ లావాదేవీ చరిత్రను నాకు చూపించండి.
Bangla বাংলাবাড়ি ভাড়ার জন্য সুমাত্রিকে ২১,০০০ টাকা পরিশোধ করুনআমি আমার সঞ্চয় অ্যাকাউন্টে বর্তমান ব্যালেন্স জানতে চাই।আপনি রাজির কাছে তাৎক্ষণিক অবস্থার জন্য তিনশো আঠারো টাকাচেয়ে নিতে পারেন?আমার ডেবিট কার্ডের লেনদেনের ইতিহাস দেখান।
Marathiसुमात्री ला घरा चे रेंट साठी दोन हजार एक संभर रुपये चुकवामला माये बचत खाते मधी चालू बॅलन्स जाणा च आहेकाय तुम्ही राजी पासून तीन सो अठराह रुपय मांगु शकते इमरजेंसी साठी ?मला माझे डेबिट कार्ड चे लेन देन दाखवा .

The Outcome

The high-quality, diverse audio data delivered by Shaip allowed the client to develop an AI-driven voice-based UPI payment system capable of recognizing commands in various dialects, environments, and contexts. The data helped enhance:

  • Real-time voice recognition in complex environments.
  • More accurate UPI transaction handling for a broader range of users.
  • Scalability: The project sets a strong foundation for expanding into other Indian languages.

Deliverables

  • 200 hours of audio files (8 kHz PCM WAV format, mono)
  • 87,000+ diversified prompts annotated with unique intents
  • Metadata: Speaker profiles, environment details, and transcription accuracy

Shaip’s ability to capture the diversity of India through unique prompts and authentic audio recordings has been a game-changer for our voice-based UPI payment system. Their team ensured that every aspect of the project – from prompt creation to recording quality – was handled with precision, helping us build a more inclusive, robust voice recognition model.

Golden-5-star