Language and Script Guidelines
Mixed Language Formatting
When working with mixed language content, particularly English and Hindi, proper script selection is crucial for accurate processing:- English text must be written in Latin script
- Hindi text must be written in Devanagari script
- Avoid transliteration of Hindi words into Latin script
Proper Nouns Handling
For Indian proper nouns, maintain cultural and linguistic accuracy by following these rules:-
City Names:
- Use Devanagari script for Indian city names
- Maintain Latin script for non-Indian city names
-
Personal Names:
- Use Devanagari script for Indian personal names
- Maintain original script for non-Indian names
Text Chunking
Character Limit Guidelines
To optimize real-time processing and reduce latency, implement these chunking practices:-
Size Constraints:
- Maximum chunk size: 250 characters
- Break at natural punctuation points
- Maintain sentence coherence when possible
-
Breaking Points Priority:
- First priority: Sentence-ending punctuation (., !, ?)
- Second priority: Other punctuation (;, :)
- Third priority: Natural word breaks
Chunking Implementation
Use the following Python code for implementing text chunking:- For
lightning-large
model, setmax_chunk_size=140
. - For
lightning
model, setmax_chunk_size=250
.
Handling numbers
Order IDs and Large Numbers
When handling order IDs or large numbers:- Send them as separate requests
- Split the text around the number
Phone Numbers
Default Grouping
- Numbers are automatically grouped in 3-4-3 format
- Example: “9876543210” is read as “987-6543-210”
Custom Formatting
For specific reading patterns:- Format numbers explicitly in text
- Write out the exact pronunciation desired
Date and Time Formatting Guidelines
Date Formats
You may use any of the following formats when writing dates:- DD/MM/YYYY →
12/02/2025
→ “twelve, two, twenty twenty-five” - DD-MM-YYYY →
12-02-2025
→ “twelve, two, twenty twenty-five” - DD Month YYYY →
12 February 2025
→ “twelve February twenty twenty five” - Month DD YYYY →
February 12th 2025
→ “February, twelfth, twenty twenty-five” - DD-MM-YY →
12-02-25
→ “twelve, two, twenty-five” - DD/MM/YY →
12/02/25
→ “twelve, two, twenty-five”
Note: Ordinal suffixes (st, nd, rd, th) could be used in dates.
Time Formats
You may use the following formats when specifying time:- HH:MM:SS →
14:30:15
→ “fourteen thirty fifteen” - HH:MM →
14:30
→ “fourteen thirty”
Mathematical Expressions
Express mathematical operations in words for clarity. For complex mathematical expressions, break down into simpler components:Approximate Values
When expressing approximate values:- Write out the full words
- Avoid using symbols for approximation
- Be explicit about the approximation
Units and Measurements
When expressing measurements, write out the units in full words to ensure clear understanding:Symbols and Special Characters
Basic Symbols
Spell out special characters and symbols in all contexts:Digital Content Formatting
1. URLs:Range and Interval Notation
Always write out ranges and relationships explicitly to avoid ambiguity:- Consistency is key - use the same format throughout your content
- When in doubt, write out the full words
- For complex URLs or handles, break them into smaller, manageable chunks
- Avoid using symbols that could have multiple interpretations