Text Formatting Best Practices
This comprehensive guide outlines the recommended practices for formatting text to ensure optimal processing, accuracy, and consistent output across different use cases. Following these guidelines will help improve the quality of generated audio and reduce potential errors.
Text Elements
Language and Script Guidelines
Mixed Language Formatting
When working with mixed language content, particularly English and Hindi, proper script selection is crucial for accurate processing:
- English text must be written in Latin script
- Hindi text must be written in Devanagari script
- Avoid transliteration of Hindi words into Latin script
Examples:
Proper Nouns Handling
For Indian proper nouns, maintain cultural and linguistic accuracy by following these rules:
-
City Names:
- Use Devanagari script for Indian city names
- Maintain Latin script for non-Indian city names
-
Personal Names:
- Use Devanagari script for Indian personal names
- Maintain original script for non-Indian names
Examples:
Text Chunking
Character Limit Guidelines
To optimize real-time processing and reduce latency, implement these chunking practices:
-
Size Constraints:
- Maximum chunk size: 250 characters
- Break at natural punctuation points
- Maintain sentence coherence when possible
-
Breaking Points Priority:
- First priority: Sentence-ending punctuation (., !, ?)
- Second priority: Other punctuation (;, :)
- Third priority: Natural word breaks
Chunking Implementation
Use the following Python code for implementing text chunking:
Handling numbers
Order IDs and Large Numbers
When handling order IDs or large numbers:
- Send them as separate requests
- Split the text around the number
Example:
Phone Numbers
Default Grouping
- Numbers are automatically grouped in 3-4-3 format
- Example: “9876543210” is read as “987-6543-210”
Custom Formatting
For specific reading patterns:
- Format numbers explicitly in text
- Write out the exact pronunciation desired
Example:
Mathematical Expressions
Express mathematical operations in words for clarity. For complex mathematical expressions, break down into simpler components:
Approximate Values
When expressing approximate values:
- Write out the full words
- Avoid using symbols for approximation
- Be explicit about the approximation
Examples:
Units and Measurements
When expressing measurements, write out the units in full words to ensure clear understanding:
Symbols and Special Characters
Basic Symbols
Spell out special characters and symbols in all contexts:
Digital Content Formatting
1. URLs:
2. Email Addresses:
3. Social Media:
Range and Interval Notation
Always write out ranges and relationships explicitly to avoid ambiguity:
Note:
- Consistency is key - use the same format throughout your content
- When in doubt, write out the full words
- For complex URLs or handles, break them into smaller, manageable chunks
- Avoid using symbols that could have multiple interpretations