Skip to main content
Lightning v3.1 is a high-fidelity, low-latency text-to-speech model delivering natural, expressive, and realistic speech at 44 kHz. Optimized for real-time applications with ultra-low latency and voice cloning support, it delivers broadcast-quality audio with genuinely conversational characteristics.

44.1 kHz

Native sample rate

200ms

Latency at 20 concurrent requests

4 Languages

English, Hindi, Spanish, Tamil

3.3x

Real-time factor (faster than playback)

Model Overview

Developed bySmallest AI
Model typeText-to-Speech / Speech Synthesis
LanguagesEnglish, Hindi, Spanish, Tamil
LicenseProprietary
Versionv3.1
Native sample rate44,100 Hz

Key Capabilities

Real-Time Optimized

Ultra-low latency architecture designed for conversational AI and live streaming.

Voice Cloning

Instant voice cloning with just 5-15 seconds of audio. Professional cloning available on demand.

Streaming

HTTP, SSE, and WebSocket support for real-time applications.

Performance & Benchmarks

In blind listening tests against OpenAI GPT-4o-mini-TTS, Lightning v3.1 was preferred by listeners 76.2% of the time — a 3.4x preference ratio.
Evaluation: Seed TTS dataset, 1,088 samples across English, Hindi, Spanish, and Tamil. LLM-as-a-Judge framework with ASR-based intelligibility testing.
CategoryMetricScoreNotes
Audio QualityWVMOS5.06Broadcast-quality audio
Naturalness4.33Predominantly human-like
Overall Quality4.42Premium-tier experience
Native Sample Rate44.1 kHzHighest fidelity among Lightning models
IntelligibilityWord Error Rate (WER)6.3%93.7% word accuracy
Character Error Rate (CER)1.6%Excellent character-level accuracy
Latency & SpeedLatency200msAt 20 concurrent requests
Real-Time Factor (RTF)0.33.3x faster than playback
Speed Control0.5x - 2.0xAdjustable playback speed
Max Chunk Size250 charsOptimal: 140 characters per request
ProsodyPronunciation4.70 / 5.0Near-perfect articulation
Intonation4.71 / 5.0Highly expressive pitch variation
Prosody4.47 / 5.0Natural conversational rhythm

Supported Languages

Automatic Language Detection & Language Switching: Set language to "auto" (default) and Lightning v3.1 will automatically detect the language from input text. The model also supports language switching within a single session — no need to restart or reconnect when switching between supported languages.
LanguageCodeStatus
EnglishenAvailable
HindihiAvailable
SpanishesAvailable
TamiltaAvailable
ItalianitComing soon
FrenchfrComing soon
PortugueseptComing soon
SwedishsvComing soon
DutchnlComing soon
GermandeComing soon
TeluguteComing soon
MalayalammlComing soon
KannadaknComing soon
MarathimrComing soon
GujaratiguComing soon

Voice Catalog

English Voices

Voice IDNameGenderAccentLanguages
magnusMagnusMaleAmericanEnglish
oliviaOliviaFemaleAmericanEnglish
danielDanielMaleAmericanEnglish
rachelRachelFemaleAmericanEnglish
nicoleNicoleFemaleAmericanEnglish
elizabethElizabethFemaleAmericanEnglish
kyleKyleMaleAmericanEnglish

Hindi Voices

Voice IDNameGenderAccentLanguages
aarushAarushMaleIndianEnglish, Hindi
sakshiSakshiFemaleIndianEnglish, Hindi
parthParthMaleIndianEnglish, Hindi
sanaSanaFemaleIndianEnglish, Hindi
vivaanVivaanMaleIndianEnglish, Hindi

Voice Cloning

Instant Voice Cloning

Audio required: 5-15 secondsSelf-serve voice cloning available via API and console. Captures core voice characteristics for quick replication.

Professional Voice Cloning

Audio required: 45+ minutes (high-quality)Near-perfect voice match capturing intonation, accent, emotions, and vocal nuances. Available on demand — contact support@smallest.ai to get started.

API Reference

Endpoints

EndpointMethodUse Case
/waves/v1/lightning-v3.1/get_speechPOSTSynchronous synthesis
/waves/v1/lightning-v3.1/streamPOST (SSE)Server-sent events streaming
/waves/v1/lightning-v3.1/get_speech/streamWebSocketReal-time streaming

Request Parameters

ParameterTypeRequiredDefaultDescription
textstringYesText to synthesize
voice_idstringYesVoice identifier
sample_rateintegerNo44100Output sample rate (Hz)
speedfloatNo1.0Speech speed (0.5-2.0)
languagestringNo"auto"Language code (en, hi, es, ta)
output_formatstringNo"pcm"Audio format
pronunciation_dictsarrayNoCustom pronunciation IDs (WebSocket only)

Quickstart

Get started in minutes with synchronous or streaming synthesis.

Technical Specifications

Audio Output

SpecificationDetails
Native sample rate44,100 Hz
Supported sample rates8,000 / 16,000 / 24,000 / 44,100 Hz
Output formatsPCM, MP3, WAV, mulaw
Audio channelsMono

Text Formatting Guidelines

AspectRecommendation
Language scriptsEnglish and Spanish in Latin script, Hindi in Devanagari
Break pointsNatural punctuation (. ! ? ,)
Mixed languageAvoid transliteration — use native script for each language

Number & Date Handling

TypeFormat
Phone numbersDefault 3-4-3 grouping
DatesDD/MM/YYYY or DD-MM-YYYY
TimeHH:MM or HH:MM:SS
Hardware
  • Recommended GPU: NVIDIA L40S
  • Recommended VRAM: 48 GB
Software
  • Server regions (AWS): India (Hyderabad), USA (Oregon)
  • Automatic geo-location based routing for lowest latency

Use Cases

Direct Use

  • Voice assistants and conversational AI
  • Interactive chatbots with voice output
  • Real-time narration and live streaming
  • Accessibility tools and screen readers
  • Gaming (dynamic character voices)
  • Customer service automation

Downstream Use

  • Multi-turn conversational agents
  • Audio content generation pipelines
  • Telephony and IVR systems
  • Podcast and audiobook generation

Limitations & Safety

Known Limitations

  • Mixed-language text (transliteration) may produce suboptimal results. Hindi text should be in Devanagari script (e.g., “नमस्ते”), not Latin (e.g., “Namaste”). English text should be in Latin script, not Devanagari.
Recommendations: Use proper script for each language. Break long text at natural punctuation points. Use pronunciation dictionaries for specialized vocabulary. Test voice selection for your specific use case.
Lightning v3.1 must not be used for impersonation or fraud, generating deceptive audio content (deepfakes), creating content that violates consent or privacy, harassment or abuse, or any illegal or unethical purposes.

Safety & Compliance

  • Voice cloning requires explicit consent
  • No retention of synthesized audio
  • No storage of personal voice data beyond cloning scope
  • Usage monitoring for policy compliance
For compliance documentation (GDPR, SOC2, HIPAA), contact support@smallest.ai.
ChannelDetails
Supportsupport@smallest.ai
Documentationwaves-docs.smallest.ai
Consoleapp.smallest.ai
CommunityDiscord