Skip to main content

Overview

The transcription endpoint converts audio files to text using Lightning ASR. Supports both batch processing and streaming.

Endpoint

POST /v1/listen

Authentication

Requires Bearer token authentication with your license key.
Authorization: Token YOUR_LICENSE_KEY
See Authentication for details.

Request

From URL

Transcribe audio from a publicly accessible URL:
{
  "url": "https://example.com/audio.wav"
}

From File Upload

Upload audio directly:
curl -X POST http://localhost:7100/v1/listen \
  -H "Authorization: Token ${LICENSE_KEY}" \
  -F "audio=@/path/to/audio.wav"

Parameters

url
string
URL to audio file (mutually exclusive with file upload)Supported protocols: http://, https://, s3://
audio
file
Audio file upload (mutually exclusive with URL)Supported formats: WAV, MP3, FLAC, OGG, M4A
language
string
default:"en"
Language code (ISO 639-1)Examples: en, es, fr, de, zh
punctuate
boolean
default:"true"
Add punctuation to transcript
diarize
boolean
default:"false"
Enable speaker diarization (identify different speakers)
num_speakers
integer
Expected number of speakers (for diarization)If not specified, automatically detected
timestamps
boolean
default:"false"
Include word-level timestamps
callback_url
string
Webhook URL for async results deliveryIf provided, returns immediately with job ID

Response

Successful Response

{
  "request_id": "req_abc123",
  "text": "Hello, this is a sample transcription.",
  "confidence": 0.95,
  "duration": 3.2,
  "language": "en",
  "words": [
    {
      "word": "Hello",
      "start": 0.0,
      "end": 0.5,
      "confidence": 0.98
    },
    {
      "word": "this",
      "start": 0.6,
      "end": 0.8,
      "confidence": 0.97
    }
  ]
}

Response Fields

request_id
string
Unique identifier for this transcription request
text
string
Complete transcription text
confidence
float
Overall confidence score (0.0 to 1.0)
duration
float
Audio duration in seconds
language
string
Detected or specified language
words
array
Word-level details (if timestamps: true)Each word object contains:
  • word: The word text
  • start: Start time in seconds
  • end: End time in seconds
  • confidence: Word confidence score

Examples

Basic Transcription

curl -X POST http://localhost:7100/v1/listen \
  -H "Authorization: Token ${LICENSE_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/audio.wav"
  }'

With Punctuation and Timestamps

{
  "url": "https://example.com/audio.wav",
  "punctuate": true,
  "timestamps": true
}
Response:
{
  "request_id": "req_abc123",
  "text": "Hello, this is a sample transcription.",
  "confidence": 0.95,
  "duration": 3.2,
  "words": [
    {"word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98},
    {"word": ",", "start": 0.5, "end": 0.5, "confidence": 1.0},
    {"word": "this", "start": 0.6, "end": 0.8, "confidence": 0.97}
  ]
}

With Speaker Diarization

{
  "url": "https://example.com/conversation.wav",
  "diarize": true,
  "num_speakers": 2
}
Response:
{
  "request_id": "req_abc123",
  "text": "Hello. Hi there!",
  "speakers": [
    {
      "speaker": "SPEAKER_00",
      "text": "Hello.",
      "start": 0.0,
      "end": 0.8
    },
    {
      "speaker": "SPEAKER_01",
      "text": "Hi there!",
      "start": 1.0,
      "end": 1.8
    }
  ]
}

File Upload

curl -X POST http://localhost:7100/v1/listen \
  -H "Authorization: Token ${LICENSE_KEY}" \
  -F "audio=@recording.wav" \
  -F "punctuate=true" \
  -F "language=en"

Async with Callback

{
  "url": "https://example.com/long-audio.wav",
  "callback_url": "https://myapp.com/webhook/transcription"
}
Immediate response:
{
  "job_id": "job_xyz789",
  "status": "processing"
}
Later, webhook receives:
{
  "job_id": "job_xyz789",
  "status": "completed",
  "result": {
    "text": "...",
    "confidence": 0.95
  }
}

Error Responses

400 Bad Request

{
  "error": "Missing required parameter: url or audio file",
  "code": "MISSING_PARAMETER"
}

415 Unsupported Media Type

{
  "error": "Unsupported audio format",
  "code": "UNSUPPORTED_FORMAT",
  "supported_formats": ["wav", "mp3", "flac", "ogg", "m4a"]
}

422 Unprocessable Entity

{
  "error": "Audio file too large",
  "code": "FILE_TOO_LARGE",
  "max_size_mb": 100
}

503 Service Unavailable

{
  "error": "No ASR workers available",
  "code": "SERVICE_UNAVAILABLE",
  "retry_after": 30
}

Audio Format Requirements

Supported Formats

FormatExtensionNotes
WAV.wavRecommended for best quality
MP3.mp3Widely supported
FLAC.flacLossless compression
OGG.oggOpen format
M4A.m4aApple format
  • Sample Rate: 16 kHz or higher (44.1 kHz recommended)
  • Bit Depth: 16-bit or higher
  • Channels: Mono or stereo
  • Max Duration: 2 hours
  • Max File Size: 100 MB

Audio Preprocessing

For best results:
  • Remove background noise
  • Normalize audio levels
  • Use mono audio when possible
  • Encode at 16 kHz or 44.1 kHz

Rate Limits

Default rate limits:
  • Requests per minute: 60
  • Concurrent requests: 10
  • Audio hours per day: 100
Contact support@smallest.ai to increase limits for your license.

Performance

Typical performance metrics:
MetricValue
Real-time Factor0.05-0.15x
Latency (1 min audio)3-9 seconds
Concurrent capacity100+ requests
Throughput100+ hours/hour
Performance varies based on:
  • Audio duration and complexity
  • Number of speakers
  • GPU instance type
  • Current load

Best Practices

  • Use lossless formats (WAV, FLAC) when possible
  • Ensure clear audio with minimal background noise
  • Use appropriate sample rate (16 kHz minimum)
Implement retry logic with exponential backoff:
import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

session = requests.Session()
retry = Retry(
    total=3,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
For audio longer than 5 minutes, use callback URL:
{
  "url": "https://example.com/podcast.mp3",
  "callback_url": "https://myapp.com/webhook"
}
Cache transcription results to avoid duplicate processing:
import hashlib

def get_cache_key(audio_url):
    return hashlib.md5(audio_url.encode()).hexdigest()

cache_key = get_cache_key(audio_url)
if cache_key in cache:
    return cache[cache_key]

result = transcribe(audio_url)
cache[cache_key] = result
return result

What’s Next?