Transcription

Overview

The transcription endpoint converts audio files to text using Lightning ASR. Supports both batch processing and streaming.

Endpoint

POST /v1/listen

Authentication

Requires Bearer token authentication with your license key.

Authorization: Token YOUR_LICENSE_KEY

See Authentication for details.

Request

From URL

Transcribe audio from a publicly accessible URL:

{
  "url": "https://example.com/audio.wav"
}

From File Upload

Upload audio directly:

curl -X POST http://localhost:7100/v1/listen \
  -H "Authorization: Token ${LICENSE_KEY}" \
  -F "audio=@/path/to/audio.wav"

Parameters

url

string

URL to audio file (mutually exclusive with file upload)Supported protocols: http://, https://, s3://

audio

file

Audio file upload (mutually exclusive with URL)Supported formats: WAV, MP3, FLAC, OGG, M4A

language

string

default:"en"

Language code (ISO 639-1)Examples: en, es, fr, de, zh

punctuate

boolean

default:"true"

Add punctuation to transcript

diarize

boolean

default:"false"

Enable speaker diarization (identify different speakers)

num_speakers

integer

Expected number of speakers (for diarization)If not specified, automatically detected

timestamps

boolean

default:"false"

Include word-level timestamps

callback_url

string

Webhook URL for async results deliveryIf provided, returns immediately with job ID

Response

Successful Response

{
  "request_id": "req_abc123",
  "text": "Hello, this is a sample transcription.",
  "confidence": 0.95,
  "duration": 3.2,
  "language": "en",
  "words": [
    {
      "word": "Hello",
      "start": 0.0,
      "end": 0.5,
      "confidence": 0.98
    },
    {
      "word": "this",
      "start": 0.6,
      "end": 0.8,
      "confidence": 0.97
    }
  ]
}

Response Fields

request_id

string

Unique identifier for this transcription request

text

string

Complete transcription text

confidence

float

Overall confidence score (0.0 to 1.0)

duration

float

Audio duration in seconds

language

string

Detected or specified language

words

array

Word-level details (if timestamps: true)Each word object contains:

word: The word text
start: Start time in seconds
end: End time in seconds
confidence: Word confidence score

Examples

Basic Transcription

cURL
Python
JavaScript

curl -X POST http://localhost:7100/v1/listen \
  -H "Authorization: Token ${LICENSE_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/audio.wav"
  }'

import requests

response = requests.post(
    "http://localhost:7100/v1/listen",
    headers={
        "Authorization": f"Token {LICENSE_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "url": "https://example.com/audio.wav"
    }
)

result = response.json()
print(result['text'])

const response = await fetch('http://localhost:7100/v1/listen', {
  method: 'POST',
  headers: {
    'Authorization': `Token ${LICENSE_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com/audio.wav'
  })
});

const result = await response.json();
console.log(result.text);

With Punctuation and Timestamps

{
  "url": "https://example.com/audio.wav",
  "punctuate": true,
  "timestamps": true
}

Response:

{
  "request_id": "req_abc123",
  "text": "Hello, this is a sample transcription.",
  "confidence": 0.95,
  "duration": 3.2,
  "words": [
    {"word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98},
    {"word": ",", "start": 0.5, "end": 0.5, "confidence": 1.0},
    {"word": "this", "start": 0.6, "end": 0.8, "confidence": 0.97}
  ]
}

With Speaker Diarization

{
  "url": "https://example.com/conversation.wav",
  "diarize": true,
  "num_speakers": 2
}

Response:

{
  "request_id": "req_abc123",
  "text": "Hello. Hi there!",
  "speakers": [
    {
      "speaker": "SPEAKER_00",
      "text": "Hello.",
      "start": 0.0,
      "end": 0.8
    },
    {
      "speaker": "SPEAKER_01",
      "text": "Hi there!",
      "start": 1.0,
      "end": 1.8
    }
  ]
}

File Upload

curl -X POST http://localhost:7100/v1/listen \
  -H "Authorization: Token ${LICENSE_KEY}" \
  -F "audio=@recording.wav" \
  -F "punctuate=true" \
  -F "language=en"

Async with Callback

{
  "url": "https://example.com/long-audio.wav",
  "callback_url": "https://myapp.com/webhook/transcription"
}

Immediate response:

{
  "job_id": "job_xyz789",
  "status": "processing"
}

Later, webhook receives:

{
  "job_id": "job_xyz789",
  "status": "completed",
  "result": {
    "text": "...",
    "confidence": 0.95
  }
}

Error Responses

400 Bad Request

{
  "error": "Missing required parameter: url or audio file",
  "code": "MISSING_PARAMETER"
}

415 Unsupported Media Type

{
  "error": "Unsupported audio format",
  "code": "UNSUPPORTED_FORMAT",
  "supported_formats": ["wav", "mp3", "flac", "ogg", "m4a"]
}

422 Unprocessable Entity

{
  "error": "Audio file too large",
  "code": "FILE_TOO_LARGE",
  "max_size_mb": 100
}

503 Service Unavailable

{
  "error": "No ASR workers available",
  "code": "SERVICE_UNAVAILABLE",
  "retry_after": 30
}

Audio Format Requirements

Supported Formats

Format	Extension	Notes
WAV	`.wav`	Recommended for best quality
MP3	`.mp3`	Widely supported
FLAC	`.flac`	Lossless compression
OGG	`.ogg`	Open format
M4A	`.m4a`	Apple format

Recommended Specifications

Sample Rate: 16 kHz or higher (44.1 kHz recommended)
Bit Depth: 16-bit or higher
Channels: Mono or stereo
Max Duration: 2 hours
Max File Size: 100 MB

Audio Preprocessing

For best results:

Remove background noise
Normalize audio levels
Use mono audio when possible
Encode at 16 kHz or 44.1 kHz

Rate Limits

Default rate limits:

Requests per minute: 60
Concurrent requests: 10
Audio hours per day: 100

Contact support@smallest.ai to increase limits for your license.

Performance

Typical performance metrics:

Metric	Value
Real-time Factor	0.05-0.15x
Latency (1 min audio)	3-9 seconds
Concurrent capacity	100+ requests
Throughput	100+ hours/hour

Performance varies based on:

Audio duration and complexity
Number of speakers
GPU instance type
Current load

Best Practices

Optimize Audio Quality

Use lossless formats (WAV, FLAC) when possible
Ensure clear audio with minimal background noise
Use appropriate sample rate (16 kHz minimum)

Handle Errors Gracefully

Implement retry logic with exponential backoff:

import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

session = requests.Session()
retry = Retry(
    total=3,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)

Use Async for Long Audio

For audio longer than 5 minutes, use callback URL:

{
  "url": "https://example.com/podcast.mp3",
  "callback_url": "https://myapp.com/webhook"
}

Cache Results

Cache transcription results to avoid duplicate processing:

import hashlib

def get_cache_key(audio_url):
    return hashlib.md5(audio_url.encode()).hexdigest()

cache_key = get_cache_key(audio_url)
if cache_key in cache:
    return cache[cache_key]

result = transcribe(audio_url)
cache[cache_key] = result
return result

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

Transcription

Overview

Endpoint

Authentication

Request

From URL

From File Upload

Parameters

Response

Successful Response

Response Fields

Examples

Basic Transcription

With Punctuation and Timestamps

With Speaker Diarization

File Upload

Async with Callback

Error Responses

400 Bad Request

415 Unsupported Media Type

422 Unprocessable Entity

503 Service Unavailable

Audio Format Requirements

Supported Formats

Recommended Specifications

Audio Preprocessing

Rate Limits

Performance

Best Practices

What’s Next?

Health Check

Examples

Getting Started

Docker Setup

Kubernetes Setup

Troubleshooting

​Overview

​Endpoint

​Authentication

​Request

​From URL

​From File Upload

​Parameters

​Response

​Successful Response

​Response Fields

​Examples

​Basic Transcription

​With Punctuation and Timestamps

​With Speaker Diarization

​File Upload

​Async with Callback

​Error Responses

​400 Bad Request

​415 Unsupported Media Type

​422 Unprocessable Entity

​503 Service Unavailable

​Audio Format Requirements

​Supported Formats

​Recommended Specifications

​Audio Preprocessing

​Rate Limits

​Performance

​Best Practices

​What’s Next?

Health Check

Examples

Overview

Endpoint

Authentication

Request

From URL

From File Upload

Parameters

Response

Successful Response

Response Fields

Examples

Basic Transcription

With Punctuation and Timestamps

With Speaker Diarization

File Upload

Async with Callback

Error Responses

400 Bad Request

415 Unsupported Media Type

422 Unprocessable Entity

503 Service Unavailable

Audio Format Requirements

Supported Formats

Recommended Specifications

Audio Preprocessing

Rate Limits

Performance

Best Practices

What’s Next?