Performance comparison of streaming transcription models across accuracy, latency, cost, and multilingual capabilities
Model Name | Provider | Type |
---|---|---|
smallestai_streaming | Smallest AI | WebSocket Streaming |
gpt4o_mini_streaming | OpenAI | WebSocket Streaming |
gpt4o_streaming | OpenAI | WebSocket Streaming |
assemblyai_streaming | Assembly AI | WebSocket Streaming |
deepgram_nova3_streaming | Deepgram | WebSocket Streaming |
Rank | Model | English WER | Hindi WER | Code-Switched WER | Disfluency Terms | Noisy WER | Overall WER |
---|---|---|---|---|---|---|---|
1 | smallestai_streaming | 2.10% | 22.74% | 12.33% | 9.99% | 15.52% | 12.53% |
2 | deepgram_nova3_streaming | 2.05% | 23.10% | 10.90% | 10.20% | 15.90% | 12.66% |
3 | gpt4o_streaming | 10.19% | 9.93% | 29.58% | 12.00% | 22.06% | 16.75% |
4 | gpt4o_mini_streaming | 11.11% | 12.28% | 36.97% | 15.19% | 20.47% | 19.20% |
5 | assemblyai_streaming | 3.94% | - | - | 14.01% | 14.56% | 10.83% |