Google Cloud STT
Google Cloud Speech-to-Text is the enterprise choice for teams deeply embedded in GCP, needing the broadest language coverage (125+ locales with Chirp 3), or building multimodal AI workflows where transcription feeds directly into Gemini-powered systems. Chirp 3 HD (2026) significantly closed the quality gap with top-tier APIs. Four model tiers: Standard/WaveNet ($0.024/min), Chirp 2 ($0.02/min batch), Chirp 3 HD ($0.03/min, flagship quality), custom Speech API ($0.02/min with domain adaptation). Speaker diarization available. Custom model training with domain-specific vocabulary. Free: 60 minutes/month on standard models. Note: requires GCP infrastructure (Cloud Storage, Pub/Sub) adding engineering overhead compared to API-first providers.
Free: 60 min/mo (standard model). Standard/WaveNet: $0.024/min. Chirp 2: $0.02/min batch. Chirp 3 HD: $0.03/min. Custom speech: $0.02/min. Video model: higher. Volume discounts via GCP committed use.
Related platforms
Amazon Transcribe
Amazon Web Services
AWS's managed STT — deepest AWS ecosystem integration, HIPAA-eligible, call analytics, and medical model.
AssemblyAI
AssemblyAI
Speech AI platform for transcription and audio intelligence.
Deepgram
Deepgram
Enterprise speech-to-text and voice AI platform.
Gladia
Gladia
#1 async STT accuracy in 2026 — Solaria-1 with 29% lower WER, 100+ languages, EU data residency.