What models are used for Speech-to-text?

This page lists 10 AI and NLP models for the Speech-to-text task, from a range of providers. Each links to its details and source.

How many Speech-to-text models are there?

We catalog 10 Speech-to-text models in one searchable directory.

Which provider has the most Speech-to-text models?

Browse the list below to compare providers. You can also explore models by provider to see a single organization's full catalog.

Speech-to-text Models

There are 10 AI and NLP models for Speech-to-text in our directory. Browse the full list below, or explore models by provider.

Speech-to-text is a machine-learning task covered in our directory. We list 10 models for it.

Updated July 2026

GPT-4o (Mar 2025)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
GPT-4o (Jan 2025)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
GPT-4o (Nov 2024)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
GPT-4o (Aug 2024)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
GPT-4oChat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
SauTechSpeech recognition (ASR),Speech-to-textSaudi Data and Artificial Intelligence Authority,Saudi Company for Artificial Intelligence
Qwen3-Omni-30B-A3BLanguage modeling/generation,Question answering,Visual question answering,Image captioning,Video description,Speech recognition (ASR),Speech synthesis,Speech-to-text,Text-to-speech (TTS)Alibaba
Gemini 2.5 Deep ThinkLanguage modeling/generation,Mathematical reasoning,Code generation,Visual question answering,Question answering,Visual puzzles,Video description,Speech recognition (ASR),Speech-to-textGoogle,Google DeepMind
Reka CoreChat,Language modeling/generation,Image captioning,Code generation,Code autocompletion,Question answering,Visual question answering,Video description,Speech recognition (ASR),Speech-to-text,Quantitative reasoningReka AI
SeamlessM4TTranslation,Speech synthesis,Speech recognition (ASR),Speech-to-text,Speech-to-speechFacebook,INRIA,University of California (UC) Berkeley

Browse models by provider

openai (60)Qwen (51)google (42)Google DeepMind (40)NVIDIA (33)Alibaba (32)Meta AI (24)microsoft (23)Stanford University (19)meta-llama (16)Anthropic (16)Tsinghua University (16)deepseek-ai (15)DeepMind (14)ByteDance (13)facebook (13)

Explore other model tasks

Language modeling/generation(222)Question answering(154)text-generation(143)Code generation(95)image-text-to-text(80)Chat(80)Visual question answering(64)Quantitative reasoning(60)translation(55)text-to-image(53)Language modeling(42)Image captioning(36)

Browse models by provider

Explore other model tasks

Frequently asked questions

What models are used for Speech-to-text?

How many Speech-to-text models are there?

Which provider has the most Speech-to-text models?