Speech-to-text Models
There are 10 AI and NLP models for Speech-to-text in our directory. Browse the full list below, or explore models by provider.
Speech-to-text is a machine-learning task covered in our directory. We list 10 models for it.
Updated June 2026
- GPT-4o (Mar 2025)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- GPT-4o (Jan 2025)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- GPT-4o (Nov 2024)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- GPT-4o (Aug 2024)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- GPT-4oChat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- SauTechSpeech recognition (ASR),Speech-to-textSaudi Data and Artificial Intelligence Authority,Saudi Company for Artificial Intelligence
- Qwen3-Omni-30B-A3BLanguage modeling/generation,Question answering,Visual question answering,Image captioning,Video description,Speech recognition (ASR),Speech synthesis,Speech-to-text,Text-to-speech (TTS)Alibaba
- Gemini 2.5 Deep ThinkLanguage modeling/generation,Mathematical reasoning,Code generation,Visual question answering,Question answering,Visual puzzles,Video description,Speech recognition (ASR),Speech-to-textGoogle,Google DeepMind
- Reka CoreChat,Language modeling/generation,Image captioning,Code generation,Code autocompletion,Question answering,Visual question answering,Video description,Speech recognition (ASR),Speech-to-text,Quantitative reasoningReka AI
- SeamlessM4TTranslation,Speech synthesis,Speech recognition (ASR),Speech-to-text,Speech-to-speechFacebook,INRIA,University of California (UC) Berkeley