Speech & Voice AI
Discover the early-stage Speech & Voice AI ecosystem: investors, accelerators, incubators, fellowships, grants, and global hubs powering next-gen Speech & Voice AI startups.
Discover the early-stage Speech & Voice AI ecosystem: investors, accelerators, incubators, fellowships, grants, and global hubs powering next-gen Speech & Voice AI startups.
Scouts
Share promising startups in this sector and get rewarded if they raise. No prior track record needed.
Investors
Access qualified startups curated by Superscout across pre-seed to seed.
Supporters
Work at a company, lab, or city? Connect with builders in your space.
Speech and voice AI encompasses the technologies that enable machines to understand, generate, and interact through spoken language, including automatic speech recognition (ASR), text-to-speech (TTS), voice cloning, speaker identification, and the conversational voice interfaces that power virtual assistants, call centers, and accessibility tools. The sector has been transformed by deep learning: modern ASR systems achieve word error rates below 5% (approaching human parity), and neural TTS produces speech indistinguishable from human recordings. ElevenLabs' rapid growth demonstrates the commercial demand for high-quality voice synthesis, while companies like Deepgram, AssemblyAI, and Whisper (OpenAI) compete for the speech-to-text infrastructure layer. Voice AI for call centers represents the largest enterprise market, with companies like PolyAI, Replicant, and Bland AI automating phone conversations that previously required human agents.