Skip to content

Audio

Audio AI splits into three sub-markets: TTS / voice agents (ElevenLabs leads), music generation (Suno, Udio, Lyria), and transcription APIs (Whisper, AssemblyAI, Deepgram). All are usage-based and Flowstate has no coverage today.

What's tracked

ToolVendorPricing modelCoverage todayNotes
ElevenLabsElevenLabsPer-seat + heavily usage-based (TTS, voice cloning, voice agents)InvisibleEasy to overspend on voice agents. Track via finance contract or AI Agent.
SunoSunoPer-seat subscription with credit capsInvisibleMusic generation. Manual AI Agent entry.
UdioUdioPer-seat subscription with credit capsInvisibleMusic generation.
Lyria 3GooglePer-token via VertexInvisibleCould surface via Gemini connector if bundled into Vertex billing.
WhisperOpenAIPer-token via OpenAI APIInvisible (rolls into OpenAI API spend)Spend lands on the OpenAI bill — see Foundation APIs.
AssemblyAIAssemblyAIPer-minute APIInvisiblePer-minute pricing means usage spikes. Manual AI Agent entry.
DeepgramDeepgramPer-minute APIInvisibleSame as AssemblyAI — track as contract SaaS or AI Agent.

What Flowstate misses today

All of it. The economic risk in audio is voice agents — ElevenLabs, in particular, has burned holes in budgets when teams ship 24/7 voice surfaces. If you're running anything in production, model it as an AI Agent with an aggressive monthly cost based on observed minutes-per-day, and revisit the number monthly.

Whisper spend is already on your OpenAI bill, so the right place to look is your foundation API line item rather than a separate Whisper entry.

Flowstate Documentation