Audio

Audio AI splits into three sub-markets: TTS / voice agents (ElevenLabs leads), music generation (Suno, Udio, Lyria), and transcription APIs (Whisper, AssemblyAI, Deepgram). All are usage-based and Flowstate has no coverage today.

What's tracked

Tool	Vendor	Pricing model	Coverage today	Notes
ElevenLabs	ElevenLabs	Per-seat + heavily usage-based (TTS, voice cloning, voice agents)	Invisible	Easy to overspend on voice agents. Track via finance contract or AI Agent.
Suno	Suno	Per-seat subscription with credit caps	Invisible	Music generation. Manual AI Agent entry.
Udio	Udio	Per-seat subscription with credit caps	Invisible	Music generation.
Lyria 3	Google	Per-token via Vertex	Invisible	Could surface via Gemini connector if bundled into Vertex billing.
Whisper	OpenAI	Per-token via OpenAI API	Invisible (rolls into OpenAI API spend)	Spend lands on the OpenAI bill — see Foundation APIs.
AssemblyAI	AssemblyAI	Per-minute API	Invisible	Per-minute pricing means usage spikes. Manual AI Agent entry.
Deepgram	Deepgram	Per-minute API	Invisible	Same as AssemblyAI — track as contract SaaS or AI Agent.

What Flowstate misses today

All of it. The economic risk in audio is voice agents — ElevenLabs, in particular, has burned holes in budgets when teams ship 24/7 voice surfaces. If you're running anything in production, model it as an AI Agent with an aggressive monthly cost based on observed minutes-per-day, and revisit the number monthly.

Whisper spend is already on your OpenAI bill, so the right place to look is your foundation API line item rather than a separate Whisper entry.

Audio ​

What's tracked ​

What Flowstate misses today ​

Audio

What's tracked

What Flowstate misses today