Appearance
Audio
Audio AI splits into three sub-markets: TTS / voice agents (ElevenLabs leads), music generation (Suno, Udio, Lyria), and transcription APIs (Whisper, AssemblyAI, Deepgram). All are usage-based and Flowstate has no coverage today.
What's tracked
| Tool | Vendor | Pricing model | Coverage today | Notes |
|---|---|---|---|---|
| ElevenLabs | ElevenLabs | Per-seat + heavily usage-based (TTS, voice cloning, voice agents) | Invisible | Easy to overspend on voice agents. Track via finance contract or AI Agent. |
| Suno | Suno | Per-seat subscription with credit caps | Invisible | Music generation. Manual AI Agent entry. |
| Udio | Udio | Per-seat subscription with credit caps | Invisible | Music generation. |
| Lyria 3 | Per-token via Vertex | Invisible | Could surface via Gemini connector if bundled into Vertex billing. | |
| Whisper | OpenAI | Per-token via OpenAI API | Invisible (rolls into OpenAI API spend) | Spend lands on the OpenAI bill — see Foundation APIs. |
| AssemblyAI | AssemblyAI | Per-minute API | Invisible | Per-minute pricing means usage spikes. Manual AI Agent entry. |
| Deepgram | Deepgram | Per-minute API | Invisible | Same as AssemblyAI — track as contract SaaS or AI Agent. |
What Flowstate misses today
All of it. The economic risk in audio is voice agents — ElevenLabs, in particular, has burned holes in budgets when teams ship 24/7 voice surfaces. If you're running anything in production, model it as an AI Agent with an aggressive monthly cost based on observed minutes-per-day, and revisit the number monthly.
Whisper spend is already on your OpenAI bill, so the right place to look is your foundation API line item rather than a separate Whisper entry.