Varosity
All guides

Voice over your video

Add narration with ElevenLabs, OpenAI TTS, or Deepgram.

Voice over your video

Three providers, one workflow. Pick the right one:

ProviderStrengthPricing tier
ElevenLabsHighest quality + voice cloning$$
OpenAI TTS-1Fast, reliable, six built-in voices$
Deepgram Aura 2Lowest latency, cheapest$
Cartesia Sonic 2Ultra-low latency, large voice library$

Agents: discover valid voice ids first

Call list_voices before generate_voice. It returns every built-in voice with its voiceModel + exact voiceId (e.g. openai-tts-1nova, deepgram-aura-2aura-2-thalia-en). Voice ids are provider-specific — passing the wrong scheme (e.g. aura to Deepgram) safely falls back to that provider's default voice, but using the exact id gives you the voice you want.

Pull your voice library

1. Add the provider's key in Settings → Keys. 2. Visit /voices. Hit Sync on the provider's row. Your library lands in the grid. 3. Click ▶ on any voice to preview.

Generate audio for a shot

In the shot inspector, expand Avatar. The audio row has two tabs:

  • Upload — drop an mp3/wav you already have.
  • From script — paste text, pick a voice, hit Generate. The selected
  • provider synthesizes (via your key on BYOK, or Varosity Credits), audio
  • uploads to Storage, attached to the shot in one click.

Voice cloning (ElevenLabs only)

/voices → Coming-soon UI for v2. Until then, clone via ElevenLabs's UI and /voices will sync the cloned voice on next refresh.

Tips

  • Keep narration scripts under 30 seconds per shot. Longer reads stretch
  • past video durations and clip awkwardly.
  • For dialogue (lip-sync to the on-screen face), use **Veo 3.1 + Full
  • avatar mode** instead of TTS over a separate clip. Veo synthesizes
  • the lip motion from the audio.
  • TTS audio attached to a shot becomes its primary audio track in the
  • stitch. Background music ducks under it automatically — see the music
  • guide.