Docs · Routing
Picking the right model
A strengths cheat sheet across every video, image, voice, and music model on Varosity. Use it as a static reference, or query the same logic via POST /v1/route.
Let your agent pick automatically
POST /v1/route takes a shot description and returns a ranked recommendation filtered to your configured providers.
44 models
veo-3.1
Veo 3.1
cinematic · fal.ai
- +best lip sync
- +native 4K
- +synced audio
$0.15/s BYOK · $0.158/s Credits
kling-3.0
Kling 3.0 Pro
cinematic · fal.ai
- +multi-shot consistency
- +cinematic motion
- +dialogue close-ups
$0.1/s BYOK · $0.105/s Credits
seedance-4.5
Seedance 4.5
audio-native · fal.ai
- +unified audio-video generation
- +multi-shot from one prompt
- +phoneme-level lip-sync
$0.14/s BYOK · $0.147/s Credits
omnihuman
OmniHuman 1.5
avatar · fal.ai
- +talking avatars
- +from single photo
- +lip-sync from audio
- −needs photo + audio
$0.08/s BYOK · $0.084/s Credits
kling-3.0-replicate
Kling 3.0 Pro (Replicate)
cinematic · Replicate
- +multi-shot consistency
- +cinematic motion
- +cheaper than direct
- −queue latency varies
$0.09/s BYOK · $0.095/s Credits
pika-2.5
Pika 2.5
creator · fal.ai
- +stylized motion
- +character consistency
- +fast
- −less photoreal than Kling
$0.08/s BYOK · $0.084/s Credits
hailuo-02
MiniMax Hailuo 02
cinematic · MiniMax Hailuo
- +physics realism
- +complex motion
- +long prompts
- −international API region latency
$0.11/s BYOK · $0.116/s Credits
elevenlabs-tts
ElevenLabs Multilingual v2
tts · ElevenLabs
- +29 languages
- +emotion control
- +voice cloning
- −streaming latency vs Cartesia
$0.005/s BYOK · $0.005/s Credits
elevenlabs-tts-v3
ElevenLabs v3 (Alpha)
tts · ElevenLabs
- +dialogue style
- +highest expressiveness
- −alpha — quality varies
$0.008/s BYOK · $0.008/s Credits
elevenlabs-sts
ElevenLabs Speech-to-Speech
tts · ElevenLabs
- +voice conversion
- +preserves delivery/performance
$0.002/s BYOK · $0.002/s Credits
elevenlabs-dubbing
ElevenLabs Dubbing
tts · ElevenLabs
- +translate + dub to 30+ languages
- +speaker-aware
$0.0084/s BYOK · $0.009/s Credits
elevenlabs-scribe
ElevenLabs Scribe (Speech-to-Text)
tts · ElevenLabs
- +transcription
- +speaker diarization
- +word-level timestamps
$0.0000611/s BYOK · $0/s Credits
elevenlabs-music
ElevenLabs Music
music · ElevenLabs
- +commercially licensed training data
- +low-latency
- +fast
- −fewer genres than Suno
$0.02/s BYOK · $0.021/s Credits
lyria-2
Google Lyria 2
music · fal.ai
- +instrumental detail
- +long-form composition
- −no vocals
$0.015/s BYOK · $0.016/s Credits
flux-1-schnell
FLUX.1 [schnell]
creator · fal.ai
- +fast (1–3s)
- +good prompt adherence
- +low cost
- −no negative prompts
$0.003/s BYOK · $0.003/s Credits
flux-1.1-pro
FLUX 1.1 Pro
cinematic · Replicate
- +highest-quality Flux
- +strong prompt adherence
- +fine detail
- −~5–10s per image
$0.04/s BYOK · $0.042/s Credits
imagen-4
Imagen 4
cinematic · Google AI Studio
- +photorealism
- +fine detail
- +accurate anatomy
- −slower than Imagen 4 Fast (~10s)
$0.04/s BYOK · $0.042/s Credits
imagen-4-fast
Imagen 4 Fast
cinematic · Google AI Studio
- +fast (~3–5s)
- +photorealism
- +good for drafts + iterations
- −slightly lower detail than Imagen 4
$0.02/s BYOK · $0.021/s Credits
ideogram-v3
Ideogram V3
creator · fal.ai
- +best text-in-image
- +legible typography
- +poster / cover art
- −aspect ratios limited to fixed presets
$0.04/s BYOK · $0.042/s Credits
recraft-v3
Recraft V3
creator · fal.ai
- +long detailed prompts
- +brand-consistent style
- +vector-style output
- −slower than Flux Schnell (~10–20s)
$0.04/s BYOK · $0.042/s Credits
dall-e-3
DALL-E 3
creator · OpenAI
- +strong text-in-image
- +creative compositions
- +photorealism
- −square-leaning aspect ratios
$0.04/s BYOK · $0.042/s Credits
nano-banana
Nano Banana
creator · Replicate
- +very fast
- +good identity preservation
- +cheap
- −less photoreal than Flux Pro
$0.005/s BYOK · $0.005/s Credits
suno-v4
Suno v4
music · muapi.ai
- +best vocal generation
- +genre + mood tags
- +full song structure
- −review licensing before commercial use
$0.025/s · Varosity Credits only
muapi-flux-dev
Flux Dev
creator · muapi.ai
- +12B parameter model
- +strong prompt adherence
- +fast guided distillation
- −slower than Flux Schnell
$0.015/s · Varosity Credits only
muapi-wan-effects
WAN Video Effects
creator · muapi.ai
- +named effect catalog (Cakeify, Squish, VHS, Samurai…)
- +frame consistency
- +platform-funded
- −short clips only (≤10s)
$0.06/s · Varosity Credits only
muapi-latentsync
LatentSync Lip-Sync
avatar · muapi.ai
- +smooth temporal consistency
- +fast inference
- +any video + audio
- −needs pre-existing video
$0.05/s · Varosity Credits only
muapi-wan-t2v
WAN 2.1 Text-to-Video
creator · muapi.ai
- +platform-funded (no BYOK)
- +up to 720p / high quality
- +reliable fallback
- −$0.30/video flat rate
$0.03/s · Varosity Credits only
openai-tts-1
OpenAI TTS-1
tts · OpenAI
- +fast
- +6 voices
- +low latency
- −slightly lower quality than TTS-1 HD
$0.004/s BYOK · $0.004/s Credits
openai-tts-1-hd
OpenAI TTS-1 HD
tts · OpenAI
- +highest OpenAI voice quality
- +6 voices
- +natural prosody
- −2× cost of TTS-1
$0.008/s BYOK · $0.008/s Credits
deepgram-aura-2
Deepgram Aura 2
tts · Deepgram
- +ultra-low latency
- +natural prosody
- +cheap ($0.030/1K chars)
- −English-only in Aura 2
$0.002/s BYOK · $0.002/s Credits
cartesia-sonic-2
Cartesia Sonic 2
tts · Cartesia
- +ultra-low latency (~90ms)
- +natural prosody
- +large public voice library
- −voice ids are library-specific — call list_voices
$0.006/s BYOK · $0.006/s Credits
fish-audio-tts
Fish Audio
tts · Fish Audio
- +multilingual
- +cheap
- +large community voice library
- −voice ids are library-specific — call list_voices
$0.004/s BYOK · $0.004/s Credits
heygen-avatar-4
HeyGen Avatar (Digital Twin)
avatar · HeyGen
- +studio-grade lip sync
- +Avatar IV / Avatar V motion engines
- +voice emotion + speed control
- −requires a pre-trained avatar look id
$0.08/s BYOK · $0.084/s Credits
heygen-photo-avatar
HeyGen Photo Avatar (Avatar IV)
avatar · HeyGen
- +animate ANY photo as the speaker
- +Avatar IV motion engine
- +motion prompt + expressiveness control
- −needs a clear front-facing photo
$0.08/s BYOK · $0.084/s Credits
heygen-cinematic
HeyGen Cinematic Avatar
cinematic · HeyGen
- +prompt-driven cinematic shots
- +blends 1–3 avatar looks into a scene
- +reference videos/images for style
- −4–15s per clip
$0.1/s BYOK · $0.105/s Credits
heygen-video-agent
HeyGen Video Agent
avatar · HeyGen
- +prompt → finished video
- +agent writes script, picks avatar & scenes
- +accepts reference files
- −least granular control
$0.12/s BYOK · $0.126/s Credits
heygen-video-translate
HeyGen Video Translate
avatar · HeyGen
- +multilingual lip-sync dubbing
- +preserves original speaker appearance
- +supports 40+ languages
- −requires source video with clear speech
$0.1/s BYOK · $0.105/s Credits
d-id-talks
D-ID AI Presenter
avatar · D-ID
- +talking avatars from any photo
- +text-to-presenter
- +fast render
- −fixed framing
$0.05/s BYOK · $0.053/s Credits
hunyuan-video
Hunyuan Video
cinematic · fal.ai
- +open-source quality
- +long coherent motion
- +strong physics
- −slow render time (60–120s)
$0.09/s BYOK · $0.095/s Credits
ltx-video
LTX Video
creator · fal.ai
- +fastest open video model (<5s)
- +image-to-video
- +good for iteration
- −lower detail than Kling/Veo
$0.04/s BYOK · $0.042/s Credits
ws-luma-ray-2
Luma Ray 2
cinematic · WaveSpeed
- +fluid motion
- +cinematic quality
- +strong prompt adherence
- −higher cost than Pika
$0.08/s · Varosity Credits only
ws-pika-2.2
Pika 2.2
creator · WaveSpeed
- +fast generation
- +stylized output
- +good character consistency
- −shorter max duration
$0.04/s · Varosity Credits only
ws-hailuo-02
Hailuo 02
cinematic · WaveSpeed
- +physics realism
- +complex motion
- +high resolution
- −fixed 6s duration
$0.08/s · Varosity Credits only
ws-runway-gen4
Runway Gen 4
cinematic · WaveSpeed
- +camera control
- +cinematic motion brush
- +video-to-video
- −requires reference image for best results
$0.01/s · Varosity Credits only
Smart Route API
Skip the cheat sheet. Pass a shot description to /v1/route and get a ranked recommendation filtered to your configured providers.
// Request
POST https://varosity.ai/api/v1/route
Authorization: Bearer vsk_…
Content-Type: application/json
{
"shot_description": "close-up of a barista pouring milk, dialogue, cinematic",
"modality": "video",
"duration_s": 8,
"budget_cents": 200
}// Response
{
"ok": true,
"primary": {
"model_id": "veo-3.1",
"vendor": "fal",
"billing_mode": "byok",
"estimated_cost_cents": 120,
"reasoning": "Dialogue + close-up + cinematic → lip-sync wins",
"confidence": 0.95
},
"fallbacks": [
{ "model_id": "seedance-4.5", "billing_mode": "byok",
"reasoning": "Audio-native, similar quality" }
],
"filtered_out": [
{ "model_id": "sora-2", "reason": "no_byok_key" }
]
}