Docs · Routing

Picking the right model

A strengths cheat sheet across every video, image, voice, and music model on Varosity. Use it as a static reference, or query the same logic via POST /v1/route.

70 models available·BYOK or Varosity Credits·Zero markup on BYOK

Let your agent pick automatically

POST /v1/route takes a shot description and returns a ranked recommendation filtered to your configured providers.

API docs llms.txt

70 models

veo-3.1

Veo 3.1

videoBest lip sync

cinematic · fal.ai

lip-syncaudio-nativecinematictalking-heads

+best lip sync
+native 4K
+synced audio

max 8snative audio16:9 · 9:16 · 1:1

$0.15/s BYOK · $0.158/s Credits

veo-3.1-quality

Veo 3.1 (Quality)

videoHighest quality

cinematic · fal.ai

lip-syncaudio-nativecinematic

+state-of-the-art fidelity
+best lip sync
+native 48kHz audio

max 8snative audio16:9 · 9:16 · 1:1

$0.4/s BYOK · $0.42/s Credits

veo-3.1-lite

Veo 3.1 Lite

videoBudget Veo

cinematic · fal.ai

audio-nativecinematicbudget

+cheapest Veo tier
+native 48kHz audio
+high-volume iteration

max 8snative audio16:9 · 9:16 · 1:1

$0.05/s BYOK · $0.053/s Credits

kling-3.0

Kling 3.0 Pro

videoBest value

cinematic · fal.ai

cinematicaudio-nativebudgetmulti-shot

+multi-shot consistency
+cinematic motion
+dialogue close-ups

max 10snative audio16:9 · 9:16 · 1:1

$0.1/s BYOK · $0.105/s Credits

grok-imagine-video

Grok Imagine Video

videoCheapest w/ audio

audio-native · fal.ai

audio-nativebudgetfast

+cheapest frontier video with native audio
+synchronized sound + dialogue
+flexible 1–15s duration

max 15snative audio16:9 · 9:16 · 1:1

$0.07/s BYOK · $0.074/s Credits

seedance-4.5

Seedance 4.5

videoAudio-native

audio-native · fal.ai

audio-nativelip-syncmulti-shot

+unified audio-video generation
+multi-shot from one prompt
+phoneme-level lip-sync

max 12snative audio16:9 · 9:16 · 1:1 · 4:5

$0.14/s BYOK · $0.147/s Credits

seedance-2.0

Seedance 2.0

videoTop ranked

cinematic · fal.ai

cinematicaudio-nativephysicscamera-control

+top-ranked motion + physics
+cinematic camera control
+native synced audio

max 15snative audio16:9 · 9:16 · 1:1 · 21:9

$0.3024/s BYOK · $0.318/s Credits

seedance-2.0-fast

Seedance 2.0 Fast

videoBest value

cinematic · fal.ai

cinematicaudio-nativebudgetphysics

+near-flagship quality at lower cost
+faster renders
+native synced audio

max 15snative audio16:9 · 9:16 · 1:1 · 21:9

$0.2419/s BYOK · $0.254/s Credits

sora-2

Sora 2

videoSunsetting

cinematic · OpenAI

cinematicphysicspremiumdeprecating

+cinematic quality
+strong prompt adherence
+complex physics

max 12snative audio16:9 · 9:16

$0.15/s BYOK · $0.158/s Credits

omnihuman

OmniHuman 1.5

videoAvatar layer

avatar · fal.ai

talking-avatarlip-sync

+talking avatars
+from single photo
+lip-sync from audio
−needs photo + audio

max 60s16:9 · 9:16 · 1:1

$0.08/s BYOK · $0.084/s Credits

fal-live-portrait

LivePortrait (Performance Transfer)

videoPerformance transfer

avatar · fal.ai

performance-transfervideo-to-videoexpression-transfer

+performance transfer from a driving video
+transfers head motion + facial expressions + lip movement
+onto a different person's photo
−face & head only (not full body/hands)

max 60s16:9 · 9:16 · 1:1

$0.05/s BYOK · $0.053/s Credits

fal-wan-animate

Wan 2.2 Animate (Full-Body Transfer)

videoFull-body (live)

creator · fal.ai

performance-transfervideo-to-videofull-bodywan-animate

+FULL-BODY motion transfer from a driving video
+transfers whole-body movement + face + expression
+onto a different character image
−~3-4 min render

max 120s16:9 · 9:16 · 1:1

$0.1/s BYOK · $0.105/s Credits

kling-3.0-replicate

Kling 3.0 Pro (Replicate)

video

cinematic · Replicate

+multi-shot consistency
+cinematic motion
+cheaper than direct
−queue latency varies

max 10snative audio16:9 · 9:16 · 1:1

$0.09/s BYOK · $0.095/s Credits

happyhorse-1.0

HappyHorse 1.0

videoTop ranked

audio-native · fal.ai

cinematicaudio-nativelip-sync

+#1-ranked motion + prompt adherence
+joint audio-video generation
+multilingual lip-sync

max 15snative audio16:9 · 9:16 · 1:1

$0.14/s BYOK · $0.147/s Credits

luma-ray-3-fal

Luma Ray3 (fal)

video

cinematic · fal.ai

cinematicphysics

+native 16-bit HDR
+high-fidelity motion
+strong realism
−price estimated — verify live rate

max 10s16:9 · 9:16 · 1:1

$0.12/s BYOK · $0.126/s Credits

runway-gen-4.5-replicate

Runway Gen-4.5 (Replicate)

video

creator · Replicate

camera-controlvideo-to-videocinematic

+camera control
+motion brush
+scene consistency
−price estimated — verify live rate

max 10s16:9 · 9:16 · 1:1

$0.05/s BYOK · $0.053/s Credits

pika-2.5

Pika 2.5

video

creator · fal.ai

+stylized motion
+character consistency
+fast
−less photoreal than Kling

max 10s16:9 · 9:16 · 1:1

$0.08/s BYOK · $0.084/s Credits

hailuo-02

MiniMax Hailuo 02

video

cinematic · MiniMax Hailuo

physicscomplex-motion

+physics realism
+complex motion
+long prompts
−international API region latency

max 10s16:9 · 9:16 · 1:1

$0.11/s BYOK · $0.116/s Credits

elevenlabs-tts

ElevenLabs Multilingual v2

voice

tts · ElevenLabs

+29 languages
+emotion control
+voice cloning
−streaming latency vs Cartesia

max 600snative audio

$0.005/s BYOK · $0.005/s Credits

elevenlabs-tts-v3

ElevenLabs v3 (Alpha)

voice

tts · ElevenLabs

+dialogue style
+highest expressiveness
−alpha — quality varies

max 600snative audio

$0.008/s BYOK · $0.008/s Credits

elevenlabs-sts

ElevenLabs Speech-to-Speech

voice

tts · ElevenLabs

+voice conversion
+preserves delivery/performance

max 600snative audio

$0.002/s BYOK · $0.002/s Credits

elevenlabs-dubbing

ElevenLabs Dubbing

voice

tts · ElevenLabs

+translate + dub to 30+ languages
+speaker-aware

max 2700snative audio

$0.0084/s BYOK · $0.009/s Credits

elevenlabs-scribe

ElevenLabs Scribe (Speech-to-Text)

voice

tts · ElevenLabs

+transcription
+speaker diarization
+word-level timestamps

max 7200snative audio

$0.0000611/s BYOK · $0/s Credits

elevenlabs-music

ElevenLabs Music

musicCommercially safe

music · ElevenLabs

+commercially licensed training data
+low-latency
+fast
−fewer genres than Suno

max 300snative audio

$0.02/s BYOK · $0.021/s Credits

lyria-2

Google Lyria 2

music

music · fal.ai

+instrumental detail
+long-form composition
−no vocals

max 120snative audio

$0.015/s BYOK · $0.016/s Credits

flux-1-schnell

FLUX.1 [schnell]

image

creator · fal.ai

+fast (1–3s)
+good prompt adherence
+low cost
−no negative prompts

16:9 · 9:16 · 1:1 · 4:5 · 21:9

$0.003/s BYOK · $0.003/s Credits

flux-2-pro

FLUX.2 [pro]

image

creator · fal.ai

+current-gen FLUX quality
+strong prompt adherence
+multi-reference editing
−pricier than schnell

16:9 · 9:16 · 1:1 · 4:5 · 21:9

$0.03/s BYOK · $0.032/s Credits

flux-2-klein

FLUX.2 [klein]

image

creator · fal.ai

budgetfastopen-weight

+current-gen FLUX at draft price
+crisper text than FLUX.1
+fast (4-step)
−below flux-2-pro fidelity

16:9 · 9:16 · 1:1 · 4:5 · 21:9

$0.006/s BYOK · $0.006/s Credits

z-image-turbo

Z-Image Turbo

imageCheapest

creator · fal.ai

open-weightbudgetfast

+ultra-cheap (~$0.01)
+~1s render
+open-weight
−draft quality — not for hero shots

16:9 · 9:16 · 1:1 · 4:5

$0.01/s BYOK · $0.011/s Credits

seedream-4

Seedream 4.0

image

creator · fal.ai

photorealismbudget

+photorealism
+strong composition
+low cost

16:9 · 9:16 · 1:1 · 4:5

$0.03/s BYOK · $0.032/s Credits

flux-1.1-pro

FLUX 1.1 Pro

image

cinematic · Replicate

+highest-quality Flux
+strong prompt adherence
+fine detail
−~5–10s per image

1:1 · 16:9 · 9:16 · 4:5 · 21:9

$0.04/s BYOK · $0.042/s Credits

grok-imagine-image

Grok Imagine Image

image

creator · fal.ai

budgetfastphotorealism

+cheap ($0.02/image)
+photorealism
+fast

16:9 · 9:16 · 1:1 · 4:5 · 21:9

$0.02/s BYOK · $0.021/s Credits

gemini-3-pro-image

Nano Banana Pro (Gemini 3 Pro Image)

imageRecommended

creator · Google AI Studio

typographytext-in-imagerecommendedphotoreal

+best-in-class legible in-image text
+multilingual text
+reasoning-driven composition
−base64 only — no hosted URL

1:1 · 16:9 · 9:16 · 4:5 · 21:9

$0.134/s BYOK · $0.141/s Credits

gemini-3.1-flash-image

Nano Banana 2 (Gemini 3.1 Flash Image)

image

creator · Google AI Studio

photorealtypographyfast

+production-scale quality at flash speed
+legible in-image text
+up to 4K
−base64 only — no hosted URL

1:1 · 16:9 · 9:16 · 4:5 · 21:9

$0.067/s BYOK · $0.07/s Credits

gemini-3.1-flash-lite-image

Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image)

imageFast

creator · Google AI Studio

budgetfast

+ultra-low latency
+cheapest Google tier
+high-volume iteration
−base64 only — no hosted URL

1:1 · 16:9 · 9:16 · 4:5 · 21:9

$0.034/s BYOK · $0.036/s Credits

imagen-4

Imagen 4

image

cinematic · Google AI Studio

deprecating

+photorealism
+fine detail
+accurate anatomy
−DEPRECATED — Imagen 4 line shuts down 2026-08-17; use gemini-3-pro-image

1:1 · 16:9 · 9:16 · 4:5

$0.04/s BYOK · $0.042/s Credits

imagen-4-fast

Imagen 4 Fast

imageFast

cinematic · Google AI Studio

deprecating

+fast (~3–5s)
+photorealism
+good for drafts + iterations
−slightly lower detail than Imagen 4

1:1 · 16:9 · 9:16 · 4:5

$0.02/s BYOK · $0.021/s Credits

ideogram-v3

Ideogram V3

image

creator · fal.ai

typography

+best text-in-image
+legible typography
+poster / cover art
−aspect ratios limited to fixed presets

1:1 · 16:9 · 9:16 · 4:5

$0.04/s BYOK · $0.042/s Credits

recraft-v3

Recraft V3

image

creator · fal.ai

+long detailed prompts
+brand-consistent style
+vector-style output
−slower than Flux Schnell (~10–20s)

1:1 · 16:9 · 9:16 · 4:5

$0.04/s BYOK · $0.042/s Credits

dall-e-3

DALL-E 3

image

creator · OpenAI

typographycreative

+strong text-in-image
+creative compositions
+photorealism
−square-leaning aspect ratios

1:1 · 16:9 · 9:16

$0.04/s BYOK · $0.042/s Credits

gpt-image-2

GPT Image 2

image

creator · OpenAI

typographycreativephotorealism

+best-in-class instruction following
+accurate in-image text
+photorealistic detail
−rate-limited per OpenAI plan

1:1 · 16:9 · 9:16

$0.08/s BYOK · $0.084/s Credits

nano-banana

Nano Banana

image

creator · Replicate

+very fast
+good identity preservation
+cheap
−less photoreal than Flux Pro

1:1 · 16:9 · 9:16 · 4:5

$0.005/s BYOK · $0.005/s Credits

suno-v4

Suno v4

musicVarosity Credits

music · muapi.ai

+best vocal generation
+genre + mood tags
+full song structure
−review licensing before commercial use

max 240snative audio

$0.025/s · Varosity Credits only

muapi-flux-dev

Flux Dev

imageVarosity Credits

creator · muapi.ai

+12B parameter model
+strong prompt adherence
+fast guided distillation
−slower than Flux Schnell

16:9 · 9:16 · 1:1 · 4:5 · 21:9

$0.015/s · Varosity Credits only

muapi-wan-effects

WAN Video Effects

videoVarosity Credits

creator · muapi.ai

+named effect catalog (Cakeify, Squish, VHS, Samurai…)
+frame consistency
+platform-funded
−short clips only (≤10s)

max 10s16:9 · 9:16

$0.06/s · Varosity Credits only

muapi-latentsync

LatentSync Lip-Sync

videoVarosity Credits

avatar · muapi.ai

lip-syncvideo-to-videobudget

+smooth temporal consistency
+fast inference
+any video + audio
−needs pre-existing video

max 60snative audio16:9 · 9:16 · 1:1

$0.05/s · Varosity Credits only

muapi-wan-t2v

WAN 2.1 Text-to-Video

videoVarosity Credits

creator · muapi.ai

+platform-funded (no BYOK)
+up to 720p / high quality
+reliable fallback
−$0.30/video flat rate

max 10s16:9 · 9:16

$0.03/s · Varosity Credits only

openai-tts-1

OpenAI TTS-1

voice

tts · OpenAI

+fast
+6 voices
+low latency
−slightly lower quality than TTS-1 HD

max 600snative audio

$0.004/s BYOK · $0.004/s Credits

openai-tts-1-hd

OpenAI TTS-1 HD

voiceHD quality

tts · OpenAI

+highest OpenAI voice quality
+6 voices
+natural prosody
−2× cost of TTS-1

max 600snative audio

$0.008/s BYOK · $0.008/s Credits

deepgram-aura-2

Deepgram Aura 2

voiceLowest latency

tts · Deepgram

+ultra-low latency
+natural prosody
+cheap ($0.030/1K chars)
−English-only in Aura 2

max 600snative audio

$0.002/s BYOK · $0.002/s Credits

cartesia-sonic-2

Cartesia Sonic 2

voice

tts · Cartesia

+ultra-low latency (~90ms)
+natural prosody
+large public voice library
−voice ids are library-specific — call list_voices

max 600snative audio

$0.006/s BYOK · $0.006/s Credits

cartesia-sonic-3

Cartesia Sonic 3

voice

tts · Cartesia

+~90ms TTFA
+42 languages
+AI laughter + emotion
−voice ids are library-specific — call list_voices

max 600snative audio

$0.006/s BYOK · $0.006/s Credits

fal-kokoro

Kokoro (fal)

voice

tts · fal.ai

open-weightbudget

+open-weight (MIT)
+runs on CPU
+very cheap
−fewer expressive controls than ElevenLabs

max 600snative audio

$0.0008/s BYOK · $0.001/s Credits

fal-chatterbox

Chatterbox (fal)

voice

tts · fal.ai

open-weight

+open-weight (MIT)
+expressive + emotion control
+instant voice cloning
−voiceId is a clone-sample URL, not a library id

max 600snative audio

$0.001/s BYOK · $0.001/s Credits

fish-audio-tts

Fish Audio

voice

tts · Fish Audio

+multilingual
+cheap
+large community voice library
−voice ids are library-specific — call list_voices

max 600snative audio

$0.004/s BYOK · $0.004/s Credits

heygen-avatar-4

HeyGen Avatar (Digital Twin)

video

avatar · HeyGen

talking-avatarlip-syncpremium

+studio-grade lip sync
+Avatar IV / Avatar V motion engines
+voice emotion + speed control
−requires a pre-trained avatar look id

max 300snative audio16:9 · 9:16 · 1:1 · 4:5

$0.08/s BYOK · $0.084/s Credits

heygen-photo-avatar

HeyGen Photo Avatar (Avatar IV)

video

avatar · HeyGen

talking-avatarlip-syncphoto-to-video

+animate ANY photo as the speaker
+Avatar IV motion engine
+motion prompt + expressiveness control
−needs a clear front-facing photo

max 300snative audio16:9 · 9:16 · 1:1 · 4:5

$0.08/s BYOK · $0.084/s Credits

heygen-cinematic

HeyGen Cinematic Avatar

video

cinematic · HeyGen

talking-avatarcinematic

+prompt-driven cinematic shots
+blends 1–3 avatar looks into a scene
+reference videos/images for style
−4–15s per clip

max 15snative audio16:9 · 9:16 · 1:1

$0.1/s BYOK · $0.105/s Credits

heygen-video-agent

HeyGen Video Agent

video

avatar · HeyGen

talking-avataragent

+prompt → finished video
+agent writes script, picks avatar & scenes
+accepts reference files
−least granular control

max 600snative audio16:9 · 9:16

$0.12/s BYOK · $0.126/s Credits

heygen-video-translate

HeyGen Video Translate

video

avatar · HeyGen

talking-avatarlip-synctranslationvideo-to-video

+multilingual lip-sync dubbing
+preserves original speaker appearance
+supports 40+ languages
−requires source video with clear speech

max 600snative audio16:9 · 9:16 · 1:1

$0.1/s BYOK · $0.105/s Credits

d-id-talks

D-ID AI Presenter

video

avatar · D-ID

talking-avatarbudget

+talking avatars from any photo
+text-to-presenter
+fast render
−fixed framing

max 300snative audio16:9 · 9:16 · 1:1

$0.05/s BYOK · $0.053/s Credits

hunyuan-video

Hunyuan Video

video

cinematic · fal.ai

physicscinematic

+open-source quality
+long coherent motion
+strong physics
−slow render time (60–120s)

max 10s16:9 · 9:16 · 1:1

$0.09/s BYOK · $0.095/s Credits

ltx-video

LTX Video

videoFastest

creator · fal.ai

fast

+fastest open video model (<5s)
+image-to-video
+good for iteration
−lower detail than Kling/Veo

max 5s16:9 · 9:16 · 1:1

$0.04/s BYOK · $0.042/s Credits

wan-2.6

Wan 2.6

video

creator · fal.ai

open-weightbudget

+open-weight (Apache-2.0)
+strong motion
+self-hostable

max 8s16:9 · 9:16 · 1:1

$0.06/s BYOK · $0.063/s Credits

hunyuan-video-1.5

HunyuanVideo 1.5

video

cinematic · fal.ai

open-weightphysics

+open-weight (Apache-2.0)
+long coherent motion
+strong physics
−slower render

max 10s16:9 · 9:16 · 1:1

$0.09/s BYOK · $0.095/s Credits

ltx-2

LTX-2

videoFastest open

creator · fal.ai

open-weightfast

+open-weight (Apache-2.0)
+very fast
+image-to-video

max 8s16:9 · 9:16 · 1:1

$0.05/s BYOK · $0.053/s Credits

ws-luma-ray-2

Luma Ray 2

videoVarosity Credits

cinematic · WaveSpeed

+fluid motion
+cinematic quality
+strong prompt adherence
−higher cost than Pika

max 10s16:9 · 9:16

$0.08/s · Varosity Credits only

ws-pika-2.2

Pika 2.2

videoVarosity Credits

creator · WaveSpeed

+fast generation
+stylized output
+good character consistency
−shorter max duration

max 10s16:9 · 9:16

$0.04/s · Varosity Credits only

ws-hailuo-02

Hailuo 02

videoVarosity Credits

cinematic · WaveSpeed

physicscomplex-motion

+physics realism
+complex motion
+high resolution
−fixed 6s duration

max 6s16:9 · 9:16

$0.08/s · Varosity Credits only

ws-runway-gen4

Runway Gen 4

videoVarosity Credits

cinematic · WaveSpeed

camera-controlvideo-to-video

+camera control
+cinematic motion brush
+video-to-video
−requires reference image for best results

max 10s16:9 · 9:16 · 1:1

$0.01/s · Varosity Credits only

Smart Route API

Skip the cheat sheet. Pass a shot description to /v1/route and get a ranked recommendation filtered to your configured providers.

// Request

POST https://varosity.ai/api/v1/route
Authorization: Bearer vsk_…
Content-Type: application/json

{
  "shot_description": "close-up of a barista pouring milk, dialogue, cinematic",
  "modality": "video",
  "duration_s": 8,
  "budget_cents": 200
}

// Response

{
  "ok": true,
  "primary": {
    "model_id": "veo-3.1",
    "vendor": "fal",
    "billing_mode": "byok",
    "estimated_cost_cents": 120,
    "reasoning": "Dialogue + close-up + cinematic → lip-sync wins",
    "confidence": 0.95
  },
  "fallbacks": [
    { "model_id": "seedance-4.5", "billing_mode": "byok",
      "reasoning": "Audio-native, similar quality" }
  ],
  "filtered_out": [
    { "model_id": "sora-2", "reason": "no_byok_key" }
  ]
}

Full API docs llms.txt routing.json