Varosity

Docs · Routing

Picking the right model

A strengths cheat sheet across every video, image, voice, and music model on Varosity. Use it as a static reference, or query the same logic via POST /v1/route.

44 models available·BYOK or Varosity Credits·Zero markup on BYOK

Let your agent pick automatically

POST /v1/route takes a shot description and returns a ranked recommendation filtered to your configured providers.

44 models

veo-3.1

Veo 3.1

videoBest lip sync

cinematic · fal.ai

lip-syncaudio-nativecinematictalking-heads
  • +best lip sync
  • +native 4K
  • +synced audio
max 8snative audio16:9 · 9:16 · 1:1

$0.15/s BYOK  ·  $0.158/s Credits

kling-3.0

Kling 3.0 Pro

videoBest value

cinematic · fal.ai

cinematicaudio-nativebudgetmulti-shot
  • +multi-shot consistency
  • +cinematic motion
  • +dialogue close-ups
max 10snative audio16:9 · 9:16 · 1:1

$0.1/s BYOK  ·  $0.105/s Credits

seedance-4.5

Seedance 4.5

videoAudio-native

audio-native · fal.ai

audio-nativelip-syncmulti-shot
  • +unified audio-video generation
  • +multi-shot from one prompt
  • +phoneme-level lip-sync
max 12snative audio16:9 · 9:16 · 1:1 · 4:5

$0.14/s BYOK  ·  $0.147/s Credits

omnihuman

OmniHuman 1.5

videoAvatar layer

avatar · fal.ai

talking-avatarlip-sync
  • +talking avatars
  • +from single photo
  • +lip-sync from audio
  • needs photo + audio
max 60s16:9 · 9:16 · 1:1

$0.08/s BYOK  ·  $0.084/s Credits

kling-3.0-replicate

Kling 3.0 Pro (Replicate)

video

cinematic · Replicate

  • +multi-shot consistency
  • +cinematic motion
  • +cheaper than direct
  • queue latency varies
max 10snative audio16:9 · 9:16 · 1:1

$0.09/s BYOK  ·  $0.095/s Credits

pika-2.5

Pika 2.5

video

creator · fal.ai

  • +stylized motion
  • +character consistency
  • +fast
  • less photoreal than Kling
max 10s16:9 · 9:16 · 1:1

$0.08/s BYOK  ·  $0.084/s Credits

hailuo-02

MiniMax Hailuo 02

video

cinematic · MiniMax Hailuo

physicscomplex-motion
  • +physics realism
  • +complex motion
  • +long prompts
  • international API region latency
max 10s16:9 · 9:16 · 1:1

$0.11/s BYOK  ·  $0.116/s Credits

elevenlabs-tts

ElevenLabs Multilingual v2

voice

tts · ElevenLabs

  • +29 languages
  • +emotion control
  • +voice cloning
  • streaming latency vs Cartesia
max 600snative audio

$0.005/s BYOK  ·  $0.005/s Credits

elevenlabs-tts-v3

ElevenLabs v3 (Alpha)

voice

tts · ElevenLabs

  • +dialogue style
  • +highest expressiveness
  • alpha — quality varies
max 600snative audio

$0.008/s BYOK  ·  $0.008/s Credits

elevenlabs-sts

ElevenLabs Speech-to-Speech

voice

tts · ElevenLabs

  • +voice conversion
  • +preserves delivery/performance
max 600snative audio

$0.002/s BYOK  ·  $0.002/s Credits

elevenlabs-dubbing

ElevenLabs Dubbing

voice

tts · ElevenLabs

  • +translate + dub to 30+ languages
  • +speaker-aware
max 2700snative audio

$0.0084/s BYOK  ·  $0.009/s Credits

elevenlabs-scribe

ElevenLabs Scribe (Speech-to-Text)

voice

tts · ElevenLabs

  • +transcription
  • +speaker diarization
  • +word-level timestamps
max 7200snative audio

$0.0000611/s BYOK  ·  $0/s Credits

elevenlabs-music

ElevenLabs Music

musicCommercially safe

music · ElevenLabs

  • +commercially licensed training data
  • +low-latency
  • +fast
  • fewer genres than Suno
max 300snative audio

$0.02/s BYOK  ·  $0.021/s Credits

lyria-2

Google Lyria 2

music

music · fal.ai

  • +instrumental detail
  • +long-form composition
  • no vocals
max 120snative audio

$0.015/s BYOK  ·  $0.016/s Credits

flux-1-schnell

FLUX.1 [schnell]

image

creator · fal.ai

  • +fast (1–3s)
  • +good prompt adherence
  • +low cost
  • no negative prompts
16:9 · 9:16 · 1:1 · 4:5 · 21:9

$0.003/s BYOK  ·  $0.003/s Credits

flux-1.1-pro

FLUX 1.1 Pro

image

cinematic · Replicate

  • +highest-quality Flux
  • +strong prompt adherence
  • +fine detail
  • ~5–10s per image
1:1 · 16:9 · 9:16 · 4:5 · 21:9

$0.04/s BYOK  ·  $0.042/s Credits

imagen-4

Imagen 4

image

cinematic · Google AI Studio

  • +photorealism
  • +fine detail
  • +accurate anatomy
  • slower than Imagen 4 Fast (~10s)
1:1 · 16:9 · 9:16 · 4:5

$0.04/s BYOK  ·  $0.042/s Credits

imagen-4-fast

Imagen 4 Fast

imageFast

cinematic · Google AI Studio

fast
  • +fast (~3–5s)
  • +photorealism
  • +good for drafts + iterations
  • slightly lower detail than Imagen 4
1:1 · 16:9 · 9:16 · 4:5

$0.02/s BYOK  ·  $0.021/s Credits

ideogram-v3

Ideogram V3

image

creator · fal.ai

typography
  • +best text-in-image
  • +legible typography
  • +poster / cover art
  • aspect ratios limited to fixed presets
1:1 · 16:9 · 9:16 · 4:5

$0.04/s BYOK  ·  $0.042/s Credits

recraft-v3

Recraft V3

image

creator · fal.ai

  • +long detailed prompts
  • +brand-consistent style
  • +vector-style output
  • slower than Flux Schnell (~10–20s)
1:1 · 16:9 · 9:16 · 4:5

$0.04/s BYOK  ·  $0.042/s Credits

dall-e-3

DALL-E 3

image

creator · OpenAI

typographycreative
  • +strong text-in-image
  • +creative compositions
  • +photorealism
  • square-leaning aspect ratios
1:1 · 16:9 · 9:16

$0.04/s BYOK  ·  $0.042/s Credits

nano-banana

Nano Banana

image

creator · Replicate

  • +very fast
  • +good identity preservation
  • +cheap
  • less photoreal than Flux Pro
1:1 · 16:9 · 9:16 · 4:5

$0.005/s BYOK  ·  $0.005/s Credits

suno-v4

Suno v4

musicVarosity Credits

music · muapi.ai

  • +best vocal generation
  • +genre + mood tags
  • +full song structure
  • review licensing before commercial use
max 240snative audio

$0.025/s · Varosity Credits only

muapi-flux-dev

Flux Dev

imageVarosity Credits

creator · muapi.ai

  • +12B parameter model
  • +strong prompt adherence
  • +fast guided distillation
  • slower than Flux Schnell
16:9 · 9:16 · 1:1 · 4:5 · 21:9

$0.015/s · Varosity Credits only

muapi-wan-effects

WAN Video Effects

videoVarosity Credits

creator · muapi.ai

  • +named effect catalog (Cakeify, Squish, VHS, Samurai…)
  • +frame consistency
  • +platform-funded
  • short clips only (≤10s)
max 10s16:9 · 9:16

$0.06/s · Varosity Credits only

muapi-latentsync

LatentSync Lip-Sync

videoVarosity Credits

avatar · muapi.ai

lip-syncvideo-to-videobudget
  • +smooth temporal consistency
  • +fast inference
  • +any video + audio
  • needs pre-existing video
max 60snative audio16:9 · 9:16 · 1:1

$0.05/s · Varosity Credits only

muapi-wan-t2v

WAN 2.1 Text-to-Video

videoVarosity Credits

creator · muapi.ai

  • +platform-funded (no BYOK)
  • +up to 720p / high quality
  • +reliable fallback
  • $0.30/video flat rate
max 10s16:9 · 9:16

$0.03/s · Varosity Credits only

openai-tts-1

OpenAI TTS-1

voice

tts · OpenAI

  • +fast
  • +6 voices
  • +low latency
  • slightly lower quality than TTS-1 HD
max 600snative audio

$0.004/s BYOK  ·  $0.004/s Credits

openai-tts-1-hd

OpenAI TTS-1 HD

voiceHD quality

tts · OpenAI

  • +highest OpenAI voice quality
  • +6 voices
  • +natural prosody
  • 2× cost of TTS-1
max 600snative audio

$0.008/s BYOK  ·  $0.008/s Credits

deepgram-aura-2

Deepgram Aura 2

voiceLowest latency

tts · Deepgram

  • +ultra-low latency
  • +natural prosody
  • +cheap ($0.030/1K chars)
  • English-only in Aura 2
max 600snative audio

$0.002/s BYOK  ·  $0.002/s Credits

cartesia-sonic-2

Cartesia Sonic 2

voice

tts · Cartesia

  • +ultra-low latency (~90ms)
  • +natural prosody
  • +large public voice library
  • voice ids are library-specific — call list_voices
max 600snative audio

$0.006/s BYOK  ·  $0.006/s Credits

fish-audio-tts

Fish Audio

voice

tts · Fish Audio

  • +multilingual
  • +cheap
  • +large community voice library
  • voice ids are library-specific — call list_voices
max 600snative audio

$0.004/s BYOK  ·  $0.004/s Credits

heygen-avatar-4

HeyGen Avatar (Digital Twin)

video

avatar · HeyGen

talking-avatarlip-syncpremium
  • +studio-grade lip sync
  • +Avatar IV / Avatar V motion engines
  • +voice emotion + speed control
  • requires a pre-trained avatar look id
max 300snative audio16:9 · 9:16 · 1:1 · 4:5

$0.08/s BYOK  ·  $0.084/s Credits

heygen-photo-avatar

HeyGen Photo Avatar (Avatar IV)

video

avatar · HeyGen

talking-avatarlip-syncphoto-to-video
  • +animate ANY photo as the speaker
  • +Avatar IV motion engine
  • +motion prompt + expressiveness control
  • needs a clear front-facing photo
max 300snative audio16:9 · 9:16 · 1:1 · 4:5

$0.08/s BYOK  ·  $0.084/s Credits

heygen-cinematic

HeyGen Cinematic Avatar

video

cinematic · HeyGen

talking-avatarcinematic
  • +prompt-driven cinematic shots
  • +blends 1–3 avatar looks into a scene
  • +reference videos/images for style
  • 4–15s per clip
max 15snative audio16:9 · 9:16 · 1:1

$0.1/s BYOK  ·  $0.105/s Credits

heygen-video-agent

HeyGen Video Agent

video

avatar · HeyGen

talking-avataragent
  • +prompt → finished video
  • +agent writes script, picks avatar & scenes
  • +accepts reference files
  • least granular control
max 600snative audio16:9 · 9:16

$0.12/s BYOK  ·  $0.126/s Credits

heygen-video-translate

HeyGen Video Translate

video

avatar · HeyGen

talking-avatarlip-synctranslationvideo-to-video
  • +multilingual lip-sync dubbing
  • +preserves original speaker appearance
  • +supports 40+ languages
  • requires source video with clear speech
max 600snative audio16:9 · 9:16 · 1:1

$0.1/s BYOK  ·  $0.105/s Credits

d-id-talks

D-ID AI Presenter

video

avatar · D-ID

talking-avatarbudget
  • +talking avatars from any photo
  • +text-to-presenter
  • +fast render
  • fixed framing
max 300snative audio16:9 · 9:16 · 1:1

$0.05/s BYOK  ·  $0.053/s Credits

hunyuan-video

Hunyuan Video

video

cinematic · fal.ai

physicscinematic
  • +open-source quality
  • +long coherent motion
  • +strong physics
  • slow render time (60–120s)
max 10s16:9 · 9:16 · 1:1

$0.09/s BYOK  ·  $0.095/s Credits

ltx-video

LTX Video

videoFastest

creator · fal.ai

fast
  • +fastest open video model (<5s)
  • +image-to-video
  • +good for iteration
  • lower detail than Kling/Veo
max 5s16:9 · 9:16 · 1:1

$0.04/s BYOK  ·  $0.042/s Credits

ws-luma-ray-2

Luma Ray 2

videoVarosity Credits

cinematic · WaveSpeed

  • +fluid motion
  • +cinematic quality
  • +strong prompt adherence
  • higher cost than Pika
max 10s16:9 · 9:16

$0.08/s · Varosity Credits only

ws-pika-2.2

Pika 2.2

videoVarosity Credits

creator · WaveSpeed

  • +fast generation
  • +stylized output
  • +good character consistency
  • shorter max duration
max 10s16:9 · 9:16

$0.04/s · Varosity Credits only

ws-hailuo-02

Hailuo 02

videoVarosity Credits

cinematic · WaveSpeed

physicscomplex-motion
  • +physics realism
  • +complex motion
  • +high resolution
  • fixed 6s duration
max 6s16:9 · 9:16

$0.08/s · Varosity Credits only

ws-runway-gen4

Runway Gen 4

videoVarosity Credits

cinematic · WaveSpeed

camera-controlvideo-to-video
  • +camera control
  • +cinematic motion brush
  • +video-to-video
  • requires reference image for best results
max 10s16:9 · 9:16 · 1:1

$0.01/s · Varosity Credits only

Smart Route API

Skip the cheat sheet. Pass a shot description to /v1/route and get a ranked recommendation filtered to your configured providers.

// Request

POST https://varosity.ai/api/v1/route
Authorization: Bearer vsk_…
Content-Type: application/json

{
  "shot_description": "close-up of a barista pouring milk, dialogue, cinematic",
  "modality": "video",
  "duration_s": 8,
  "budget_cents": 200
}

// Response

{
  "ok": true,
  "primary": {
    "model_id": "veo-3.1",
    "vendor": "fal",
    "billing_mode": "byok",
    "estimated_cost_cents": 120,
    "reasoning": "Dialogue + close-up + cinematic → lip-sync wins",
    "confidence": 0.95
  },
  "fallbacks": [
    { "model_id": "seedance-4.5", "billing_mode": "byok",
      "reasoning": "Audio-native, similar quality" }
  ],
  "filtered_out": [
    { "model_id": "sora-2", "reason": "no_byok_key" }
  ]
}