# Varosity > Your agent. Your accounts. Every model. Varosity is the API agents call when they need to generate video, voice, music, or images. The differentiator is BYOK (bring your own provider keys) at zero markup, combined with MCP-native access and multi-vendor aggregation across forty-plus frontier models. ## What Varosity does Varosity routes agent-initiated generation requests across providers including fal.ai, Runway, Luma, ElevenLabs, Suno, OpenAI, Replicate, Pika, Hailuo, and others. The customer can use their own provider accounts (zero markup, billed directly by the provider) or Varosity credits (no provider accounts needed; a small service fee applies). The MCP server, REST API, and CLI all use the same `vsk_` API key. ## Who Varosity is for Primary customer: any AI agent that needs to generate video, voice, music, or images. The agent reads tool descriptions, picks the right model for the task, calls the API, handles async jobs, and returns results to the operator. Secondary customer: the human operator who pays the bill. Operators use Varosity through Brand Studio for observability and approval gates, or through direct API integration for headless workflows. ## What makes Varosity different 1. BYOK at zero markup. Bring fal, Runway, ElevenLabs, or any supported provider account. Varosity does not take a percentage. Competitors either don't support BYOK or charge 5% on it. 2. Agent-runtime-agnostic. Works with Claude Desktop, Claude Code, OpenClaw, Hermes, Cowork, Cursor, Windsurf, Codex, ChatGPT Apps, and any MCP-compatible runtime. The protocol is the contract. 3. Multi-vendor aggregation. Forty-plus frontier models across video, voice, music, and image. Single-vendor MCP servers (Higgsfield, MiniMax, Pictory) are walled gardens; Varosity routes across all of them. 4. Multi-shot stitching as a primitive. Chain different models into one MP4 render — Veo for one shot, Kling for the next, Runway for the closer — without managing ffmpeg yourself. 5. Skills for coding agents. The Varosity skills package installs into Claude Code, Cursor, OpenCode, Codex, and others via `npx -y skills add varosity-ai/varosity`. The skill teaches the agent when to reach for Varosity and how to use it correctly. ## How agents use Varosity MCP (recommended): Add `https://varosity.ai/api/mcp` as a custom MCP connector. The server exposes tools for generate_video, generate_voice, generate_music, generate_image, render_storyboard, and others. Each tool returns a job ID; agents poll get_job until complete. REST: POST https://varosity.ai/api/v1/images — Generate an image (SYNCHRONOUS; returns { imageUrl } directly, no polling). Body: { prompt, model?, aspect_ratio? }. Default model flux-1-schnell; e.g. nano-banana, imagen-4, flux-1.1-pro, dall-e-3, ideogram-v3. POST https://varosity.ai/api/v1/video/generate — Submit a video generation job; returns jobId GET https://varosity.ai/api/v1/jobs/{jobId} — Poll job status until succeeded or failed POST https://varosity.ai/api/v1/route — Smart Route: recommend the best model for a shot description Authorization: Bearer vsk_... (image generation requires the generate:image scope) CLI (for agent shells): varosity video generate --model veo-3.1 --prompt "..." --json ## Documentation - API reference: https://varosity.ai/docs - Agent integration guide: https://varosity.ai/agent-guide - Model catalog with capabilities and pricing: https://varosity.ai/models - Skills for coding agents: https://varosity.ai/skills - OpenAPI 3.1 spec: https://varosity.ai/api/openapi.json - Agents manifest: https://varosity.ai/.well-known/agents.json - MCP endpoint: https://varosity.ai/api/mcp — Streamable HTTP, JSON-RPC 2.0 - [5-minute first render](https://varosity.ai/docs/quick-start) — Sign in, add a key, render a clip — end-to-end in five minutes. - [Agent mode (Claude / Cursor / MCP)](https://varosity.ai/docs/agent-mode) — Drive Varosity from any MCP host: 11 tools, bearer auth, JSON-RPC over Streamable HTTP. - [Avatar on any background](https://varosity.ai/docs/avatar-on-any-background) — Composite a talking head on a Veo / Kling / Runway background. - [Background music + auto-ducking](https://varosity.ai/docs/background-music-and-ducking) — Add a soundtrack to a project; voice ducks the music automatically. - [BYOK setup per provider](https://varosity.ai/docs/byok-setup) — Where to get each provider's key + how to add it to Varosity. - [Chaining clips across models](https://varosity.ai/docs/multi-shot-storyboard) — Use the right model for each shot type, then stitch them into one MP4. - [Director Mode](https://varosity.ai/docs/director-mode) — Plan, visualise, and approve video campaigns before a single frame renders. - [Managing renders + costs](https://varosity.ai/docs/managing-renders) — Where renders live, how to download, and how to keep BYOK costs under control. - [Multi-shot consistency with locked references](https://varosity.ai/docs/multi-shot-consistency) — Enforce visual continuity across multi-shot video sequences. Lock one reference image, vary model and prompt per shot. - [Picking the right model](https://varosity.ai/docs/picking-the-right-model) — Strengths and weaknesses cheat sheet across every video, voice, and music model. - [Prompting cheatsheet](https://varosity.ai/docs/prompting-cheatsheet) — Camera, lighting, and motion language that lands consistently across models. - [Provider options (advanced knobs)](https://varosity.ai/docs/provider-options) — Every per-model providerOptions field across image, video, and voice — exact names, allowed values, and examples. - [Save as workflow template](https://varosity.ai/docs/workflows) — Turn a project into a reusable template; instantiate from UI, CLI, or MCP. - [Using the CLI](https://varosity.ai/docs/using-the-cli) — Install @varosity/cli; render videos and music from the terminal. - [Varosity Agent Video SDK](https://varosity.ai/docs/agent-video-sdk) — Production-grade video generation for any agent. One API key, all platforms, no external dependencies. - [Voice over your video](https://varosity.ai/docs/voice-over-your-video) — Add narration with ElevenLabs, OpenAI TTS, or Deepgram. ## Pricing BYOK is zero markup — customers pay providers directly. Varosity credits cover calls with no provider accounts needed (a small service fee applies), sold in packs starting at $10. $5 of credit on signup. Pricing page: https://varosity.ai/pricing ## Agent Skills Self-updating skill files for Hermes, Claude Code, Claude Desktop, and other MCP hosts. - [varosity-mcp-agent-integration](https://varosity.ai/api/v1/skills/varosity-mcp-agent-integration) — Full integration reference: auth, all 35 tools, connection examples for every major MCP host. - [varosity-video-orchestration](https://varosity.ai/api/v1/skills/varosity-video-orchestration) — Director-mode orchestration: storyboard → keyframes → parallel renders → approval gate → final stitch. - [varosity-agent-video-sdk](https://varosity.ai/api/v1/skills/varosity-agent-video-sdk) — SDK-level reference for custom video agents built on the Varosity engine. - [custom-video-generation](https://varosity.ai/api/v1/skills/custom-video-generation) — On-demand single-video pipeline: reference image pre-flight → render → raw delivery. ## Models ### Video - `veo-3.1` (fal) — best lip sync, native 4K, synced audio, talking heads. ~$0.15/s, max 8s. - `kling-3.0` (fal) — multi-shot consistency, cinematic motion, dialogue close-ups, 4K. ~$0.1/s, max 10s. - `seedance-4.5` (fal) — unified audio-video generation, multi-shot from one prompt, phoneme-level lip-sync. ~$0.14/s, max 12s. - `omnihuman` (fal) — Drives a talking avatar from one photo + an audio clip. Drop on any shot in a brand agent's storyboard.. ~$0.08/s, max 60s. - `kling-3.0-replicate` (replicate) — multi-shot consistency, cinematic motion, cheaper than direct. ~$0.09/s, max 10s. - `pika-2.5` (fal) — stylized motion, character consistency, fast. ~$0.08/s, max 10s. - `hailuo-02` (hailuo) — physics realism, complex motion, long prompts. ~$0.11/s, max 10s. - `muapi-wan-effects` (muapi) — Apply Cakeify, VHS, Samurai, Film Noir and 20+ other AI effects to images. Billed in Varosity Credits.. ~$0.06/s, max 10s. - `muapi-latentsync` (muapi) — Sync lip movements to any audio track on an existing video. Billed in Varosity Credits.. ~$0.05/s, max 60s. - `muapi-wan-t2v` (muapi) — WAN 2.1 text-to-video via muapi — platform-funded fallback when WaveSpeed is unavailable. No provider key needed.. ~$0.03/s, max 10s. - `heygen-avatar-4` (heygen) — HeyGen's flagship talking-avatar (Digital Twin) on the v3 API — the highest-quality lip-sync available. Drives a pre-trained avatar look with a script (or your own audio), with the Avatar IV motion engine, voice emotion/speed/pitch, captions, and custom backgrounds.. ~$0.08/s, max 300s. - `heygen-photo-avatar` (heygen) — Turn a single photo into a talking presenter with HeyGen's Avatar IV engine. Pass a photo URL + script and HeyGen animates the face with natural motion — no pre-training required. Control motion via a prompt and expressiveness level.. ~$0.08/s, max 300s. - `heygen-cinematic` (heygen) — Prompt-driven cinematic shots featuring your avatar — describe a scene (camera, setting, action) and HeyGen renders a documentary-style clip from 1–3 avatar looks, optionally guided by reference clips/images.. ~$0.1/s, max 15s. - `heygen-video-agent` (heygen) — HeyGen's flagship agent: give it a prompt and it produces a complete video end-to-end — writing the script, choosing the avatar, and composing scenes. Attach reference files (images, docs) to ground the output.. ~$0.12/s, max 600s. - `heygen-video-translate` (heygen) — Translate and dub an existing video into any of 40+ languages with perfectly lip-synced audio. Ideal for brands producing multilingual content from a single master video. Speed or precision mode, optional captions and speech enhancement.. ~$0.1/s, max 600s. - `d-id-talks` (d-id) — talking avatars from any photo, text-to-presenter, fast render. ~$0.05/s, max 300s. - `hunyuan-video` (fal) — open-source quality, long coherent motion, strong physics. ~$0.09/s, max 10s. - `ltx-video` (fal) — fastest open video model (<5s), image-to-video, good for iteration. ~$0.04/s, max 5s. - `ws-luma-ray-2` (wavespeed) — Luma Ray 2 text-to-video via WaveSpeed. Fluid, cinematic motion. Billed in Varosity Credits.. ~$0.08/s, max 10s. - `ws-pika-2.2` (wavespeed) — Pika 2.2 text-to-video via WaveSpeed. Fast and stylized. Billed in Varosity Credits.. ~$0.04/s, max 10s. - `ws-hailuo-02` (wavespeed) — Hailuo 02 Pro via WaveSpeed. Best-in-class physics and complex motion. Billed in Varosity Credits.. ~$0.08/s, max 6s. - `ws-runway-gen4` (wavespeed) — Runway Gen 4 Turbo via WaveSpeed. Precise camera control and motion brush. Billed in Varosity Credits.. ~$0.01/s, max 10s. ### Voice - `elevenlabs-tts` (elevenlabs) — 29 languages, emotion control, voice cloning. - `elevenlabs-tts-v3` (elevenlabs) — dialogue style, highest expressiveness. - `openai-tts-1` (openai) — fast, 6 voices, low latency. - `openai-tts-1-hd` (openai) — highest OpenAI voice quality, 6 voices, natural prosody. - `deepgram-aura-2` (deepgram) — ultra-low latency, natural prosody, cheap ($0.030/1K chars). - `cartesia-sonic-2` (cartesia) — ultra-low latency (~90ms), natural prosody, large public voice library. - `fish-audio-tts` (fish-audio) — multilingual, cheap, large community voice library. ### Music - `elevenlabs-music` (elevenlabs) — commercially licensed training data, low-latency, fast. - `lyria-2` (fal) — instrumental detail, long-form composition. - `suno-v4` (muapi) — Full vocal music generation via muapi — no provider key required, billed in Varosity Credits.. ### Image - `flux-1-schnell` (fal) — fast (1–3s), good prompt adherence, low cost. - `flux-1.1-pro` (replicate) — highest-quality Flux, strong prompt adherence, fine detail. - `imagen-4` (google) — photorealism, fine detail, accurate anatomy, scene coherence. - `imagen-4-fast` (google) — fast (~3–5s), photorealism, good for drafts + iterations. - `ideogram-v3` (fal) — best text-in-image, legible typography, poster / cover art. - `recraft-v3` (fal) — long detailed prompts, brand-consistent style, vector-style output. - `dall-e-3` (openai) — strong text-in-image, creative compositions, photorealism. - `nano-banana` (replicate) — very fast, good identity preservation, cheap, image editing. - `muapi-flux-dev` (muapi) — 12B rectified flow transformer — higher quality than Schnell, faster than Flux Pro. No provider key needed.. ## Full content dump See [/llms-full.txt](https://varosity.ai/llms-full.txt) for the docs + full model registry inlined.