Avatar on any background

Composite a talking head on a Veo / Kling / Runway background.

Avatar on any background (the HeyGen-killer)

HeyGen ships avatars on a fixed pipeline. Varosity composites a talking head on any of the frontier video models. Three modes:

Mode: Off Ignore the attached avatar even if one is set. Useful when iterating on the BG without reburning OmniHuman credits.

Mode: Full frame OmniHuman replaces the shot entirely. Good for plain talking-heads. Required: avatar photo + audio clip (or TTS script).

Mode: Overlay (PiP) The killer feature.

1. Background renders via your chosen text-to-video model (Veo for lip sync if dialogue, Kling for cinematic, Seedance for audio-native…). 2. OmniHuman renders the talking head from a photo + audio in parallel. 3. Once both finish, Varosity runs ffmpeg.wasm in your browser to overlay the avatar in the chosen corner at the chosen size. 4. The composite replaces the shot's render_url. Stitches into the final MP4 like any other shot.

How to set it up

1. Upload a high-resolution front-facing photo at /avatars. 2. In the shot inspector, expand Avatar. 3. Pick the avatar. 4. Either upload an audio clip OR switch to From script and paste text — ElevenLabs synthesizes (BYOK). 5. Switch Layer mode to Overlay. 6. Choose corner + size in the position picker. 7. Hit Render this shot. Two jobs go out; the compositor fires when both land.

Tips

Overlay corner matters more than you think. Bottom-right reads as
presenter; top-right as branding/sponsorship; top-left as primary
caller. Pick deliberately.
Avatar size 25–30% is the sweet spot for most shots. Smaller
reads as ambient; larger steals focus from the BG.
Audio length sets shot duration in Full mode. OmniHuman's output
is bound by the audio length; ignore the duration slider.