Provider options (advanced knobs)

Every per-model providerOptions field across image, video, and voice — exact names, allowed values, and examples.

Provider options (advanced knobs)

Every generation endpoint accepts an optional providerOptions object that is forwarded verbatim to the underlying model. It is how you reach a provider's full capability — resolution, negative prompts, voice emotion, output format, and more — beyond the common fields (prompt, aspectRatio, durationSec).

Rules of the road:

Optional. Omit providerOptions entirely and you get sensible defaults.
Per-provider. Each model reads only the fields it understands. Fields a
model doesn't support are ignored (or rejected by that provider with a clear
error) — they never silently corrupt a request.
Forwarded on three endpoints: POST /api/v1/images,
POST /api/v1/video/generate, and POST /api/tts (voice also takes
top-level format and speed). POST /api/music uses its own fields
(tags, instrumental, lyrics, title) instead.

All field names below are confirmed against each provider's live API.

---

Images — `POST /api/v1/images`

{
  "prompt": "a glossy red paperclip on white, studio light",
  "modelId": "flux-1.1-pro",
  "aspectRatio": "1:1",
  "providerOptions": { "outputFormat": "jpg", "outputQuality": 80 }
}

OpenAI — `dall-e-3` (gpt-image-1) | Field | Type / values | Notes | |---|---|---| | `quality` | `"low"` \| `"medium"` \| `"high"` \| `"auto"` | render quality | | `background` | `"transparent"` \| `"opaque"` \| `"auto"` | transparent needs png/webp | | `outputFormat` | `"png"` \| `"jpeg"` \| `"webp"` | | | `outputCompression` | `0`–`100` | jpeg/webp only | | `moderation` | `"low"` \| `"auto"` | |

fal — `flux-1-schnell`, `ideogram-v3`, `recraft-v3` | Field | Applies to | Type / values | |---|---|---| | `numInferenceSteps` | flux | integer (schnell 1–12, optimal 4) | | `enableSafetyChecker` | flux | boolean | | `guidanceScale` | flux (image-to-image edit) | number | | `style` | ideogram, recraft | string (recraft: `"realistic_image"`, `"digital_illustration"`, `"vector_illustration"`, …) | | `negativePrompt` | ideogram | string (flux/recraft have no negative prompt) | | `renderingSpeed` | ideogram | `"TURBO"` \| `"BALANCED"` \| `"QUALITY"` | | `expandPrompt` | ideogram | boolean | | `strength` | any (image-to-image edit) | `0`–`1`, how far to deviate from the source |

Replicate — `flux-1.1-pro` | Field | Type / values | Notes | |---|---|---| | `outputFormat` | `"png"` \| `"jpg"` \| `"webp"` | | | `outputQuality` | `0`–`100` | jpg/webp | | `safetyTolerance` | `1`–`6` | 1 = strict, 6 = lax | | `promptUpsampling` | boolean | | | `imagePrompt` | image URL | Flux Redux image-to-image (also falls back to `referenceImageUrl`) | | `width`, `height` | `256`–`1440` | both required; overrides the aspect preset |

> Note: flux-1.1-pro has no steps/guidance fields (those belong to > flux-dev).

Google — `gemini-3-pro-image`, `gemini-3.1-flash-image`, `gemini-3.1-flash-lite-image`

The Gemini (Nano Banana) image models expose no provider options today — aspect_ratio is passed natively via imageConfig. No seed or negativePrompt.

> Imagen 4 sunset: Google retires imagen-4 / imagen-4-fast on > 2026-08-17. Requests for those ids are transparently aliased — > imagen-4 → gemini-3-pro-image, imagen-4-fast → > gemini-3.1-flash-lite-image — and billed at the successor's price, so the > old Imagen personGeneration option is ignored.

---

Video — `POST /api/v1/video/generate`

{
  "prompt": "a red paper airplane glides over a city, smooth aerial motion",
  "modelId": "seedance-4.5",
  "aspectRatio": "16:9",
  "durationSec": 5,
  "providerOptions": { "resolution": "1080p", "cameraFixed": true }
}

fal — `veo-3.1`, `kling-3.0`, `seedance-4.5` | Field | Applies to | Type / values | |---|---|---| | `resolution` | veo, seedance | veo: `"720p"` \| `"1080p"` \| `"4k"`; seedance: `"480p"` \| `"720p"` \| `"1080p"` | | `safetyTolerance` | veo | `"1"`–`"6"` (string) | | `autoFix` | veo | boolean (rewrite prompt to pass moderation) | | `generateAudio` | veo, kling | boolean — native audio, default true; set `false` for silent | | `cfgScale` | kling | number (guidance) | | `shotType` | kling | `"customize"` \| `"intelligent"` | | `cameraFixed` | seedance | boolean (locked camera) | | `enableSafetyChecker` | seedance | boolean |

WaveSpeed (Varosity Credits) — `ws-pika-2.2`, `ws-luma-ray-2`, `ws-hailuo-02`, `ws-runway-gen4` | Field | Applies to | Type / values | |---|---|---| | `seed` | all | number | | `negativePrompt` | all | string | | `enablePromptExpansion` | hailuo | boolean | | `loop` | luma | boolean | | `size` | pika/luma (text-to-video) | `"1280720"` \| `"7201280"` | | `image` | pika/luma/hailuo | image URL — auto-routes to image-to-video |

> Passing a referenceImageUrl on pika/luma/hailuo also triggers > image-to-video automatically.

muapi (Varosity Credits) — `muapi-wan-t2v`, `muapi-wan-effects` | Field | Applies to | Type / values | |---|---|---| | `resolution` | both | `"480p"` \| `"720p"` | | `quality` | both | `"medium"` \| `"high"` | | `negativePrompt` | wan-t2v | string | | `effect` | wan-effects | effect name, e.g. `"Cakeify"`, `"Inflate"`, `"VHS Footage"`, `"Samurai It"`, `"Film Noir"` (validated against muapi's catalog — unknown names error) |

Replicate — `kling-3.0-replicate` | Field | Type / values | |---|---| | `generateAudio` | boolean (default true; `false` = silent) | | `endImage` | image URL (last-frame keyframe) |

Hailuo direct — `hailuo-02` (gated; Credits route uses `ws-hailuo-02`) | Field | Type / values | |---|---| | `resolution` | `"768P"` \| `"1080P"` (1080P is 6s-only) | | `promptOptimizer` | boolean |

---

Voice / TTS — `POST /api/tts`

Voice takes two top-level fields plus providerOptions:

{
  "text": "Hello from Varosity.",
  "voiceModel": "cartesia-sonic-2",
  "format": "wav",                // mp3 | wav | ogg | opus | aac | flac | pcm
  "speed": 1.1,                    // OpenAI 0.25–4.0; others ignore
  "providerOptions": { "language": "en", "emotion": ["positivity:high"] }
}

OpenAI — `openai-tts-1`, `openai-tts-1-hd` Uses the top-level `speed` (0.25–4.0) and `format` (mp3/opus/aac/flac/wav/pcm). No extra `providerOptions`.

Cartesia — `cartesia-sonic-2` | Field | Type / values | |---|---| | `language` | `"en"`, `"es"`, `"fr"`, … (sonic-2 is multilingual) | | `speed` | `"slowest"` \| `"slow"` \| `"normal"` \| `"fast"` \| `"fastest"` | | `emotion` | string[] e.g. `["positivity:high", "curiosity:low"]` | | `container` | `"mp3"` \| `"wav"` \| `"raw"` | | `sampleRate` | number (default 44100) | | `bitRate` | number (mp3, default 128000) | | `encoding` | `"pcm_s16le"`, `"pcm_f32le"`, … (wav/raw) |

Fish Audio — `fish-audio-tts` | Field | Type / values | |---|---| | `model` | `"speech-1.5"` \| `"speech-1.6"` \| `"s1"` (default `speech-1.6`) | | `mp3Bitrate` | `64` \| `128` \| `192` | | `normalize` | boolean | | `latency` | `"normal"` \| `"balanced"` | | `speed` | number (prosody, 1.0 = normal) | | `volume` | number (prosody) | | `temperature`, `topP` | number |

Deepgram — `deepgram-aura-2` | Field | Type / values | |---|---| | `encoding` | `"mp3"` \| `"linear16"` \| `"flac"` \| `"opus"` \| `"aac"` \| `"mulaw"` \| `"alaw"` | | `sampleRate` | number (required for linear16) | | `bitRate` | number (mp3/opus/aac) | | `container` | `"none"` \| `"wav"` \| `"ogg"` |

> Tip: passing top-level "format": "wav" is enough — Deepgram maps it to > linear16 in a WAV container for you.

ElevenLabs — `elevenlabs-tts` Uses top-level `speed`, `format`, plus `stability`, `similarity`, `style`, `speakerBoost` on the request body.

---

Music — `POST /api/music`

Music does not use providerOptions. Use the dedicated fields: tags (genre/mood), instrumental (boolean), lyrics (custom-lyrics mode), and title.

Provider options (advanced knobs)