Veo 3 JSON Prompt Generator: Structured Prompting Guide & Templates (2026)

Stop hoping for good Veo 3 results — engineer them. The complete 2026 guide to JSON prompting in Veo 3: field-by-field anatomy, 4 copy-paste templates, image-to-video, and the consistency trick for multi-shot videos.

Emma Chen · 11 min read · Jun 25, 2026

Most people type a sentence into Veo 3, cross their fingers, and hope the model guesses what they meant. Sometimes it nails it. More often the camera drifts, the lighting changes shot to shot, the character's jacket switches color, and the audio is nothing like what you pictured. The fix that advanced creators have settled on in 2026 is not a longer sentence — it is a JSON prompt. Instead of one run-on paragraph, you hand Veo 3 a structured object that names every dial separately: subject, action, camera, lens, lighting, color, audio, and style. The model stops guessing and starts following instructions.

This guide is the complete, copy-paste reference for JSON prompting in Veo 3. You will learn what a JSON prompt actually is, why the structured format produces more consistent results than plain text, the full field-by-field anatomy of a Veo 3 prompt object, and four ready-to-use templates you can paste straight into Veo 3 today. If you are still writing prompts as paragraphs, this is the single upgrade that will change your hit rate the most. New to prompting in general? Skim our Veo 3 prompt engineering guide first, then come back here to level up to structured prompts.

What is a JSON prompt?

JSON (JavaScript Object Notation) is a simple, readable way to write structured data as key–value pairs. A JSON prompt for Veo 3 takes the same idea you would normally cram into one sentence and breaks it into labeled fields:

{
  "shot": "medium close-up",
  "subject": "a barista with curly red hair and a green apron",
  "action": "steams milk, then looks up and smiles at the camera",
  "setting": "a sunlit specialty coffee shop, morning",
  "camera": "slow push-in on a 50mm lens, shallow depth of field",
  "lighting": "warm window light from the left, soft shadows",
  "audio": "hiss of the steam wand, low cafe chatter, no music",
  "style": "photorealistic, cinematic, 35mm film grain"
}

Compare that to the paragraph version: "A barista with curly red hair and a green apron steams milk in a sunlit coffee shop, then looks up and smiles, shot on a 50mm lens with a slow push-in and warm window light, with steam-wand sounds and cafe chatter." Both describe the same scene. But the JSON version separates every instruction so the model cannot blur "warm window light from the left" into the subject description or forget the audio cue buried at the end of a long sentence. Each key is a clean channel of intent.

Veo 3 does not require a formal schema — it reads the natural-language values inside the object — but the act of structuring forces you to be explicit about things you would otherwise leave vague. That explicitness is where the quality jump comes from.

Why JSON prompts work better in Veo 3

Three concrete reasons structured prompts beat paragraphs, especially on a model like Veo 3 that generates picture and native audio in a single pass:

1. Nothing gets lost. In a long sentence, the model weights the words it sees first and last most heavily. Audio cues, lens choices, and color grading shoved into the middle of a paragraph routinely get ignored. In JSON every instruction sits in its own field with equal billing, so your audio and lighting keys get the same attention as your subject.

2. Consistency across shots. This is the big one. If you keep the subject, style, and lighting fields identical across multiple generations and only change action and camera, you get a recognizably consistent character and look from clip to clip. That is the foundation of stitching shots into a longer sequence — see our walkthrough on extending Veo 3 beyond 8 seconds, where stable JSON blocks are what stop the character from morphing halfway through.

3. Repeatability and iteration. Because the prompt is structured, you can change one variable at a time and actually learn what each field does. Bump lighting from "warm" to "high-key" and rerun. Swap camera from "static" to "slow dolly-in" and rerun. You build a controlled feedback loop instead of rewriting a whole paragraph and wondering which word moved the needle.

The anatomy of a Veo 3 JSON prompt

Here is the full field set that works reliably with Veo 3 in 2026. You do not need every field on every prompt — use what the shot requires — but this is the master list, grouped by purpose.

Core scene

shot — framing: extreme wide, wide, medium, medium close-up, close-up, macro.
subject — who or what, described with specific, durable visual detail (hair, wardrobe, age, build). Reuse this verbatim for consistency.
action — what happens, in time order. Use sequencing words: "first… then…".
setting — location plus time of day and weather.

Camera and lens

camera — movement: static, slow push-in, dolly-out, handheld, crane up, orbit. See our camera control prompts guide for the full vocabulary.
lens — 18mm wide, 35mm, 50mm, 85mm portrait, plus shallow or deep depth of field.

Light and color

lighting — direction, quality, and source: "soft key from the right, rim light behind, practical neon".
color / color_grade — palette and mood: "teal and orange", "muted pastel", "high-contrast noir".

Audio (Veo 3's superpower)

audio — describe the full mix: ambient sound, sound effects, and music separately. Be explicit when you want silence or no music.
dialogue — the exact spoken line in quotes. Veo 3 will lip-sync it. Keep lines short for an 8-second clip. Our native audio prompt guide goes deep on this field.

Style and finish

style — overall aesthetic: photorealistic, cinematic, documentary, claymation, anime, 35mm film.
aspect_ratio — 16:9, 9:16 for vertical, 1:1.
negative — what to avoid: "no text overlay, no warped hands, no extra fingers". For more, see the negative prompt guide.

Four copy-paste JSON templates

These are complete, working starting points. Paste one into Veo 3, then swap the values for your scene.

1. Product reveal (e-commerce / ads)

{
  "shot": "macro to medium pull-back",
  "subject": "a matte-black wireless earbud case on a wet slate surface",
  "action": "the lid slowly opens, the earbuds glow, then the camera pulls back to reveal the full product",
  "setting": "minimalist studio, dark background",
  "camera": "slow dolly-out on a 100mm macro lens, rack focus",
  "lighting": "single soft top light, subtle blue rim light",
  "color_grade": "high-contrast, cool blues and silver",
  "audio": "a soft mechanical click as the lid opens, a low ascending synth swell, no voiceover",
  "style": "premium commercial, photorealistic, glossy reflections",
  "aspect_ratio": "16:9",
  "negative": "no text, no logos, no hands"
}

2. Cinematic dialogue (lip-synced)

{
  "shot": "medium close-up",
  "subject": "a weary detective in a damp trench coat, 50s, stubble",
  "action": "he leans against a brick wall, exhales, and speaks directly to camera",
  "setting": "a rain-soaked alley at night, neon signs reflecting in puddles",
  "camera": "slow handheld push-in on an 85mm lens, shallow depth of field",
  "lighting": "magenta and cyan neon practicals, hard rim light, deep shadows",
  "dialogue": "\"Everybody in this city is running from something. Tonight, it's my turn.\"",
  "audio": "steady rain, distant traffic, a low ominous drone, no music bed",
  "style": "neo-noir, cinematic, 35mm film grain",
  "aspect_ratio": "16:9"
}

{
  "shot": "medium",
  "subject": "an energetic fitness coach in bright activewear",
  "action": "demonstrates a kettlebell swing with perfect form, then points at the camera and gives a thumbs up",
  "setting": "a sunlit home gym with plants",
  "camera": "static tripod, eye level, then a quick snap zoom on the thumbs up",
  "lighting": "bright natural daylight, clean and high-key",
  "audio": "upbeat energetic background music, a rhythmic exhale on each swing",
  "dialogue": "\"Three sets of fifteen — let's go!\"",
  "style": "vibrant, modern, social-media polish",
  "aspect_ratio": "9:16",
  "negative": "no warped equipment, no extra limbs"
}

4. Character-consistent series shot

Lock subject, style, and lighting; change only action and camera between runs.

{
  "shot": "medium",
  "subject": "Mira, a young astronaut with a buzzcut and a scar over her left eyebrow, wearing a worn orange flight suit",
  "action": "checks a wrist console, frowns, then looks off-screen toward an alarm",
  "setting": "the cramped cockpit of a derelict spaceship, red emergency lighting",
  "camera": "slow orbit to the right on a 35mm lens",
  "lighting": "pulsing red emergency light, faint blue glow from the console",
  "audio": "low hum of failing systems, an intermittent alarm beep, tense ambient drone",
  "style": "sci-fi, cinematic, photorealistic, film grain",
  "aspect_ratio": "16:9"
}

JSON prompts with a reference image (image-to-video)

JSON structuring shines just as much when you start from an image instead of pure text. When you upload a reference frame, the image already locks the subject's appearance, wardrobe, and setting — so you can drop the heavy subject and setting description and spend your fields on motion and camera instead. The image handles the "what it looks like"; the JSON handles the "what it does".

{
  "input": "uploaded reference image of the character",
  "action": "the character turns their head toward the window, then breaks into a slow smile",
  "camera": "gentle handheld drift, slight push-in on a 50mm lens",
  "lighting": "match the soft window light in the reference image",
  "audio": "quiet room tone, a soft inhale, distant birdsong, no music",
  "style": "preserve the photographic style of the reference image",
  "negative": "do not change the character's face, hair, or clothing"
}

Notice how lighting and style say match and preserve rather than redescribing — that keeps Veo 3 anchored to your image instead of reinventing it. The negative field doing identity-protection work ("do not change the face") is one of the highest-leverage lines you can write for image-to-video. For the full reference-image workflow, see our Veo 3 image-to-video guide.

Advanced: ordering and grouping your fields

Two refinements separate decent JSON prompts from great ones. First, field order is a soft signal — put the most important instruction near the top. If character identity matters most, lead with subject; if the camera move is the hero of the shot, lead with camera. Second, for complex scenes you can group related values into nested objects so each cluster reads cleanly:

{
  "subject": "a street violinist, late 20s, fingerless gloves",
  "action": "plays an energetic solo, eyes closed",
  "camera": { "movement": "slow arc left", "lens": "35mm", "depth_of_field": "shallow" },
  "lighting": { "key": "golden hour backlight", "fill": "soft bounce from the right" },
  "audio": { "music": "a fast, emotional solo violin", "ambient": "city street, faint applause", "sfx": "none" }
}

Veo 3 reads the nested values fine, and the grouping makes it obvious at a glance what you have specified versus what you have left to the model. Use flat fields for simple shots and nested objects only when a section genuinely has several sub-values — over-nesting a simple prompt just adds noise.

How to use a JSON prompt in Veo 3

You do not need a special mode. Veo 3 accepts the JSON object directly in the same prompt box you would type a sentence into:

Open Veo 3 on veo3ai.io (or Google Flow if you prefer the storyboard view).
Paste your complete JSON object into the prompt field. Keep it valid JSON — matched braces, quoted strings, commas between fields.
Set your clip length and aspect ratio if the interface exposes them separately; otherwise the aspect_ratio field inside the JSON does the work.
Generate, review, then iterate by changing one field at a time.
To build a sequence, duplicate the JSON, keep subject/style/lighting frozen, and edit only action and camera for the next shot.

That frozen-block discipline is exactly how creators keep a character on-model across a multi-shot video, which is the prerequisite for anything longer than a single clip.

JSON prompt generators: do you need one?

A wave of free "Veo 3 JSON prompt generator" tools appeared in 2026 — they give you a form with dropdowns for shot, camera, and lighting, then export the JSON for you. They are handy for beginners who do not want to hand-write braces, and for browsing pre-built prompt libraries. But once you understand the field anatomy above, writing the JSON yourself is faster and far more flexible than clicking through someone else's form. The schema in this guide is the generator — keep it open in a tab, copy the template closest to your shot, and edit the values. The real skill is not the tool; it is knowing which fields move which dials, which is what you now have.

Common JSON prompting mistakes

Invalid JSON. A missing comma or an unmatched brace can make the model fall back to reading it as messy text. Paste your object into any free JSON validator before generating if you are unsure.
Over-stuffing one field. Do not write a paragraph inside action. Split distinct ideas into the right keys — movement goes in camera, mood goes in lighting and color_grade.
Forgetting audio. Veo 3's native audio is its biggest edge over older models. An empty or missing audio field wastes it. Always specify ambient sound, effects, and whether you want music.
Dialogue too long. An 8-second clip fits roughly one to two short sentences of speech. Cram in a monologue and the lip-sync rushes or cuts off.
Changing everything at once. When a result is close but not perfect, resist rewriting the whole object. Change one field, rerun, learn.

JSON vs plain text: when to use which

Plain-text prompts are still perfectly good for quick, simple, single-shot ideas where you do not care about exact control — "a golden retriever running on a beach at sunset" will look great either way. Reach for JSON when you need precision (specific lens, lighting, color), consistency (the same character or look across multiple clips), or repeatable iteration (changing one variable at a time). In practice: experiment loosely in text, then lock your winning idea into a JSON object so you can reproduce and extend it reliably. For a broader library of plain-text ideas to convert into JSON, browse our best Veo 3 prompts guide.

Frequently asked questions

Does Veo 3 officially support JSON prompts? Veo 3 does not enforce a formal JSON schema, but it reliably parses structured JSON objects because the natural-language values inside each field are exactly what the model reads. The structure is for your benefit — it forces explicit, separated instructions — and in practice it produces noticeably more controllable results than paragraphs.

Is JSON prompting better than a detailed sentence? For complex or repeatable shots, yes. The format prevents instructions from getting lost and makes consistency across clips far easier. For a one-off simple idea, a good sentence is fine.

What fields matter most for consistency? subject, style, and lighting. Keep those three identical across generations and only vary action and camera to keep a character and look on-model from shot to shot.

Can I use JSON prompts for vertical TikTok and Reels videos? Yes — set "aspect_ratio": "9:16" inside the object. Template 3 above is a ready-made vertical starting point.

Do I need a JSON prompt generator tool? No. The templates and field list in this guide cover what the generators output. Hand-editing a template is faster and more flexible once you know the fields.

Can I use JSON prompts when starting from an image? Yes, and it is often the cleanest workflow. Let the uploaded image define appearance and setting, then use your JSON fields for action, camera, and a negative line that protects the character's identity. See the image-to-video template above.

How long can the dialogue in a JSON prompt be? Keep spoken lines short — roughly one to two brief sentences for a standard 8-second clip. Longer lines force the lip-sync to rush or get cut off. If you need more dialogue, split it across consecutive clips with a frozen subject block.

Will the same JSON prompt always produce the same video? Not exactly — Veo 3 still introduces variation between runs. But a well-structured prompt dramatically narrows that variation, and freezing subject, style, and lighting keeps the look consistent enough to stitch clips together.

Start prompting with structure

JSON prompting is the difference between hoping for a good Veo 3 result and engineering one. Pick the template closest to your shot, paste it into Veo 3, and change one field at a time until it sings. Once you are fluent in the field anatomy, you will never go back to wrestling with run-on paragraphs — and your characters, lighting, and audio will finally stay exactly where you put them. For the next level, pair this with our cinematic prompts guide and start building multi-shot sequences that actually hold together.

Ready to create AI videos?

Turn ideas and images into finished videos with the core Veo3 AI tools.

Text to Video Image to Video

Continue with more blog posts in the same locale.

Browse all posts

Veo 3 B-Roll Generator: How to Create Cinematic Stock Footage with AI (2026)

Use Veo 3 as a b-roll generator: write prompts for cinematic cutaways, match AI footage to real clips, batch a full b-roll pack, and QA before publishing.

Read article