
Learn how to generate videos using structured JSON. This guide walks through practical examples, including subtitles, AI voiceovers, and multi-scene videos.
JSON-to-video is a programmatic approach to video production. It allows you to describe the entire video in JSON: scenes, text, images, audio, animations, and timing, and the API renders it into a file. Think of it as the source code for video.
This makes JSON a natural fit for any workflow that requires video to be produced automatically: personalized clips from a dataset, automated social content, or pipelines that run without human input. Because the format is flexible and easy to generate, it works particularly well when producing video at scale.
This guide walks through four practical examples you can adapt to your own use case, starting with a quick overview of how JSON-to-video works and how Creatomate's editor makes working with JSON easier. In each example, the JSON and the generated video are shown side-by-side:
curl -X POST https://api.creatomate.com/v2/renders \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY_HERE" \
-d '{
"output_format": "mp4",
"width": 720,
"height": 1280,
"elements": [
{
"type": "composition",
"track": 1,
"time": 0,
"duration": 4.5,
"fill_color": "rgba(243,207,86,1)",
"elements": [
{
"type": "text",
"text": "This example video is entirely generated through code.",
"track": 1,
"time": 0,
"width": "95.2506%",
"height": "32.6602%",
"x_alignment": "50%",
"y_alignment": "50%",
"font_family": "Inter",
"font_weight": "700",
"line_height": "100%",
"fill_color": "rgba(0,0,0,1)",
"color_filter": "hue",
"animations": [
{
"easing": "linear",
"type": "scale",
"fade": false,
"scope": "element",
"start_scale": "70%"
},
{
"time": 0,
"duration": 1,
"easing": "back-in-out",
"type": "text-scale",
"split": "letter",
"track": 0
}
]
}
]
},
{
"type": "composition",
"track": 1,
"time": 3.5,
"duration": 4.5,
"fill_color": "rgba(73,128,241,1)",
"animations": [
{
"time": 0,
"duration": 1,
"transition": true,
"type": "circular-wipe"
}
],
"elements": [
{
"type": "text",
"text": "Define your entire video in code: scenes, layers, animations, audio, and timing.",
"track": 1,
"time": 0.5,
"width": "79.7626%",
"height": "37.8471%",
"x_alignment": "50%",
"y_alignment": "50%",
"font_family": "Inter",
"font_weight": "700",
"line_height": "100%",
"fill_color": "#ffffff",
"animations": [
{
"easing": "linear",
"type": "scale",
"fade": false,
"scope": "element",
"start_scale": "70%"
},
{
"time": 0,
"duration": 1,
"easing": "back-in-out",
"type": "text-scale",
"split": "letter",
"track": 0
}
]
}
]
},
{
"type": "composition",
"track": 1,
"duration": 4.5,
"fill_color": "rgba(60,174,163,1)",
"animations": [
{
"time": 0,
"duration": 0.87,
"transition": true,
"type": "circular-wipe"
}
],
"elements": [
{
"type": "text",
"text": "Build fully automated video pipelines driven by JSON.",
"track": 1,
"time": 0.5,
"width": "79.7626%",
"height": "37.8471%",
"x_alignment": "50%",
"y_alignment": "50%",
"font_family": "Inter",
"font_weight": "700",
"line_height": "100%",
"fill_color": "#ffffff",
"animations": [
{
"easing": "linear",
"type": "scale",
"fade": false,
"scope": "element",
"start_scale": "70%"
},
{
"time": 0,
"duration": 1,
"easing": "back-in-out",
"type": "text-scale",
"split": "letter",
"track": 0
}
]
}
]
}
]
}'Let's get started!
JSON-to-video is a way to generate videos programmatically by describing the entire video structure in JSON format. Instead of modifying placeholders inside a reusable template, you write structured instructions that define exactly what your video should contain and how it should behave.
With JSON, you can define:
This gives you complete control over every aspect of your video, making it a particularly good fit for developers and technical users building dynamic video workflows. Much like HTML is to web pages, JSON is to video creation.
At its core, JSON-to-video works like this:
Because the API is REST-based, you can call it from any language. Creatomate provides ready-made code snippets in Node.js, Python, PHP, Ruby, C#, and more.
Terminology note: Within Creatomate, the JSON format used to define videos is called "RenderScript". The full RenderScript reference documents every available property and element type.
You don't need to write your JSON from scratch.
Creatomate's template editor lets you design your video visually and then export the underlying JSON. This is the fastest way to get started: create a video design, copy its JSON, then programmatically modify it for each render.
To view the JSON behind a template, open the Source Editor or press F12:
The examples below skip the editor and show the final JSON directly. In practice, many users take a hybrid approach: start with a template, export the JSON, and dynamically replace text, images, or data before sending it to the API.
Tip: If you want to see how visual changes translate into JSON structure, refer to the How to Create Videos from JSON tutorial.
Here are the use cases covered in this post:
The examples build on each other, so following them in order will give you the clearest sense of how the pieces fit together.
Let's start with the simplest possible JSON-to-video example: generating a video with a single animated text element.
The video below displays the message "This video is generated from JSON! 👋" with a simple entrance animation:
curl -X POST https://api.creatomate.com/v2/renders \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY_HERE" \
-d '{
"output_format": "mp4",
"width": 720,
"height": 1280,
"elements": [
{
"type": "text",
"track": 1,
"width": "75%",
"height": "25%",
"x_alignment": "50%",
"y_alignment": "50%",
"text": "This video is generated from JSON! 👋",
"font_family": "Noto Sans",
"fill_color": "#ffffff",
"animations": [
{
"time": 0,
"duration": 1,
"easing": "quadratic-out",
"type": "text-slide",
"scope": "element",
"split": "line",
"distance": "200%",
"direction": "right",
"background_effect": "disabled"
}
]
}
]
}'Tip: If you're already logged in to Creatomate, the code snippet above has your API key, so you copy the JSON and test it right away. If not, simply create a free account and refresh this page to continue.
Understanding the text element
This example uses a single text element, which is one of the most commonly used element types in RenderScript. Text elements are used for titles, headlines, subtitles, and any dynamic text generated from your data.
The most important property is the text itself:
{
"text": "This video is generated from JSON! 👋"
}This defines the content displayed in the video. In practice, this value will come from your dataset or application logic.
Positioning the text
The following properties control the size and placement of the text element:
{
"width": "75%",
"height": "25%",
"x_alignment": "50%",
"y_alignment": "50%"
}These properties define the size of the text container and center it within the video frame.
Styling the text
The appearance of the text can be customized using standard typography properties:
{
"font_family": "Noto Sans",
"fill_color": "#ffffff"
}In this example, we set the font family and text color. RenderScript also supports many additional styling options, such as font weight, font size, letter spacing, line height, and background styling.
Adding animation
To make the text enter the frame, we added a simple animation:
{
"type": "text-slide",
"direction": "right"
}This animation slides the text into the frame when the video starts. There are animations to fade, slide, scale, or animate line by line.
For more details on text positioning and styling, see the documentation. You can also experiment with text animations directly in the editor using the Animations panel (see step 4 of this post).
In this example, we move beyond the basics and introduce automatic transcription.
Creatomate can transcribe audio from a video file and generate fully synchronized subtitles. You define the subtitle styling in your JSON, and Creatomate handles the transcription, timing, and word-by-word synchronization automatically.
We'll take a video file and add styled captions that highlight each word as it's spoken, a format commonly used for social media content.
Here's what we're generating (unmute to hear the audio):
curl -X POST https://api.creatomate.com/v2/renders \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY_HERE" \
-d '{
"output_format": "mp4",
"width": 720,
"height": 1280,
"snapshot_time": 0.63,
"elements": [
{
"name": "Main-Video",
"type": "video",
"track": 1,
"time": 0,
"source": "https://cdn.creatomate.com/demo/the-daily-stoic-podcast.mp4"
},
{
"type": "text",
"track": 2,
"time": 0,
"y": "82.3743%",
"width": "81.4383%",
"height": "35.2513%",
"x_alignment": "50%",
"y_alignment": "50%",
"font_family": "Montserrat",
"font_weight": "700",
"font_size": "9.29 vmin",
"background_color": "rgba(216,216,216,0)",
"background_x_padding": "31%",
"background_y_padding": "17%",
"background_border_radius": "31%",
"transcript_source": "Main-Video",
"transcript_effect": "highlight",
"transcript_maximum_length": 14,
"fill_color": "#ffffff",
"stroke_color": "#000000",
"stroke_width": "1.6 vmin"
}
]
}'How automatic transcription works
The key property is:
{
"transcript_source": "Main-Video"
}This value references the name of the video element. By linking transcript_source to the video element, Creatomate knows which media file to transcribe.
When the render starts, Creatomate extracts the audio from the referenced video, transcribes the speech, generates subtitle segments, and synchronizes each word with the timeline. No separate transcription request is needed.
Styling the subtitles
You have full control over the visual appearance of the subtitles: font family and weight, font size, fill and stroke colors, background styling, and positioning.
The transcript_effect: "highlight" property enables word-by-word highlighting during playback, while transcript_maximum_length: 14 limits the maximum number of characters per subtitle segment, keeping captions short and readable.
All other styling properties work exactly like those on a regular text element.
Tip: Not sure which subtitle style works best? Use the template editor to experiment with fonts, colors, and effects. Once you're happy with the result, press F12 to open the source editor and copy the subtitle element's JSON. You can then reuse that styling directly in your API requests.
This example uses a single video file as input. If you want to stitch multiple short clips together and add subtitles across the entire video, that's where compositions come in. The following examples cover how they work in more detail.
In this example, we move from transcription to content generation.
Instead of using existing audio, we'll generate speech from plain text using ElevenLabs, then combine a background image and an AI-generated voiceover with automatically synchronized subtitles, all defined in JSON.
Here's what we're creating (unmute to hear the voiceover):
curl -X POST https://api.creatomate.com/v2/renders \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY_HERE" \
-d '{
"output_format": "mp4",
"width": 720,
"height": 1280,
"snapshot_time": 1.28,
"elements": [
{
"name": "Scene-1",
"type": "composition",
"track": 1,
"time": 0,
"elements": [
{
"name": "Image-1",
"type": "image",
"track": 1,
"time": 0,
"source": "https://cdn.creatomate.com/demo/better-sleep-1.jpg",
"color_overlay": "rgba(0,0,0,0.15)",
"animations": [
{
"easing": "linear",
"type": "pan",
"end_x": "50%",
"scope": "element",
"track": 0,
"start_x": "50%",
"end_scale": "100%",
"start_scale": "120%"
}
]
},
{
"name": "Subtitles-1",
"type": "text",
"track": 2,
"time": 0,
"width": "86.66%",
"height": "37.71%",
"x_alignment": "50%",
"y_alignment": "50%",
"font_family": "Montserrat",
"font_weight": "700",
"font_size": "8 vmin",
"background_color": "rgba(216,216,216,0)",
"background_x_padding": "26%",
"background_y_padding": "7%",
"background_border_radius": "28%",
"transcript_source": "Voiceover-1",
"transcript_effect": "highlight",
"transcript_maximum_length": 35,
"transcript_color": "#ff0040",
"fill_color": "#ffffff",
"stroke_color": "#333333",
"stroke_width": "1.05 vmin"
},
{
"name": "Voiceover-1",
"type": "audio",
"track": 3,
"time": 0,
"source": "Here are three simple ways to improve your sleep quality tonight.",
"provider": "elevenlabs model_id=eleven_multilingual_v2 voice_id=XrExE9yKIg1WjnnlVkGX stability=0.75"
}
]
}
]
}'How this composition works
We use a composition to group three child elements:
The composition has no fixed duration. By default, it expands to match the longest child element, which in this case is the voiceover. Because the voiceover's duration depends on the input text, the entire scene adapts automatically.
Let's break down how each piece works.
The background image
The image element loads any publicly accessible image URL. You can replace it with your own image source:
{
"type": "image",
"source": "https://cdn.creatomate.com/demo/better-sleep-1.jpg"
}We've added a subtle pan animation to make the scene feel more dynamic, but this is optional and does not affect duration.
The voiceover
The audio element generates speech via ElevenLabs:
{
"type": "audio",
"source": "Here are three simple ways to improve your sleep quality tonight.",
"provider": "elevenlabs model_id=eleven_multilingual_v2 voice_id=XrExE9yKIg1WjnnlVkGX stability=0.75"
}The source property contains the text to be spoken. Replace it with any text you want to convert to speech.
The provider property defines which text-to-speech service is used and how the voice is configured. The provider string must start with "elevenlabs", followed by the configuration parameters:
You can modify these parameters to adjust the voice's characteristics. For a full list of options, see the ElevenLabs documentation.
Before using ElevenLabs, connect your API key in your Creatomate Project Settings. Follow steps 1 and 2 of this tutorial for guidance.
The audio is generated during rendering, so no separate TTS API request is required.
The subtitles
The text element for the subtitles works exactly like in Example 2, but instead of transcribing a video file, it transcribes the generated voiceover:
{
"transcript_source": "Voiceover-1"
}This references the audio element by name. Creatomate transcribes the generated voiceover and produces fully synchronized subtitles automatically.
This example shows a single composition. To build multi-scene videos where each scene has its own image, voiceover, and subtitles, you can chain multiple compositions together. Example 4 shows how to automatically adapt each scene's duration to its content.
So far, we've worked with single compositions. Now we'll chain multiple compositions together to build a multi-scene video, where each scene contains its own content and automatically adapts its duration accordingly.
In this example, we'll create a video with three scenes. Each scene contains a video clip and a text overlay. The total video length adjusts automatically based on the duration of the video files used in each scene.
The three video URLs used here are publicly accessible, so you can reuse them directly in your own tests.
Here's what we're generating:
curl -X POST https://api.creatomate.com/v2/renders \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY_HERE" \
-d '{
"output_format": "mp4",
"width": 720,
"height": 1280,
"elements": [
{
"name": "Composition-1",
"type": "composition",
"track": 1,
"time": 0,
"elements": [
{
"name": "Video-1",
"type": "video",
"track": 1,
"time": 0,
"source": "https://cdn.creatomate.com/demo/video1.mp4"
},
{
"name": "Text-1",
"type": "text",
"track": 2,
"time": 0,
"y": "55.1793%",
"width": "86.4609%",
"height": "20.4292%",
"x_alignment": "50%",
"y_alignment": "50%",
"text": "Building videos with JSON.",
"font_family": "Noto Sans",
"font_size_maximum": "6.5 vmin",
"fill_color": "#ffffff",
"stroke_color": "#333333",
"stroke_width": "1 vmin"
}
]
},
{
"name": "Composition-2",
"type": "composition",
"track": 1,
"elements": [
{
"name": "Video-2",
"type": "video",
"track": 1,
"time": 0,
"source": "https://cdn.creatomate.com/demo/video2.mp4"
},
{
"name": "Text-2",
"type": "text",
"track": 2,
"time": 0,
"y": "55.1793%",
"width": "86.4609%",
"height": "20.4292%",
"x_alignment": "50%",
"y_alignment": "50%",
"text": "Each scene adapts to its content.",
"font_family": "Noto Sans",
"font_size_maximum": "6.5 vmin",
"fill_color": "#ffffff",
"stroke_color": "#333333",
"stroke_width": "1 vmin"
}
]
},
{
"name": "Composition-3",
"type": "composition",
"track": 1,
"elements": [
{
"name": "Video-3",
"type": "video",
"track": 1,
"time": 0,
"source": "https://cdn.creatomate.com/demo/video4.mp4"
},
{
"name": "Text-3",
"type": "text",
"track": 2,
"time": 0,
"y": "55.1793%",
"width": "86.4609%",
"height": "20.4292%",
"x_alignment": "50%",
"y_alignment": "50%",
"text": "The video length is calculated automatically.",
"font_family": "Noto Sans",
"font_size_maximum": "6.5 vmin",
"fill_color": "#ffffff",
"stroke_color": "#333333",
"stroke_width": "1 vmin"
}
]
}
]
}'How scene duration is determined
Notice that none of the compositions define a duration property. When duration is omitted, it defaults to "auto", meaning the composition automatically expands to match the duration of its longest child element.
Each composition contains:
Because the video element has a fixed runtime, the composition adopts that duration. Creatomate reads the media metadata and resolves the timeline automatically during rendering.
How scenes are sequenced
All three compositions are placed on the same track ("track": 1).
Elements on the same track are sequenced one after another:
If compositions were placed on different tracks, they would overlap and play simultaneously.
The total video duration is therefore the sum of all individual scene durations.
In this example:
Total duration: 3 + 3 + 5 = 11 seconds
The timeline is resolved automatically based on the media files.
Replacing video clips with AI voiceovers
This same pattern works perfectly with AI-generated voiceovers from Example 3.
Instead of a video and text element, each composition would contain:
Because compositions expand to match their longest child element, each scene automatically adapts to the duration of the spoken text.
Tip: Because voiceovers are a bit more complex, it may help to explore them interactively. Open the Short-Form Voice Over template from the gallery and press F12 to open the source editor. You'll see how multiple compositions are structured, and any changes you make in the editor will immediately be reflected in the JSON.
These examples show how structured JSON can define not just visual elements, but the behavior of an entire video.
Instead of manually stitching clips together or calculating durations, you describe scenes, layers, and relationships, and the renderer takes care of the rest.
From here, you can apply the same principles to more advanced workflows. For detailed reference, check the RenderScript documentation, which covers every property and element type.
Since JSON is the source of every template, another practical way to learn is to open a template in the editor and press F12 to inspect its generated JSON. Modify the design visually and observe how those changes translate into structured code. It's one of the fastest ways to understand how everything connects.
Start simple, experiment, and gradually combine concepts. Once you understand how elements, tracks, and compositions interact, you can build fully automated video pipelines with surprisingly little code.