How to Transcribe Video & Audio Files using Whisper and Make

Laura van Sinderen | 18 July 2024 | 6 min read

Laura van Sinderen

In this tutorial, you'll learn how to use OpenAI's Whisper with Make.com to automatically generate transcripts based on a video or audio clip.

Imagine you have a 10-minute video clip that you need to transform into a concise written summary. In the past, this would typically require hiring a transcriber to convert the audio into text, followed by hiring an editor to summarize the transcription.

Fortunately, with OpenAI's advanced AI models, this is a thing of the past. It's easier than ever to convert long-form audio recordings into text using AI-powered auto-transcription. In this guide, we'll focus on using OpenAI's Whisper to transcribe recordings into subtitles, which can then be integrated into a larger automation workflow using Make.com.

This video has been automatically transcribed using speech-to-text AI,
with animated subtitles added by Creatomate.

One common application of AI transcription is creating subtitles for social media videos. With this specific need in mind, we'll also explore an alternative approach using Creatomate. This video API has built-in support for generating both transcriptions and animated subtitles.

Let's begin by diving into how to use Whisper with Make.com first. Then we'll see how Creatomate handles this same scenario with a real-life application – transcribing videos for social media.

Prerequisites

Before we begin, make sure you have:

A Make.com account. Sign up for free if you don't have one yet.
An OpenAI account with an API key. You need it to use Whisper.

How to create transcripts using Whisper and Make.com

In real-world applications, you might use any of the 2000+ supported apps to create your own custom automation workflows. But for simplicity, we'll set up a very basic Make.com scenario demonstrating Whisper the most straightforward way possible:

Let's dive in and start building the transcription workflow!

1. Input the video

The goal for this step is to provide the scenario with a video for transcription. In this case, we're using a "Basic Trigger" just for testing; it's the most basic way to run a scenario without integrating anything else. But if you're already familiar with the platform, feel free to use any other app right away. As long as you have a video URL or file to use in the next step, you're good to go.

Add the Tools app with the Basic trigger. Next, create an item named Input-Video and insert the following URL: https://cdn.creatomate.com/demo/the-daily-stoic-podcast.mp4

Then, click OK:

Right-click the module and select Run this module only:

We can now use this video to set up the rest of the workflow.

2. Download the file

In this step, we'll get the video file from the URL provided in step 1. This is necessary because the Whisper module can't handle URLs; it needs an actual file.

Click + to add another module. Search for the HTTP app and select the Get a file action.

In the URL field, select Tools - Basic trigger -> input-video. Then, click OK:

To make sure Make.com can download the video file, let's click the Run once button located in the bottom left corner:

Note: The maximum file size depends on your Make subscription plan. If you encounter an error indicating that the HTTP response exceeds the allowed maximum file size, try using a smaller file or consider upgrading your Make subscription.

Once the test is successful (when the module turns green), we're ready to move on.

3. Create a transcript using Whisper

Now let's move on to the task at hand: transcribing the video and generating written text.

Add the OpenAI (ChatGPT, Whisper, DALL-E) app with the Create a Transcription (Whisper) action.

Configure the module as follows:

Connection: Select your OpenAI connection or add a new one using your API key.
File: Choose HTTP - Get a file (it should be pre-selected by Make.com)
Model: Set to Whisper-1
Prompt: We don't need it, so leave this field blank.
Response Format: Select Text

Once you're done, click OK:

Note: Whisper only supports files smaller than 25MB. If your file is larger, it's best to break it up into chunks of 25MB's. For more information, refer to OpenAI's documentation.

In the next step, we'll do a test run to see how the transcript looks and how you can use it with other apps.

4. Further process the transcript

By now, everything should be set up correctly. So let's click the Run once button to execute the entire workflow:

Once the test succeeds, you can view Whisper's transcript by clicking the magnifying glass icon:

Next, you can save it as a text file in Google Drive or Dropbox, send it by email, or use it in any other way. Just add another app to your scenario and connect OpenAI (ChatGPT, Whisper, DALL-E) - Create a Transcription (Whisper) -> Text, as shown in the screenshot below:

And that's all there is to it! You've learned how to create transcripts from video or audio files with Whisper and Make.com in just a few easy steps.

If you want to add subtitles to your videos too, keep reading.

Automatically add subtitles to your videos

While transcripts are incredibly useful, you might want to take your video content to the next level with stylish, animated subtitles. This is where Creatomate comes in, a powerful video and image generation API that includes auto-transcription among many other tools:

A handful of animated subtitle styles as supported by Creatomate.

Creatomate is officially supported by Make.com, so all you need to do is add the "Creatomate" module to your Make scenario. The API will take a video file, transcribe it, and add subtitles according to your template – all from one Make.com module.

A video that has been automatically transcribed using Creatomate and Make.com.

We've written a detailed, step-by-step tutorial on how to set up an automatic subtitle generation workflow. It will guide you through creating a template and integrating Creatomate with Make.com:

👉 Automatically Add Subtitles to Videos using Make.com