Do you enjoy being trapped in a room and listening to your own voice? Me neither. That's our fate, though, if we need to transcribe videos or audio. Captions and transcripts are vital for improving accessibility to your online presence. Most potential customers are browsing the internet with the sound off, while others might have specific requirements. In the olden days - i.e. a couple of years ago - transcribing meant painstakingly working our way through the audio, typing everything, and trying not to fall asleep in the process. Now, technology has made it much easier. Although the robots are coming for our jobs, they're also willing to take dictation.
These transcription tools have improved dramatically over the last year. Getting the right one will save you a huge amount of time and increase your audience. You'll probably save quite a bit of money, too. As the tools are constantly emerging, evolving, and disappearing, it can be hard to know which one to choose. In this post, I'll review my three favourite transcription tools - Descript, Sonix.ai, and Otter.ai - and explain their strengths and weaknesses. You'll then be in a better position to choose one that's right for you.
Although aimed mainly at podcasters, Descript is a great tool for any type of transcription. Unlike most other transcription tools, you need to download a desktop app to use most of the features. Using the app is a lot more stable as you’re not relying on a web browser to handle large files. However, the software needs at least 20Gb of free disk space to work properly.
So far, I've used it for creating video transcripts and captions. I found the accuracy to be about 90%. It’s affected (naturally) by the quality of the recording and also your accent. Occasionally, I achieved better results by adopting an (atrocious) American accent. I also used Descript for a collaboration and the team features were very effective. It was easy to share specific projects and also keep track of progress.
There are heaps of other features to make editing much easier for you, such as overdubbing and creating speech from text. When you spot a mistake in your audio recording, you can correct it by typing. Descript uses other examples of your speech to recreate the words. Yes, it’s very clever! There's also a library of voices for you to deploy. With a couple of clicks, you can sound like the gravelly-voiced guy who does the Hollywood trailers.
A recent addition is the ability to record and edit videos. It's simple to make a screencast, add some extra images. then introduce a few different voices. The simple tools mean you can quickly remove ums and ahhs and fix any bloopers. Although this is an exciting development, it still needs a bit of work. Unless you're lucky enough to have a quiet recording space, you'd need to export the video and apply noise reduction techniques in another program. For now, I think Screencast-o-Matic remains the leader for screencasting. I reckon Descript isn't far behind, though. The developers are constantly innovating and they're highly responsive to customer feedback. These are the benefits of a small and nimble company over some of the behemoths in the market.
Descript's impressive features give you a lot of power and versatility at a low price. The cost of the basic package is $12 per month, which includes 10 hours of transcription. I’ve looked at a lot of AI transcription tools and this is an amazing deal. There’s also a generous free trial with three hours’ transcription included. For overdubbing, batch export, and some other fancy features, it'll cost you $24 - still very reasonable.
If you have existing transcripts and want to sync them with a video to produce captions, Descript will do this for free.
Unfortunately, Descript has one major drawback for me: there’s no custom dictionary. This means you’ll have to go through and manually correct any specialist vocabulary or words that Descript routinely mistranscribes. This is unlikely to be a problem for many users, but it’s a limitation for more technical people. I think they’ll add this feature quite soon. Sadly, in the meantime, it slows me down too much on intensive projects. It's especially frustrating when I have to correct the spelling of my own name in every transcript.
Update: Descript now has a custom dictionary, but I've not yet tried it. This is a great example of them responding to feedback from users.
In terms of straightforward transcription, Sonix works in a similar way to Descript, but it has one big advantage: a custom dictionary. You can preload up to 400 words and Sonix transcribes them perfectly. It can also handle 30 different languages. I achieve around 93% accuracy and the excellent web-based software makes correcting transcripts really speedy. Occasionally, it hangs - as you might expect with anything that relies on an internet connection. I've never lost any work, though.
I mainly use Sonix for transcribing videos and creating captions. It suits my workflow really well and I can even export my transcripts and captions in 30 different languages – it magically translates all the text in a matter of minutes. This is an extraordinary feature that’ll be a huge benefit to some users. Unlike Descript, Sonix don't charge extra for batch exporting. I let the transcription elves churn through my videos, then download all the files in one go at the end. However, Sonix do charge the standard transcription rate for syncing to an existing transcript. With Descript, it's free.
Sonix.ai is currently my primary transcription tool. Its speed and accuracy means I can generate captions and transcripts for a 90-minute course in less than an hour. A couple of years ago, this would have taken an entire (tedious) week.
The pay-as-you-go option costs $10 per hour. In return for a monthly subscription of $22, you’ll pay only $5 per hour. And you get various other features for organising and sharing your transcripts. There’s a free trial where you’ll get 30 minutes’ transcription for free (if you subscribe after clicking this link, we both get 100 minutes free).
I gave Otter.ai a quick try last year and quickly abandoned it. The accuracy was terrible and the transcript was a garbled mess. When I tried it again last week, I was amazed - the transcript was near perfect. This is the power of machine learning-based transcription: the more people use it, the better it becomes. Of course, it depends on who is using it. Tools that are trained exclusively on North American accents will struggle to understand anybody else.
I needed to provide live transcription for a webinar and Otter.ai is currently the only tool that can do this with Zoom. Although the transcription was excellent, the integration isn't ideal. Participants have to open Otter in a separate window and position it alongside the Zoom window. This is tricky if they have a small screen. In this case, I was giving a software tutorial, so it was important for everyone to see the details. However, Otter.ai still improved accessibility for several participants and quickly generated captions that I added to the replay afterwards. You can do this without Otter If you have a Zoom Pro account and enable the cloud recording.
The integration between Otter.ai and Zoom will undoubtedly become more elegant, or perhaps other players will develop their own add-ons. In the meantime, the live transcription in Google Meet is pretty good, although they're only just adding Zoom-like features, such as breakout rooms. For me, Zoom remains the best choice for workshop-style webinars.
Like Sonix.ai, Otter.ai benefits from a custom dictionary. If you're collaborating with a big team, you can add everyone's name, thereby improving the accuracy of meeting transcriptions. With the Pro account, you can add up to 200 names and 200 specialist terms. Otter.ai also does a good job of identifying different speakers and flagging them in the transcripts.
The Otter.ai interface is easy to navigate, although it's not quite as streamlined as Sonix.ai. A handy additional feature, though, the automatic generation of keywords at the top of every transcript. This is useful if you want to get a quick sense of what was covered, without having the read the whole thing again.
There are dozens - maybe hundreds - of other transcription tools out there. Although I've dabbled with many of them, here I'm just showcasing those I've used extensively and I believe will still be around in a few years time. Everything is moving so quickly that certain tools can be rendered obsolete in a matter of months.
There's no overall winner for me, I use all of them for different purposes. Fortunately, they're all relatively inexpensive and offer flexible pricing plans. Sonix.ai is available on a pay as you go basis, while all them offer a free trial or a basic free account.
Give them a try and see what works for you. Whichever one you choose, it'll be an awful lot faster than doing it yourself.