Note to Self: Transcribing Podcasts

I’ve been listening to a lot of podcasts lately, and I wanted to try making transcripts of one series, because, well, podcasts are a terrible way to store any information that you actually want to retrieve. And then a friend on Twitter was lamenting about how the process of transcription sucks, and another Twitter friend pointed out glitchdigital/video-transcriber, and I decided to try it.


Installation was a bit of a headache, though the “Getting Started” instructions for video-transcriber were mostly pretty thorough:

  • Register for an IBM Bluemix Account.
  • Install Node.js and FFMpeg, using Homebrew, which I already had installed for something else.
  • “Install Node.js dependences in the usual way,” an instruction which incensed me. I think I figured out what “the usual way” was via StackOverflow, but I now see that Dave MacFarland of Treehouse has written a nicely detailed tutorial: How to Install Node.js and NPM on a Mac.

Here are the Terminal commands that I used during installation (not including installing Homebrew):
$ brew install node ffmpeg $ cd /Applications/video-transcriber-master $ npm install


Here is the command (all on one line) that I use every time I need to start the transcription server:
$ WATSON_SPEECH_TO_TEXT_API_USERNAME="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" WATSON_SPEECH_TO_TEXT_API_PASSWORD="xxxxxxxxxxxx" npm start

The software turned out to have a bug in it—or maybe I installed it wrong?—that causes it to crash after it processes more than about 18 minutes of audio. But once I chopped my MP3s into <18-minute chunks, it worked okay, and it’s kind of fun to watch.

I recorded a short video so you can see/hear the app in action. Note that it starts transcribing before the audio starts (around 0:38), runs a bit faster than the sound, and adjusts its transcription as new words are added.

A 16-minute clip takes about 8 minutes to transcribe.


You can’t edit the text while the transcription is occurring. When it finishes, however, the tool switches into editing mode.

All the words it decided were suspect are highlighted in yellow, and you can click into the text to edit it. Most of the time when you click, the audio will start playing from around that location, so you can verify what was actually said.

screenshot of video-transcriber showing highlighted suspect words

Once you’re done editing, you can click “Transcript reviewed,” which does…nothing. Except prevent you from editing the transcript any further? There’s no way to re-enter editing mode.

You can then select your transcribed text, copy, and paste into a real text editor. The timestamps do not get included on your clipboard, alas; one of the three issues filed on GitHub as of this writing is a request for an option to export with the timestamps.

This app is very MVP, and rather fragile. It crashed on clips longer than 18 minutes, and sometimes it just dies for no apparent reason, and you have to restart it from Terminal and refresh the page. If this happens, your previously transcribed text is not saved, so I quickly got into the habit of copying chunks of transcript out early and often, while I worked on it.


Basically, it works, and it’s less tedious than typing it all out by hand, or starting and stopping some more typical MP3 player a million times while you make corrections.

The two Code Newbie transcripts I’ve created so far are on GitHub: Each one took me perhaps twice as long to edit as the actual running time of the audio, but I expect to get faster at it.

Leave a Reply

%d bloggers like this: