So you just published your conference talk videos on YouTube, and you want to add captions. YouTube will use speech to text and try to add their own captions for you, but it's pretty hit or miss, especially for technical talks where there's a lot of proper nouns or jargon in the talk. The good news is there's a much better solution, and one that provides benefits to your in-person audience as well!
You can hire a captioning service to transcribe the speakers' presentations live during the event, providing a display of the captions on a separate display at the conference as well.
There's a real person at the other end of the screen, listening to everything the speakers are saying and typing really quickly. (There's a special keyboard and years of training that go into this, but there are plenty of resources online if you're interested in learning!)
I've worked with a few people who do captioning for these events, and they're always fantastic. If you run a conference, check out StenoKnight and White Coat Captioning and hire them for your next event! They can either send someone out in person to be at the event and do the captioning right there, or they can connect via Skype and do it all remotely.
Typically they'll provide you with a web page that you can pull up on a large monitor at the venue to display the captions in real time for the in-person audience. This is of course great for people who are deaf or hard of hearing, but is also really helpful for audience members who are not native English speakers, since often times the presenters will talk too quickly to be understood.
At the end of the conference, ask the captioner to send you the plain text files of everything they typed. This will just be a regular text file, no fancy formatting. Here's where the magic happens.
Adding Captions to YouTube Videos
YouTube has a special feature which will match up typed text with the speech it recognizes in videos, doing all the work of syncing your captions with the video timings automatically. If you have a clear audio recording and accurate text, it does a surprisingly good job. Here are the detailed steps to take your plain text file and turn it into captions on YouTube.
First, launch YouTube Studio.
Click on "Videos" to show the list of all the videos uploaded.
Choose the video you want to add captions to.
Click the sidebar option "Transcriptions".
Then click "Add Language". We don’t want to use the autogenerated captions at all, since the live captionist does a way better job.
That adds a new row to this table. Click "Add" under "Subtitles".
A new window opens. Choose the "transcribe and auto-sync" option. That will let us paste in the transcribed text.
Copy the captions of the presenter into the box. Make sure the text you paste from starts and ends with the words the video starts with, or YouTube gets confused trying to line things up.
Once you do that, click "Set Timings".
This part takes a few minutes, so go make a coffee or tea, since you’ll need it for the next step. You can refresh this page to check if it’s done. It will look like this while YouTube is busy with it.
Finally when YouTube has finished thinking, it will appear as a draft you can edit.
Now YouTube has done its magic, and matched up the typed text with the spoken words! It usually does a pretty good job of it. Click on the draft and you can see what it's done.
You could probably publish this at this point, but I like to do a manual review of everything to make sure it looks good. You can play the video to review the captions and timings (check out the keyboard shortcuts which can really help speed up this step).
- SHIFT+SPACE - start/stop
- SHIFT+LEFT / SHIFT+RIGHT - skip forward or backwards by a second
While reviewing, I’m mainly looking for the following:
Did YouTube leave any dangling words that could otherwise fit into the previous caption?
For example, this would look better if the word "fit," was in the previous caption frame instead of starting a new caption frame with the end of that phrase.
In that case, just move the word to the other caption frame.
Are there any obvious typos on technical terms or proper nouns?
The live captionists do a pretty good job, but occasionally some typos slip through.
If the presenter has any long gaps in between sentences, sometimes YouTube gets confused about the timing.
You can find some of these spots by visually looking at the waveform compared to the length of text in the caption. (In this particular example this happens to be accurate but it usually looks similar to this when it’s wrong.)
Once you’re happy with the transcript, click "Save Changes".
Now you need to delete the auto transcript so that only the good transcript is left.
You first have to click "Unpublish".
Then you can click "Delete Draft".
Now there is only one set of captions, the good one! You are finished, congrats!
Now when people watch the video on YouTube and enable captions, they'll be seeing what the live captionist typed during the event!