Creating metadata with Speech Search

One of Premiere Pro CS4's most impressive new features is its ability to look inside an audio file and recognize the text and the dialog and convert that into metadata that can then be searched in the Project panel so you can locate clips more easily. Now it sounds impossible and I am gotta to be honest with you, it is one of the more frustrating features. It does work better than you would think, but it is very buggy in that a lot of times you can't get it to work. When you select the file, in the Metadata panel, if it says Transcribe, then you're in good shape.

It will work for you. But there are a lot of file formats, such as a very popular AIF file format, you cannot do this on. You can see that Transcribe is blocked out. A lot of video file formats that I've worked with, you can't transcribe the audio in a video format. Also, a lot of times, even file formats in the exact files that you can transcribe, oftentimes, this will be grayed out, for whatever reason. I've actually got to record this tutorial 4-5 times, because sometimes it'll work and sometimes it won't and I'll practice it before I record it and I will work and then when I go to record the movie, this Transcribe button will be grayed out for whatever reason, the exact same files, the exact same projects.

So I have got my fingers crossed right now. I'm really grateful that, for whatever reason, it's choosing to work, right at this second. But you can transcribe the text by selecting the clip and pressing the Transcribe button, or you could go to the Clip menu by selecting the clip in the Project panel>Audio Options>Transcribe to Text. So I am going to go ahead and choose this option, and under the Speech Transcription Options, I'll go ahead and say OK. Now what's interesting is that this actually adds this to the Adobe Media Encoder.

So what we got to do is just go ahead and click the Start Queue button and it will take, with this clip, about 7 seconds. So it does take a little bit. I mean, it actually takes more time to load the waveform into the Queue than it actually does to render it, but you give it a few seconds here and there we go. We go back here and then you could see here. There we go. It's transcribed. Now the clip actually says in your dream job movie that I mentioned in chapter 1 that we're going to use the assets of throughout this training series.

I actually had an opportunity to do some voiceover work and this is me saying this: "Your service sucks. "I'm paying you to do a job, so do it!" So I'm an angry customer that's complaining, or whatever. So it says, "Your service sucks. I'm paying you to do a job, so do it" and it translated it as Sox, as in, like, capital S-o-x as in the White Sox, the Red Sox. I am not really sure why that is. But what I can do is click in that word and just type in 'sucks' and so we can go in and we can double-click in these words and change what they have to say and what's really cool about this feature too is that when I click on a word, it gives me the exact time code, even to the sample rate level, the exact spot where that word is said and also the duration, how long that word lasts.

So not only is it good for metadata and searching for a particular word or phrase or whatever, but it's also really good to be able to see exactly where a word is said. So if you need to perform a precise edit, it already tells you exactly where to go to do that. I find that, from my experience, with very clear dialog, it gets about 60% of the job right and that's not the problem. The problem is that the feature, in and of itself, doesn't often work. So I give this feature probably 2 stars out of 10, because it doesn't often work and if it did work, I'd probably give it about 6-7 stars out of 10.

It's a really phenomenal idea though and hopefully we'll see this improve in future versions of Premiere.

