Join Jeff Sengstack for an in-depth discussion in this video Turning spoken dialogue into searchable metadata, part of Soundbooth CS5 Essential Training.
Soundbooth has a speech recognition module. It can analyze spoken words and convert them with reasonable accuracy into text. It can also differentiate between individuals to associate dialog with each person doing the speaking. This is a fast way to create a transcript, to create text that people can use inside Closed Captioning software or for subtitles. So, let me show you how that works. We have a dialog here with the declaration of independence in it, that just one person speaking me. We are going to have Soundbooth analyze this and try to convert the spoken word into the text.
To do that, we need to go to what's called the Metadata panel, and that's not visible here. It's not visible in the default view. So, you go Window, Metadata. Now Metadata is information that's stored inside the file as text and typically you don't necessarily see unless you can look at the Metadata using a panel like this, and it does not interfere with audio. So, here is an audio file, but inside of this file, it's a little sort capsule of metadata. And right now, there is very little Metadata associated with this because we have not added any. So, what I want to do now is analyze this text.
In the Metadata panel, there is this one thing it's the Speech Analysis and at the bottom there is a little button and whenever you see a button or menu command that is ... after it, just for information. That means another menu is going to open if I click on it, and there is a little dialog box. It says what are options, which language are you going to use. Well actually, three are different English- language versions that it does searching on. We will take the English - U.S. version as oppose to the UK or Canada version. It ask for reference script. If there was one, it would show up here. This is something that you would create inside Premiere Pro.
And then it says what kind of quality do you want to search on? High quality, which takes longer or medium quality, which is faster. Well we will go with high quality because well we want to try to get the best job done here the first time through. There is a little option here to identify speakers. Since there is only one person speaking, we don't need to check that box. If there was more than one person speaking, you would check that so it try to decide that this is person one, and this is person two based upon the analysis of how their waveforms fit. Now I will click OK, when we do this the analysis will begin. Typically, the analysis takes about as long as the clip is and if you know this little segment you are going to say that doesn't look quite right.
I am going to play this, and as a place you will notice that each word will be highlighted as I go through here. So, you actually can search on words like Happiness would be over there, for example. Well let's go to the beginning and play and see how those words compare to the actual words, and you will see that they don't necessarily completely fit. (Audio playing.) It did pretty well, but it missed a few, a little bit of irony here that they are endowed that they aren't owned by their creator, sometimes funny things are stated inadvertently.
But what you can do now is you can fix this, and the thing is you can't, let's say, copy all of this and do a text editing program, fix-it and then paste it back. That's because each word here is connected to time inside this file. So, copying and pasting an entire file back in here won't work. We need to stick with what we have here. So, you have to edit literally one word at a time. So, I am going to show you, how that works. You click a word, and it turns blue. You click again, and it becomes editable. There is one we hold, and I could keep on going like this, but what I want to say is We hold these truths to be self-evident.
So, I am a missing a word here to be actually I guess can turn solved to self, self evident. Hold these truths, and I want to fix these other guys. I want to show you one thing that you can add words ahead of or after one so, and the pursuit here is missing. We need the word the, so I click on that word and then right-click, say Insert Word Before. I can do Before, After, Delete or Merge, but we will do Insert Word Before and the pursuit and the pursuit of happiness.
So, let me edit the rest of these guys so you can see the finished product when I am done. So, now I am almost done. I want to show you one more thing. There is with certain unalienable, and it's one word they made into three slang. Right-click on this one and just delete that one, right-click on this one delete that one and then replace legal with unalienable. And now we are done among these are Life, Liberty and the pursuit of Happiness Now that we are done we can always jump right to a word or if we have a very long dialog or narration, which something that you want to do the text analysis on first, go get a coffee while it's don't the analysis come back, and then you want to be able to jump right to a word or phrase or just type in the word you are looking for like happiness, and it will search the entire set of metadata to define any instances of that word and in this case, it jumps right to happiness.
If I click on it, I will click play. I loop it. It goes over and over again. Enough of that. So you can see how advantages it can be to do speech analysis because if you have a very long dialog or a log interview or something, and you really don't want sit down there and transcribe it or remind yourself what the person said. You just know that may be 15 minutes in or so the person have that pithy sound bite that you want to you as well as one that say 30 seconds and want at 30 minutes.
You just do the whole speech analysis creation, and then later on you can then go find that word that jumped out at you to help you track that segment down right away so you can edit it. Let me just tell you one another thing you can do. You can right-click on any one of these words and say Copy All, and then when you Copy All you can then go to another program let's say like Word and then Paste, which would be Paste or Ctrl+V or Command+V, and you can put in all the text, and this could be used let's say for Closed Captioning or for subtitles.
Then we back to Soundbooth, just have that open. So, that's basically how you can do speech analysis plus you can see, I think, the advantages of speech analysis knowing that it isn't perfect.
- Setting up recording hardware
- Recording vocals and instruments
- Viewing audio waveforms and spectral frequency displays
- Copying, cutting and pasting audio
- Stretching time and shifting pitch
- Looping tracks
- Identifying and removing noise
- Enhancing audio with Soundbooth effects
- Mixing audio in multitrack mode
- Customizing prebuilt scores
- Working with Soundbooth files in Premiere Pro projects