Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
Here's a page that I have scanned from the printed material, converted to a PDF, and run the recognized text on it with the Searchable Image option. So we are seeing a view of the actual scan on top, but there is text behind it, so we can actually select things, we can search for words and find them, and so on. Are we good to go? Almost, not quite, because though it may look good, it might not be accurate, because it is just a computer after all, and it's making guesses about what these letters are.
There's no little human being saying, "Oh, this is the word 'plants'." It's just sort of guessing at it. So if you want to make sure about what the text is that's behind these letters, then you want to run through OCR Suspects. That's over here in the Recognize Text section of the Tools pane. So go to OCR Suspects and choose Find First Suspect. It has selected the item number of the Bonsai Tree down here, with its weird little selection preview, and according to the dialog box Find Element, it's saying, "Here is what the scan look like. Here's a close-up of it." And if you click inside here, you'll see what the text looks like without the image in front of it--in other words, as though you had chosen Clear Scan as the method for OCR.
Let's zoom in a bit with Command+Plus or Ctrl+Plus, so we can see this a little closer, because it looks like HP10-CP1 to me. But actually, do you notice that it is HP, maybe a lowercase L and then an O, and then there is a lower case L and then an O over there? So if you are actually doing a search for HP10 in this catalog, you would not find this one as a match, which is pretty good that Acrobat is recognizing that it might have guessed wrong on this word. So when you click in it, you see the actual text, and it gives you chance to correct it, so this should be actually HP 1 and 0.
I am typing by hand here, and this is CP. I want to make sure that's a 1. Then you click Accept and Find, and go on to the next one. So it's saying that it's not sure about the word "Blue" with the quote mark. Again, you click in it to see the text that it's using. Oh yeah, it guessed right. It's "Blue. That's fine. Accept and Find, and so on. You go on throughout the entire document, clicking inside here and making sure that it's correct or not. In my experience, if while you're scanning and converting to OCR you choose the Clear Scan method of OCR, it doesn't do a good job as detecting where the suspects are.
So to ensure that the text is as accurate as possible, I would recommend that whenever you do the OCR that you choose the Searchable Image option. When you do so, Acrobat has something to compare against, when you ask it to find possible suspects in words that are not quite accurate. It's a great help.
Get unlimited access to all courses for just $25/month.Become a member
82 Video lessons · 80280 Viewers
80 Video lessons · 132678 Viewers
52 Video lessons · 66237 Viewers
59 Video lessons · 52011 Viewers
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.
Your file was successfully uploaded.