Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
Now, let's say that you are starting with a scan, just a regular scan saved as a JPEG or a TIF, and you want to convert it to a PDF but also convert the text in that scan to searchable text. Let's see the best kind of settings that you should set up in Acrobat to make that happen. Let's take a look at the scans that we will be working with in this chapter. I have them open up here in Photoshop. So I scanned something as a bitmap, but this should look familiar to you. Its just one page of that employee manual that we've been working with. Notice its size.
It's almost 2 MB. And then I have a page from the magazine catalog, whatever would call it. Unfortunately, it got a little tilted while I was scanning it. And this page is an RGB, and it is 24 MB. The resolution for this one, by the way in case you are wondering, is 300 pixels per inch, and the letter scan is 400 pixels per inch. But because it's a bitmap and not RGB, then it's a smaller file size. But in either case, they are pretty big files. So I am going to close these up, and we will convert these to PDFs in Acrobat.
So I switch to Acrobat. What we want to go is down to Create > PDF from File, because these are your already files, and locate the file. And I'll start with the magazinescan.tif. That's the color one. So I selected here in the Open dialog box, and I want to access the settings before we do so. This is I think a bug in Adobe Acrobat, that unless you change the files of type-- or in a Mac it would say format--to just the kind of format that you are looking for, then the Settings button is inaccessible.
So switch it to that. Or of course, if you're looking at JPEG scans, switch this to JPEG. And now you can click the Settings button, and let's look at the different optimization options. So when we convert this TIF to a PDF, we can choose to also have it run OCR and optimize the scan, so that's what that check mark is for. So I suggest you to turn it on, because that will save you bunch of steps down the line. Then go to Settings for that, and let's look at these settings here. So Optimize Scanned PDF. First of all, it's going to apply adaptive compression, which is very intelligent compression of different parts of the page, depending on the content.
So in Color/Grayscale parts of it, it will apply JPEG2000 compression. For Monochrome, it will apply this kind of compression, and it might lose some image data, but it usually doesn't make that much difference in monochrome. But you have your choice of choosing a different kind of compression level if you want. Or just use the slider to say these are very important historic documents. I want you to use high-quality when you convert them PDF. Or these are just some receipts that I am going to send in with my expense report.
You can make them small size. Then look at the Filters, and let's click Edit. Here at the default settings for the filters, when you choose to optimize a scanned PDF. First of all, Deskew: deskew means if you happen to have tilted and didn't get a perfectly straight scan, then it will straighten it up, which is something you have always want to turn on. Now it's not that smart. It will only go up to about 10 to 15 degrees off center, so if it's a lot rotated, then it's not going to work. However, remember that if you scan something and say sideways or upside down, you can always rotate the page in 90-degree increments.
So I will leave Deskew turned on to fix the slight tilting of the pages as I scan them in. Then Background removal, if your pages have a lot of dirt and dust and scratches and stuff on them, or maybe there's some see-through from a color image on the other side of the page, you might want to turn that on. The default is for it to be off. And the Descreen is on. Descreen is when you take something that's been printed and has a halftone screen, sometimes when you scan it, you get a really weird pattern known as Moire pattern; this can Descreen it as it scans it.
Sometimes this kind of degrades the image a little bit, so if your scan was not from printed material, you might want to turn this off. I will leave it on for now. And then Text Sharpening, so the text sharpening means, sometimes when you scan something the characters get like a little halo around them, so this will sharpen that up and remove that halo. Sometimes it goes a little overboard, and they don't look like letters anymore, which means that your recognized text isn't going to work that well. I am just going to leave that at the defaults, but these are what the filters are for. And if you make a scan using these filters and convert it to a PDF and it's not right, then come back and try again with some different settings here.
So those are the filters, and let's look at the OCR options. First of all, we definitely want it to Make Searchable, that means apply OCR. Adobe is trying to use more English-like language; not everyone understands what OCR means. So Make Searchable means that you can actually search for a word in the PDF, and it would find it. The primary language is English, and the PDF output style is searchable image. If you want to change either one of those, you just click the Edit button. So the primary language there it should be is English. What that means is that it's expecting the text to be in English.
If the text is in a different language, definitely choose the different language here. And then the PDF Output Style, should it be searchable image or clear scan? These are sort of two different outputs, and I am going to show that in more detail in the next video. We can just leave it at Searchable Image. Either one of these will make for a searchable PDF; one just gives truer results than the other. I will click Cancel to leave all these things as is, and then I can just click OK. And now it's retained all those settings for Scan Optimization OCR.
But we can leave all the Color Management settings as is, and now I am going to click OK, and it will convert that TIF file to a searchable PDF. So it went through its little engine, and the PDF opens up. It still looks somewhat like a scan, which is good. I mean, sometimes when you convert a scan to a PDF, you want it to look like the original. However, it is a PDF. Let's zoom out with Fit in Window. We can select text, so it actually has done the OCR. We could do say a search. I am going to press Ctrl+F and say search for plant, and it found plants and plants over here.
We can continue searching if we wanted to, and let's look at the file size. If I go to File > Properties, the size of this PDF is 3.57K. Do you remember what it was in Photoshop? It was like 24 MB. But it looks just like it did. Didn't it? And it's straight this time, so it does a fantastic job of converting scanned documents into searchable PDFs.
Get unlimited access to all courses for just $25/month.Become a member
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.
Your file was successfully uploaded.