Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
Something that I need to do a lot of the time with PDFs is to get text out of them to use elsewhere, maybe to email somebody over here what the manual said or a client sends me a PDF of a previous issue and says, "We don't know where the original file is but can you grab text out this PDF? We need to pick up an article for the project that you working on." Sure, no problem. So whether it is just a few sentences or a caption or all the text in a PDF, so whether it's small or large, there is different ways that you can grab text out of here and you have to remember that once text is part of a PDF, it sometimes morphs into different shapes. Sometimes it's not even text any longer because it got rasterized, but I do have a few tips to share with you that I think will help you in successfully extracting a lot of different kinds of text.
So I have opened up a typical document, this is the newsletter that I have been working with for various videos in this title and I have jumped to page 2. Let's say that I just need to grab this first paragraph worth of text and paste it elsewhere. So the tool that you want to use to select text and copy it, as long as you don't want to edit it, is you just use the plain old Select tool next to Mr. Spanky up over here and then you drag and select it. Now hopefully as you drag, it will select the text and then you choose Copy.
Sometimes when you drag, it selects something way over here or way down here and then you know that you are sunk. But in this case, we just select the text. Now, I'm going to paste it into Word and show you my first gripe with this is that usually every single line ends with a paragraph return and so you are forever doing this space and then click and then this, space and so on. Now you may be very good at find/change and be able to quickly get rid of those end of line returns and replace with the space but why bother? You know what I do? I love this little trick. You can clean it up right in Acrobat before you make your copy. You do that by going to the Advanced menu > Accessibility and choose TouchUp Reading Order. The Reading Order and all of these commands have to do with making PDF accessible to people who are using screen readers or some other kind of assistant devices to access the content they are in. But one very useful part of TouchUp Reading Order is this little trick.
Once you open up this TouchUp Reading Order dialog box, your cursor changes to a crosshair. All you need to do is drag the crosshair in a rectangle around the text that you are trying to copy. Now notice how it sort of outlines each one of these things with a little tiny text frame. So this is often what happens to text in a PDF is that it gets broken up. Once this is all selected, come over here to TouchUp Reading Order and say, hey, this is all text. Click the Text button and it turns into one large box, so that's all. Click Close and now let's try and select that text again.
I select the text, copy it, we will jump back over to Word and paste and it comes in nice and clean. It's picked up some formatting from something there but to me the most important thing is that I don't have to deal with all of the returns. So let's get rid of this and try something else. I found another bad boy further down. Let's say this article. Let's say that we want to grab all this text. Watch what happens as I start to drag my Selection tool across the top. Why did it select this thing below this bar? How do you fix this? Again, with the Reading Order tool.
Let's see what would happen if we didn't have the Reading Order tool. Let me see if I can grab it again. Here is what we want. Let's just try that, see what we got. Edit > Copy. Switch to Word, Paste. 'Hello C3ers, we will offer for the latest information.' All right, so it skipped these two things for some reason, I'm not quite sure. We want to grab all this text and create a new Word document, so we were not picking up any formatting by mistake. And this time, let's again go to Advanced > Accessibility. We are going to use the same exact thing that we just did. I'm going to move this guy out of the way a bit.
What we want to do is drag a rectangle around the contents that we want to say this is a text frame. So that's text and this is text and sometimes this will happen as that you say this is text and it takes over everything else. Don't worry about it. You can overlap these things. So I'm just going to start right here, come up to this part and say and that is text too. When you are done, close the TouchUp text. All right. So now we switch back to our Select tool and we will go ahead and select all of this content and Copy, switch back to Word, Paste and it comes in complete without any end of line returns. It has got a little-- looks like Word's interpreting that as a section break. There is our dates and the final bit after that last horizontal line.
Now, one casualty is that sometimes you will lose returns, so you do have to come in here and enter the occasional return but I find that less onerous than having to go and delete the Returns at the end of every single line. So it's that little feature, the TouchUp Reading Order dialog box. It's under Advanced > Accessibility. That can really save your whole a lot of tedious cleanup time. Now what if you wanted to export all the text here? All right, you could click somewhere in this document and then go to Edit > Select All and then Copy and Paste but your results are going to be not too exciting. By the way, I don't know if you remember but I had an earlier video where I mentioned that if you choose Select All with the Selection tool, it will get all of the text in the entire document as long as you are viewing the document as continuous pages.
If you are viewing the document as single pages, meaning that you get this little guy happening on the right as you drag down and you click inside of a page and you choose Select All, it only selects all the text on that one single page, so keep that in mind. If it's not working how you think it should be working. I am going to put it back to Continuous. But regardless of you are looking at a continuous or a single page, the best way to get all the text out is to let Acrobat do the heavy lifting. Go to the File menu, go to Export and choose one of these guys. I have had the most success with Rich Text Format. It does the better job in my eyes than Word and if it was a PDF created from Word then this is the way to go, is to export it right back to Word and most likely it will look exactly like it did when you first exported it. It would be like having an Export to InDesign command and having it boom! resurrected in InDesign.
That would be so cool. But we don't have that. So if you want to get the text out, you want to choose Rich Text Format. Now, you do have some settings available to you. I'm going to switch over to the Desktop and notice that says RTF for RTF file. That's what it's going to save it but I want to show you Settings. Then you can say I just want to get all the text. I really don't care about the page layout. If you say Retain Page Layout, it's going to put in like tabs and spaces to try to get stuff aligned on the right and try to make two columns. That is normally not what you want. I usually don't want to include any comments that are in there and then under Images, I usually do not want to Include Images in the export format, so I turn those off and then I just click OK and then you say Save and it does all the pages and let's see what that looks like in Word. So in Word, I'm going to open up that RTF file and the first page. Wow, that's exciting, isn't it? But we know that there is something in here because look at down here. There is actually 10 pages in this document and there is probably strange characters that's forcing Word like some sort of Page Break or so.
You see that the text came over without the returns. Looks like it did a pretty good job. Even-- I believe that's a table. There is also a number of plug-ins and there is probably scripts around that you can drop into Acrobat that offer a little bit more feature-wise as far as exporting your text out to RTF, that you might want to keep an eye out for. The one last thing I want to mention about this that you can export multiple files. If it is something that you do all the time or you have a backlog of 5000 documents that you need to export all the text out from, you can choose Export Multiple Files. Let's add the open files, this one here and let's say we had about 30 more of them. And then when you say OK, you get some Output Options about how you want them exported to, should they overwrite existing files, do you want them to automatically rename. They do not concatenate into a single file. Each PDF gets exported to its own RTF file or whichever format that you are talking about down here. So you can export them to all to Rich Text Format as we have been talking about or any of these other formats.
So, the Export commands are pretty powerful in Acrobat and I encourage you to use them. I mean, why bother doing it manually when Acrobat has it built right in?
Get unlimited access to all courses for just $25/month.Become a member
82 Video lessons · 90669 Viewers
80 Video lessons · 137962 Viewers
59 Video lessons · 56731 Viewers
52 Video lessons · 70350 Viewers
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.