Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Text file helpers

From: Unix for Mac OS X Users

Video: Text file helpers

In this movie we will take a look at some Unix commands that can help us when working with text files. Now when I say text file, I am really talking about a file that has nothing in it but text. That's very different from a Microsoft Word document which contains both formatting information and text. These commands would not help us in those cases. For example, if we try to do a word count of a Microsoft Word document, it would try and count all of the formatting information as well as the actual text of the document. So in all these cases we are really talking about working with pure text files. So the three that we are going to be looking at are wc, which is short for word count, sort for sorting lines, and unique, uniq. There is no ue on the end of it and that's for filtering in and out repeated lines.

Text file helpers

In this movie we will take a look at some Unix commands that can help us when working with text files. Now when I say text file, I am really talking about a file that has nothing in it but text. That's very different from a Microsoft Word document which contains both formatting information and text. These commands would not help us in those cases. For example, if we try to do a word count of a Microsoft Word document, it would try and count all of the formatting information as well as the actual text of the document. So in all these cases we are really talking about working with pure text files. So the three that we are going to be looking at are wc, which is short for word count, sort for sorting lines, and unique, uniq. There is no ue on the end of it and that's for filtering in and out repeated lines.

Let's take a look at all three and see how they work. So in Terminal, notice that I am already inside my user directory and I'm inside my unix_files directory. I can do ls-la on them and we will see that I've added a new file here called fruit.txt. Take a look at the contents of that file. It's a very simple file. It just contains a list of fruit. Notice a couple of things about it though. Notice that they are not sorted. Notice that there are repeats in there and notice in particular that strawberry is repeated immediately one after the other. So you can create your own. You can pause the movie and copy this list down if you want. They're also included in the Exercise Files.

The important thing is to make sure that it's not sorted, that it contains duplicates, and at least one of those duplicates are one immediately following another. So let's try this out now. Let's look at our first one. We have wc. That's for word count and all we do is say the file name that we want. wc fruit. It comes up and it gives me 3 numbers followed by the name of the file. The first number is the number of lines in the file, the second number is the number of words in the file, and a word is a set of characters with spaces on either side of it. That's what it considers a word. And then the third value is the characters in the file, the number of letters that have been used.

So again, if we take a look at the file, cat fruit, you can see that there are 13 lines, each line has one word, so there's also 13 words, and if we add it all up, there are 99 characters in that file. Let's try it with something a little more useful. Remember we have this lorem ipsum text, which is just fake Latin text, just a long document. Let's do wc lorem_ipsum.txt. You will see that it comes up and it tells me that that file has 523 lines, 5,289 words, and 36,232 characters.

Now I just want to point out something here, if we do head lorem_ipsum. We can take a peek at the beginning. Each one of these lines actually has a line return after it. I did that on purpose when I created this file. There is a line return at the end of each of these lines. Now, with a lot of paragraphs, that would not be true, right? They would not be wrapped like this, they would just continue, and you wouldn't have a line return, you would just keep going. So every paragraph would be considered one line. So keep that in mind. When we are talking about lines. We're not talking about the lines that you see on your screen necessarily; we are talking about a line up until it gets to a line return.

So that's really all there is to doing word count. It's a very simple utility that can be very useful. Let's try out working with the next one which is sort, and all we do for sort is we say sort the lines in a file. sort fruit.txt. So now it gives us that same list back but it's been sorted. Now notice it did not actually change the file itself. If I do, the original file is untouched. What it did was just take that file, sort it, and output the results to me on the screen. Now I could copy and paste it or, as we will see in the next chapter, I could then send that to a file.

We will see how to do that. I want to make sure you realize that it's not actually sorting the contents of the text file itself. All right! Let's try that with our lorem_ipsum. Let's try sort lorem_ipsum.txt. Again, each one of these has a line return after it, so each one is considered a line. If they were paragraphs, then this would not behave the same way. Notice that we are seeing all of the Vs down here at the bottom. I just want to scroll up a bit through the alphabet, so we get up here to the As. Here we have the As. Now, notice this. Lowercase a and then we start over with capital V. Capital letters and lowercase letters are treated differently by sort by default.

Now we can pass-in the f option and we can change that. So let's do -f. I will clear the screen, just so we have everything gone from before. And now look. You see that they are mixed up, the uppercase and lowercase. That's the f option. There is also a reverse sort. Let's take out the f option and do an r. That's for reverse. Now we see a lot of blank lines here. If we scroll up you'll see the blank lines count as lines. They get sorted before the As. So there is the As, we keep going up. Here is the Cs, and the Ds, and so on. So that's reverse sort and it also helps you to see that these blank lines are still included as lines and then the last thing let's do is as if we sort, sort fruit reverse first, just so you can see that it reverse sorts those.

That's nice and easy to see. And let's do sort -u and that's the option for sorted and uniq. So we get rid of all those duplicates at the same time. sorted and unique. Now each entry just occurs once. Notice what it did there. It sorted them, and it made sure we only had each one, one time. So sort has its built in uniq option. There are a couple of other options. You can look at the man pages to see what those are, but sort is a really, really useful tool. It actually works well with other commands. We'll see that later. We'll see how to use two commands together. The sort is really useful for that.

Another one that works well with other commands is uniq. Uniq allows you to filter in or out repeated lines in the file. The default is to de-dupe the repeated lines. Let me show you by using it on our fruit file and it's easy to see. Notice that Apple is still in there twice. It did not de-dupe that, but strawberry is only in there once because strawberry is in there immediately after. So it does not sort them and de-dupe them. It only goes down the line and says, ah, is this line the same as the next line? If it is, reduce it down to one, de- dupe it till we only have one line.

But it does not jump ahead and look around to other ones. We would have to sort it if we wanted that behavior. But one of the nice features of uniq is that it has a couple of other options we can pass in. uniq-d fruit will return the lines that are repeated. So it tells me oh, there was one line that was repeated in here. It was strawberry. That's a nice handy tool. There is also uniq-u and this just shows me the unduplicated lines. That's why we have the u. So now I am seeing everything, but strawberry, strawberry has been completely taken out.

Not just de-duped, but actually removed. So those are complementary of each other. Between unique -d and uniq -u you get the same thing that you get if you did the total list from uniq. Having no option shows you the entire list both duplicated and non-duplicated but with no duplicates immediately following each other. So again, if your goal is really to completely de-dupe it, then you either need to use sort with the uniq option or you will need to use uniq in combination with other commands which we will learn how to do in a little bit later.

So these are all three really simple tools, but they can be very powerful when you're working with lots of data in text files.

Show transcript

This video is part of

Image for Unix for Mac OS X Users
Unix for Mac OS X Users

82 video lessons · 25431 viewers

Kevin Skoglund
Author

 
Expand all | Collapse all
  1. 3m 57s
    1. Introduction
      1m 14s
    2. Using the exercise files
      2m 43s
  2. 32m 2s
    1. What is Unix?
      7m 27s
    2. The terminal application
      4m 23s
    3. Logging in and using the command prompt
      5m 19s
    4. Command structure
      5m 22s
    5. Kernel and shells
      5m 25s
    6. Unix manual pages
      4m 6s
  3. 15m 58s
    1. The working directory
      2m 49s
    2. Listing files and directories
      3m 59s
    3. Moving around the filesystem
      4m 58s
    4. Filesystem organization
      4m 12s
  4. 1h 4m
    1. Naming files
      5m 41s
    2. Creating files
      2m 19s
    3. Unix text editors
      6m 39s
    4. Reading files
      5m 35s
    5. Reading portions of files
      3m 27s
    6. Creating directories
      2m 40s
    7. Moving and renaming files and directories
      8m 32s
    8. Copying files and directories
      3m 7s
    9. Deleting files and directories
      3m 38s
    10. Finder aliases in Unix
      4m 10s
    11. Hard links
      5m 30s
    12. Symbolic links
      6m 36s
    13. Searching for files and directories
      6m 32s
  5. 34m 58s
    1. Who am I?
      4m 3s
    2. Unix groups
      1m 52s
    3. File and directory ownership
      6m 41s
    4. File and directory permissions
      4m 27s
    5. Setting permissions using alpha notation
      6m 49s
    6. Setting permissions using octal notation
      3m 49s
    7. The root user
      1m 57s
    8. sudo and sudoers
      5m 20s
  6. 52m 34s
    1. Command basics
      4m 4s
    2. The PATH variable
      4m 13s
    3. System information commands
      3m 40s
    4. Disk information commands
      6m 8s
    5. Viewing processes
      5m 0s
    6. Monitoring processes
      3m 36s
    7. Stopping processes
      3m 19s
    8. Text file helpers
      6m 50s
    9. Utility programs
      7m 28s
    10. Using the command history
      8m 16s
  7. 20m 39s
    1. Standard input and standard output
      1m 24s
    2. Directing output to a file
      4m 13s
    3. Appending to a file
      2m 44s
    4. Directing input from a file
      5m 28s
    5. Piping output to input
      4m 40s
    6. Suppressing output
      2m 10s
  8. 41m 28s
    1. Profile, login, and resource files
      9m 11s
    2. Setting command aliases
      6m 59s
    3. Setting and exporting environment variables
      4m 54s
    4. Setting the PATH variable
      6m 10s
    5. Configuring history with variables
      6m 17s
    6. Customizing the command prompt
      6m 5s
    7. Logout file
      1m 52s
  9. 1h 25m
    1. grep: Searching for matching expressions
      5m 21s
    2. grep: Multiple files, other input
      4m 28s
    3. grep: Coloring matched text
      2m 57s
    4. Introduction to regular expressions
      3m 22s
    5. Regular expressions: Basic syntax
      3m 19s
    6. Using regular expressions with grep
      5m 20s
    7. tr: Translating characters
      8m 17s
    8. tr: Deleting and squeezing characters
      5m 30s
    9. sed: Stream editor
      7m 45s
    10. sed: Regular expressions and back-references
      7m 8s
    11. cut: Cutting select text portions
      7m 42s
    12. diff: Comparing files
      4m 35s
    13. diff: Alternative formats
      4m 30s
    14. xargs: Passing argument lists to commands
      7m 25s
    15. xargs: Usage examples
      7m 59s
  10. 42m 25s
    1. Finder integration
      4m 45s
    2. Clipboard integration
      5m 5s
    3. Screen capture
      3m 42s
    4. Shut down, reboot, and sleep
      3m 34s
    5. Text to speech
      2m 36s
    6. Spotlight integration: Searching metadata
      3m 41s
    7. Spotlight integration: Metadata attributes
      4m 24s
    8. Using AppleScript
      5m 23s
    9. System configurations: Viewing and setting
      5m 51s
    10. System configurations: Examples
      3m 24s
  11. 1m 26s
    1. Conclusion
      1m 26s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Join now "Already a member? Log in

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Unix for Mac OS X Users.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.