Unix for Mac OS X Users
Illustration by John Hersey

Text file helpers


From:

Unix for Mac OS X Users

with Kevin Skoglund

Start your free trial now, and begin learning software, business and creative skills—anytime, anywhere—with video instruction from recognized industry experts.

Start Your Free Trial Now

Video: Text file helpers

In this movie we will take a look at some Unix commands that can help us when working with text files. Now when I say text file, I am really talking about a file that has nothing in it but text. That's very different from a Microsoft Word document which contains both formatting information and text. These commands would not help us in those cases. For example, if we try to do a word count of a Microsoft Word document, it would try and count all of the formatting information as well as the actual text of the document. So in all these cases we are really talking about working with pure text files. So the three that we are going to be looking at are wc, which is short for word count, sort for sorting lines, and unique, uniq. There is no ue on the end of it and that's for filtering in and out repeated lines.
Expand all | Collapse all
  1. 3m 57s
    1. Introduction
      1m 14s
    2. Using the exercise files
      2m 43s
  2. 32m 2s
    1. What is Unix?
      7m 27s
    2. The terminal application
      4m 23s
    3. Logging in and using the command prompt
      5m 19s
    4. Command structure
      5m 22s
    5. Kernel and shells
      5m 25s
    6. Unix manual pages
      4m 6s
  3. 15m 58s
    1. The working directory
      2m 49s
    2. Listing files and directories
      3m 59s
    3. Moving around the filesystem
      4m 58s
    4. Filesystem organization
      4m 12s
  4. 1h 4m
    1. Naming files
      5m 41s
    2. Creating files
      2m 19s
    3. Unix text editors
      6m 39s
    4. Reading files
      5m 35s
    5. Reading portions of files
      3m 27s
    6. Creating directories
      2m 40s
    7. Moving and renaming files and directories
      8m 32s
    8. Copying files and directories
      3m 7s
    9. Deleting files and directories
      3m 38s
    10. Finder aliases in Unix
      4m 10s
    11. Hard links
      5m 30s
    12. Symbolic links
      6m 36s
    13. Searching for files and directories
      6m 32s
  5. 34m 58s
    1. Who am I?
      4m 3s
    2. Unix groups
      1m 52s
    3. File and directory ownership
      6m 41s
    4. File and directory permissions
      4m 27s
    5. Setting permissions using alpha notation
      6m 49s
    6. Setting permissions using octal notation
      3m 49s
    7. The root user
      1m 57s
    8. sudo and sudoers
      5m 20s
  6. 52m 34s
    1. Command basics
      4m 4s
    2. The PATH variable
      4m 13s
    3. System information commands
      3m 40s
    4. Disk information commands
      6m 8s
    5. Viewing processes
      5m 0s
    6. Monitoring processes
      3m 36s
    7. Stopping processes
      3m 19s
    8. Text file helpers
      6m 50s
    9. Utility programs
      7m 28s
    10. Using the command history
      8m 16s
  7. 20m 39s
    1. Standard input and standard output
      1m 24s
    2. Directing output to a file
      4m 13s
    3. Appending to a file
      2m 44s
    4. Directing input from a file
      5m 28s
    5. Piping output to input
      4m 40s
    6. Suppressing output
      2m 10s
  8. 41m 28s
    1. Profile, login, and resource files
      9m 11s
    2. Setting command aliases
      6m 59s
    3. Setting and exporting environment variables
      4m 54s
    4. Setting the PATH variable
      6m 10s
    5. Configuring history with variables
      6m 17s
    6. Customizing the command prompt
      6m 5s
    7. Logout file
      1m 52s
  9. 1h 25m
    1. grep: Searching for matching expressions
      5m 21s
    2. grep: Multiple files, other input
      4m 28s
    3. grep: Coloring matched text
      2m 57s
    4. Introduction to regular expressions
      3m 22s
    5. Regular expressions: Basic syntax
      3m 19s
    6. Using regular expressions with grep
      5m 20s
    7. tr: Translating characters
      8m 17s
    8. tr: Deleting and squeezing characters
      5m 30s
    9. sed: Stream editor
      7m 45s
    10. sed: Regular expressions and back-references
      7m 8s
    11. cut: Cutting select text portions
      7m 42s
    12. diff: Comparing files
      4m 35s
    13. diff: Alternative formats
      4m 30s
    14. xargs: Passing argument lists to commands
      7m 25s
    15. xargs: Usage examples
      7m 59s
  10. 42m 25s
    1. Finder integration
      4m 45s
    2. Clipboard integration
      5m 5s
    3. Screen capture
      3m 42s
    4. Shut down, reboot, and sleep
      3m 34s
    5. Text to speech
      2m 36s
    6. Spotlight integration: Searching metadata
      3m 41s
    7. Spotlight integration: Metadata attributes
      4m 24s
    8. Using AppleScript
      5m 23s
    9. System configurations: Viewing and setting
      5m 51s
    10. System configurations: Examples
      3m 24s
  11. 1m 26s
    1. Conclusion
      1m 26s

please wait ...
Watch the Online Video Course Unix for Mac OS X Users
6h 35m Beginner Apr 29, 2011

Viewers: in countries Watching now:

Unix for Mac OS X Users unlocks the powerful capabilities of Unix that underlie Mac OS X, teaching how to use command-line syntax to perform common tasks such as file management, data entry, and text manipulation. The course teaches Unix from the ground up, starting with the basics of the command line and graduating to powerful, advanced tools like grep, sed, and xargs. The course shows how to enter commands in Terminal to create, move, copy, and delete files and folders; change file ownership and permissions; view and stop command and application processes; find and edit data within files; and use command-line shortcuts to speed up workflow. Exercise files accompany the course.

Topics include:
  • Moving around the file system
  • Creating and reading files
  • Copying, moving, renaming, and deleting files and directories
  • Creating hard links and symbolic links
  • Understanding user identity, file ownership, and sudo
  • Setting file permissions with alpha and octal notation
  • Changing the PATH variable
  • Using the command history
  • Directing input and output
  • Configuring the Unix working environment
  • Searching and replacing using grep and regular expressions
  • Manipulating text with tr, sed, and cut
  • Integrating with the Finder, Spotlight, and AppleScript
Subject:
IT
Software:
Mac OS X Unix
Author:
Kevin Skoglund

Text file helpers

In this movie we will take a look at some Unix commands that can help us when working with text files. Now when I say text file, I am really talking about a file that has nothing in it but text. That's very different from a Microsoft Word document which contains both formatting information and text. These commands would not help us in those cases. For example, if we try to do a word count of a Microsoft Word document, it would try and count all of the formatting information as well as the actual text of the document. So in all these cases we are really talking about working with pure text files. So the three that we are going to be looking at are wc, which is short for word count, sort for sorting lines, and unique, uniq. There is no ue on the end of it and that's for filtering in and out repeated lines.

Let's take a look at all three and see how they work. So in Terminal, notice that I am already inside my user directory and I'm inside my unix_files directory. I can do ls-la on them and we will see that I've added a new file here called fruit.txt. Take a look at the contents of that file. It's a very simple file. It just contains a list of fruit. Notice a couple of things about it though. Notice that they are not sorted. Notice that there are repeats in there and notice in particular that strawberry is repeated immediately one after the other. So you can create your own. You can pause the movie and copy this list down if you want. They're also included in the Exercise Files.

The important thing is to make sure that it's not sorted, that it contains duplicates, and at least one of those duplicates are one immediately following another. So let's try this out now. Let's look at our first one. We have wc. That's for word count and all we do is say the file name that we want. wc fruit. It comes up and it gives me 3 numbers followed by the name of the file. The first number is the number of lines in the file, the second number is the number of words in the file, and a word is a set of characters with spaces on either side of it. That's what it considers a word. And then the third value is the characters in the file, the number of letters that have been used.

So again, if we take a look at the file, cat fruit, you can see that there are 13 lines, each line has one word, so there's also 13 words, and if we add it all up, there are 99 characters in that file. Let's try it with something a little more useful. Remember we have this lorem ipsum text, which is just fake Latin text, just a long document. Let's do wc lorem_ipsum.txt. You will see that it comes up and it tells me that that file has 523 lines, 5,289 words, and 36,232 characters.

Now I just want to point out something here, if we do head lorem_ipsum. We can take a peek at the beginning. Each one of these lines actually has a line return after it. I did that on purpose when I created this file. There is a line return at the end of each of these lines. Now, with a lot of paragraphs, that would not be true, right? They would not be wrapped like this, they would just continue, and you wouldn't have a line return, you would just keep going. So every paragraph would be considered one line. So keep that in mind. When we are talking about lines. We're not talking about the lines that you see on your screen necessarily; we are talking about a line up until it gets to a line return.

So that's really all there is to doing word count. It's a very simple utility that can be very useful. Let's try out working with the next one which is sort, and all we do for sort is we say sort the lines in a file. sort fruit.txt. So now it gives us that same list back but it's been sorted. Now notice it did not actually change the file itself. If I do, the original file is untouched. What it did was just take that file, sort it, and output the results to me on the screen. Now I could copy and paste it or, as we will see in the next chapter, I could then send that to a file.

We will see how to do that. I want to make sure you realize that it's not actually sorting the contents of the text file itself. All right! Let's try that with our lorem_ipsum. Let's try sort lorem_ipsum.txt. Again, each one of these has a line return after it, so each one is considered a line. If they were paragraphs, then this would not behave the same way. Notice that we are seeing all of the Vs down here at the bottom. I just want to scroll up a bit through the alphabet, so we get up here to the As. Here we have the As. Now, notice this. Lowercase a and then we start over with capital V. Capital letters and lowercase letters are treated differently by sort by default.

Now we can pass-in the f option and we can change that. So let's do -f. I will clear the screen, just so we have everything gone from before. And now look. You see that they are mixed up, the uppercase and lowercase. That's the f option. There is also a reverse sort. Let's take out the f option and do an r. That's for reverse. Now we see a lot of blank lines here. If we scroll up you'll see the blank lines count as lines. They get sorted before the As. So there is the As, we keep going up. Here is the Cs, and the Ds, and so on. So that's reverse sort and it also helps you to see that these blank lines are still included as lines and then the last thing let's do is as if we sort, sort fruit reverse first, just so you can see that it reverse sorts those.

That's nice and easy to see. And let's do sort -u and that's the option for sorted and uniq. So we get rid of all those duplicates at the same time. sorted and unique. Now each entry just occurs once. Notice what it did there. It sorted them, and it made sure we only had each one, one time. So sort has its built in uniq option. There are a couple of other options. You can look at the man pages to see what those are, but sort is a really, really useful tool. It actually works well with other commands. We'll see that later. We'll see how to use two commands together. The sort is really useful for that.

Another one that works well with other commands is uniq. Uniq allows you to filter in or out repeated lines in the file. The default is to de-dupe the repeated lines. Let me show you by using it on our fruit file and it's easy to see. Notice that Apple is still in there twice. It did not de-dupe that, but strawberry is only in there once because strawberry is in there immediately after. So it does not sort them and de-dupe them. It only goes down the line and says, ah, is this line the same as the next line? If it is, reduce it down to one, de- dupe it till we only have one line.

But it does not jump ahead and look around to other ones. We would have to sort it if we wanted that behavior. But one of the nice features of uniq is that it has a couple of other options we can pass in. uniq-d fruit will return the lines that are repeated. So it tells me oh, there was one line that was repeated in here. It was strawberry. That's a nice handy tool. There is also uniq-u and this just shows me the unduplicated lines. That's why we have the u. So now I am seeing everything, but strawberry, strawberry has been completely taken out.

Not just de-duped, but actually removed. So those are complementary of each other. Between unique -d and uniq -u you get the same thing that you get if you did the total list from uniq. Having no option shows you the entire list both duplicated and non-duplicated but with no duplicates immediately following each other. So again, if your goal is really to completely de-dupe it, then you either need to use sort with the uniq option or you will need to use uniq in combination with other commands which we will learn how to do in a little bit later.

So these are all three really simple tools, but they can be very powerful when you're working with lots of data in text files.

Find answers to the most frequently asked questions about Unix for Mac OS X Users .


Expand all | Collapse all
please wait ...
Q: The exercise files for the following movies appear to be broken:
07_02_files
07_03_files
07_04_files
07_05_files
08_03_files

Is there something wrong with them?
These exercises include one or more "dot files", whose file names start with a period. These files are normally hidden from view by the Finder.  So that they would show up in the Finder, the period has been removed from the file names. Additionally, "_example" has been added at the end of the file name to make it clear that the file will not work as-is. To make the dot files usable, either:

1) Open the file in a text editor to view its contents. Note that it may not be possible to double-click the file to open it because there is no file extension (such as .txt).
2) Resave the file under a new name (usually by choosing File > Save As), adding a "." to the beginning of the file name and removing "_example" from the end.

OR

1) Copy and rename the file from the Unix command line using the techniques discussed in this course. Rename the file by adding a "." to the start and removing "_example" from the end. Include the "-i" option to prevent overwriting an existing file unexpectedly.
Example:  cp -i ~/Desktop/Exercise\ Files/Chapter_07/07_02_files/bashrc_example ~/.bashrc
 
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ .

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Join now Already a member? Log in

* Estimated file size

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Unix for Mac OS X Users.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member ?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferences from the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Learn more, save more. Upgrade today!

Get our Annual Premium Membership at our best savings yet.

Upgrade to our Annual Premium Membership today and get even more value from your lynda.com subscription:

“In a way, I feel like you are rooting for me. Like you are really invested in my experience, and want me to get as much out of these courses as possible this is the best place to start on your journey to learning new material.”— Nadine H.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.