Join Scott Simpson for an in-depth discussion in this video Working with text: cut and sort, part of Linux Tips Weekly.
- [Narrator] Two helpful tools for working with text on Linux are cut and sort. Cut lets us pick out a particular piece of information from a text string, and sort lets us reorder lines of text in particular ways. Let's take a look at cut first. If you're working with a file like a log, or something that has information presented in a predictable way, you can use cut to, well, cut out a predetermined piece of information and return it to you, or send it along to another program. Let's use cut to pull the names of users on our system out of the password file.
This might be something you could do as part of a periodic audit of accounts. Let's take a look at the ETC password file again to remind ourselves of how it's formatted. The username here is at the beginning followed by a colon. That colon acts as a delimiter between fields. And with cut, we can use the field position and delimiter to pull out a particular field. All right, cut, dash d, for delimiter, a colon, and dash f and one, for field number one, and give the path to the file, ETC, password.
And there we go. We can do the same thing with, say, field three here. The user ID, or any other field that we'd want to use. While we could use awk or sed to grab this information here, cut is more specialized to doing this kind of thing, so at large scale, it'll be faster. It's not necessarily more or less correct to use one tool over the other for this kind of thing. It's just about efficiencies and what you're familiar with. You can also cut output by using byte positions with the dash p option, or by character position with a dash c option.
Let's get the first ten bytes of each line of the password file with cut dash b, one through ten, ETC, password. That works, but it's not too useful here. I can see how it would be helpful in some situations though. There's more in the man page for cut if you want to explore that. When you're working with lines of text, or other lists of items, sometimes you want them to be in a particular order. A list of file sizes in the folder, for example, or user names pulled from a configuration file, might need to be sorted in order to be easier to use.
The sort command by itself will take a list of items and return them sorted in various ways. I'll write sort here, and I'll give it a few things to work with, let's say, penguin, Linux, terminal, and command. When I'm done, I'll press control d. And then I'll see an alphabetically sorted list of things that I wrote. Sort can be used in a piped command, or with a file name. A minute ago, we pulled the user names out of the password file.
They're in there in an order that isn't immediately intuitive, especially if we wanted to be able to quickly scan the list to look for a particular name. So let's recall that command and pipe it to sort. Now, these user names are sorted alphabetically, ignoring special characters. We can also reverse the sort with sort dash r. And if you check out the man pages, you'll see there are other specialty types of sorting you can do. One option I find really helpful is the human numeric sort which we get with dash h.
This feature can order file sizes from smallest to largest rather than trying to alphabetize the size prefixes, which doesn't make any sense. I like to use it when I have a list of items and their size in order to help identify large items more quickly by putting them at the end. So if I were to write DU dash HD1 here, to see the sizes of directories in my home directory, they're sorted by the names of the folders. If I pipe that into sort dash h though, then they're reordered by size instead, which can be helpful.
Cut and sort aren't big complicated tools that have their own language or a million options. They're small, focused tools that make it easy to work with text in the command line. Take some time to get familiar with them, so you can rely on them when you need to.
Note: Because this is an ongoing series, viewers will not receive a certificate of completion.