Suppose you have been elected secretary of a bowling league, and one of your jobs is to update the membership list of the league's private web page with the members' names, contact information, bowling statistics, and other information. But new information is always coming in, and it's messy. Text files, Excel spreadsheets, web forms, email messages. And it's inconsistent. You'd like to find a way to format that varied information into something clean and consistent you can use as a database. And you'd like to automate that process as much as possible. Enter AWK.
So, why the name? AWK, A W K, is named after its inventors Al Aho, Peter Weinberger and Brian Kernighan, who first developed it at Bell Labs in the late 1970s. It was designed to perform simple data filtering and manipulation tasks but quickly become popular and gained more power in subsequent versions of the UNIX operating system. Where is AWK available? You'll find AWK preinstalled on most UNIX variants, including Mac OS X. Just open command line and type AWK, and you're good to go.
If AWK is not pre-installed in your system you may be able to download it as an install package. To use AWK under Windows, you can install Cygwin from cygwin.com which provides a complete UNIX style shell environment, or you can install UnixUtils from unxutils.sourceforge.net, which provides individual UNIX tools that run directly at the Windows Command Prompt. Both of these are free, but many other alternatives, both commercial and open source, are available. So, what is AWK good for? AWK is great for manipulating text files which are divided into records or lines, and in which each line is divided into fields.
It doesn't require that every line have the same format. It's very happy with plain English text. But if the file is more structured, like a spreadsheet or database, there are more things you can do with it. You can easily write small one line AWK programs right on the command line to do things like find interesting records in a data file, output only selected fields, and perform manipulations like swapping the order of fields. Or combining multiple fields into one. More sophisticated data operations including joins and merges are possible as well. So what is AWK not so good for? AWK is not so good for manipulating binary data such as Word and Excel files, though if you can export data into text format, you might find that AWK is useful.
It is not a web programming language and is not well suited for parsing HTML, although it is very good at generating HTML. So, what versions of AWK exist? The original version of AWK, just called AWK, was developed at Bell Labs and released with version 7 UNIX in 1978. A version called nAWK, for New AWK, with several additional features was released with System V UNIX in 1985. And the new project developed its own version of AWK, called gAWK in the 1990s. And as Linux has grown in popularity, gAWK has become more and more powerful.
gAWK is the most common version of AWK in use today. In fact, on Mac OS X and almost all Linux systems today, if you type AWK on the command line, you will in fact get gAWK. The examples in this course were all developed using gAWK. But I`ve tried to restrict myself to features that are also available in older versions of AWK.
In this course, award-winning author and teacher David D. Levine shows you how to use AWK to read and write data in a variety of formats, produce reports, and automate repetitive tasks. He reviews the nuts and bolts of the language, such as field separators, pattern matching, variables, operators, expressions, and control structures; functions available for manipulating data; and integration with other programs like Excel.
- Determine what AWK is.
- Recognize how to write an AWK program.
- Determine how to use AWK command-line flags.
- Identify how to specify field and record separators with variables.
- Distinguish how to change a CVS file to a tab-separated one.
- Break down how to work with operators and arrays.
- Discover how to format output.
- Interpret how to string data with functions.