J Cole Morrison
J Cole Morrison

J Cole Morrison

Developer Advocate @HashiCorp, DevOps Enthusiast, Startup Lover, Teaching at awsdevops.io



Complete Guides:

CloudFormation

Understanding Enough awk to Search Piles of Files and Text

Posted by J Cole Morrison on .

Understanding Enough awk to Search Piles of Files and Text

Posted by J Cole Morrison on .

Understanding Enough awk to Search Piles of Files and Text

Command line tools are obviously useful, but more often than not they're JAM packed with so much functionality that it can be hard to get started. "Well, check the man page fool." Oh, okay. Thanks that's helpful. Don't get me wrong, man pages are an excellent resource, but a lot of the time they're just an alphabet soup of overwhelm.

The things is, for a lot of these tools, you don't need to know 100% of it to be productive. You can learn that small 20% that lets you get 80% of the work done. And so today, let's talk about awk - the smarter brother of grep (depending upon who you ask. Their mother loves them both though).

What is awk?

It's just a text processing tool. It does a ton. But what it's going to help you do day-to-day is more than likely:

  1. Searching through long lists of files
  2. Searching through long files
  3. Getting and molding output from those long files

A more practical idea of what it is can be explained through a scenario:

Suppose you have a long, rotating log file named entries.log with tons of ... logs. On the other hand you have Jerry the Developer who's been wrecking havoc in the cloud with his mighty blade of code leaving errors abound left and right. You need to search through your entries.log and grab the most important ones so that you can yell at him.

Obviously there's a ton of ways to analyze this, but this is an awk post. So let's move on to what it could do for you in this scenario, and by the end you'll know how it works and be able to imagine all sorts of things you can do with it. We'll use the following file as our example to work on:

entries.log

Wed Jul 16 2019 12:35:23 GMT-0700 Success: Some Message  
Wed Jul 17 2019 12:35:23 GMT-0700 Error: Some Message  
Wed Jul 17 2019 12:37:14 GMT-0700 Error: Some Important Message  
Wed Jul 18 2019 12:37:14 GMT-0700 Success: Some Message  

Using awk

So, awk separates lines of text by "columns." Those columns are made via delimiter. By default, a space is that delimiter. SO. For this line:

Wed Jul 17 2019 12:37:14 GMT-0700 Error: Some Message  

There are 9 columns. Why? Because there's 9 different pieces of text, divided by a space. AWK recognizes each of those as their own column:

Wed Jul 17 2019 12:37:14 GMT-0700 Error: Some Message  
^   ^   ^  ^    ^        ^        ^      ^    ^
$1  $2  $3 $4   $5       $6       $7     $8   $9

To see this in action, if you ran:

$ echo 'Wed Jul 17 2019 12:37:14 GMT-0700 Error: Some Message' | awk '{print $1}'

You'd get back:

Wed  

The {print $1} is an "action statement." Basically, what actions do you want awk to take on the text that we feed it? In this case, we want it to print the first column of our output. There's a ton of other things you can do with these action statements and functions, but let's not get too far from foundations here.

How it's actually useful

Okay, that's nice and all, but how does this help? Well, it lets you search through files in a much more organized way. So let's revisit our entries.log file that has these contents:

Wed Jul 16 2019 12:35:23 GMT-0700 Success: Some Message  
Wed Jul 17 2019 12:35:23 GMT-0700 Error: Some Message  
Wed Jul 17 2019 12:37:14 GMT-0700 Error: Some Important Message  
Wed Jul 18 2019 12:37:14 GMT-0700 Success: Some Message  

And let's say that this file is constantly being populated with entries and we want to search for only the error messages. How would we do it?

Well right now, we've seen an "action statement" to print a column for the search output. But the other part to awk is the "pattern." What "pattern" do we want awk to search for? Well, if we wanted it to only look for lines with the word "Error" in it...

$ cat entries.log | awk '/Error/'

And here, as you can see, the "pattern" is just a regular expression. Granted it can be other things, but we'll stick with this one type for now.

So in plain english this is saying, "Hey cat, print the output of the entries.log file and pipe it to awk." And then awk gets it and says, "Okay, now I'm going to look for all lines that have a match of the regular expression /Error/."

The result?

$ cat entries.log | awk '/Error/'
Wed Jul 17 2019 12:35:23 GMT-0700 Error: Some Message  
Wed Jul 17 2019 12:37:14 GMT-0700 Error: Some Important Message  

What if you just wanted to see the error messages? Well, then you'd need to combine a pattern AND an action statement like so:

$ cat entries.log | awk -F "Error: " '/Error/ {print $2}'
Some Message  
Some Important Message  

In plain english - cat does its usual bit. It outputs the contents of the entries.log file. The | once again pipes the output to our awk command. Once awk gets the output, it says,

"Okay, given this file, I'm going to use Error: (with the space) as the delimiter for my columns. I know I usually use a normal space, but the -F option tells me to use whatever this gung-ho developer passes in as the delimiter."

Which means that to awk, it's now going to see that error file and its columns like this:

Wed Jul 17 2019 12:37:14 GMT-0700 Error: Some Message  
^                                 ^
$1                                $2

"I'm going to search through all of the output from entries.log and look for lines that match the expression /Error/. For any matches, I'm going to print the second column of that line."

Meaning that our command and its output will look like so:

$ cat entries.log | awk -F "Error: " '/Error/ {print $2}'
Some Message  
Some Important Message  

And then of course, if you were looking for a specific message, maybe an important one, you can chain patterns together:

$ cat entries.log | awk -F "Error: " '/Error/ && /Important/ {print $2}'
Some Important Message  

Granted, you can always make more complex regular expression. This one is just more readable for our purposes.

Okay, and so that's neat. We can now take a file and search it as if each line in the file is a row. And in each row, it has columns that are delimited with a space by default, or with whatever you want by using F. We also know that, when giving this stuff over to awk we can have it use patterns to filter the input and then action-statements to do stuff with that output.

So, one more bonus thing. What if you just want to count all the errors in our file so that you can yell at Jerry the Developer for his incompetence?

$ cat entries.log | awk -F "Error: " '/Error/ {print $2}' | wc -l
2  

The wc -l part of the command takes the lines that awk found and counts them. In our case, there's 2 lines that match our awk criteria as noted by wc.

Summary

So the process of using awk to search through files (or directories) is as follows:

  1. Give it some text output.

    $ cat entries.log

  2. Pipe it to awk

    $ cat entries.log | awk

  3. Tell awk what you want to use as columns in the lines

    $ cat entries.log | awk -F "Error: "

    Now it'll treat the full string of Error: as its delimiter for lines instead of a single space.

  4. Given a pattern to search for

    $ cat entries.log | awk -F "Error: " '/Error/'

  5. Give it something to do with the found output

    $ cat entries.log | awk -F "Error: " '/Error/ {print $2}'

    Now it'll both find lines in the entries.log that has the word Error in it and print the second column of the line.

  6. Yell at Jerry the Developer:

    "Dammit Jerry, there's 2 errors today."

--

Alrighty, there's our quick, practical overview of awk. Yes, there's a ton more that you can do with it (as outlined in the awk man page). But even with just this basic knowledge you can get done 80% of what you need to.

J Cole Morrison

J Cole Morrison

http://start.jcolemorrison.com

Developer Advocate @HashiCorp, DevOps Enthusiast, Startup Lover, Teaching at awsdevops.io

View Comments...
J Cole Morrison

J Cole Morrison

Developer Advocate @HashiCorp, DevOps Enthusiast, Startup Lover, Teaching at awsdevops.io



Complete Guides: