Skip to content

Instantly share code, notes, and snippets.

@co89757
Created July 19, 2017 19:23
Show Gist options
  • Save co89757/6e5150c1347ce2e96f941415880e0053 to your computer and use it in GitHub Desktop.
Save co89757/6e5150c1347ce2e96f941415880e0053 to your computer and use it in GitHub Desktop.
awk tutorial

Conditional Filter Rules in awk

the general form is

pattern {_action }

Filter based on regex

/<regex>/ {
	#actions 
}

NOTE you can combine multiple patterns with boolean operators (&& , ||, !) e.g.

/mary/ && !(/lamb/ || /had/) {
	a =   $0 
	print "this is a test. a is " a
}

Note varibles in awk scripts are like Python variables, they are created on-the-go.

Expression Ranges

Similar to sed , you can limit your range between two patterns /<start_regex>/, /<end_regex>/ { #action }

Relational expressions

More fine-grained matching is enabled, such as matching based on a particular field

  • expr ~ /regex/ --- expr matches the regex
  • expr !~ /regex/ --- invert matches
  • expr in array_name --- expr is a key in the array
  • expr comp_operator expr --- basic string or numberic comparison

Conditional in awk (just like C)

Just like C, awk supports control flow syntax like if/else, for .. in .., for ( pre_expr; while_expr; post_expr) {} and continue

Skipping records and files

Use next and nextfile to skip records or files

Awk functions

You can define your own function like

function func_name(param1 [, param2, ...]) {
	action 
}

Arrays in awk

Arrays in AWK scripts are associative. This means that each array element is stored as a key-value pair, resulting in three major differences when compared to C

  • Arrays are allocated and grow dynamically as space is needed.
  • Arrays can be sparse; you can have an array with a value at index 711 and a value at index 1116 with nothing between them.
  • note You cannot populate an array in a single operation except by splitting a string e.g
BEGIN {
        my_array[0] = "Partridge";
        my_array[1] = "pear";
        my_array[2] = "tree";
        my_array["David"] = "Cassidy";
 
        for ( my_index in my_array ) {
                print my_index "=" my_array[my_index];
        }
}

the array-iterator-style for statement in AWK scripts iterates through the array keys (indices) rather than through the array values.

Note the keys are always stored as a string even if they only contain numbers, to compare them as numbers, +0 to force them into numeric values

Create Arrays with split

split is a built-in function to split a string into an array by a given separator. It has the form: count = split(string, arrayname, regexp )

e.g.

BEGIN {
        arr_len = split( "Mary lamb freezer", my_array, / / );
}

string concat in awk is just place two spaced strings together stringadd = string1 string2

To delete an array element, use delete arrayname[key]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment