tapyu/.awk_cheatsheet.md

## .awk_cheatsheet.md

      
    Raw
  

              .awk_cheatsheet.md
            
          
    AWK cheatsheet

Cheatsheet of the AWK programming language

  
## cheatsheet.md

      
    Raw
  

              cheatsheet.md
            
          
    Basic syntax


awk 'BEGIN { actions_before_processing }; condition { actions_for_matching_lines }; END { actions_after_processing }' input_file


BEGIN { actions_before_processing } (optional): Contains actions to be executed before processing any lines from the input file.
condition: It defines a condition for which the current field processed. Examples of conditionals:

A pattern. For instance, /pattern/ { actions_for_matching_lines }.
A boolean conditional. For instance, NR==1 {for(i=1;i<=NF;i++) print i}.


END { actions_after_processing }: Contains actions to be executed after processing all lines from the input file.


CAVEAT: function arguments are preserved only within its scope. However, other variables are preserved ouside its scope.


Special (built-in) varibles


FS: Field Separator - This variable is used to specify the input field separator, i.e., the character or pattern that separates fields in a line. The default value for FS (Field Separator) is a space (" ")
OFS: Output Field Separator - This variable is used to specify the output field separator, i.e., the character or pattern that separates fields in the output. The default value for OFS (Output Field Separator) is also a space (" ").
RS: Record Separator. Determines how input records are separated. By default, RS is set to a newline character (\n), so each line is treated as a separate record.
ORS: Output record separator. Define which character will separate each record at the output (the default value is a newline, "\n").
NR: Number of the current record. We usually run it in front of the AWK command to assure that is will run on that specific line.
NF: Number of fields that the current record contains. Hence, print $NF prints the last field of the line.
$n: nth field of the current record. $0 get the whole record.


Built-in keywords


BEGIN { commands }: the special BEGIN block, which only runs at the beginning of the file.
END { commands }: the special END block, which only runs at the end of the file.
next: skips the rest of the processing for this record and moves on to the next line.
getline: read the next input record from the input stream.
break: break the loop
function: a function


Coditional operators


==: Equal to
!=: Not equal to
~: Matches a regular expression
!~: Does not match a regular expression
&& And operator
|| Or operator


Functons


match: returns the position of the first occurrence of the specified regular expression pattern within the string.
index:  find the position of the first occurrence of a specified substring within a string. E.g., index(string, substring). if the substring is not found, the function returns 0.
split: based on a string separator, split a scalar string into a array of strings. E.g. split(str, arr, " ").
sub: substitutes in-place the first occurrence of a pattern with a replacement string. E.g., sub(/apple/, "orange", fruit).
gsub: substitutes all occurrences of a pattern with a replacement string
substr
toupper
tolower
length
print: print the arguments passed to it. If no args are passed, print the whole line then. print is usually used without parensis, e.g., print $2.


    Element

    Action

  
    SEARCH

  
    Define separator (Field Separator, FS)
    'BEGIN { FS = "," } ; { print $2 }'
  
  
    Define separator (Field Separator, FS)
    awk -v FS=: '{print $1}'
  
  
    Define separator (Field Separator, FS)
    awk '{print $1}' FS=:
  
  
    Define separator (Field Separator, FS)
    awk -F: '{print $1}'
  
  
    Matching patterns (ternary operator)
    '{print ($0 ~ /pattern/) ? text_for_true : text_for_false}' 

    • $0 represents the wole line.

    • $0 ~ /pattern/ means "does $0 contain pattern?" 

    • The else statement is mandatory.
    
  
    For loop
     'for(i=1;i<=NF;i++){ loop statement }'
  
  
     Save a element for future use
     'split($n,a,"delimiter")'
    

    • $n means the nth field 

    • The content of the nth is stored into the array a. 

    • The value of "delimiter" is used to split the element into an array.
    
  
    Matching patterns (if-else statement)
    '{if ($0 ~ /pattern/) {then_actions} else {else_actions}}' 

    • The else statement if optional
    
  
    REPLACE

  
    search and replace
    awk '{sub(/{OLD_TERM}/,{NEW_TERM}); print}' {file}
  
  
    search and replace in-place
    awk -i inplace ...
  

## example1.awk
awk '{ORS=/ION/ ? "\n" : " "; print $0}'

# set the output record separator (ORS) to a space if the current line does not contain "ION", thus eliminating unwanted breaklines

## prob1.tex
\newglossaryentry{gnss}{
  type=\acronymtype,
  name={GNSS},
  description={Global navigation satellite system},
  first={global navigation satellite system (GNSS)},
  sort={GNSS},
  long={global navigation satellite system},
  short={GNSS}
}

## sol1.awk
#!/bin/awk -f

BEGIN {
}

function capitalize(word) {
    if (word ~ /and|of|the/) { # if `word` is one of these words, don't captalize it
        return word
    } else if (substr(word, 1, 1) == "(") {
        return "(" toupper(substr(word, 2, 1)) substr(word, 3)
    } else {
        return toupper(substr(word, 1, 1)) substr(word, 2)
    }
}

function join(arr, sep) {
    if (sep == "")  sep = " "
    result = arr[1]
    for (k = 2; k <= length(arr); k++)
        result = result sep arr[k]
    return result
}

function fix(word, initials) {
    # get the first letter
    first_letter = substr(word, 1, 1) == "(" ? substr(word, 2, 1) : substr(word, 1, 1)
    # index returns how many times the second argument is found in the first arguments
    if (index(initials, toupper(first_letter)) > 0) {
        return capitalize(word)
    } else {
        return word
    }
}

/^%/ {
    print $0
}

/^\\newglossaryentry/ {
    print $0
    while (getline) {
        if ($0 == "}") {
            print $0
            break
        } else if ($0 ~ /name/ ) {
            match($0, /name=\{([^}]*)/, arr)
            initials = arr[1]
            print $0
        # entries that must be corrected
        } else if ($0 ~ /description=|first=|plural=|firstplural=/){
            match($0, /\w+=\{([^}]*)/, arr)
            content = arr[1]
            split(content, arr, " ")

            for (i=1; i<=length(arr); i++) {
                # if it is a compound noun (e.g, `signal-to-noise`), split it up again
                if (arr[i] ~ /-/) {
                    split(arr[i], arr1, "-")
                    for (j=1; j<=length(arr1); j++) {
                        arr1[j] = fix(arr1[j], initials)
                    }
                    arr[i] = join(arr1, "-")
                } else {
                    arr[i] = fix(arr[i], initials)
                }
            }
            final = join(arr, " ")
            if ($0 ~ /description/ && substr(final, length(final)) != ".") {
                final = final "."
            }
            match($0, /\w+=/, arr)
            print "  " arr[0] "{" final "},"
        # entries that must not be corrected
        } else if ($0 !~ /^\}$/) {
            print $0
        # end of \newglossaryentry
        } else {
            break
        }
    }
}
Element	Action
SEARCH
Define separator (Field Separator, FS)	`'BEGIN { FS = "," } ; { print $2 }'`
Define separator (Field Separator, FS)	`awk -v FS=: '{print $1}'`
Define separator (Field Separator, FS)	`awk '{print $1}' FS=:`
Define separator (Field Separator, FS)	`awk -F: '{print $1}'`
Matching patterns (ternary operator)	`'{print ($0 ~ /pattern/) ? text_for_true : text_for_false}'` • `$0` represents the wole line. • `$0 ~ /pattern/` means "does `$0` contain `pattern`?" • The else statement is mandatory.
For loop	`'for(i=1;i<=NF;i++){ loop statement }'`
Save a element for future use	`'split($n,a,"delimiter")'` • `$n` means the nth field • The content of the nth is stored into the array a. • The value of `"delimiter"` is used to split the element into an array.
Matching patterns (if-else statement)	`'{if ($0 ~ /pattern/) {then_actions} else {else_actions}}'` • The else statement if optional
REPLACE
search and replace	`awk '{sub(/{OLD_TERM}/,{NEW_TERM}); print}' {file}`
search and replace in-place	`awk -i inplace ...`
	awk '{ORS=/ION/ ? "\n" : " "; print $0}'

	# set the output record separator (ORS) to a space if the current line does not contain "ION", thus eliminating unwanted breaklines
	\newglossaryentry{gnss}{
	type=\acronymtype,
	name={GNSS},
	description={Global navigation satellite system},
	first={global navigation satellite system (GNSS)},
	sort={GNSS},
	long={global navigation satellite system},
	short={GNSS}
	}
	#!/bin/awk -f

	BEGIN {
	}

	function capitalize(word) {
	if (word ~ /and\|of\|the/) { # if `word` is one of these words, don't captalize it
	return word
	} else if (substr(word, 1, 1) == "(") {
	return "(" toupper(substr(word, 2, 1)) substr(word, 3)
	} else {
	return toupper(substr(word, 1, 1)) substr(word, 2)
	}
	}

	function join(arr, sep) {
	if (sep == "") sep = " "
	result = arr[1]
	for (k = 2; k <= length(arr); k++)
	result = result sep arr[k]
	return result
	}

	function fix(word, initials) {
	# get the first letter
	first_letter = substr(word, 1, 1) == "(" ? substr(word, 2, 1) : substr(word, 1, 1)
	# index returns how many times the second argument is found in the first arguments
	if (index(initials, toupper(first_letter)) > 0) {
	return capitalize(word)
	} else {
	return word
	}
	}

	/^%/ {
	print $0
	}

	/^\\newglossaryentry/ {
	print $0
	while (getline) {
	if ($0 == "}") {
	print $0
	break
	} else if ($0 ~ /name/ ) {
	match($0, /name=\{([^}]*)/, arr)
	initials = arr[1]
	print $0
	# entries that must be corrected
	} else if ($0 ~ /description=\|first=\|plural=\|firstplural=/){
	match($0, /\w+=\{([^}]*)/, arr)
	content = arr[1]
	split(content, arr, " ")

	for (i=1; i<=length(arr); i++) {
	# if it is a compound noun (e.g, `signal-to-noise`), split it up again
	if (arr[i] ~ /-/) {
	split(arr[i], arr1, "-")
	for (j=1; j<=length(arr1); j++) {
	arr1[j] = fix(arr1[j], initials)
	}
	arr[i] = join(arr1, "-")
	} else {
	arr[i] = fix(arr[i], initials)
	}
	}
	final = join(arr, " ")
	if ($0 ~ /description/ && substr(final, length(final)) != ".") {
	final = final "."
	}
	match($0, /\w+=/, arr)
	print " " arr[0] "{" final "},"
	# entries that must not be corrected
	} else if ($0 !~ /^\}$/) {
	print $0
	# end of \newglossaryentry
	} else {
	break
	}
	}
	}