Skip to content

Instantly share code, notes, and snippets.

@tapyu
Last active May 27, 2024 18:16
Show Gist options
  • Save tapyu/ec4d0287ada24e775d1263ffbe6d5a8b to your computer and use it in GitHub Desktop.
Save tapyu/ec4d0287ada24e775d1263ffbe6d5a8b to your computer and use it in GitHub Desktop.
AWK cheatsheet

AWK cheatsheet

Cheatsheet of the AWK programming language

Basic syntax

awk 'BEGIN { actions_before_processing }; condition { actions_for_matching_lines }; END { actions_after_processing }' input_file

  • BEGIN { actions_before_processing } (optional): Contains actions to be executed before processing any lines from the input file.
  • condition: It defines a condition for which the current field processed. Examples of conditionals:
    • A pattern. For instance, /pattern/ { actions_for_matching_lines }.
    • A boolean conditional. For instance, NR==1 {for(i=1;i<=NF;i++) print i}.
  • END { actions_after_processing }: Contains actions to be executed after processing all lines from the input file.

CAVEAT: function arguments are preserved only within its scope. However, other variables are preserved ouside its scope.


Special (built-in) varibles

  • FS: Field Separator - This variable is used to specify the input field separator, i.e., the character or pattern that separates fields in a line. The default value for FS (Field Separator) is a space (" ")
  • OFS: Output Field Separator - This variable is used to specify the output field separator, i.e., the character or pattern that separates fields in the output. The default value for OFS (Output Field Separator) is also a space (" ").
  • RS: Record Separator. Determines how input records are separated. By default, RS is set to a newline character (\n), so each line is treated as a separate record.
  • ORS: Output record separator. Define which character will separate each record at the output (the default value is a newline, "\n").
  • NR: Number of the current record. We usually run it in front of the AWK command to assure that is will run on that specific line.
  • NF: Number of fields that the current record contains. Hence, print $NF prints the last field of the line.
  • $n: nth field of the current record. $0 get the whole record.

Built-in keywords

  • BEGIN { commands }: the special BEGIN block, which only runs at the beginning of the file.
  • END { commands }: the special END block, which only runs at the end of the file.
  • next: skips the rest of the processing for this record and moves on to the next line.
  • getline: read the next input record from the input stream.
  • break: break the loop
  • function: a function

Coditional operators

  • ==: Equal to
  • !=: Not equal to
  • ~: Matches a regular expression
  • !~: Does not match a regular expression
  • && And operator
  • || Or operator

Functons

  • match: returns the position of the first occurrence of the specified regular expression pattern within the string.
  • index: find the position of the first occurrence of a specified substring within a string. E.g., index(string, substring). if the substring is not found, the function returns 0.
  • split: based on a string separator, split a scalar string into a array of strings. E.g. split(str, arr, " ").
  • sub: substitutes in-place the first occurrence of a pattern with a replacement string. E.g., sub(/apple/, "orange", fruit).
  • gsub: substitutes all occurrences of a pattern with a replacement string
  • substr
  • toupper
  • tolower
  • length
  • print: print the arguments passed to it. If no args are passed, print the whole line then. print is usually used without parensis, e.g., print $2.

Element

Action

SEARCH

Define separator (Field Separator, FS) 'BEGIN { FS = "," } ; { print $2 }'
Define separator (Field Separator, FS) awk -v FS=: '{print $1}'
Define separator (Field Separator, FS) awk '{print $1}' FS=:
Define separator (Field Separator, FS) awk -F: '{print $1}'
Matching patterns (ternary operator) '{print ($0 ~ /pattern/) ? text_for_true : text_for_false}'
$0 represents the wole line.
$0 ~ /pattern/ means "does $0 contain pattern?"
• The else statement is mandatory.
For loop 'for(i=1;i<=NF;i++){ loop statement }'
Save a element for future use 'split($n,a,"delimiter")'
$n means the nth field
• The content of the nth is stored into the array a.
• The value of "delimiter" is used to split the element into an array.
Matching patterns (if-else statement) '{if ($0 ~ /pattern/) {then_actions} else {else_actions}}'
• The else statement if optional

REPLACE

search and replace awk '{sub(/{OLD_TERM}/,{NEW_TERM}); print}' {file}
search and replace in-place awk -i inplace ...
awk '{ORS=/ION/ ? "\n" : " "; print $0}'
# set the output record separator (ORS) to a space if the current line does not contain "ION", thus eliminating unwanted breaklines
\newglossaryentry{gnss}{
type=\acronymtype,
name={GNSS},
description={Global navigation satellite system},
first={global navigation satellite system (GNSS)},
sort={GNSS},
long={global navigation satellite system},
short={GNSS}
}
#!/bin/awk -f
BEGIN {
}
function capitalize(word) {
if (word ~ /and|of|the/) { # if `word` is one of these words, don't captalize it
return word
} else if (substr(word, 1, 1) == "(") {
return "(" toupper(substr(word, 2, 1)) substr(word, 3)
} else {
return toupper(substr(word, 1, 1)) substr(word, 2)
}
}
function join(arr, sep) {
if (sep == "") sep = " "
result = arr[1]
for (k = 2; k <= length(arr); k++)
result = result sep arr[k]
return result
}
function fix(word, initials) {
# get the first letter
first_letter = substr(word, 1, 1) == "(" ? substr(word, 2, 1) : substr(word, 1, 1)
# index returns how many times the second argument is found in the first arguments
if (index(initials, toupper(first_letter)) > 0) {
return capitalize(word)
} else {
return word
}
}
/^%/ {
print $0
}
/^\\newglossaryentry/ {
print $0
while (getline) {
if ($0 == "}") {
print $0
break
} else if ($0 ~ /name/ ) {
match($0, /name=\{([^}]*)/, arr)
initials = arr[1]
print $0
# entries that must be corrected
} else if ($0 ~ /description=|first=|plural=|firstplural=/){
match($0, /\w+=\{([^}]*)/, arr)
content = arr[1]
split(content, arr, " ")
for (i=1; i<=length(arr); i++) {
# if it is a compound noun (e.g, `signal-to-noise`), split it up again
if (arr[i] ~ /-/) {
split(arr[i], arr1, "-")
for (j=1; j<=length(arr1); j++) {
arr1[j] = fix(arr1[j], initials)
}
arr[i] = join(arr1, "-")
} else {
arr[i] = fix(arr[i], initials)
}
}
final = join(arr, " ")
if ($0 ~ /description/ && substr(final, length(final)) != ".") {
final = final "."
}
match($0, /\w+=/, arr)
print " " arr[0] "{" final "},"
# entries that must not be corrected
} else if ($0 !~ /^\}$/) {
print $0
# end of \newglossaryentry
} else {
break
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment