Cheatsheet of the AWK
programming language
Last active
May 27, 2024 18:16
-
-
Save tapyu/ec4d0287ada24e775d1263ffbe6d5a8b to your computer and use it in GitHub Desktop.
AWK cheatsheet
awk 'BEGIN { actions_before_processing }; condition { actions_for_matching_lines }; END { actions_after_processing }' input_file
BEGIN { actions_before_processing }
(optional): Contains actions to be executed before processing any lines from the input file.condition
: It defines a condition for which the current field processed. Examples of conditionals:- A pattern. For instance,
/pattern/ { actions_for_matching_lines }
. - A boolean conditional. For instance,
NR==1 {for(i=1;i<=NF;i++) print i}
.
- A pattern. For instance,
END { actions_after_processing }
: Contains actions to be executed after processing all lines from the input file.
CAVEAT: function arguments are preserved only within its scope. However, other variables are preserved ouside its scope.
FS
: Field Separator - This variable is used to specify the input field separator, i.e., the character or pattern that separates fields in a line. The default value for FS (Field Separator) is a space (" "
)OFS
: Output Field Separator - This variable is used to specify the output field separator, i.e., the character or pattern that separates fields in the output. The default value for OFS (Output Field Separator) is also a space (" "
).RS
: Record Separator. Determines how input records are separated. By default,RS
is set to a newline character (\n
), so each line is treated as a separate record.ORS
: Output record separator. Define which character will separate each record at the output (the default value is a newline,"\n"
).NR
: Number of the current record. We usually run it in front of the AWK command to assure that is will run on that specific line.NF
: Number of fields that the current record contains. Hence,print $NF
prints the last field of the line.$n
:n
th field of the current record.$0
get the whole record.
BEGIN { commands }
: the specialBEGIN
block, which only runs at the beginning of the file.END { commands }
: the specialEND
block, which only runs at the end of the file.next
: skips the rest of the processing for this record and moves on to the next line.getline
: read the next input record from the input stream.break
: break the loopfunction
: a function
==:
Equal to!=:
Not equal to~:
Matches a regular expression!~:
Does not match a regular expression&&
And operator||
Or operator
match
: returns the position of the first occurrence of the specified regular expression pattern within the string.index
: find the position of the first occurrence of a specified substring within a string. E.g.,index(string, substring)
. if the substring is not found, the function returns 0.split
: based on a string separator, split a scalar string into a array of strings. E.g.split(str, arr, " ")
.sub
: substitutes in-place the first occurrence of a pattern with a replacement string. E.g.,sub(/apple/, "orange", fruit)
.gsub
: substitutes all occurrences of a pattern with a replacement stringsubstr
toupper
tolower
length
print
: print the arguments passed to it. If no args are passed, print the whole line then.print
is usually used without parensis, e.g.,print $2
.
Define separator (Field Separator, FS) | 'BEGIN { FS = "," } ; { print $2 }' |
Define separator (Field Separator, FS) | awk -v FS=: '{print $1}' |
Define separator (Field Separator, FS) | awk '{print $1}' FS=: |
Define separator (Field Separator, FS) | awk -F: '{print $1}' |
Matching patterns (ternary operator) | '{print ($0 ~ /pattern/) ? text_for_true : text_for_false}' • $0 represents the wole line.• $0 ~ /pattern/ means "does $0 contain pattern ?" • The else statement is mandatory. |
For loop | 'for(i=1;i<=NF;i++){ loop statement }' |
Save a element for future use | 'split($n,a,"delimiter")'
• $n means the nth field • The content of the nth is stored into the array a. • The value of "delimiter" is used to split the element into an array.
|
Matching patterns (if-else statement) | '{if ($0 ~ /pattern/) {then_actions} else {else_actions}}' • The else statement if optional |
search and replace | awk '{sub(/{OLD_TERM}/,{NEW_TERM}); print}' {file} |
search and replace in-place | awk -i inplace ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
awk '{ORS=/ION/ ? "\n" : " "; print $0}' | |
# set the output record separator (ORS) to a space if the current line does not contain "ION", thus eliminating unwanted breaklines |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\newglossaryentry{gnss}{ | |
type=\acronymtype, | |
name={GNSS}, | |
description={Global navigation satellite system}, | |
first={global navigation satellite system (GNSS)}, | |
sort={GNSS}, | |
long={global navigation satellite system}, | |
short={GNSS} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/awk -f | |
BEGIN { | |
} | |
function capitalize(word) { | |
if (word ~ /and|of|the/) { # if `word` is one of these words, don't captalize it | |
return word | |
} else if (substr(word, 1, 1) == "(") { | |
return "(" toupper(substr(word, 2, 1)) substr(word, 3) | |
} else { | |
return toupper(substr(word, 1, 1)) substr(word, 2) | |
} | |
} | |
function join(arr, sep) { | |
if (sep == "") sep = " " | |
result = arr[1] | |
for (k = 2; k <= length(arr); k++) | |
result = result sep arr[k] | |
return result | |
} | |
function fix(word, initials) { | |
# get the first letter | |
first_letter = substr(word, 1, 1) == "(" ? substr(word, 2, 1) : substr(word, 1, 1) | |
# index returns how many times the second argument is found in the first arguments | |
if (index(initials, toupper(first_letter)) > 0) { | |
return capitalize(word) | |
} else { | |
return word | |
} | |
} | |
/^%/ { | |
print $0 | |
} | |
/^\\newglossaryentry/ { | |
print $0 | |
while (getline) { | |
if ($0 == "}") { | |
print $0 | |
break | |
} else if ($0 ~ /name/ ) { | |
match($0, /name=\{([^}]*)/, arr) | |
initials = arr[1] | |
print $0 | |
# entries that must be corrected | |
} else if ($0 ~ /description=|first=|plural=|firstplural=/){ | |
match($0, /\w+=\{([^}]*)/, arr) | |
content = arr[1] | |
split(content, arr, " ") | |
for (i=1; i<=length(arr); i++) { | |
# if it is a compound noun (e.g, `signal-to-noise`), split it up again | |
if (arr[i] ~ /-/) { | |
split(arr[i], arr1, "-") | |
for (j=1; j<=length(arr1); j++) { | |
arr1[j] = fix(arr1[j], initials) | |
} | |
arr[i] = join(arr1, "-") | |
} else { | |
arr[i] = fix(arr[i], initials) | |
} | |
} | |
final = join(arr, " ") | |
if ($0 ~ /description/ && substr(final, length(final)) != ".") { | |
final = final "." | |
} | |
match($0, /\w+=/, arr) | |
print " " arr[0] "{" final "}," | |
# entries that must not be corrected | |
} else if ($0 !~ /^\}$/) { | |
print $0 | |
# end of \newglossaryentry | |
} else { | |
break | |
} | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment