Skip to content

Instantly share code, notes, and snippets.

@murraycadzow
Created February 17, 2019 21:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save murraycadzow/05cbe1e95c5914c0901fbe8b0b0fc4ef to your computer and use it in GitHub Desktop.
Save murraycadzow/05cbe1e95c5914c0901fbe8b0b0fc4ef to your computer and use it in GitHub Desktop.

grep/sed/awk

tags: resbaz resbaz2019

data

1 hour

What we said people would come away being able to do:

  • extract text from files that match patterns
  • find and replace text using patterns
  • rearrange columns in files

brief recap of regular expressions

wildcards

  • . (zero or more)
  • ? (single )

extending matches

  • * zero or more
  • + one or more

sometimes these need to be escaped with a \ for them to work - depends on your environment

difference between ' and "

  • ' is a literal quote, in bash everything is passed as is
  • " bash will substitute inside these

this makes a difference if you want to use the contents of a bash variable as a pattern

extract text from files that match patterns

basics of grep

grep 'pattern' file

make the pattern case insensitive

grep -i 'pattern' file

invert the search

grep -v 'pattern' file

count how many lines match

grep -c 'pattern' file

view context of results

# show the line number of the results
grep -n 'pattern' file

# show one extra line AFTER results (-a)
grep -a1 'pattern' file

# show one extra line BEFORE results (-b)
grep -b1 'pattern' file

find matches that form part of or entire words only

grep -w 'pattern' file

find patterns that are stored in a file

grep -f pattern_file file

Challenges

Pride and Predjudice:

  • P and P: find number of lines that mention ???

Names:

  • find the results for 1999
  • find all entries for your favourite name
  • find all entries for Calvin or Kelvin from the 1980s

find and replace

sed (stream editor)

  • read line
  • execute command
  • display result of line

view specific lines in a file

sed -n '5,10p' file

delete a specific line, eg the 10th line

sed -e '10d' file

or delete a range of lines

sed -e '5,10d' file

find and replace basic syntax

sed -e 's/find_pattern/replacement/g' file

using back references

  • groups are started with ( and ended with )
  • enables you to reference the bits that match each pattern and substitute them back in as part of the replacement
sed -e 's/\(group1\)/\1/g' file

convert from upper case to lower case (gnu sed)

sed -e 's/\(.*\)/\L\1/' file > 

from lower to upper

sed -e 's/\(.*\)/\U\1/' input.txt > output.txt

Challenges

Names

  • change file separator from tab to comma
  • remove the first line
  • change all dates from 1960 - 1969 be '1960s'

rearrange columns

basic syntax for awk

awk '{print}' < file

can refer to specific columns using $

eg $1 for the first column, $2 for second etc

$0 refers to the original line

example to print first 2 columns

awk '{print $1, $2}' < file

we can also use conditionals:

example to print an entry from the first column if it is above 10

awk '{if($1 > 10){print $1}}' < file

other automatic variables that awk uses include:

NR: the row number

NF: the number of fields on line

example of how to print out a specific line

awk '{if(NR == 3){print}}' < file

or we can find out how many fields we have per line:

awk '{print NF}' < file

Challenges:

Names:

  • Make the column first
  • Print all the names that occurred more than 100 times in a year
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment