murraycadzow/grep_sed_awk.md

## grep_sed_awk.md

      
    Raw
  

              grep_sed_awk.md
            
          
    grep/sed/awk

tags: resbaz resbaz2019

data

Pride and Predjudice
NZ populalar baby names

1 hour
What we said people would come away being able to do:

extract text from files that match patterns
find and replace text using patterns
rearrange columns in files

brief recap of regular expressions
wildcards

. (zero or more)
? (single )

extending matches

* zero or more
+ one or more

sometimes these need to be escaped with a \ for them to work - depends on your environment
difference between ' and "

' is a literal quote, in bash everything is passed as is
" bash will substitute inside these

this makes a difference if you want to use the contents of a bash variable as a pattern
extract text from files that match patterns

basics of grep
grep 'pattern' file

make the pattern case insensitive
grep -i 'pattern' file

invert the search
grep -v 'pattern' file


count how many lines match
grep -c 'pattern' file

view context of results
# show the line number of the results
grep -n 'pattern' file

# show one extra line AFTER results (-a)
grep -a1 'pattern' file

# show one extra line BEFORE results (-b)
grep -b1 'pattern' file

find matches that form part of or entire words only
grep -w 'pattern' file

find patterns that are stored in a file
grep -f pattern_file file

Challenges
Pride and Predjudice:

P and P: find number of lines that mention ???

Names:

find the results for 1999
find all entries for your favourite name
find all entries for Calvin or Kelvin from the 1980s

find and replace

sed (stream editor)

read line
execute command
display result of line

view specific lines in a file
sed -n '5,10p' file

delete a specific line, eg the 10th line
sed -e '10d' file

or delete a range of lines
sed -e '5,10d' file

find and replace basic syntax
sed -e 's/find_pattern/replacement/g' file

using back references

groups are started with ( and ended with )
enables you to reference the bits that match each pattern and substitute them back in as part of the replacement

sed -e 's/\(group1\)/\1/g' file

convert from upper case to lower case (gnu sed)
sed -e 's/\(.*\)/\L\1/' file > 

from lower to upper
sed -e 's/\(.*\)/\U\1/' input.txt > output.txt

Challenges
Names

change file separator from tab to comma
remove the first line
change all dates from 1960 - 1969 be '1960s'

rearrange columns

basic syntax for awk
awk '{print}' < file

can refer to specific columns using $
eg $1 for the first column, $2 for second etc
$0 refers to the original line
example to print first 2 columns
awk '{print $1, $2}' < file

we can also use conditionals:
example to print an entry from the first column if it is above 10
awk '{if($1 > 10){print $1}}' < file

other automatic variables that awk uses include:
NR: the row number
NF: the number of fields on line
example of how to print out a specific line
awk '{if(NR == 3){print}}' < file

or we can find out how many fields we have per line:
awk '{print NF}' < file

Challenges:
Names:

Make the column first
Print all the names that occurred more than 100 times in a year