Skip to content

Instantly share code, notes, and snippets.

@Stephen-ONeil
Last active May 1, 2019 22:39
Show Gist options
  • Save Stephen-ONeil/4c41b182aa38e0cc7fcf41d013eacbfc to your computer and use it in GitHub Desktop.
Save Stephen-ONeil/4c41b182aa38e0cc7fcf41d013eacbfc to your computer and use it in GitHub Desktop.
Quick and dirty bash script for running a multi-find-and-replace on a csv (using another csv containing mapping definitions)
#!/bin/bash
# Takes a mapping csv with the first column containing the values to replace-with and the 2nd to nth columns containing the patterns
# to replace-by. The target document is matched by cell against each pattern, and a cell that matches is replaced in the output (mfr_output.csv)
# by the replace value corresponding to the first pattern it matches on.
# Known issues:
# Seems to be bad at matching certain accented chars, encoding issues? Maybe deburr before matching if you have that problem
mapping_file=$1
target_file=$2
mapping_file_length=$( cat $mapping_file | wc -l )
# likely need to change the file separator (-F) option to suit your needs
gawk -v map_length="${mapping_file_length}" -F "," '
BEGIN {
target_file_string = ""
}
NR <= map_length {
pattern = "^"$2"$"
for (i = 3; i <= NF; i++){
if ($i != ""){
pattern = pattern"|^"$i"$"
}
}
patern_map[NR] = pattern
replace_map[NR] = $1
}
NR > map_length {
mapped_record = ""
for (i = 1; i <= NF; i++){
sub(RS, $(i), "")
mapped_cell = $(i)
for (j = 1; j <= map_length; j++){
if ( mapped_cell ~ patern_map[j] ){
mapped_cell = replace_map[j]
j = map_length + 1
}
}
if (i <NF) {
mapped_record = mapped_record""mapped_cell""FS""
} else {
mapped_record = mapped_record""mapped_cell""RS""
}
}
target_file_string = target_file_string""mapped_record
}
END {
print target_file_string
}
' $mapping_file $target_file > mfr_output.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment