Skip to content

Instantly share code, notes, and snippets.

@kks32
Last active June 14, 2018 21:47
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kks32/1abb7ade6ec6895a1c6e1990ff2fecba to your computer and use it in GitHub Desktop.
Save kks32/1abb7ade6ec6895a1c6e1990ff2fecba to your computer and use it in GitHub Desktop.
AWK command to compare two columns of different files and print required columns from both files.
awk -F',' 'NR==FNR{label[$1]=$1;date[$1]=$2;next}; ($2==label[$2]){print $0 "," date[$2]}' <(sort -k1 file2.csv) <(sort -k2 file1.csv) &> file3.csv
#Question
> I need to match strings between the two files and print to a third file. Data look like this:
#File 1
dbID labnumber myID Status
CMV_1235 LAB06 56-1 Fail
CMV_1236 LAB14 57-1 Fail
CMV_2137 LAB84 54-4 Pass
CMV_2238 LAB85 50-3
CMV_C131 LAB21 51-2 Pass
#File 2
labnumber date
LAB06 18/01/2016
LAB14 27/04/2016
LAB18 10/01/2016
LAB21 9/02/2016
LAB69 4/03/2016
LAB84 18/02/2016
LAB22 18/03/2016
LAB85 27/03/2016
(Not totally overlapping: there may be samples in file 1 but not file 2 and vice versa)
I want to print to file 3:
dbID labnumber myID Status date
CMV_1235 LAB06 56-1 Fail 18/01/2016
CMV_1236 LAB14 57-1 Fail 27/04/2016
CMV_2137 LAB84 54-4 Pass 18/02/2016
CMV_2238 LAB85 50-3 27/03/2016
So, If labnumber matches in file 1 and file 2, print all of that line in file 2 then print relevant date from that line in file 1, into a third file
dbID labnumber myID Status
CMV_1235 LAB06 56-1 Fail
CMV_1236 LAB14 57-1 Fail
CMV_2137 LAB84 54-4 Pass
CMV_2238 LAB85 50-3
CMV_C131 LAB21 51-2 Pass
labnumber date
LAB06 18/01/2016
LAB14 27/04/2016
LAB18 10/01/2016
LAB21 9/02/2016
LAB69 4/03/2016
LAB84 18/02/2016
LAB22 18/03/2016
LAB85 27/03/2016
@kks32
Copy link
Author

kks32 commented Jul 7, 2016

awk -F',' 'NR==FNR{label[$1]=$1;date[$1]=$2;next}; ($2==label[$2]){print $0 "," date[$2]}' <(sort -k1 file2.csv) <(sort -k2 file1.csv) &> file3.csv

Explanation

  • -F',': Sets the field separator to ,
  • NR==FNR : NR is the current input line number and FNR the current file's line number. The two will be equal only while the 1st file is being read.
  • label[$1]=$1: Save column 1 (label) from the first file (argument) file2.csv in hash-array using column 1 as the key
  • date[$1]=$2: Save column 2 (date) from the first file file2.csv in hash-array using column 1 as the key
  • next: Then, skip to the next line so that this is only applied on the 1st file.
  • ($2==label[$2]): the else block will only be executed if this is the second file (file1.csv), so we check whether field 2 of this file is in array label with the field $2 as the key ($2==label[$2]).
  • {print $0 "," date[$2]}: If that's true print the entire file1.csv and append the date column from file2.csv.
  • sort -k1 file2.csv: Sorts file2.csv column 1 by setting k1. k2 sorts column 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment