Last active
June 14, 2018 21:47
-
-
Save kks32/1abb7ade6ec6895a1c6e1990ff2fecba to your computer and use it in GitHub Desktop.
AWK command to compare two columns of different files and print required columns from both files.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
awk -F',' 'NR==FNR{label[$1]=$1;date[$1]=$2;next}; ($2==label[$2]){print $0 "," date[$2]}' <(sort -k1 file2.csv) <(sort -k2 file1.csv) &> file3.csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Question | |
> I need to match strings between the two files and print to a third file. Data look like this: | |
#File 1 | |
dbID labnumber myID Status | |
CMV_1235 LAB06 56-1 Fail | |
CMV_1236 LAB14 57-1 Fail | |
CMV_2137 LAB84 54-4 Pass | |
CMV_2238 LAB85 50-3 | |
CMV_C131 LAB21 51-2 Pass | |
#File 2 | |
labnumber date | |
LAB06 18/01/2016 | |
LAB14 27/04/2016 | |
LAB18 10/01/2016 | |
LAB21 9/02/2016 | |
LAB69 4/03/2016 | |
LAB84 18/02/2016 | |
LAB22 18/03/2016 | |
LAB85 27/03/2016 | |
(Not totally overlapping: there may be samples in file 1 but not file 2 and vice versa) | |
I want to print to file 3: | |
dbID labnumber myID Status date | |
CMV_1235 LAB06 56-1 Fail 18/01/2016 | |
CMV_1236 LAB14 57-1 Fail 27/04/2016 | |
CMV_2137 LAB84 54-4 Pass 18/02/2016 | |
CMV_2238 LAB85 50-3 27/03/2016 | |
So, If labnumber matches in file 1 and file 2, print all of that line in file 2 then print relevant date from that line in file 1, into a third file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dbID | labnumber | myID | Status | |
---|---|---|---|---|
CMV_1235 | LAB06 | 56-1 | Fail | |
CMV_1236 | LAB14 | 57-1 | Fail | |
CMV_2137 | LAB84 | 54-4 | Pass | |
CMV_2238 | LAB85 | 50-3 | ||
CMV_C131 | LAB21 | 51-2 | Pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
labnumber | date | |
---|---|---|
LAB06 | 18/01/2016 | |
LAB14 | 27/04/2016 | |
LAB18 | 10/01/2016 | |
LAB21 | 9/02/2016 | |
LAB69 | 4/03/2016 | |
LAB84 | 18/02/2016 | |
LAB22 | 18/03/2016 | |
LAB85 | 27/03/2016 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
awk -F',' 'NR==FNR{label[$1]=$1;date[$1]=$2;next}; ($2==label[$2]){print $0 "," date[$2]}' <(sort -k1 file2.csv) <(sort -k2 file1.csv) &> file3.csv
Explanation
-F','
: Sets the field separator to,
NR==FNR
:NR
is the current input line number andFNR
the current file's line number. The two will be equal only while the 1st file is being read.label[$1]=$1
: Save column 1 (label) from the first file (argument)file2.csv
in hash-array using column 1 as the keydate[$1]=$2
: Save column 2 (date) from the first filefile2.csv
in hash-array using column 1 as the keynext
: Then, skip to the next line so that this is only applied on the 1st file.($2==label[$2])
: the else block will only be executed if this is the second file (file1.csv
), so we check whether field 2 of this file is in array label with the field$2
as the key($2==label[$2])
.{print $0 "," date[$2]}
: If that's true print the entirefile1.csv
and append the date column fromfile2.csv
.sort -k1 file2.csv
: Sortsfile2.csv
column1
by settingk1
.k2
sorts column2