Skip to content

Instantly share code, notes, and snippets.

@Tabea-K
Last active July 9, 2020 08:17
Show Gist options
  • Save Tabea-K/e341d9fb450eb30c9a57 to your computer and use it in GitHub Desktop.
Save Tabea-K/e341d9fb450eb30c9a57 to your computer and use it in GitHub Desktop.
Prints the number of identical rows between different columns for two csv files. The first argument is the column number which should be used. For example, you can compare the IDs given in a csv file. Mainly a wrapper around the comm command.
#!/usr/bin/env bash
# Prints the number of identical rows between different columns for two
# csv files. The first argument is the column number which should be used.
# For example, you can compare the IDs given in a csv file.
cut -f $1 $2 | sort > .file1
cut -f $1 $3 | sort > .file2
# With no options, comm produces three-column output.
# Column one contains lines unique to FILE1, column
# two contains lines unique to FILE2, and column three
# contains lines common to both files.
UNIQUEINFILE1=$(comm .file1 .file2 | cut -f1 | uniq | wc -l)
UNIQUEINFILE2=$(comm .file1 .file2 | cut -f2 | uniq | wc -l)
INBOTHFILES=$(comm .file1 .file2 | cut -f3 | uniq | wc -l)
echo "There are $UNIQUEINFILE1 lines that are only found in $2"
echo "There are $UNIQUEINFILE2 lines that are only found in $3"
echo "There are $INBOTHFILES lines found in both files"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment