Skip to content

Instantly share code, notes, and snippets.

@inodb
Created September 11, 2015 14:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save inodb/652da48a7e4350b9e9e2 to your computer and use it in GitHub Desktop.
Save inodb/652da48a7e4350b9e9e2 to your computer and use it in GitHub Desktop.
#!/bin/bash
# Deduplicates all .txt files in sudirs
# creates file with .dedup postfix
# keeps one header line if file does not contains comments
# keeps two header lines if file contains comments
find . -name '*.txt' | parallel -k "f=$(mktemp -t mskimpact) &&" sort {} '|' uniq -d '>' '$f' '&&' test -s '$f' '&&' '('grep -q '^#' {} '&&' '('head -2 {} '>' {}.dedup '&&' tail -n +3 {} '|' sort '|' uniq '>>' {}.dedup')' '||' '('head -1 {} '>' {}.dedup '&&' tail -n +2 {} '|' sort '|' uniq '>>' {}.dedup'))'
# to replace all files run:
#find . -name '*.dedup' | parallel -k 'f='{} '&&' mv {} '${f/.dedup/}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment