Skip to content

Instantly share code, notes, and snippets.

@rama100
Created May 10, 2017 02:54
Show Gist options
  • Save rama100/b769fb1ecad6f6d903b92cf86ff87d73 to your computer and use it in GitHub Desktop.
Save rama100/b769fb1ecad6f6d903b92cf86ff87d73 to your computer and use it in GitHub Desktop.
+----------------------------------------------------------------------------------------------------------------------------------------+
| Task | One-Liner
+----------------------------------------------------------------------------------------------------------------------------------------+
| Count the number of lines in a file | wc -l filename |
| ----------------------------------------------------------------------------------------------------------------------------------------
| Show the field names, one in each line, preceded by line numbers | head -1 filename | tr ‘\t’ ‘\n’ | nl    (tab-delimited file) |
| (helpful when you have numerous fields in a new file and want to | head -1 filename | tr ‘,’ ‘\n’ | nl     (comma-delimited file) |
| get the lay of the land) | head -1 filename | tr ‘ ’ ‘\n’ | nl     (space-delimited file) |
| ----------------------------------------------------------------------------------------------------------------------------------------
| Page through the file with line numbers showing | less filename | nl |
| ----------------------------------------------------------------------------------------------------------------------------------------
| Show the first line | head -1 filename
| Show the first few lines | head filename
| Show the last few lines | tail filename
| ----------------------------------------------------------------------------------------------------------------------------------------
| Show line #4212   | sed ‘4212q;d’ filename
| (very useful when you are trying to load the file into a | (this will quit after printing the 4212nd line;
| database and the load fails at line #4212)   | very considerate if your file has a million lines!)
| ----------------------------------------------------------------------------------------------------------------------------------------
| Show lines with “foo” in any field | grep ’foo’ filename 
| Show lines with “foo” in any field, ignoring foo’s case | grep -i ‘foo’ filename
| ----------------------------------------------------------------------------------------------------------------------------------------
| show lines with ‘foo’ in field #18 | awk -F\t ‘$18 == “foo” ’ filename  (tab-delimited file)
| | awk -F, ‘$18 == “foo” ’ filename  (comma-delimited file)
| | awk  ‘$18 == “foo” ’ filename  (space-delimited file)
| ---------------------------------------------------------------------------------------------------------------------------------------
| Select every 10th row and save into a new file | awk ‘NR%10’ filename > newfile
| (great for sampling and train-test splitting)
| ---------------------------------------------------------------------------------------------------------------------------------------
| Show rows that have fewer fields than the header row  | awk 'NR==1 {x=NF}; NF <  x’ filename
| (to check if any rows are incomplete)
| ----------------------------------------------------------------------------------------------------------------------------------------
| Remove lines with “foo” in any field and save into a new file | sed ’/foo/d’ filename > newfile
| Remove lines with ‘foo’ in field #18 and save into a new file | awk -F\t ‘$18 != “foo” ’ filename > newfile  (tab-delimited file)
| | awk -F, ‘$18 != “foo” ’ filename > newfile  (comma-delimited file)
| | awk  ‘$18 != “foo” ’ filename > newfile   (space-delimited file)
| ----------------------------------------------------------------------------------------------------------------------------------------
| Remove the first line and save into a new file | sed ‘1d’ filename > newfile
| (great for stripping a header row before further processing)
| Remove the first 8 lines and save the rest into a new file | sed ‘1,8d' filename > newfile
| Remove line #42 and save the rest into a new file | sed ‘42d’ filename > newfile
| Remove lines 233 to 718 and save the rest into a new file | sed ‘233,718d’ filename > newfile
| Remove the last line and save the rest into a new file | sed '$d’ filename > newfile
| Remove the last 8 lines and save the rest into a new file | sed -e :a -e '$d;N;2,8ba' -e 'P;D’  filename > newfile
| | (Ugh! Let me know if you know of a better way)
| ----------------------------------------------------------------------------------------------------------------------------------------
| Remove blank lines from the file and save into a new file | sed '/^$/d’ filename > newfile
| ----------------------------------------------------------------------------------------------------------------------------------------
| Remove duplicate lines and save into a new file | awk '!seen[$0]++’ filename > newfile
| | (if you want the original order preserved)
| | sort -u filename > newfile 
| | (if you don’t need the original order preserved)
| ---------------------------------------------------------------------------------------------------------------------------------------
| Remove lines with a missing value in field #18 and | awk -F\t ‘!$18’ filename > newfile  (tab-delimited file)
| save into a new file | awk -F, ‘!$18’ filename > newfile  (comma-delimited file)
| | awk  ‘!$18’ filename > newfile  (space-delimited file)
| ---------------------------------------------------------------------------------------------------------------------------------------
| Show just col #42 | cut -f42 filename  (tab-delimited file)
| | cut -d, -f42 filename (comma-delimited file)
| | cut -d’ ' -f42 filename (space-delimited file)
| ----------------------------------------------------------------------------------------------------------------------------------------
| Show the unique values in column #42 with counts | cut -f42 filename | sort | uniq -c  (tab-delimited file)
| (a histogram, sorta. Useful for listing the distinct | cut -d, -f42 filename | sort | uniq -c (comma-delimited file)
| values in a categorical field) | cut -d’ ' -f42 filename | sort | uniq -c (space-delimited file)
| -----------------------------------------------------------------------------------------------------------------------------------------
| Remove the 1st field and save into a new file | cut -f2- filename > newfile
| Remove field #42 and save the rest into a new file | cut -f1-41,43- filename > newfile
| Remove fields #19-42 and save the rest into a new file | cut -f1-18,43- filename > newfile
| (the above 3 examples assume that file is tab-delimited. |
| For comma and space delimited files, modify as shown in |
| earlier examples involving 'cut) |
| ----------------------------------------------------------------------------------------------------------------------------------------
| Stack files row-wise | cat file1 file2 > newfile
| (useful if you have two or more files with the same columns |
| and you need to stack them vertically) |
| ----------------------------------------------------------------------------------------------------------------------------------------
| Stack files column-wise  | paste file1 file2  > newfile
| (useful if you have two or more files with the same rows |
| and you need to you need to combine them side-by-side) |
| ----------------------------------------------------------------------------------------------------------------------------------------
| Randomly shuffle the rows of a file and save to a new file | awk 'BEGIN{srand();}{print rand()"\t"$0}’ filename |
| | sort -k1 -n | cut -f2- > newfile
| --------------------+-------------------------------------------------------------------------------------------------------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment