Skip to content

Instantly share code, notes, and snippets.

@NicMcPhee
Last active September 4, 2020 17:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save NicMcPhee/5302e3c70f84ef0480e8f079a2340c48 to your computer and use it in GitHub Desktop.
Save NicMcPhee/5302e3c70f84ef0480e8f079a2340c48 to your computer and use it in GitHub Desktop.
Using pipes to eliminate temporary files

Using pipes to eliminate temporary files

This illustrates using pipes to eliminate temporary files. We start with a bash script that takes some demographic data (see MOCK_DATA.csv) specified as a file name as a command line argument. The script then outputs a count of how many people come from different states. The output on the included data file is:

  49 MN
  21 IA
  20 WI
   7 ND
   3 SD

The first version generates a lot of temporary text files (one for each step); the second does the same thing but uses pipes (|) to turn the output of each command into the input of the next command. This avoids creating a ton of extra temporary files, each of which we have to name, and naming is hard. We should also delete each of the temporary files when we're done so we don't clutter up the world. Thus not having them is Very Nice.

Note that we need to escape the newline at the end of each line (including comments) by putting a backslash (\) at the end of the line. (The backslash needs to be the very last character.) This causes bash to "ignore" the newline and see this as one (very long) line. We could actually make it one lone long, but breaking it up like this is a lot more readable.

The mock data was generated using Mockaroo.

See this StackOverflow answer from Jonathan Leffler for more on the advantages of pipes vs temporary files.

#!/usr/bin/env bash
# Takes some demographic data (see MOCK_DATA.csv below)
# specified as a file name as a command line argument.
# Outputs a count of how many people come from different
# states. The output on the data file below is:
#
# 49 MN
# 21 IA
# 20 WI
# 7 ND
# 3 SD
data_file="$1"
# Get rid of the header line
tail -n +2 "$data_file" > no_header.txt
# Extract just the state column
# The `-F ','` tells `awk` to use `,` as the field separator.
# That's necessary here because the fields are separated by
# commas and not spaces, which is `awk`'s default field
# separator.
awk -F ',' '{ print $5 }' no_header.txt > just_states.txt
# Sort the states so I can use `uniq` to count
sort just_states.txt > sorted_states.txt
# Now count with uniq
uniq -c sorted_states.txt > state_counts.txt
# Sort by occurrances. `-n` tells `sort` to sort numerically
# (instead of alphabetically), and `-r` tells it to reverse
# the order so the biggest values end up at the top.
# This sends the output to standard output.
sort -nr state_counts.txt
#!/usr/bin/env bash
# Takes some demographic data (see MOCK_DATA.csv below)
# specified as a file name as a command line argument.
# Outputs a count of how many people come from different
# states. The output on the data file below is:
#
# 49 MN
# 21 IA
# 20 WI
# 7 ND
# 3 SD
# This is the same as before, but we use pipes (`|`) to turn the
# output of one command into the input of the next command. This
# avoids creating a ton of extra temporary files, each of which we
# have to name, and naming is hard. We should also delete each of the
# temporary files when we're done so we don't clutter up the
# world. Thus not having them is nice.
# We need to escape the newline at the end of each line (all the
# backslashes `\`) so the shell sees this as one (very long) line.
# We could actually make it one lone long, but this is a _lot_ more
# readable.
data_file="$1"
# Get rid of the header line
tail -n +2 "$data_file" | \
# Extract just the state column \
# The `-F ','` tells `awk` to use `,` as the field separator. \
# That's necessary here because the fields are separated by \
# commas and not spaces, which is `awk`'s default field \
# separator. \
awk -F ',' '{ print $5 }' | \
# Sort the states so I can use `uniq` to count \
sort | \
# Now count with uniq \
uniq -c | \
# Sort by occurrances. `-n` tells `sort` to sort numerically \
# (instead of alphabetically), and `-r` tells it to reverse \
# the order so the biggest values end up at the top. \
# This sends the output to standard output. \
sort -nr
id first_name last_name email State ZIP
1 Nilson Kurt nkurt0@photobucket.com WI 53790
2 Gregory Lethby glethby1@google.co.jp MN 55805
3 Wendy Domanek wdomanek2@biblegateway.com MN 55458
4 Emmet Peracco eperacco3@cloudflare.com IA 52410
5 Elizabet O'Heaney eoheaney4@ted.com WI 53205
6 Randolf Ullyott rullyott5@netvibes.com WI 54305
7 Costanza Orred corred6@lulu.com MN 55446
8 Garry Ousby gousby7@seesaa.net IA 50310
9 Gery Kirrens gkirrens8@amazon.co.uk MN 55551
10 Marquita Gingle mgingle9@army.mil MN 55428
11 Erinn Zanotti ezanottia@blogtalkradio.com MN 55172
12 Garret Kimbley gkimbleyb@google.co.jp IA 52410
13 Jacki Aizkovitch jaizkovitchc@cocolog-nifty.com WI 53705
14 Rozanna Lohden rlohdend@sun.com MN 55146
15 Lib Hellier lhelliere@epa.gov WI 53405
16 Hiram Trimme htrimmef@flickr.com MN 55487
17 Lilllie Handsheart lhandsheartg@goo.ne.jp MN 55805
18 Tobiah Holsey tholseyh@cam.ac.uk MN 55585
19 Padraig Acey paceyi@statcounter.com MN 55114
20 Dulcy Ellaway dellawayj@engadget.com MN 55407
21 Karena Costigan kcostigank@ucoz.ru MN 55428
22 Patti MacAllen pmacallenl@sphinn.com MN 56372
23 Sibilla Benion sbenionm@ed.gov IA 50315
24 Allister Player aplayern@shareasale.com WI 53726
25 Cross Shanks cshankso@blogger.com WI 53234
26 Cassy Orris corrisp@hostgator.com ND 58207
27 Concordia Paolini cpaoliniq@ucla.edu MN 55458
28 Madlin Sansome msansomer@domainmarket.com IA 50981
29 Jarrett Raddon jraddons@studiopress.com MN 55428
30 Georgette Thorpe gthorpet@independent.co.uk MN 55407
31 Darlene Dowbekin ddowbekinu@unblog.fr WI 53225
32 Reggie Roches rrochesv@storify.com ND 58207
33 Leigh Duthy lduthyw@globo.com MN 55428
34 Kirsteni Querrard kquerrardx@unicef.org MN 55166
35 Kalli Whooley kwhooleyy@businesswire.com WI 54305
36 Dag Cheshire dcheshirez@slate.com SD 57188
37 Ferdinand Sier fsier10@soundcloud.com MN 55172
38 Vanya Bim vbim11@slideshare.net MN 55551
39 Blondy Hitchens bhitchens12@yellowbook.com MN 55590
40 Alis Websdale awebsdale13@squidoo.com IA 50315
41 Leeland Windridge lwindridge14@icq.com ND 58106
42 Dionysus Maude dmaude15@loc.gov IA 52809
43 Hyacintha Helm hhelm16@upenn.edu IA 51110
44 Brenda Rounsefull brounsefull17@patch.com MN 55487
45 Greer Pohlke gpohlke18@mapy.cz MN 55572
46 Craig Bottrill cbottrill19@mac.com MN 55585
47 Zacharie Dellow zdellow1a@dyndns.org IA 50320
48 Maryanne Broker mbroker1b@ebay.co.uk ND 58207
49 Elvis O'Monahan eomonahan1c@nsw.gov.au MN 55108
50 Felecia Butterworth fbutterworth1d@merriam-webster.com WI 53790
51 Brook Hulk bhulk1e@yale.edu MN 55441
52 Merry O'Caherny mocaherny1f@reverbnation.com SD 57198
53 Laryssa Sleit lsleit1g@ucsd.edu MN 55103
54 Therine Croydon tcroydon1h@forbes.com MN 55407
55 Ainslie Bowne abowne1i@ask.com ND 58207
56 Tammie Leavry tleavry1j@wp.com MN 55146
57 Filmore Cuerda fcuerda1k@desdev.cn MN 55585
58 Ronnie Truitt rtruitt1l@shop-pro.jp WI 53215
59 Cyndie Patty cpatty1m@domainmarket.com IA 51105
60 Lorelei Handslip lhandslip1n@acquirethisname.com SD 57193
61 Ingra Belasco ibelasco1o@de.vu IA 52410
62 Page Stockell pstockell1p@amazonaws.com WI 53263
63 Dareen Strevens dstrevens1q@harvard.edu IA 51105
64 Hurleigh Kynforth hkynforth1r@eepurl.com MN 55441
65 Moira Moggach mmoggach1s@domainmarket.com MN 55108
66 Elroy Bowerman ebowerman1t@studiopress.com WI 54313
67 Gayler Vsanelli gvsanelli1u@soundcloud.com ND 58505
68 Valli Hellyar vhellyar1v@google.co.uk WI 53215
69 Etan Claris eclaris1w@addtoany.com MN 55436
70 Sam Laker slaker1x@cornell.edu IA 50981
71 Meier Oliveira moliveira1y@furl.net MN 55480
72 Rosalie Fahy rfahy1z@yellowpages.com MN 55407
73 Binni Veasey bveasey20@parallels.com MN 55407
74 Chaddy Aronin caronin21@dedecms.com IA 50315
75 Nelson Burd nburd22@weebly.com WI 53785
76 Hilda Licence hlicence23@ask.com WI 53220
77 Ennis Dearnaley edearnaley24@bloglines.com IA 50369
78 Claribel Leads cleads25@aol.com MN 55127
79 Kiley De Mico kdemico26@yellowbook.com MN 55564
80 Emogene Long elong27@wufoo.com IA 50330
81 Bellanca Ritch britch28@techcrunch.com IA 51105
82 Lana Studdeard lstuddeard29@usatoday.com ND 58505
83 Cynthie Dowdell cdowdell2a@state.gov IA 50706
84 Willy Noore wnoore2b@ovh.net IA 52245
85 Luci Barrasse lbarrasse2c@wired.com MN 55446
86 Viv Sowood vsowood2d@ucoz.ru WI 54313
87 Horacio Pilmoor hpilmoor2e@spiegel.de WI 53210
88 Nikita Madine nmadine2f@eepurl.com WI 53790
89 Kally Klees kklees2g@1und1.de MN 55417
90 Taddeusz Christou tchristou2h@samsung.com MN 55123
91 Margarita Lafont mlafont2i@livejournal.com MN 56398
92 Sebastiano Sarginson ssarginson2j@webnode.com MN 55598
93 Salome Howitt showitt2k@dot.gov MN 55565
94 Hilda Gethins hgethins2l@skype.com MN 55441
95 Wayland Ilett wilett2m@taobao.com IA 50369
96 Brook Nashe bnashe2n@wordpress.com IA 50369
97 Corrie Bleackley cbleackley2o@usnews.com WI 53705
98 Cheslie O' Markey comarkey2p@hhs.gov MN 55811
99 Valeria Heeron vheeron2q@google.ca MN 55565
100 Rosalind Gordon-Giles rgordongiles2r@mapy.cz MN 55402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment