Skip to content

Instantly share code, notes, and snippets.

@zacoppotamus
Created November 17, 2016 09:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zacoppotamus/0899e4c4d21025e93b701848cebcd960 to your computer and use it in GitHub Desktop.
Save zacoppotamus/0899e4c4d21025e93b701848cebcd960 to your computer and use it in GitHub Desktop.
Filtering CSV Columns by values from external files.
3002 - Manufacture computers & process equipment
7210 - Hardware consultancy
7220 - Software consultancy and supply
7221 - Software publishing
7222 - Other software consultancy and supply
7230 - Data processing
7240 - Data base activities
7250 - Maintenance office & computing mach
7260 - Other computer related activities
9211 - Motion picture and video production
9212 - Motion picture & video distribution
9213 - Motion picture projection
9220 - Radio and television activities
18201 - Reproduction of sound recording
18202 - Reproduction of video recording
18203 - Reproduction of computer media
26110 - Manufacture of electronic components
26120 - Manufacture of loaded electronic boards
26200 - Manufacture of computers and peripheral equipment
26301 - Manufacture of telegraph and telephone apparatus and equipment
26309 - Manufacture of communication equipment other than telegraph, and telephone apparatus and equipment
26400 - Manufacture of consumer electronics
47410 - Retail sale of computers, peripheral units and software in specialised stores
58290 - Other software publishing
59111 - Motion picture production activities
59112 - Video production activities
59113 - Television programme production activities
59120 - Motion picture, video and television programme post-production activities
59131 - Motion picture distribution activities
59132 - Video distribution activities
59133 - Television programme distribution activities
59140 - Motion picture projection activities
59200 - Sound recording and music publishing activities
60100 - Radio broadcasting
61100 - Wired telecommunications activities
61200 - Wireless telecommunications activities
61300 - Satellite telecommunications activities
61900 - Other telecommunications activities
62011 - Ready-made interactive leisure and entertainment software development
62012 - Business and domestic software development
62020 - Information technology consultancy activities
62030 - Computer facilities management activities
62090 - Other information technology service activities
63110 - Data processing, hosting and related activities
63120 - Web portals
63910 - News agency activities
63990 - Other information service activities not elsewhere classified
81100 - Combined facilities support activities
#!/usr/bin/gawk -f
# SIC category: $27, postcode: $10, city $7
# '","'
BEGIN {
OFS=",";
# FS=",";
# if (length(ARGV) != 4) {
# printf "Usage: %s categories.txt postcodes.txt records.csv\n", ARGV[0];
# exit(1);
# }
}
FILENAME == ARGV[1] { categories[$0]++; next }
FILENAME == ARGV[2] { postcodes[$0]++; next }
{
if ((toupper($7) == "LONDON") && ($27 in categories)) {
for (i in postcodes) {
if ($10 ~ i) {
print;
}
}
}
}
^E1
^E2
^EC1
^EC2
^N1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment