Skip to content

Instantly share code, notes, and snippets.

@miraculixx
Last active August 8, 2016 05:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save miraculixx/ba998357daf3c255cf3ade2b0bc88497 to your computer and use it in GitHub Desktop.
Save miraculixx/ba998357daf3c255cf3ade2b0bc88497 to your computer and use it in GitHub Desktop.
A simple, extendible python file parser with high processing speed and low memory overhead

in response to this question on stackoverflow

Installation:

$ chmod +x csvparse.py

Usage:

$ cat sample.csv | csvparse.py 

Add more filters:

Modify filter() accordingly. E.g. to filter on the first column, change the existing condition:

if fields[0] == 'some other value':
     return True

To add more conditions extend the filters in any way you like. Here is one example derived from a decision table ($<n> refres to field n, zero-indexed):

Conditions:   R1     R2     R3      R_else  
        $0    foo    abc    <any>   <else>
        $1    !xyz   !xyz   xyz      
Actions:
    include   X      X
    exclude                 X        X
def filter(fields, line):
  # R_else
  should_include = False
  # R1
  if fields[0] == "foo" and fields[1] != "xyz":
     should_include = True
  # R2
  if fields[0] == "abc" and fields[1] != "xyz":
     should_include = True
  # R3
  if fields[1] == "xyz":
     should_include = False
  return should_include

Note you could also write the same as a simple conditional statement, however this becomes unmaintainable quickly.

return (fields[0] in ['foo', 'abc']) and fields[1] != "xyz"
#!/usr/bin/env python
import sys
def filter(fields, line):
"""
put your conditions here
return True to include the line in the output
fields are all fields in line
"""
if fields[0] == 'foo':
return True
def parsed(infile, sep=','):
""" helper function to call filter() """
for line in infile:
fields = line.split(sep)
if filter(fields, line):
yield line
# only output lines that should be filtered
for output in parsed(sys.stdin):
sys.stdout.write(output)
foo abc
bla cde
foo fgh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment