Skip to content

Instantly share code, notes, and snippets.

@miraculixx miraculixx/
Last active Aug 8, 2016

What would you like to do?
A simple, extendible python file parser with high processing speed and low memory overhead

in response to this question on stackoverflow


$ chmod +x


$ cat sample.csv | 

Add more filters:

Modify filter() accordingly. E.g. to filter on the first column, change the existing condition:

if fields[0] == 'some other value':
     return True

To add more conditions extend the filters in any way you like. Here is one example derived from a decision table ($<n> refres to field n, zero-indexed):

Conditions:   R1     R2     R3      R_else  
        $0    foo    abc    <any>   <else>
        $1    !xyz   !xyz   xyz      
    include   X      X
    exclude                 X        X
def filter(fields, line):
  # R_else
  should_include = False
  # R1
  if fields[0] == "foo" and fields[1] != "xyz":
     should_include = True
  # R2
  if fields[0] == "abc" and fields[1] != "xyz":
     should_include = True
  # R3
  if fields[1] == "xyz":
     should_include = False
  return should_include

Note you could also write the same as a simple conditional statement, however this becomes unmaintainable quickly.

return (fields[0] in ['foo', 'abc']) and fields[1] != "xyz"
#!/usr/bin/env python
import sys
def filter(fields, line):
put your conditions here
return True to include the line in the output
fields are all fields in line
if fields[0] == 'foo':
return True
def parsed(infile, sep=','):
""" helper function to call filter() """
for line in infile:
fields = line.split(sep)
if filter(fields, line):
yield line
# only output lines that should be filtered
for output in parsed(sys.stdin):
foo abc
bla cde
foo fgh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.