Skip to content

Instantly share code, notes, and snippets.

@bsweger
Created July 13, 2016 04:13
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save bsweger/a7542c990f9d27b4dcd4ce48d55b2d2f to your computer and use it in GitHub Desktop.
Save bsweger/a7542c990f9d27b4dcd4ce48d55b2d2f to your computer and use it in GitHub Desktop.
Apply a padding function to .csv columns (Pandas)
# example of using a parameterized function as a converter when reading .csv in pandas
import pandas as pd
# a function that will be used to pad datafram column values to a specified length
# (some incoming values are multiple spaces; those should convert to Noe)
padFunction = lambda field, padTo: str(field).strip().zfill(padTo) if len(str(field).strip()) else None
# read file w/o using converters and display list of unique alloc_id values
pa = pd.read_csv(
'https://raw.githubusercontent.com/fedspendingtransparency/data-act-broker-backend/master/dataactvalidator/config/program_activity.csv'
)
print(pd.unique(pa.alloc_id.ravel()))
# apply padding function to pad columns after file is read in
pa.account = pa.account.apply(padFunction, padTo=4)
pa.pa_code = pa.pa_code.apply(padFunction, padTo=4)
pa.alloc_id = pa.alloc_id.apply(padFunction, padTo=3)
pa.agency_id = pa.agency_id.apply(padFunction, padTo=3)
print(pd.unique(pa.alloc_id.ravel()))
# example of using a parameterized function as a converter when reading .csv in pandas
import pandas as pd
# a function that will be used to pad datafram column values to a specified length
# (some incoming values are multiple spaces; those should convert to Noe)
padFunction = lambda field, padTo: str(field).strip().zfill(padTo) if len(str(field).strip()) else None
converters = {
'agency_id': lambda x: padFunction(x, 3),
'alloc_id': lambda x: padFunction(x, 3),
'pa_code': lambda x: padFunction(x, 4),
'account': lambda x: padFunction(x, 4),
}
# read file w/o using converters and display list of unique alloc_id values
pa = pd.read_csv(
'https://raw.githubusercontent.com/fedspendingtransparency/data-act-broker-backend/master/dataactvalidator/config/program_activity.csv'
)
print(pd.unique(pa.alloc_id.ravel()))
# now read file using converters and display list of unique alloc_id values (should be zero-padded to length of 3)
pa = pd.read_csv(
'https://raw.githubusercontent.com/fedspendingtransparency/data-act-broker-backend/master/dataactvalidator/config/program_activity.csv',
converters=converters
)
print(pd.unique(pa.alloc_id.ravel()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment