Skip to content

Instantly share code, notes, and snippets.

@brunobord
Created November 2, 2011 13:56
Show Gist options
  • Save brunobord/1333685 to your computer and use it in GitHub Desktop.
Save brunobord/1333685 to your computer and use it in GitHub Desktop.
Chained string transformations
### My question is: is this solution elegant enough?
### I mean: if I'm adding several other functions to "clean" my "cell", will it still be "Pythonic"?
### e.g.: for f in (func1, func2, func3, func..): stuff = f(stuff)
def strip(cell):
return cell.strip()
def removedblspaces(cell):
return u' '.join(cell.split())
def clean(cell):
if isinstance(cell, unicode):
for f in (removedblspaces, strip):
cell = f(cell)
return cell
return cell
@revolunet
Copy link

what about regexps ?
example usage ?

@brunobord
Copy link
Author

my question was not within the "inner functions" ; they're string transformations, but they could be something else.

my point was to to look if Python didn't have a magic "Okay, I'm taking a list of functions and I'm applying each one of them on your item and then spits out a result".

regexps are too overkill in my opinion, and the inner functions can be more complicated than "replace this stuff with the other stuff".

And I still trust my motto: avoid regexp if you can. :op

@revolunet
Copy link

is desired usage something like this :

txt = 'hello, world'
clean_functions = [strip, removedblspace, capitalize]
print clean(txt, clean_functions)

? if so, heres a very similar example :

# -*- encoding: UTF-8 -*-

def a(s):
    return s.rjust(20, '-').ljust(30, '-')

def b(s):
    return s.replace('-', '_')

def c(s):
    return r'_//%s\\_' % s

def clean(s, chain=(a, b, c)):
    if isinstance(s, (unicode, str)):
        for f in chain:
            s = f(s)
    return s

print clean(u'hello, world')

@brunobord
Copy link
Author

yes, right. both solutions (yours and mine) are quite similar: iteration over a list of functions. that conforts me in my position, thx.

@davidbgk
Copy link

davidbgk commented Nov 3, 2011

What about:

def clean(cell):
    if isinstance(cell, unicode):
        return ' '.join(part.strip() for part in cell.split())
    return cell

To me this one-liner is more readable than understanding that it iterates through functions and verifying which function is doing what, in which order and so on.

@revolunet: never put a mutable as a default argument, please ;-)

@revolunet
Copy link

@davidbgk: yes thanks, i changed list to tuple ;)

the oneliner is readable but doesnt provide the chained function calls as requested, or did i miss something ?

@brunobord
Copy link
Author

@revolunet is right. @davidbgk has missed the point.

@revolunet
Copy link

@davidbgk pointed something important (and weird) about mutable functions arguments.

check this example :

def test1(a=[1,2,3]):
    a.append(4)
    return a

print test1()
print test1(a=[5,6,7])
print test1()

@davidbgk
Copy link

davidbgk commented Nov 3, 2011

@brunobord if you want some kind of pipeline to generate a workflow, I advise you to use generators. I even wonder if you can use decorators in your case:

@remove_blank_spaces
@strip
def clean_cell(data):
    return data
...

But it really depends on your data, as always!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment