Instantly share code, notes, and snippets.

Embed
What would you like to do?
Python string multireplacement
def multireplace(string, replacements):
"""
Given a string and a replacement map, it returns the replaced string.
:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str
"""
# Place longer ones first to keep shorter substrings from matching where the longer ones should take place
# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against the string 'hey abc', it should produce
# 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)
# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))
# For each match, look up the new string in the replacements
return regexp.sub(lambda match: replacements[match.group(0)], string)
@derdav3

This comment has been minimized.

derdav3 commented Apr 6, 2017

how would you modify this to have a case-insensitive match?

@HatScripts

This comment has been minimized.

HatScripts commented Apr 27, 2017

@derdav3

def multi_replace(string, replacements, ignore_case=False):
    """
    Given a string and a dict, replaces occurrences of the dict keys found in the 
    string, with their corresponding values. The replacements will occur in "one pass", 
    i.e. there should be no clashes.
    :param str string: string to perform replacements on
    :param dict replacements: replacement dictionary {str_to_find: str_to_replace_with}
    :param bool ignore_case: whether to ignore case when looking for matches
    :rtype: str the replaced string
    """
    rep_sorted = sorted(replacements, key=lambda s: len(s[0]), reverse=True)
    rep_escaped = [re.escape(replacement) for replacement in rep_sorted]
    pattern = re.compile("|".join(rep_escaped), re.I if ignore_case else 0)
    return pattern.sub(lambda match: replacements[match.group(0)], string)
@bgusach

This comment has been minimized.

Owner

bgusach commented May 10, 2017

@derdav3, as @HatScripts suggested, just pass the ignore-case flag to re.compile.

@HatScripts, I haven't tested your proposal, but... aren't you sorting the strings by the length of the first character (i.e. always 1?)

@sidscry

This comment has been minimized.

sidscry commented Feb 6, 2018

@HatScripts This case fails
string = "original text is here"
replacements = {
"original": "text",
"text" : "fake",
"Is hEre": "was there"
}
ignore_case = True

@thorfi

This comment has been minimized.

thorfi commented Jul 11, 2018

@sidscry @HatScripts @bgusach:
Bugfixes for the above, replace: rep_sorted = ... with:

    if ignore_case:
        replacements = dict((pair[0].lower(), pair[1]) for pair in sorted(replacements.iteritems()))
    rep_sorted = sorted(replacements, key=lambda s: (len(s), s), reverse=True)
    ...
    return pattern.sub(lambda match: replacements[match.group(0).lower() if ignore_case else match.group(0)], string)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment