Skip to content

Instantly share code, notes, and snippets.

@idiomer
Created March 20, 2020 04:16
Show Gist options
  • Save idiomer/2929afd8908649fb2d06dbb0ef3397c8 to your computer and use it in GitHub Desktop.
Save idiomer/2929afd8908649fb2d06dbb0ef3397c8 to your computer and use it in GitHub Desktop.
元素排重,且保持原有顺序
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
from itertools import filterfalse
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
@idiomer
Copy link
Author

idiomer commented Mar 20, 2020

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.

(refer: https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment