Skip to content

Instantly share code, notes, and snippets.

@dting
Created June 5, 2015 07:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dting/c28fb161de7b6287491b to your computer and use it in GitHub Desktop.
Save dting/c28fb161de7b6287491b to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"224\n",
"180\n"
]
}
],
"source": [
"masterlist = \"Lorem ipsum dolor sit amet, agam assum nemore id vis, decore commune oporteat in has. Ne sit malis instructior, no mazim ancillae moderatius pri. Admodum albucius pri cu, duo nihil nonumy appellantur id, essent alienum invenire ea duo. Ius cu justo intellegebat, vix integre suscipiantur et, vix in error aliquam periculis. Solum deleniti accusamus ea pri, ex saperet honestatis eos. Ut vero justo eruditi nam. Quot vide minim his no, at eam graeco complectitur. Nemore expetenda nam in. Vel torquatos consetetur concludaturque et, ad duo dicam ponderum, ad eos populo suscipit. An per enim erant, mei et quot invenire abhorreant, et qui soleat comprehensam. Audire menandri conceptam eu ius, at eum vitae dolorem erroribus. Ei his soluta invidunt, cu sit vide disputationi. Diam alienum elaboraret per id. Eam novum nobis pertinax cu, ad sed justo possit. Idque quando partiendo vix ut. Nominavi inimicus similique et est. Eam quas idque deseruisse eu, sea ut appetere invenire definitionem. Efficiendi ullamcorper has ei, an has liber ignota. Est an oblique appetere volutpat, ad lobortis democritum sea. An nec soluta corpora, eu lorem vidisse sea, id mutat facilisi insolens ius. Consul tamquam qui id, ex alienum interpretaris eum, te tota posse eam. Cum lobortis sententiae ei, nam et feugiat recusabo persequeris. Sit choro voluptatibus ne. Ea mel quaeque vivendo. Eam case hendrerit no, eu harum ceteros invenire sed.\".split()\n",
"print(len(masterlist))\n",
"print(len(set(masterlist)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# `O(n)`"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def counter_sub_non_mut(mylist):\n",
" from collections import Counter, defaultdict\n",
"\n",
" # not modifying original list\n",
" counts = {k:v for k,v in Counter(mylist).items() if v > 1}\n",
" newlist = mylist[:]\n",
" for i in reversed(range(len(mylist))):\n",
" item = mylist[i]\n",
" if item in counts and counts[item]:\n",
" newlist[i] += str(counts[item])\n",
" counts[item]-=1\n",
" return newlist"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Lorem', 'ipsum', 'dolor', 'sit1', 'amet,', 'agam', 'assum', 'nemore', 'id1', 'vis,']\n",
"10000 loops, best of 3: 142 µs per loop\n"
]
}
],
"source": [
"print(counter_sub_non_mut(masterlist[:])[:10])\n",
"%timeit counter_sub_non_mut(masterlist[:])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'a10']\n",
"100 loops, best of 3: 8.08 ms per loop\n"
]
}
],
"source": [
"print(counter_sub_non_mut([\"a\"]*10**4)[:10])\n",
"%timeit counter_sub_non_mut([\"a\"]*10**4)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']\n",
"100 loops, best of 3: 5.39 ms per loop\n"
]
}
],
"source": [
"print(counter_sub_non_mut(list(map(str, range(10**4))))[:10])\n",
"%timeit counter_sub_non_mut(list(map(str, range(10**4))))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def counter_sub_mut(mylist):\n",
" from collections import Counter, defaultdict\n",
" \n",
" # modifying original list\n",
" counts = {k:v for k,v in Counter(mylist).items() if v > 1} \n",
" for i in reversed(range(len(mylist))):\n",
" item = mylist[i]\n",
" if item in counts and counts[item]:\n",
" mylist[i] += str(counts[item])\n",
" counts[item]-=1\n",
" return mylist"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Lorem', 'ipsum', 'dolor', 'sit1', 'amet,', 'agam', 'assum', 'nemore', 'id1', 'vis,']\n",
"10000 loops, best of 3: 139 µs per loop\n"
]
}
],
"source": [
"print(counter_sub_mut(masterlist[:])[:10])\n",
"%timeit counter_sub_mut(masterlist[:])"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'a10']\n",
"100 loops, best of 3: 7.94 ms per loop\n"
]
}
],
"source": [
"print(counter_sub_mut([\"a\"]*10**4)[:10])\n",
"%timeit counter_sub_mut([\"a\"]*10**4)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']\n",
"100 loops, best of 3: 5.38 ms per loop\n"
]
}
],
"source": [
"print(counter_sub_mut(list(map(str, range(10**4))))[:10])\n",
"%timeit counter_sub_mut(list(map(str, range(10**4))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# `O(n^2)`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"http://stackoverflow.com/a/30651963/635411\n",
"\n",
"`O(n^2)` using `.index()`\n",
"\n",
"This has excellent performance when all the items in the list are difference since it doesn't do any index lookups."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def counter_with_index(mylist):\n",
" from collections import Counter # Counter counts the number of occurrences of each item\n",
" \n",
" counts = Counter(mylist) # so we have: {'name':3, 'state':1, 'city':1, 'zip':2}\n",
" for s,num in counts.items():\n",
" if num > 1: # ignore strings that only appear once\n",
" for suffix in range(1, num + 1): # suffix starts at 1 and increases by 1 each time\n",
" mylist[mylist.index(s)] = s + str(suffix) # replace each appearance of s\n",
" return mylist"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Lorem', 'ipsum', 'dolor', 'sit1', 'amet,', 'agam', 'assum', 'nemore', 'id1', 'vis,']\n",
"1000 loops, best of 3: 310 µs per loop\n"
]
}
],
"source": [
"print(counter_with_index(masterlist[:])[:10])\n",
"%timeit counter_with_index(masterlist[:])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'a10']\n",
"1 loops, best of 3: 883 ms per loop\n"
]
}
],
"source": [
"print(counter_with_index([\"a\"]*10**4)[:10])\n",
"%timeit counter_with_index([\"a\"]*10**4)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']\n",
"100 loops, best of 3: 4.19 ms per loop\n"
]
}
],
"source": [
"print(counter_with_index(list(map(str, range(10**4))))[:10])\n",
"%timeit counter_with_index(list(map(str, range(10**4))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"http://stackoverflow.com/a/30650847/635411\n",
" \n",
"`O(n^2)` using `.count()`"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def one_liner_count(mylist):\n",
" return list(map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist)))"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Lorem', 'ipsum', 'dolor', 'sit1', 'amet,', 'agam', 'assum', 'nemore', 'id1', 'vis,']\n",
"1000 loops, best of 3: 1.31 ms per loop\n"
]
}
],
"source": [
"print(one_liner_count(masterlist[:])[:10])\n",
"%timeit one_liner_count(masterlist[:])"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'a10']\n",
"1 loops, best of 3: 737 ms per loop\n"
]
}
],
"source": [
"print(one_liner_count([\"a\"]*10**4)[:10])\n",
"%timeit one_liner_count([\"a\"]*10**4)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']\n",
"1 loops, best of 3: 2.18 s per loop\n"
]
}
],
"source": [
"print(one_liner_count(list(map(str, range(10**4))))[:10])\n",
"%timeit one_liner_count(list(map(str, range(10**4))))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment