Skip to content

Instantly share code, notes, and snippets.

@DGalt
Last active December 15, 2016 15:56
Show Gist options
  • Save DGalt/beef4039e2e9eb6adf8a239d2b78b203 to your computer and use it in GitHub Desktop.
Save DGalt/beef4039e2e9eb6adf8a239d2b78b203 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage\n",
"- Python built-in *sorted()* returns lexicographically sorted list \n",
"- Often not ideal when dealing with lists composed of strings of alphanumeric items"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['c002', 'c01', 'c02', 'c1', 'c2']"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l1 = ['c2', 'c1', 'c01', 'c002', 'c02']\n",
"sorted(l1)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['Cell02-protocol1-sweep1',\n",
" 'Cell02-protocol1-sweep2',\n",
" 'Cell1-protocol02-sweep03',\n",
" 'Cell1-protocol02-sweep1',\n",
" 'Cell1-protocol1-sweep1',\n",
" 'Cell1-protocol1-sweep10',\n",
" 'Cell1-protocol1-sweep2']"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l2 = ['Cell1-protocol1-sweep1'\n",
" , 'Cell1-protocol1-sweep2'\n",
" , 'Cell1-protocol1-sweep10'\n",
" , 'Cell1-protocol02-sweep1'\n",
" , 'Cell1-protocol02-sweep03'\n",
" , 'Cell02-protocol1-sweep1'\n",
" , 'Cell02-protocol1-sweep2']\n",
"\n",
"sorted(l2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"** *natsort* provides 'natural' sorting of lists / tuples / etc **"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from natsort import natsorted"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['c1', 'c01', 'c2', 'c002', 'c02']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l1)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['Cell1-protocol1-sweep1',\n",
" 'Cell1-protocol1-sweep2',\n",
" 'Cell1-protocol1-sweep10',\n",
" 'Cell1-protocol02-sweep1',\n",
" 'Cell1-protocol02-sweep03',\n",
" 'Cell02-protocol1-sweep1',\n",
" 'Cell02-protocol1-sweep2']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Controlling behavior of *natsorted()*\n",
"\n",
"- the 'alg' keyword provides a means by which you can change the algorithm natsorted uses to sort\n",
"- the ns module provides a series algorithm variations to use"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from natsort import ns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*** Dealing with letter case ***"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"['A', 'B', 'a', 'b']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = ['a', 'B', 'A', 'b']\n",
"sorted(l)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['A', 'B', 'a', 'b']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['a', 'A', 'B', 'b']"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l, alg=ns.IGNORECASE) #ns.IC is the shorthand"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['a', 'b', 'A', 'B']"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l, alg=ns.LOWERCASEFIRST) #ns.LF is the shorthand"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['A', 'a', 'B', 'b']"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l, alg=ns.GROUPLETTERS) #ns.G is the shorthand"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*** Sorting floats ***"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['0,5', '5', '5.0', '5.1']"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = ['5.1', '5', '5.0', '0,5']\n",
"sorted(l)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['0,5', '5', '5.0', '5.1']"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l, alg=ns.FLOAT)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"l = ['-43.21', '1.0e2', '1.1e1', '+103.2']"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['-43.21', '1.1e1', '1.0e2', '+103.2']"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l, alg=ns.FLOAT | ns.SIGNED)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*** Sorting file paths ***"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['..\\\\folder 2\\\\file.txt', '..\\\\folder\\\\file.txt']"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = ['..\\\\folder\\\\file.txt', '..\\\\folder 2\\\\file.txt']\n",
"sorted(l)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['..\\\\folder\\\\file.txt', '..\\\\folder 2\\\\file.txt']"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l, alg=ns.PATH) # ns.P is the shorthand"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Misc. Behavior\n",
"\n",
"***Mixed types***"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unorderable types: int() < str()",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-18-47b29de7bde2>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0ml\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;34m'a'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'c'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m2\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'1.2'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0msorted\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0ml\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m: unorderable types: int() < str()"
]
}
],
"source": [
"l = ['a', 'c', 2, '1.2']\n",
"sorted(l)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['1.2', 2, 'a', 'c']"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*** Reverse ***"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['c', 'a', 2, '1.2']"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"natsorted(l, reverse=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting natsorted key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*** Aside: 'key' keyword in builtin sorted() / sort() functions ***\n",
"\n",
"- Specifies a function to be applied to each element prior to making the comparison used for the sorting"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['John', 'Mike', 'kate', 'sally']"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = ['John', 'sally', 'Mike', 'kate']\n",
"sorted(l)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['John', 'kate', 'Mike', 'sally']"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted(l, key=str.lower)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[['Kate', 8], ['Tim', 9], ['John', 12]]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = [['John', 12], ['Kate', 8], ['Tim', 9]]\n",
"sorted(l, key=lambda x: x[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***natsort*** provides utility function to generate a key to use with e.g. inplace ***sort***"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from natsort import natsort_keygen"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['c1', 'c01', 'c2', 'c002', 'c02']"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = ['c2', 'c1', 'c01', 'c002', 'c02']\n",
"nskey = natsort_keygen()\n",
"l.sort(key=nskey)\n",
"l"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting index of sorted list\n",
"\n",
"- utility function that instead returns the indices use to sort the series"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from natsort import index_natsorted, order_by_index"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 0, 3, 4]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = ['c2', 'c1', 'c01', 'c002', 'c02']\n",
"ixs = index_natsorted(l)\n",
"ixs"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['c1', 'c01', 'c2', 'c002', 'c02']"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"order_by_index(l, ixs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*** working with pandas ***\n",
"\n",
"Adapted from example in docs / [this StackOverflow answer](http://stackoverflow.com/a/29582718/1399279)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0hr</th>\n",
" <td>a5</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128hr</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72hr</th>\n",
" <td>a10</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48hr</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96hr</th>\n",
" <td>a12</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b\n",
"0hr a5 b1\n",
"128hr a1 b1\n",
"72hr a10 b2\n",
"48hr a2 b2\n",
"96hr a12 b1"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pandas import DataFrame\n",
"df = DataFrame({'a': ['a5', 'a1', 'a10', 'a2', 'a12']\n",
" , 'b': ['b1', 'b1', 'b2', 'b2', 'b1']}\n",
" , index=['0hr', '128hr', '72hr', '48hr', '96hr'])\n",
"\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0hr</th>\n",
" <td>a5</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48hr</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72hr</th>\n",
" <td>a10</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96hr</th>\n",
" <td>a12</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128hr</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b\n",
"0hr a5 b1\n",
"48hr a2 b2\n",
"72hr a10 b2\n",
"96hr a12 b1\n",
"128hr a1 b1"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.reindex(index=natsorted(df.index))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>128hr</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48hr</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0hr</th>\n",
" <td>a5</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72hr</th>\n",
" <td>a10</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96hr</th>\n",
" <td>a12</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b\n",
"128hr a1 b1\n",
"48hr a2 b2\n",
"0hr a5 b1\n",
"72hr a10 b2\n",
"96hr a12 b1"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[index_natsorted(df.a)]"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>128hr</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0hr</th>\n",
" <td>a5</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96hr</th>\n",
" <td>a12</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48hr</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72hr</th>\n",
" <td>a10</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b\n",
"128hr a1 b1\n",
"0hr a5 b1\n",
"96hr a12 b1\n",
"48hr a2 b2\n",
"72hr a10 b2"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[index_natsorted(zip(df.b, df.a))]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0hr</th>\n",
" <td>a5</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96hr</th>\n",
" <td>a12</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128hr</th>\n",
" <td>a1</td>\n",
" <td>b1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48hr</th>\n",
" <td>a2</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72hr</th>\n",
" <td>a10</td>\n",
" <td>b2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b\n",
"0hr a5 b1\n",
"96hr a12 b1\n",
"128hr a1 b1\n",
"48hr a2 b2\n",
"72hr a10 b2"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[index_natsorted(zip(df.b, df.index))]"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment