Skip to content

Instantly share code, notes, and snippets.

@dfolch
Created July 31, 2016 12:52
Show Gist options
  • Save dfolch/b42734ee5d1de01cea6480ccdd326359 to your computer and use it in GitHub Desktop.
Save dfolch/b42734ee5d1de01cea6480ccdd326359 to your computer and use it in GitHub Desktop.
Testing tie breaking in SciPy cKDTree
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Testing tie breaking in SciPy's cKDTree\n",
"\n",
"When two points are at exactly the same location, deciding which is \"nearest\" to the queried point is ambiguous. Below are some tests of when the ordering changes. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from scipy.spatial import cKDTree as kdt"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.17.1\n"
]
}
],
"source": [
"import scipy\n",
"print scipy.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create 1000 points, each with an X coord, Y coord and ID. 20 of the point locations have 5 IDs at the location. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"np.random.seed(4444)\n",
"points = np.random.normal(1000, 100, (1000,2))\n",
"points[20:40,:] = points[0:20,:]\n",
"points[40:60,:] = points[0:20,:]\n",
"points[60:80,:] = points[0:20,:]\n",
"points[80:100,:] = points[0:20,:]\n",
"pdf = pd.DataFrame(points, columns=['x','y'])\n",
"pdf.index = range(88, 1088)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total points: 1000\n",
"unique locations: 920\n"
]
}
],
"source": [
"print \"total points:\", pdf.shape[0]\n",
"print \"unique locations:\", pdf.x.unique().shape[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create the kd-tree"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"tree = kdt(pdf.values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll test the nearest neighbors on ID 135 (this point shares its location with four other points). When we ask for the nearest neighbors to 135, the first five \"neighbors\" could be in any order."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Test 1: If you change the number of neighbors (k) the order changes."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"k = 630 [115 175 155 135 95 371]\n",
"k = 244 [155 95 175 135 115 371]\n",
"k = 981 [175 135 155 115 95 371]\n",
"k = 501 [115 175 135 155 95 371]\n",
"k = 993 [135 175 115 95 155 371]\n",
"k = 938 [155 175 95 115 135 371]\n",
"k = 289 [115 155 95 135 175 371]\n",
"k = 778 [ 95 115 155 175 135 371]\n",
"k = 173 [155 135 115 175 95 371]\n",
"k = 777 [155 135 115 95 175 371]\n"
]
}
],
"source": [
"for i in range(10):\n",
" query = pdf.ix[[135]]\n",
" k = np.random.randint(100,1000)\n",
" results = tree.query(query, k=k)[1]\n",
" results = pdf.index[results].values # convert positions to IDs\n",
" print 'k =', k, results[0,0:6]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Test 2: The number of other observations queried doesn't matter."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"queried = 121 [135 175 155 95 115 371]\n",
"queried = 688 [135 175 155 95 115 371]\n",
"queried = 502 [135 175 155 95 115 371]\n",
"queried = 132 [135 175 155 95 115 371]\n",
"queried = 187 [135 175 155 95 115 371]\n",
"queried = 892 [135 175 155 95 115 371]\n",
"queried = 131 [135 175 155 95 115 371]\n",
"queried = 519 [135 175 155 95 115 371]\n",
"queried = 744 [135 175 155 95 115 371]\n",
"queried = 831 [135 175 155 95 115 371]\n"
]
}
],
"source": [
"for i in range(10):\n",
" select_count = np.random.randint(10,1000)\n",
" get_these = [135] + np.random.permutation(range(88,1088)).tolist()[0:select_count]\n",
" query = pdf.ix[get_these]\n",
" results = tree.query(query, k=401)[1]\n",
" results = pdf.index[results].values # convert positions to IDs\n",
" print 'queried =', select_count, results[0,0:6]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Test 3: Position of ID 135 doesn't matter."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"selected = 45 position = 44 [135 175 155 95 115 371]\n",
"selected = 808 position = 379 [135 175 155 95 115 371]\n",
"selected = 843 position = 475 [135 175 155 95 115 371]\n",
"selected = 995 position = 648 [135 175 155 95 115 371]\n",
"selected = 567 position = 374 [135 175 155 95 115 371]\n",
"selected = 513 position = 364 [135 175 155 95 115 371]\n",
"selected = 538 position = 217 [135 175 155 95 115 371]\n",
"selected = 755 position = 21 [135 175 155 95 115 371]\n",
"selected = 565 position = 89 [135 175 155 95 115 371]\n",
"selected = 382 position = 63 [135 175 155 95 115 371]\n"
]
}
],
"source": [
"for i in range(10):\n",
" select_count = np.random.randint(10,1000)\n",
" get_these = [135] + np.random.permutation(range(88,1088)).tolist()[0:select_count]\n",
" get_these = list(set(get_these))\n",
" get_these = np.random.permutation(get_these).tolist()\n",
" query = pdf.ix[get_these]\n",
" results = tree.query(query, k=401)[1]\n",
" results = pdf.index[results].values # convert positions to IDs\n",
" print 'selected =', select_count, 'position =', query.index.get_loc(135), results[query.index.get_loc(135),0:6]"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [Root]",
"language": "python",
"name": "Python [Root]"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment