Skip to content

Instantly share code, notes, and snippets.

@RahulDas-dev
Last active November 13, 2022 06:43
Show Gist options
  • Save RahulDas-dev/948c024ad3ae3857f11281ed2ba3d4fd to your computer and use it in GitHub Desktop.
Save RahulDas-dev/948c024ad3ae3857f11281ed2ba3d4fd to your computer and use it in GitHub Desktop.
Pairwise Distances Matrix using numpy
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "dd1d0afa",
"metadata": {},
"source": [
"## Distance Matrix Computation"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "db93338d",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "86eefba3",
"metadata": {},
"outputs": [],
"source": [
"def eucleiden_distance(a: np.ndarray,b: np.ndarray)-> np.ndarray:\n",
" diff = a - b\n",
" ssd = np.sum(diff**2, axis=1)\n",
" return np.sqrt(ssd)\n",
"\n",
"def distance_matrix(x: np.ndarray) -> np.ndarray:\n",
" no_of_obs,no_of_feature = x.shape\n",
" i, j = np.triu_indices(no_of_obs, k=1) # Upeer Traingular index Without Diagonal index\n",
" a = x[i] # Selecting elements for upper triangular distance computation\n",
" b = x[j] # Selecting elements for upper triangular distance computation \n",
" upper_triangle_distance = eucleiden_distance(a,b)\n",
" d_mat = np.zeros((no_of_obs, no_of_obs)) # Distance Matrix with all 0\n",
" d_mat[i,j] = upper_triangle_distance # Filling Up Upper Triangular Matrix\n",
" d_mat = d_mat + d_mat.T # Filling Up lower Triangular Matrix\n",
" return d_mat \n"
]
},
{
"cell_type": "markdown",
"id": "3329177c",
"metadata": {},
"source": [
"Let $X^{i}$ is Vector such that $\\forall X^{i}\\in \\mathbb{R}^{n}$\n",
"\n",
"And $D(X^{i},X^{j})$ is the distance between $X^{i},X^{j}$"
]
},
{
"cell_type": "markdown",
"id": "5da2ccf4",
"metadata": {},
"source": [
"\n",
"$\\large \\begin{bmatrix}\n",
"X^{1}\\\\\n",
"X^{2}\\\\\n",
"X^{3}\\\\\n",
"\\vdots\\\\\n",
"X^{m}\\\\\n",
"\\end{bmatrix}\n",
"\\text{Pairwise Distance} \\Rightarrow \\begin{bmatrix} \n",
"D(X^{1},X^{1}) & D(X^{1},X^{2}) & D(X^{1},X^{3}) & \\dots & D(X^{1},X^{m}) \\\\\n",
"D(X^{2},X^{1}) & D(X^{2},X^{2}) & D(X^{2},X^{3}) & \\dots & D(X^{2},X^{m}) \\\\\n",
"D(X^{3},X^{1}) & D(X^{3},X^{2}) & D(X^{3},X^{3}) & \\dots & D(X^{3},X^{m}) \\\\\n",
"\\vdots \\\\\n",
"D(X^{m},X^{1}) & D(X^{m},X^{2}) & D(X^{m},X^{3}) & \\dots & D(X^{m},X^{m}) \\\\\n",
"\\end{bmatrix}\n",
"$\n",
"\n",
"\n",
"Total number of distance calculation is $\\large m^{2}$\n",
"\n",
"if $\\large D(X^{i},X^{j})$ is commutative or $\\large D(X^{i},X^{j}) = D(X^{j},X^{i})$, then Distance Matrix will be symmetric\n",
"\n",
"So we need to calculate only upper triangle matrix element , which is $\\large \\frac{m^{2}}{2}$ computation\n",
"\n",
"and if $\\large D(X^{i},X^{i}) = 0 $ then diagonal elements will be always 0 so we have todo $\\large \\frac{m^{2}}{2} - m$ computation\n",
"\n",
"total computation = $\\large \\frac{m(m-1)}{2}$\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "4ea609d4",
"metadata": {},
"source": [
"### Data has 3 observation m=3 and n =2"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "f22b7d5a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0 0]\n",
" [3 4]\n",
" [3 5]]\n"
]
}
],
"source": [
"x = np.array([[0,0],[3,4],[3,5]])\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"id": "852c731e",
"metadata": {},
"source": [
"### Upper Traingular matrix indicies"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "b4f00863",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 0 1] [1 2 2]\n"
]
}
],
"source": [
"m,n = x.shape\n",
"#i, j = np.triu_indices(m,) #With Diagonal elements \n",
"i, j = np.triu_indices(m, k=1) #Without Diagonal elements \n",
"print(i,j)"
]
},
{
"cell_type": "markdown",
"id": "d54ffee0",
"metadata": {},
"source": [
"$\\begin{bmatrix} \n",
"D(0,0) & D(0,1) & D(0,2)\\\\\n",
"D(1,0) & D(1,1) & D(1,2)\\\\\n",
"D(2,0) & D(2,1) & D(2,2)\\\\\n",
"\\end{bmatrix} \\Rightarrow\n",
"\\begin{bmatrix}\n",
". & D(0,1) & D(0,2)\\\\\n",
". & . & D(1,2)\\\\\n",
". & . & .\\\\\n",
"\\end{bmatrix}\n",
"$"
]
},
{
"cell_type": "markdown",
"id": "f04282bb",
"metadata": {},
"source": [
"### Upper traingle elements for distance computations"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "4d806b94",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0 0]\n",
" [0 0]\n",
" [3 4]]\n",
"[[3 4]\n",
" [3 5]\n",
" [3 5]]\n"
]
}
],
"source": [
"a = x[i] # x[i] \n",
"b = x[j]\n",
"\n",
"print(a)\n",
"print(b)"
]
},
{
"cell_type": "markdown",
"id": "444df6fd",
"metadata": {},
"source": [
"### Euclide Distance computation using for upper traingular elements"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "9519d74f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5. 5.83095189 1. ]\n"
]
}
],
"source": [
"def eucleiden_distance(a,b):\n",
" diff = a-b\n",
" ssd = np.sum(diff**2, axis=1)\n",
" return np.sqrt(ssd)\n",
"\n",
"upper_distance = eucleiden_distance(a,b)\n",
"print(upper_distance)"
]
},
{
"cell_type": "markdown",
"id": "f1e30221",
"metadata": {},
"source": [
"### creating distance matrix with all 0"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "661a81f7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0., 0., 0.],\n",
" [0., 0., 0.],\n",
" [0., 0., 0.]])"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d_mat = np.zeros((m, m))\n",
"d_mat"
]
},
{
"cell_type": "markdown",
"id": "a780f20a",
"metadata": {},
"source": [
"### Filling upper traingular matrix"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "70f5001b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0. , 5. , 5.83095189],\n",
" [0. , 0. , 1. ],\n",
" [0. , 0. , 0. ]])"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d_mat[i,j] = upper_distance\n",
"d_mat"
]
},
{
"cell_type": "markdown",
"id": "842ab024",
"metadata": {},
"source": [
"### Filling up lowe trangular matrix"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "3e0547c9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0. , 5. , 5.83095189],\n",
" [5. , 0. , 1. ],\n",
" [5.83095189, 1. , 0. ]])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d_mat = d_mat + d_mat.T\n",
"d_mat"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cde0934d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment