Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 94-775/95-865: Manifold Learning Demo\n",
"\n",
"Author: George H. Chen (georgechen [at symbol] cmu.edu)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('seaborn')\n",
"np.set_printoptions(precision=2, suppress=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-dimensional scaling (MDS)\n",
"\n",
"Given a table of pairwise distances, MDS finds points that approximately have those pairwise distances. MDS is used within the Isomap algorithm."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 6.4]\n",
" [ 6.6]\n",
" [ 0. ]\n",
" [-4.6]\n",
" [-8.4]]\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"point_names = ['A', 'B', 'C', 'D', 'E']\n",
"distances = np.array([[0, 5, 8, 13, 16],\n",
" [5, 0, 5, 10, 13],\n",
" [8, 5, 0, 5, 8],\n",
" [13, 10, 5, 0, 5],\n",
" [16, 13, 8, 5, 0]])\n",
"\n",
"from sklearn.manifold import MDS\n",
"\n",
"# remove random_state=0 and re-running gives you different 1D representations of A, B, C, D, E\n",
"mds = MDS(n_components=1, dissimilarity='precomputed', random_state=0)\n",
"low_dimensional_points = mds.fit_transform(distances)\n",
"print(low_dimensional_points)\n",
"\n",
"plt.scatter(low_dimensional_points, np.zeros(len(low_dimensional_points)))\n",
"for idx in range(len(point_names)):\n",
" plt.annotate(point_names[idx], (low_dimensional_points[idx], 0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Isomap\n",
"\n",
"In practice, if you want to use Isomap, you don't need to write the MDS code yourself. Instead, you would use Isomap as follows (and Isomap's fitting procedure will do MDS under the hood without telling you about it):"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"swiss_roll_2d = np.array([[479, -231],\n",
" [515, -237],\n",
" [551, -233],\n",
" [581, -255],\n",
" [597, -284],\n",
" [610, -313],\n",
" [619, -341],\n",
" [623, -368],\n",
" [617, -407],\n",
" [591, -434],\n",
" [573, -468],\n",
" [542, -478],\n",
" [507, -490],\n",
" [471, -482],\n",
" [437, -462],\n",
" [398, -446],\n",
" [383, -408],\n",
" [403, -373],\n",
" [430, -349],\n",
" [470, -326],\n",
" [507, -320],\n",
" [531, -352],\n",
" [527, -385],\n",
" [487, -397]])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(24, 2)\n"
]
}
],
"source": [
"print(swiss_roll_2d.shape)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(370.98850059737157,\n",
" 635.0114994026284,\n",
" -502.96770603384846,\n",
" -218.03229396615157)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.scatter(swiss_roll_2d[:, 0], swiss_roll_2d[:, 1], c=list(range(len(swiss_roll_2d))), cmap='Spectral')\n",
"plt.axis('equal')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"pca = PCA(n_components=1)\n",
"swiss_roll_1d_pca = pca.fit_transform(swiss_roll_2d)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def plot_1d(data_1d, y_offset=-0.001):\n",
" plt.scatter(data_1d, np.zeros(len(data_1d)),\n",
" c=list(range(len(data_1d))), cmap='Spectral')"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_1d(swiss_roll_1d_pca)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.manifold import Isomap\n",
"\n",
"isomap = Isomap(n_neighbors=2, n_components=1)\n",
"swiss_roll_1d_isomap = isomap.fit_transform(swiss_roll_2d)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_1d(swiss_roll_1d_isomap)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case you are wondering what distance matrix Isomap computed (using the nearest neighbor graph) to use with Isomap, you can print it out as follows:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0. 36.5 72.03 109.23 142.35 174.13 203.54 230.84 270.3 307.78\n",
" 346.25 378.82 415.82 452.7 492.15 534.3 575.16 615.47 651.59 697.73\n",
" 735.22 775.22 808.46 838.15]\n",
" [ 36.5 0. 36.22 73.42 106.54 138.33 167.74 195.03 234.49 271.97\n",
" 310.44 343.02 380.02 416.89 456.34 498.5 539.35 579.66 615.78 661.93\n",
" 699.41 739.41 772.65 802.35]\n",
" [ 72.03 36.22 0. 37.2 70.32 102.1 131.51 158.81 198.27 235.75\n",
" 274.22 306.8 343.8 380.67 420.12 462.27 503.13 543.44 579.56 625.7\n",
" 663.19 703.19 736.43 766.12]\n",
" [109.23 73.42 37.2 0. 33.12 64.9 94.31 121.61 161.07 198.55\n",
" 237.02 269.59 306.59 343.47 382.92 425.07 465.93 506.24 542.36 588.5\n",
" 625.99 665.99 699.23 728.92]\n",
" [142.35 106.54 70.32 33.12 0. 31.78 61.19 88.49 127.94 165.43\n",
" 203.9 236.47 273.47 310.35 349.8 391.95 432.8 473.12 509.24 555.38\n",
" 592.86 632.86 666.11 695.8 ]\n",
" [174.13 138.33 102.1 64.9 31.78 0. 29.41 56.71 96.16 133.65\n",
" 172.12 204.69 241.69 278.57 318.02 360.17 401.02 441.34 477.46 523.6\n",
" 561.08 601.08 634.33 664.02]\n",
" [203.54 167.74 131.51 94.31 61.19 29.41 0. 27.29 66.75 104.24\n",
" 142.71 175.28 212.28 249.16 288.6 330.76 371.61 411.92 448.05 494.19\n",
" 531.67 571.67 604.91 634.61]\n",
" [230.84 195.03 158.81 121.61 88.49 56.71 27.29 0. 39.46 76.94\n",
" 115.41 147.99 184.99 221.86 261.31 303.46 344.32 384.63 420.75 466.9\n",
" 504.38 544.38 577.62 607.32]\n",
" [270.3 234.49 198.27 161.07 127.94 96.16 66.75 39.46 0. 37.48\n",
" 75.95 108.53 145.53 182.41 221.85 264.01 304.86 345.17 381.3 427.44\n",
" 464.92 504.92 538.16 567.86]\n",
" [307.78 271.97 235.75 198.55 165.43 133.65 104.24 76.94 37.48 0.\n",
" 38.47 71.04 108.04 144.92 184.37 226.52 267.38 307.69 343.81 389.95\n",
" 427.44 467.44 500.68 530.37]\n",
" [346.25 310.44 274.22 237.02 203.9 172.12 142.71 115.41 75.95 38.47\n",
" 0. 32.57 69.57 106.45 145.9 188.05 228.91 269.22 305.34 351.48\n",
" 388.97 428.97 462.21 491.9 ]\n",
" [378.82 343.02 306.8 269.59 236.47 204.69 175.28 147.99 108.53 71.04\n",
" 32.57 0. 37. 73.88 113.32 155.48 196.33 236.64 272.77 318.91\n",
" 356.39 396.39 429.63 459.33]\n",
" [415.82 380.02 343.8 306.59 273.47 241.69 212.28 184.99 145.53 108.04\n",
" 69.57 37. 0. 36.88 76.32 118.48 159.33 199.64 235.77 281.91\n",
" 319.39 359.39 392.63 422.33]\n",
" [452.7 416.89 380.67 343.47 310.35 278.57 249.16 221.86 182.41 144.92\n",
" 106.45 73.88 36.88 0. 39.45 81.6 122.45 162.77 198.89 245.03\n",
" 282.51 322.51 355.76 385.45]\n",
" [492.15 456.34 420.12 382.92 349.8 318.02 288.6 261.31 221.85 184.37\n",
" 145.9 113.32 76.32 39.45 0. 42.15 83.01 123.32 159.44 205.59\n",
" 243.07 283.07 316.31 346. ]\n",
" [534.3 498.5 462.27 425.07 391.95 360.17 330.76 303.46 264.01 226.52\n",
" 188.05 155.48 118.48 81.6 42.15 0. 40.85 81.16 117.29 163.43\n",
" 200.91 240.91 274.16 303.85]\n",
" [575.16 539.35 503.13 465.93 432.8 401.02 371.61 344.32 304.86 267.38\n",
" 228.91 196.33 159.33 122.45 83.01 40.85 0. 40.31 76.44 122.58\n",
" 160.06 200.06 233.3 263. ]\n",
" [615.47 579.66 543.44 506.24 473.12 441.34 411.92 384.63 345.17 307.69\n",
" 269.22 236.64 199.64 162.77 123.32 81.16 40.31 0. 36.12 82.27\n",
" 119.75 159.75 192.99 222.69]\n",
" [651.59 615.78 579.56 542.36 509.24 477.46 448.05 420.75 381.3 343.81\n",
" 305.34 272.77 235.77 198.89 159.44 117.29 76.44 36.12 0. 46.14\n",
" 83.62 123.62 156.87 186.56]\n",
" [697.73 661.93 625.7 588.5 555.38 523.6 494.19 466.9 427.44 389.95\n",
" 351.48 318.91 281.91 245.03 205.59 163.43 122.58 82.27 46.14 0.\n",
" 37.48 77.48 110.72 140.42]\n",
" [735.22 699.41 663.19 625.99 592.86 561.08 531.67 504.38 464.92 427.44\n",
" 388.97 356.39 319.39 282.51 243.07 200.91 160.06 119.75 83.62 37.48\n",
" 0. 40. 73.24 102.94]\n",
" [775.22 739.41 703.19 665.99 632.86 601.08 571.67 544.38 504.92 467.44\n",
" 428.97 396.39 359.39 322.51 283.07 240.91 200.06 159.75 123.62 77.48\n",
" 40. 0. 33.24 62.94]\n",
" [808.46 772.65 736.43 699.23 666.11 634.33 604.91 577.62 538.16 500.68\n",
" 462.21 429.63 392.63 355.76 316.31 274.16 233.3 192.99 156.87 110.72\n",
" 73.24 33.24 0. 41.76]\n",
" [838.15 802.35 766.12 728.92 695.8 664.02 634.61 607.32 567.86 530.37\n",
" 491.9 459.33 422.33 385.45 346. 303.85 263. 222.69 186.56 140.42\n",
" 102.94 62.94 41.76 0. ]]\n"
]
}
],
"source": [
"print(isomap.dist_matrix_)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.colorbar.Colorbar at 0x11ee5c650>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.imshow(isomap.dist_matrix_)\n",
"plt.colorbar()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What happens if you choose the number of nearest neighbors to be too large?"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"bad_isomap = Isomap(n_neighbors=23, n_components=1)\n",
"swiss_roll_1d_bad_isomap = bad_isomap.fit_transform(swiss_roll_2d)\n",
"\n",
"plot_1d(swiss_roll_1d_bad_isomap)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.colorbar.Colorbar at 0x11ed8bcd0>"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.imshow(bad_isomap.dist_matrix_)\n",
"plt.colorbar()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.colorbar.Colorbar at 0x11c8735d0>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 4 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.subplot(1, 2, 1)\n",
"plt.imshow(isomap.dist_matrix_)\n",
"plt.colorbar()\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"plt.imshow(bad_isomap.dist_matrix_)\n",
"plt.colorbar()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# t-SNE\n",
"\n",
"In terms of using other manifold learning algorithms that scikit-learn provides, the code is similar to using PCA and Isomap; however, different methods have different parameters that you do have to play with. One of the most popular manifold learning methods used in practice is t-SNE. Here's an example usage case:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[t-SNE] Computing 7 nearest neighbors...\n",
"[t-SNE] Indexed 24 samples in 0.000s...\n",
"[t-SNE] Computed neighbors for 24 samples in 0.001s...\n",
"[t-SNE] Computed conditional probabilities for sample 24 / 24\n",
"[t-SNE] Mean sigma: 23.784069\n",
"[t-SNE] KL divergence after 250 iterations with early exaggeration: 50.924473\n",
"[t-SNE] KL divergence after 1000 iterations: 0.463246\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from sklearn.manifold import TSNE\n",
"\n",
"# *WARNING*: *despite* what the sklearn documentation says for t-SNE, the perplexity parameter\n",
"# does matter! a helpful article to read: https://distill.pub/2016/misread-tsne/\n",
"tsne = TSNE(n_components=1, perplexity=2, learning_rate=.01, init='random', verbose=1, random_state=0)\n",
"swiss_roll_1d_tsne = tsne.fit_transform(swiss_roll_2d)\n",
"plot_1d(swiss_roll_1d_tsne)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment