Skip to content

Instantly share code, notes, and snippets.

@mateuszbaran
Created March 10, 2020 09:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mateuszbaran/19060ccf27b66aad68a3705bc3e39e44 to your computer and use it in GitHub Desktop.
Save mateuszbaran/19060ccf27b66aad68a3705bc3e39e44 to your computer and use it in GitHub Desktop.
Laboratorium 3
Display the source blob
Display the rendered blob
Raw
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Zajecia3.ipynb","provenance":[{"file_id":"1cJ_SNJCfd3Z58wJKKll6BVRgGuaCiCjh","timestamp":1583829844798}],"collapsed_sections":[],"toc_visible":true,"authorship_tag":"ABX9TyNWzr/gej7goxsKOgsaXwew"},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"gFHSNZ-iTU2p","colab_type":"text"},"source":["# Laboratorium 3: transformacje cech i redukcja wymiarowości\n","\n","W ramach laboratorium omówione zostaną popularne metody transformacji cech próbek oraz redukcji wymiarowości."]},{"cell_type":"code","metadata":{"id":"doIptr5uTULW","colab_type":"code","colab":{}},"source":["from sklearn import datasets\n","from sklearn.model_selection import train_test_split\n","import numpy as np\n","\n","boston = datasets.load_boston()\n","print(boston.DESCR)\n","\n","digits = datasets.load_digits()\n","\n","print(\"boston (shape): \", boston.data.shape)\n","\n","#print(boston.target)\n","\n","# Wycinamy kawałek danych do testów:\n","\n","X_train, X_test, y_train, y_test = train_test_split(\n"," boston.data, boston.target, test_size=0.2, random_state=421, shuffle=True)\n","\n","print(X_train.shape)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"t4LmOf37v2hu","colab_type":"text"},"source":["## PCA\n","\n","Wykonaj PCA osobno dla poszczególnych klas w zbiorze `digits`. Jaki ułamek wariancji wyjaśnia pierwsza główna składowa?\n","\n","Dla cyfry 2 narysuj kilka elementów przestrzeni własnej. Czy przypominają one tą cyfrę?\n","\n","Ostatni rysunek przestawia współrzędne rzutów na płaszczyznę własną rozpinaną przez dwa pierwsze wektory główne. Pokoloruj znaczniki na rysunku według normy składowej wektora prostopadłej do płaszczyzny własnej."]},{"cell_type":"code","metadata":{"id":"MQxwO46lwJB_","colab_type":"code","colab":{}},"source":["from sklearn import decomposition\n","import matplotlib.pyplot as plt\n","\n","pca = decomposition.PCA(n_components = 10)\n","#dig_pca = pca.fit()\n","digit_no = 2\n","X = digits.data[digits.target == digit_no,:]\n","\n","def show_digit(dgt):\n"," plt.figure(1, figsize=(3, 3))\n"," plt.imshow(dgt.reshape(8, 8), cmap=plt.cm.gray_r, interpolation='nearest')\n"," plt.show()\n","\n","show_digit(X[1,:])\n","\n","pca.fit(X)\n","\n","print(\"explained variance ratio: \", 0) # do modyfikacji\n","\n","pcatr = pca.transform(digits.data)\n","zs = np.linalg.norm(digits.data, axis=1) # do modyfikacji\n","\n","plt.figure(2, figsize = (10, 10))\n","other_digits = digits.target != digit_no\n","plt.scatter(pcatr[other_digits,0], pcatr[other_digits,1], c = zs[other_digits])\n","\n","plt.scatter(pcatr[digits.target == digit_no,0], pcatr[digits.target == digit_no,1], c = zs[digits.target == digit_no], marker = 'x')\n","plt.colorbar()\n","plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"m_v_q_-KlfL2","colab_type":"text"},"source":["## Nieliniowa redukcja wymiarowości\n","\n","UMAP zakłada, że dane leżą mniej więcej w jednej dwuwymiarowej przestrzeni. Dopasowanie polega na znalezieniu tej przestrzeni, co umożliwia potem na przykład wizualizację danych.\n","\n","Porównaj metrykę _accuracy_ dla klasyfikatora SVC z kernelem RBF przed i po transformacji z użyciem UMAP za pomocą 5-krotnej walidacji krzyżowej."]},{"cell_type":"code","metadata":{"id":"q2009N2NlltG","colab_type":"code","colab":{}},"source":["import umap\n","from sklearn.svm import SVC\n","from sklearn import metrics\n","\n","reducer = umap.UMAP(random_state=42)\n","\n","def svm_test():\n"," X_train, X_test, y_train, y_test = train_test_split(\n"," digits.data, digits.target, test_size=0.2, random_state=421, shuffle=True)\n"," \n"," reducer.fit(X_train)\n","\n"," embedding = reducer.transform(X_test)\n","\n"," plt.figure(2, (10, 10))\n"," plt.scatter(embedding[:, 0], embedding[:, 1], c=y_test, cmap='Spectral', s=5)\n"," plt.gca().set_aspect('equal', 'datalim')\n"," plt.colorbar(boundaries=np.arange(11)-0.5).set_ticks(np.arange(10))\n"," plt.title('Projekcja UMAP', fontsize=14);\n","\n","svm_test()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"_p21tVotgvm6","colab_type":"text"},"source":["## Pipeline z selekcją cech\n","\n","Zmodyfikuj poniższy pipeline tak, aby etap selekcji cech był opcjonalny. Zapoznaj się w tym celu z dokumentacją: https://scikit-learn.org/stable/modules/compose.html\n","\n","Czy selekcja cech poprawia MSE?\n","Spróbuj z inną wybraną metodą selekcji cech."]},{"cell_type":"code","metadata":{"id":"rP123T70g2Q_","colab_type":"code","colab":{}},"source":["from sklearn.model_selection import GridSearchCV\n","\n","from sklearn.feature_selection import SelectFromModel\n","from sklearn import linear_model\n","from sklearn.pipeline import Pipeline\n","from sklearn import tree\n","from sklearn.metrics import mean_squared_error\n","\n","boston = datasets.load_boston()\n","\n","# Wycinamy kawałek danych do testów:\n","\n","X_b_train, X_b_test, y_b_train, y_b_test = train_test_split(\n"," boston.data, boston.target, test_size=0.2, random_state=421, shuffle=True)\n","\n","pipe = Pipeline(steps=[\n"," ('feature_selection', SelectFromModel(linear_model.Lasso(alpha=0.5))),\n"," ('tree', tree.DecisionTreeRegressor()),\n","])\n","\n","param_grid = {\n"," 'tree__max_depth': [1, 2, 3, 4, None],\n"," 'tree__min_samples_leaf': [1, 2, 4, 8]\n","}\n","search = GridSearchCV(pipe, param_grid, scoring='???')\n","\n","search.fit(X_b_train, y_b_train)\n","print(search.best_params_)\n","print(search.best_score_)\n","\n","print(\"MSE train: \", mean_squared_error(y_b_train, search.predict(X_b_train)))\n","print(\"MSE test: \", mean_squared_error(y_b_test, search.predict(X_b_test)))"],"execution_count":0,"outputs":[]}]}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment