Skip to content

Instantly share code, notes, and snippets.

@hotchpotch
Created March 30, 2022 01:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hotchpotch/e54962ed4931c621da5b7ecd24f27a9f to your computer and use it in GitHub Desktop.
Save hotchpotch/e54962ed4931c621da5b7ecd24f27a9f to your computer and use it in GitHub Desktop.
QR分解で最小二乗問題の近似解を求める
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "QR分解で最小二乗問題の近似解を求める",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyNtOQoUWnST8t3pCa5fqSwj",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/hotchpotch/e54962ed4931c621da5b7ecd24f27a9f/qr.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "0MAIY7ouQ_7M"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import numpy.linalg as npl\n",
"\n",
"from sklearn.datasets import load_iris\n",
"iris = load_iris()"
]
},
{
"cell_type": "code",
"source": [
"print(iris['DESCR'])\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ox2wBqNrdOq6",
"outputId": "7f7cd1a9-dd22-431d-eb0b-5d1f082a2e90"
},
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
".. _iris_dataset:\n",
"\n",
"Iris plants dataset\n",
"--------------------\n",
"\n",
"**Data Set Characteristics:**\n",
"\n",
" :Number of Instances: 150 (50 in each of three classes)\n",
" :Number of Attributes: 4 numeric, predictive attributes and the class\n",
" :Attribute Information:\n",
" - sepal length in cm\n",
" - sepal width in cm\n",
" - petal length in cm\n",
" - petal width in cm\n",
" - class:\n",
" - Iris-Setosa\n",
" - Iris-Versicolour\n",
" - Iris-Virginica\n",
" \n",
" :Summary Statistics:\n",
"\n",
" ============== ==== ==== ======= ===== ====================\n",
" Min Max Mean SD Class Correlation\n",
" ============== ==== ==== ======= ===== ====================\n",
" sepal length: 4.3 7.9 5.84 0.83 0.7826\n",
" sepal width: 2.0 4.4 3.05 0.43 -0.4194\n",
" petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)\n",
" petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)\n",
" ============== ==== ==== ======= ===== ====================\n",
"\n",
" :Missing Attribute Values: None\n",
" :Class Distribution: 33.3% for each of 3 classes.\n",
" :Creator: R.A. Fisher\n",
" :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n",
" :Date: July, 1988\n",
"\n",
"The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken\n",
"from Fisher's paper. Note that it's the same as in R, but not as in the UCI\n",
"Machine Learning Repository, which has two wrong data points.\n",
"\n",
"This is perhaps the best known database to be found in the\n",
"pattern recognition literature. Fisher's paper is a classic in the field and\n",
"is referenced frequently to this day. (See Duda & Hart, for example.) The\n",
"data set contains 3 classes of 50 instances each, where each class refers to a\n",
"type of iris plant. One class is linearly separable from the other 2; the\n",
"latter are NOT linearly separable from each other.\n",
"\n",
".. topic:: References\n",
"\n",
" - Fisher, R.A. \"The use of multiple measurements in taxonomic problems\"\n",
" Annual Eugenics, 7, Part II, 179-188 (1936); also in \"Contributions to\n",
" Mathematical Statistics\" (John Wiley, NY, 1950).\n",
" - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.\n",
" (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.\n",
" - Dasarathy, B.V. (1980) \"Nosing Around the Neighborhood: A New System\n",
" Structure and Classification Rule for Recognition in Partially Exposed\n",
" Environments\". IEEE Transactions on Pattern Analysis and Machine\n",
" Intelligence, Vol. PAMI-2, No. 1, 67-71.\n",
" - Gates, G.W. (1972) \"The Reduced Nearest Neighbor Rule\". IEEE Transactions\n",
" on Information Theory, May 1972, 431-433.\n",
" - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al\"s AUTOCLASS II\n",
" conceptual clustering system finds 3 classes in the data.\n",
" - Many, many more ...\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"data = iris['data']\n",
"data.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "uiRQeqbDRP4y",
"outputId": "0bc6231f-bc4c-484d-c7c4-60a843c865c5"
},
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(150, 4)"
]
},
"metadata": {},
"execution_count": 10
}
]
},
{
"cell_type": "code",
"source": [
"# X はガクの長さと幅、yは花弁の長さ\n",
"X = data[:, 0:2]\n",
"X.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "OWavIdD7Rg1_",
"outputId": "12121ae5-3bfd-4955-88db-0dfccab52aaf"
},
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(150, 2)"
]
},
"metadata": {},
"execution_count": 11
}
]
},
{
"cell_type": "code",
"source": [
"y = data[:, 2:3]\n",
"y.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zoUzHNhyRmDl",
"outputId": "cfd87fd2-49a8-47f7-b779-c5c2301bbd00"
},
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(150, 1)"
]
},
"metadata": {},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"source": [
"X[0:5], y[0:5]"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TBMJhkgaTr1A",
"outputId": "7cb580a7-1cd5-48fb-a6e4-8b69dc313ce8"
},
"execution_count": 13,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([[5.1, 3.5],\n",
" [4.9, 3. ],\n",
" [4.7, 3.2],\n",
" [4.6, 3.1],\n",
" [5. , 3.6]]), array([[1.4],\n",
" [1.4],\n",
" [1.3],\n",
" [1.5],\n",
" [1.4]]))"
]
},
"metadata": {},
"execution_count": 13
}
]
},
{
"cell_type": "code",
"source": [
"# sklearn の線形回帰で係数切片を求める\n",
"from sklearn.linear_model import LinearRegression\n",
"lr = LinearRegression()\n",
"lr.fit(X, y)\n",
"print(lr.coef_, lr.intercept_)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7nPdcCznRDgy",
"outputId": "44363f8c-d959-42f8-c3dd-9ad2f1bd7997"
},
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[[ 1.77559255 -1.33862329]] [-2.52476151]\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# QR 分解で最小二乗問題の近似解を求める\n",
"\n",
"# 切片となる1の列を追加\n",
"X_ = np.concatenate([X, np.ones((X.shape[0], 1))], axis=1)\n",
"print(X_.shape)\n",
"print(X[0:2], X_[0:2])\n",
"\n",
"Q,R = npl.qr(X_)\n",
"npl.inv(R) @ (Q.T @ y)\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JKi4vBk6SKbn",
"outputId": "93dbf5f4-b9e1-4d1e-8e7b-9da08efdec86"
},
"execution_count": 15,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(150, 3)\n",
"[[5.1 3.5]\n",
" [4.9 3. ]] [[5.1 3.5 1. ]\n",
" [4.9 3. 1. ]]\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[ 1.77559255],\n",
" [-1.33862329],\n",
" [-2.52476151]])"
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "code",
"source": [
"# 擬似逆行列から求める\n",
"\n",
"npl.pinv(X_) @ y"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "65jKFSFHWLjh",
"outputId": "bf5cadba-da06-4500-f010-a650b9e6c6cd"
},
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[ 1.77559255],\n",
" [-1.33862329],\n",
" [-2.52476151]])"
]
},
"metadata": {},
"execution_count": 16
}
]
},
{
"cell_type": "markdown",
"source": [
"# 行列最小二乗問題の近似解"
],
"metadata": {
"id": "vYBBCJ5Jd3l8"
}
},
{
"cell_type": "code",
"source": [
"# y をガクの長さだけでなく、幅も入れて求める。\n",
"y = data[:, 2:4]\n",
"y.shape\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "UcktmtNNdXBQ",
"outputId": "b7390d04-05a6-4079-c3a6-f19b48ab597a"
},
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(150, 2)"
]
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "code",
"source": [
"lr = LinearRegression()\n",
"lr.fit(X, y)\n",
"print(lr.coef_, lr.intercept_)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1PWTidx5dmUI",
"outputId": "067631d3-3324-4c89-f382-0fa5b28fb5dd"
},
"execution_count": 21,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[[ 1.77559255 -1.33862329]\n",
" [ 0.723292 -0.47872132]] [-2.52476151 -1.56349227]\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"Q,R = npl.qr(X_)\n",
"npl.inv(R) @ (Q.T @ y)\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "-Ttdp4zodri6",
"outputId": "59134228-5f48-4be9-dd6d-e403bf487a0f"
},
"execution_count": 22,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(150, 3)\n",
"[[5.1 3.5]\n",
" [4.9 3. ]] [[5.1 3.5 1. ]\n",
" [4.9 3. 1. ]]\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[ 1.77559255, 0.723292 ],\n",
" [-1.33862329, -0.47872132],\n",
" [-2.52476151, -1.56349227]])"
]
},
"metadata": {},
"execution_count": 22
}
]
},
{
"cell_type": "code",
"source": [
"npl.pinv(X_) @ y"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZJ6RXhNadt9N",
"outputId": "3ddac11d-8899-4825-da5a-222daf379feb"
},
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[ 1.77559255, 0.723292 ],\n",
" [-1.33862329, -0.47872132],\n",
" [-2.52476151, -1.56349227]])"
]
},
"metadata": {},
"execution_count": 23
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment