Created Jan 22, 2019
 { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Nonparametric test\n", "- http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/index.html\n", "- http://hs-www.hyogo-dai.ac.jp/~kawano/HStat/?plugin=cssj&page=2009%2F13th%2FSign_Test\n", "- https://machinelearningmastery.com/nonparametric-statistical-significance-tests-in-python/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "今回はノンパラメトリックテストを行う。 \n", "既存のライブラリと模範解答としながら自分でも手を動かす。" ] }, { "cell_type": "code", "execution_count": 314, "metadata": {}, "outputs": [], "source": [ "from numpy.random import seed\n", "from numpy.random import randn\n", "seed(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Mann-Whitney U Test\n", "データ数は20程度まで" ] }, { "cell_type": "code", "execution_count": 326, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "
AB
\n", " \n", " \n", " \n", " \n", " \n", " \n", "
073
156
264
342
4121
indexABdiff
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
072025-5
10857510
224050-10
35756510
46554015
51705020
63654025
74802060
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "" ], "text/plain": [ " index A B diff\n", "0 7 20 25 -5\n", "1 0 85 75 10\n", "2 2 40 50 -10\n", "3 5 75 65 10\n", "4 6 55 40 15\n", "5 1 70 50 20\n", "6 3 65 40 25\n", "7 4 80 20 60" ] }, "execution_count": 301, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_s = df.reindex(df[\"diff\"].abs().sort_values().index)\n", "df_s = df_s.reset_index()\n", "df_s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "絶対値でかぶりを確認する、なぜならrankをassignするときに平均値になるから" ] }, { "cell_type": "code", "execution_count": 302, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 1\n", "10 3\n", "15 1\n", "20 1\n", "25 1\n", "60 1\n", "Name: diff, dtype: int64\n", "[1 4 5 6 7 8]\n" ] } ], "source": [ "vs = abs(df_s[\"diff\"]).value_counts()\n", "vs = vs.sort_index()\n", "print(vs)\n", "print(np.cumsum([vs]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "rankのcolumnを作ろう" ] }, { "cell_type": "code", "execution_count": 303, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "
indexABdiffrank
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
072025-51
108575103
224050-103
357565103
465540155
517050206
636540257
748020608