{{ message }}

Instantly share code, notes, and snippets.

# kiwamizamurai/nonpara_test.ipynb

Created Jan 22, 2019
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Nonparametric test\n", "- http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/index.html\n", "- http://hs-www.hyogo-dai.ac.jp/~kawano/HStat/?plugin=cssj&page=2009%2F13th%2FSign_Test\n", "- https://machinelearningmastery.com/nonparametric-statistical-significance-tests-in-python/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "今回はノンパラメトリックテストを行う。 \n", "既存のライブラリと模範解答としながら自分でも手を動かす。" ] }, { "cell_type": "code", "execution_count": 314, "metadata": {}, "outputs": [], "source": [ "from numpy.random import seed\n", "from numpy.random import randn\n", "seed(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Mann-Whitney U Test\n", "データ数は20程度まで" ] }, { "cell_type": "code", "execution_count": 326, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "
AB
\n", " \n", " \n", " \n", " \n", " \n", " \n", "
073
156
264
342
4121
indexABdiff
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
072025-5
10857510
224050-10
35756510
46554015
51705020
63654025
74802060
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "" ], "text/plain": [ " index A B diff\n", "0 7 20 25 -5\n", "1 0 85 75 10\n", "2 2 40 50 -10\n", "3 5 75 65 10\n", "4 6 55 40 15\n", "5 1 70 50 20\n", "6 3 65 40 25\n", "7 4 80 20 60" ] }, "execution_count": 301, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_s = df.reindex(df[\"diff\"].abs().sort_values().index)\n", "df_s = df_s.reset_index()\n", "df_s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "絶対値でかぶりを確認する、なぜならrankをassignするときに平均値になるから" ] }, { "cell_type": "code", "execution_count": 302, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 1\n", "10 3\n", "15 1\n", "20 1\n", "25 1\n", "60 1\n", "Name: diff, dtype: int64\n", "[1 4 5 6 7 8]\n" ] } ], "source": [ "vs = abs(df_s[\"diff\"]).value_counts()\n", "vs = vs.sort_index()\n", "print(vs)\n", "print(np.cumsum([vs]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "rankのcolumnを作ろう" ] }, { "cell_type": "code", "execution_count": 303, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "
indexABdiffrank
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
072025-51
108575103
224050-103
357565103
465540155
517050206
636540257
748020608