Skip to content

Instantly share code, notes, and snippets.

@Elijas
Created March 21, 2021 18:20
Show Gist options
  • Save Elijas/750949f00f9fe232cbc5622ef6e77792 to your computer and use it in GitHub Desktop.
Save Elijas/750949f00f9fe232cbc5622ef6e77792 to your computer and use it in GitHub Desktop.
Sprendimų medis
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "massive-static",
"metadata": {},
"source": [
"Elijas Dapšauskas TSf-17"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "provincial-reform",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "occupied-slovenia",
"metadata": {},
"outputs": [],
"source": [
"# Duomenys is skaidriu pavyzdzio (pasitikrinimui)\n",
"pvz_duomenys = \"\"\"0 0 0 0\n",
"0 1 1 0\n",
"1 1 1 1\n",
"1 1 0 0\n",
"1 1 1 1\"\"\"\n",
"pvz_stulp_pavad = list('SILY')\n",
"\n",
"# Duomenys is uzduoties\n",
"duomenys = \"\"\"1 1 1 1 0\n",
"0 1 1 0 1\n",
"0 0 1 1 1\n",
"1 0 1 0 1\n",
"0 1 0 0 0\n",
"1 1 1 1 0\n",
"0 0 1 1 0\n",
"0 1 0 0 1\n",
"1 0 1 0 0\"\"\"\n",
"stulp_pavad = ['Namuose', 'Lietus', 'PalaikymoKom', 'Lyderiai', 'Pergalė']"
]
},
{
"cell_type": "markdown",
"id": "higher-group",
"metadata": {},
"source": [
"Įkeliame duomenis"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "later-approach",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Namuose</th>\n",
" <th>Lietus</th>\n",
" <th>PalaikymoKom</th>\n",
" <th>Lyderiai</th>\n",
" <th>Pergalė</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Namuose Lietus PalaikymoKom Lyderiai Pergalė\n",
"0 1.0 1.0 1.0 1.0 0.0\n",
"1 0.0 1.0 1.0 0.0 1.0\n",
"2 0.0 0.0 1.0 1.0 1.0\n",
"3 1.0 0.0 1.0 0.0 1.0\n",
"4 0.0 1.0 0.0 0.0 0.0\n",
"5 1.0 1.0 1.0 1.0 0.0\n",
"6 0.0 0.0 1.0 1.0 0.0\n",
"7 0.0 1.0 0.0 0.0 1.0\n",
"8 1.0 0.0 1.0 0.0 0.0"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def nuskaityti_duomenis(duomenys, stulp_pavad):\n",
" df = pd.DataFrame([k.split() for k in duomenys.split('\\n')],\n",
" columns=stulp_pavad,\n",
" dtype=float)\n",
" Y = df.iloc[:,len(df.columns)-1] # pask. stulpelis\n",
" X = df.iloc[:,:len(df.columns)-1] # kiti stulpeliai\n",
" return df, X, Y\n",
"\n",
"# Pavyzdys\n",
"XY, X, Y = nuskaityti_duomenis(duomenys, stulp_pavad)\n",
"XY"
]
},
{
"cell_type": "markdown",
"id": "welcome-superior",
"metadata": {},
"source": [
"Apskaičiuojame gini koeficientus ir surikiuojame juos didėjančia tvarka. Pirmoji šaka skiriama pagal ties mažiausią koeficientą turinčiu stulpeliu ir t.t."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "dimensional-elimination",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Namuose 0.433333\n",
"Lyderiai 0.433333\n",
"Lietus 0.488889\n",
"PalaikymoKom 0.492063\n",
"dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def apskaiciuoti_gini(x):\n",
" Pr_s0 = sum(x==0) / len(x)\n",
" Pr_s1 = sum(x==1) / len(x)\n",
" Pr_y0_s0 = sum(Y.loc[x==0]==0) / sum(x==0)\n",
" Pr_y1_s0 = sum(Y.loc[x==0]==1) / sum(x==0)\n",
" Pr_y0_s1 = sum(Y.loc[x==1]==0) / sum(x==1)\n",
" Pr_y1_s1 = sum(Y.loc[x==1]==1) / sum(x==1)\n",
" gini_impurity_s0 = 1 - Pr_y0_s0**2 - Pr_y1_s0**2\n",
" gini_impurity_s1 = 1 - Pr_y0_s1**2 - Pr_y1_s1**2\n",
" gini_impurity_s0, gini_impurity_s1\n",
" gini_s = Pr_s0 * gini_impurity_s0 + Pr_s1 * gini_impurity_s1\n",
" return(gini_s)\n",
"\n",
"# Pavyzdys\n",
"X.apply(apskaiciuoti_gini).sort_values()"
]
},
{
"cell_type": "markdown",
"id": "extreme-functionality",
"metadata": {},
"source": [
"Pirmiausia sukursime funkciją kuri keliauja per grafą."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "adult-orlando",
"metadata": {},
"outputs": [],
"source": [
"def keliauti_per_grafa(stulpeliai, ar_baigti_paieska, _istorija=None):\n",
" _istorija = _istorija or []\n",
" if ar_baigti_paieska(_istorija) or not len(stulpeliai):\n",
" return \n",
" for v in (0, 1):\n",
" nauja_istorija = _istorija.copy()\n",
" nauja_istorija.append((stulpeliai[0], v))\n",
" keliauti_per_grafa(stulpeliai[1:], \n",
" ar_baigti_paieska, \n",
" nauja_istorija)\n",
"\n",
"## Pavyzdys\n",
"# def ar_baigti_paieska(istorija):\n",
"# if len(istorija) == 3:\n",
"# print(istorija)\n",
"# return True\n",
"# return False\n",
"#keliauti_per_grafa(['A', 'B', 'C'], ar_baigti_paieska)"
]
},
{
"cell_type": "markdown",
"id": "complicated-particle",
"metadata": {},
"source": [
"Tada sukursime funkciją, kuri tikrina, ar tam tikras eilučių filtravimas leidžia pasiekti homogenišką Y stulpelį (t.y. atfiltravus visi Y=1 arba visi Y=0)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "spanish-alignment",
"metadata": {},
"outputs": [],
"source": [
"def spausdinti_atsakyma(filtrai, y0, tikimybe=None):\n",
" salyga = ' ir '.join(\n",
" f'[{k} = {\"Taip\" if v else \"Ne\"}]' for k, v in filtrai)\n",
" txt = (\n",
" f'Jei {salyga}:\\n '\n",
" f'tada Y = {int(y0) if tikimybe != 0.5 else \"0 arba 1\"}'\n",
" )\n",
" if tikimybe:\n",
" txt = f\"{txt}\\n (tikimybė: {100*tikimybe:.2f}%)\"\n",
" print(f'{txt}\\n')\n",
"\n",
"def filtruoti(df, filtrai):\n",
" for stulp, reiksme in filtrai:\n",
" df = df.loc[df[stulp] == reiksme]\n",
" return df\n",
"\n",
"def trumpiausia_filtru_israiska(XY, filtrai):\n",
" for i in reversed(range(len(filtrai))):\n",
" if (filtruoti(XY, filtrai[:i]).shape[0] \n",
" != filtruoti(XY, filtrai[:i+1]).shape[0]):\n",
" return filtrai[:i+1]\n",
"\n",
"def ar_Y_homogeniskas(XY, filtrai, viso_stulpeliu):\n",
" Y = filtruoti(XY, filtrai).iloc[:,-1]\n",
" if not len(Y):\n",
" return False\n",
" y0 = Y.iloc[0]\n",
" if not (y0 == Y).all():\n",
" if len(filtrai) == viso_stulpeliu:\n",
" spausdinti_atsakyma(\n",
" trumpiausia_filtru_israiska(XY, filtrai), \n",
" y0, np.mean(y0 == Y)\n",
" )\n",
" return False\n",
" spausdinti_atsakyma(trumpiausia_filtru_israiska(XY, filtrai), y0)\n",
" return True\n",
"\n",
"## Pavyzdys\n",
"#filtrai = [('S', 1), (('I'), 1)]\n",
"#filtruoti(XY, filtrai), ar_Y_homogeniskas(XY, filtrai)"
]
},
{
"cell_type": "markdown",
"id": "christian-dutch",
"metadata": {},
"source": [
"Apjungiame praeitas dvi funkcijas."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "featured-worcester",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Jei [Namuose = Ne] ir [Lyderiai = Ne] ir [Lietus = Taip] ir [PalaikymoKom = Ne]:\n",
" tada Y = 0 arba 1\n",
" (tikimybė: 50.00%)\n",
"\n",
"Jei [Namuose = Ne] ir [Lyderiai = Ne] ir [Lietus = Taip] ir [PalaikymoKom = Taip]:\n",
" tada Y = 1\n",
"\n",
"Jei [Namuose = Ne] ir [Lyderiai = Taip]:\n",
" tada Y = 0 arba 1\n",
" (tikimybė: 50.00%)\n",
"\n",
"Jei [Namuose = Taip] ir [Lyderiai = Ne]:\n",
" tada Y = 0 arba 1\n",
" (tikimybė: 50.00%)\n",
"\n",
"Jei [Namuose = Taip] ir [Lyderiai = Taip]:\n",
" tada Y = 0\n",
"\n"
]
}
],
"source": [
"XY, X, Y = nuskaityti_duomenis(duomenys, stulp_pavad)\n",
"stulpeliu_pav = X.apply(apskaiciuoti_gini).sort_values().index\n",
"keliauti_per_grafa(\n",
" stulpeliu_pav, \n",
" lambda istorija: ar_Y_homogeniskas(XY, istorija, len(stulpeliu_pav))\n",
")"
]
},
{
"attachments": {
"image.png": {
"image/png": ""
}
},
"cell_type": "markdown",
"id": "ordinary-spoke",
"metadata": {},
"source": [
"![image.png](attachment:image.png)"
]
},
{
"cell_type": "markdown",
"id": "extreme-logan",
"metadata": {},
"source": [
"Įsitikiname, kad kodas veikia teisingai išspręsdami pavyzdinį uždavinį:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "narrative-right",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Jei [S = Ne]:\n",
" tada Y = 0\n",
"\n",
"Jei [S = Taip] ir [L = Ne]:\n",
" tada Y = 0\n",
"\n",
"Jei [S = Taip] ir [L = Taip]:\n",
" tada Y = 1\n",
"\n"
]
}
],
"source": [
"XY, X, Y = nuskaityti_duomenis(pvz_duomenys, pvz_stulp_pavad)\n",
"stulpeliu_pav = X.apply(apskaiciuoti_gini).sort_values().index\n",
"keliauti_per_grafa(\n",
" stulpeliu_pav, \n",
" lambda istorija: ar_Y_homogeniskas(XY, istorija, len(stulpeliu_pav))\n",
")"
]
},
{
"attachments": {
"image.png": {
"image/png": ""
}
},
"cell_type": "markdown",
"id": "spread-greece",
"metadata": {},
"source": [
"![image.png](attachment:image.png)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment