Skip to content

Instantly share code, notes, and snippets.

@HyeongWookKim
Created June 4, 2020 14:14
Show Gist options
  • Save HyeongWookKim/2cc59fba5701108f96e044c336b586e1 to your computer and use it in GitHub Desktop.
Save HyeongWookKim/2cc59fba5701108f96e044c336b586e1 to your computer and use it in GitHub Desktop.
[Pandas 기초] from "판다스 10분 완성 - 데잇걸즈 2"
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 판다스(Pandas) 기초\n",
"<참고자료 및 코드 출처>\n",
"1. DANDYRILLA님의 github \"판다스(pandas) 기본 사용법 익히기\"\n",
"2. 파이썬 머신러닝 완벽 가이드(권철민 저)\n",
"3. 김도형의 데이터 사이언스 스쿨\n",
"4. R, Python 분석과 프로그래밍의 친구(by R Friend)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**pandas 외에도 배열 구조나 랜덤 값 생성 등의 기능을 활용하기 위한 numpy 와 그래프를 그리기 위한 matplotlib 패키지들도 함께 import**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:00:58.751811Z",
"start_time": "2020-05-21T06:00:58.023935Z"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 데이터 object 생성하기\n",
"- 데이터 object는 \"데이터를 담고 있는 그릇\"이라고 생각하면 이해하기 쉽다\n",
"- 판다스에서 자주 사용하게 될 데이터 object는 \"Series\" 와 \"DataFrame\"이 있다\n",
"- Series는 데이터를 1차원 배열(칼럼이 하나)로 담고 있고, DataFrame은 데이터를 2차원 배열(칼럼이 여러 개)로 담고 있다\n",
"- DataFrame은 여러 개의 Series로 이루어졌다"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Series는 다음과 같이 값의 리스트를 넘겨줘서 만들 수 있다\n",
"- index의 경우, 0 부터 시작해서 1씩 증가하는 정수 index가 사용된다"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:02.757601Z",
"start_time": "2020-05-21T06:01:02.748625Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 1.0\n",
"1 3.0\n",
"2 5.0\n",
"3 NaN\n",
"4 6.0\n",
"5 8.0\n",
"dtype: float64"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = pd.Series([1, 3, 5, np.nan, 6, 8]) # np.nan 은 \"NaN(Na와 Null을 모두 표현)\"을 의미\n",
"s"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### DataFrame은 여러 형태의 데이터를 받아 생성할 수 있다\n",
"- numpy array를 받아서 DataFrame 생성이 가능하다\n",
"- pd.DataFrame() 이라는 클래스 생성자를 사용한다"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:03.359363Z",
"start_time": "2020-05-21T06:01:03.338420Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.472836</td>\n",
" <td>-0.000761</td>\n",
" <td>-1.040429</td>\n",
" <td>0.152076</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.093737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>-0.076963</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" <td>2.021500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>-1.035226</td>\n",
" <td>1.476234</td>\n",
" <td>0.109521</td>\n",
" <td>-0.435898</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-01 0.472836 -0.000761 -1.040429 0.152076\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737\n",
"2013-01-04 0.941575 -1.211694 -0.811266 -0.076963\n",
"2013-01-05 1.458312 -0.885358 -1.660743 2.021500\n",
"2013-01-06 -1.035226 1.476234 0.109521 -0.435898"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 날짜 값들을 만들기\n",
"dates = pd.date_range('20130101', periods = 6)\n",
"\n",
"# 행에 해당하는 기준인 인덱스를 index라는 인수로 전달\n",
"# 열에 해당하는 기준인 컬럼을 columns라는 인수로 전달\n",
"# <참고> np.random.randn(m, n): 평균 0, 표준편차 1의 표준정규분포 난수로 이루어진 matrix array(m, n)를 생성\n",
"df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = list('ABCD'))\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 여러 종류의 자료들이 담긴 딕셔너리(dict)를 받아서 DataFrame을 만들 수 있다\n",
" - 딕셔너리의 key 값이 열을 정의하는 column\n",
" - 행을 정의하는 index는 0 부터 시작해서 1씩 증가하는 정수 index"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:03.800274Z",
"start_time": "2020-05-21T06:01:03.783322Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>E</th>\n",
" <th>F</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>2013-01-02</td>\n",
" <td>1.0</td>\n",
" <td>3</td>\n",
" <td>test</td>\n",
" <td>foo</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.0</td>\n",
" <td>2013-01-02</td>\n",
" <td>1.0</td>\n",
" <td>3</td>\n",
" <td>train</td>\n",
" <td>foo</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.0</td>\n",
" <td>2013-01-02</td>\n",
" <td>1.0</td>\n",
" <td>3</td>\n",
" <td>test</td>\n",
" <td>foo</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.0</td>\n",
" <td>2013-01-02</td>\n",
" <td>1.0</td>\n",
" <td>3</td>\n",
" <td>train</td>\n",
" <td>foo</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D E F\n",
"0 1.0 2013-01-02 1.0 3 test foo\n",
"1 1.0 2013-01-02 1.0 3 train foo\n",
"2 1.0 2013-01-02 1.0 3 test foo\n",
"3 1.0 2013-01-02 1.0 3 train foo"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2 = pd.DataFrame({'A': 1.,\n",
" 'B': pd.Timestamp('20130102'), # pd.Timestamp() 는 주어진 값을 'yyyy-mm-dd hh:mm:ss' 형태로 반환\n",
" 'C': pd.Series(1, index = list(range(4)), dtype = 'float32'),\n",
" 'D': np.array([3] * 4, dtype = 'int32'),\n",
" 'E': pd.Categorical(['test', 'train', 'test', 'train']),\n",
" 'F': 'foo'})\n",
"df2"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:03.973351Z",
"start_time": "2020-05-21T06:01:03.967369Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"A float64\n",
"B datetime64[ns]\n",
"C float32\n",
"D int32\n",
"E category\n",
"F object\n",
"dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 위에서 만든 df2의 데이터 타입 확인\n",
"df2.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**<참고> 주피터에서 'df2.'를 입력하고 탭을 누르면, dtypes 외에 다른 속성들을 확인할 수 있다**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 데이터 확인하기\n",
"- 데이터 프레임에 있는 자료들 중, 몇 개의 자료들만 확인해보고 싶다면 .head() 와 .tail() 메소드를 사용하면 된다"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:04.536026Z",
"start_time": "2020-05-21T06:01:04.519074Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.472836</td>\n",
" <td>-0.000761</td>\n",
" <td>-1.040429</td>\n",
" <td>0.152076</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.093737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>-0.076963</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" <td>2.021500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-01 0.472836 -0.000761 -1.040429 0.152076\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737\n",
"2013-01-04 0.941575 -1.211694 -0.811266 -0.076963\n",
"2013-01-05 1.458312 -0.885358 -1.660743 2.021500"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 괄호 안에 아무런 숫자도 안 넣으면, default = 5가 적용됨\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:04.725020Z",
"start_time": "2020-05-21T06:01:04.708065Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>-0.076963</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" <td>2.021500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>-1.035226</td>\n",
" <td>1.476234</td>\n",
" <td>0.109521</td>\n",
" <td>-0.435898</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-04 0.941575 -1.211694 -0.811266 -0.076963\n",
"2013-01-05 1.458312 -0.885358 -1.660743 2.021500\n",
"2013-01-06 -1.035226 1.476234 0.109521 -0.435898"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail(3)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:04.906534Z",
"start_time": "2020-05-21T06:01:04.899552Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n",
" '2013-01-05', '2013-01-06'],\n",
" dtype='datetime64[ns]', freq='D')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 데이터 프레임의 인덱스 확인: .index\n",
"df.index"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:05.087050Z",
"start_time": "2020-05-21T06:01:05.079078Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['A', 'B', 'C', 'D'], dtype='object')"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 데이터 프레임의 컬럼 확인: .columns\n",
"df.columns"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:05.272553Z",
"start_time": "2020-05-21T06:01:05.265580Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 4.72835714e-01, -7.60948916e-04, -1.04042872e+00,\n",
" 1.52076047e-01],\n",
" [ 4.61396680e-01, -6.69242026e-01, -2.25293813e-01,\n",
" -9.52451303e-01],\n",
" [ 3.24977169e-01, 1.75157096e-01, -8.39796781e-01,\n",
" -9.37373381e-02],\n",
" [ 9.41575171e-01, -1.21169435e+00, -8.11265968e-01,\n",
" -7.69632027e-02],\n",
" [ 1.45831196e+00, -8.85357733e-01, -1.66074277e+00,\n",
" 2.02150000e+00],\n",
" [-1.03522590e+00, 1.47623394e+00, 1.09521461e-01,\n",
" -4.35897845e-01]])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 데이터 프레임 안에 들어있는 numpy 데이터 확인: .values\n",
"df.values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 데이터 프레임의 기초통계량 확인: .describe() 메소드 사용\n",
" - count\n",
" - mean\n",
" - std\n",
" - min\n",
" - 4분위수(25%, 50%, 75%)\n",
" - max"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:05.662513Z",
"start_time": "2020-05-21T06:01:05.629600Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>6.000000</td>\n",
" <td>6.000000</td>\n",
" <td>6.000000</td>\n",
" <td>6.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.437312</td>\n",
" <td>-0.185944</td>\n",
" <td>-0.744668</td>\n",
" <td>0.102421</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.834212</td>\n",
" <td>0.969788</td>\n",
" <td>0.622823</td>\n",
" <td>1.015729</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>-1.035226</td>\n",
" <td>-1.211694</td>\n",
" <td>-1.660743</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>0.359082</td>\n",
" <td>-0.831329</td>\n",
" <td>-0.990271</td>\n",
" <td>-0.350358</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>0.467116</td>\n",
" <td>-0.335001</td>\n",
" <td>-0.825531</td>\n",
" <td>-0.085350</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>0.824390</td>\n",
" <td>0.131178</td>\n",
" <td>-0.371787</td>\n",
" <td>0.094816</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>1.458312</td>\n",
" <td>1.476234</td>\n",
" <td>0.109521</td>\n",
" <td>2.021500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"count 6.000000 6.000000 6.000000 6.000000\n",
"mean 0.437312 -0.185944 -0.744668 0.102421\n",
"std 0.834212 0.969788 0.622823 1.015729\n",
"min -1.035226 -1.211694 -1.660743 -0.952451\n",
"25% 0.359082 -0.831329 -0.990271 -0.350358\n",
"50% 0.467116 -0.335001 -0.825531 -0.085350\n",
"75% 0.824390 0.131178 -0.371787 0.094816\n",
"max 1.458312 1.476234 0.109521 2.021500"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:05.829170Z",
"start_time": "2020-05-21T06:01:05.815223Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>2013-01-01</th>\n",
" <th>2013-01-02</th>\n",
" <th>2013-01-03</th>\n",
" <th>2013-01-04</th>\n",
" <th>2013-01-05</th>\n",
" <th>2013-01-06</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>A</th>\n",
" <td>0.472836</td>\n",
" <td>0.461397</td>\n",
" <td>0.324977</td>\n",
" <td>0.941575</td>\n",
" <td>1.458312</td>\n",
" <td>-1.035226</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>-0.000761</td>\n",
" <td>-0.669242</td>\n",
" <td>0.175157</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.885358</td>\n",
" <td>1.476234</td>\n",
" </tr>\n",
" <tr>\n",
" <th>C</th>\n",
" <td>-1.040429</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.811266</td>\n",
" <td>-1.660743</td>\n",
" <td>0.109521</td>\n",
" </tr>\n",
" <tr>\n",
" <th>D</th>\n",
" <td>0.152076</td>\n",
" <td>-0.952451</td>\n",
" <td>-0.093737</td>\n",
" <td>-0.076963</td>\n",
" <td>2.021500</td>\n",
" <td>-0.435898</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06\n",
"A 0.472836 0.461397 0.324977 0.941575 1.458312 -1.035226\n",
"B -0.000761 -0.669242 0.175157 -1.211694 -0.885358 1.476234\n",
"C -1.040429 -0.225294 -0.839797 -0.811266 -1.660743 0.109521\n",
"D 0.152076 -0.952451 -0.093737 -0.076963 2.021500 -0.435898"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 데이터 프레임을 transpose 시키기: .T 속성(메소드가 아니라 속성)\n",
"df.T"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:06.041650Z",
"start_time": "2020-05-21T06:01:05.960868Z"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "'DataFrame' object is not callable",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-13-4dd480cba40b>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# .T를 \"메소드\"로 착각하면 안된다!! .T는 \"속성\"이다!!\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;31m# 따라서 .T()로 호출하는 경우 에러가 발생한다\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m: 'DataFrame' object is not callable"
]
}
],
"source": [
"# .T를 \"메소드\"로 착각하면 안된다!! .T는 \"속성\"이다!!\n",
"# 따라서 .T()로 호출하는 경우 에러가 발생한다\n",
"df.T()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 행과 열 이름을 정렬하기: .sort_index() 메소드\n",
" - axis = 0 : index를 기준으로 정렬\n",
" - axis = 1 : column을 기준으로 정렬\n",
" - ascending = True : 오름차순 정렬\n",
" - ascending = False : 내림차순 정렬"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:06.341965Z",
"start_time": "2020-05-21T06:01:06.326009Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>D</th>\n",
" <th>C</th>\n",
" <th>B</th>\n",
" <th>A</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.152076</td>\n",
" <td>-1.040429</td>\n",
" <td>-0.000761</td>\n",
" <td>0.472836</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>-0.952451</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.669242</td>\n",
" <td>0.461397</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>-0.093737</td>\n",
" <td>-0.839797</td>\n",
" <td>0.175157</td>\n",
" <td>0.324977</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>-0.076963</td>\n",
" <td>-0.811266</td>\n",
" <td>-1.211694</td>\n",
" <td>0.941575</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>2.021500</td>\n",
" <td>-1.660743</td>\n",
" <td>-0.885358</td>\n",
" <td>1.458312</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>-0.435898</td>\n",
" <td>0.109521</td>\n",
" <td>1.476234</td>\n",
" <td>-1.035226</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" D C B A\n",
"2013-01-01 0.152076 -1.040429 -0.000761 0.472836\n",
"2013-01-02 -0.952451 -0.225294 -0.669242 0.461397\n",
"2013-01-03 -0.093737 -0.839797 0.175157 0.324977\n",
"2013-01-04 -0.076963 -0.811266 -1.211694 0.941575\n",
"2013-01-05 2.021500 -1.660743 -0.885358 1.458312\n",
"2013-01-06 -0.435898 0.109521 1.476234 -1.035226"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_index(axis = 1, ascending = False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 데이터 프레임 내부에 있는 값으로 정렬하기: .sort_values() 메소드"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:06.708965Z",
"start_time": "2020-05-21T06:01:06.694006Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>-0.076963</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" <td>2.021500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.472836</td>\n",
" <td>-0.000761</td>\n",
" <td>-1.040429</td>\n",
" <td>0.152076</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.093737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>-1.035226</td>\n",
" <td>1.476234</td>\n",
" <td>0.109521</td>\n",
" <td>-0.435898</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-04 0.941575 -1.211694 -0.811266 -0.076963\n",
"2013-01-05 1.458312 -0.885358 -1.660743 2.021500\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451\n",
"2013-01-01 0.472836 -0.000761 -1.040429 0.152076\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737\n",
"2013-01-06 -1.035226 1.476234 0.109521 -0.435898"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 'B' column에 대해 정렬\n",
"df.sort_values(by = \"B\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 데이터 선택하기\n",
"- [] 슬라이싱 사용"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:07.402471Z",
"start_time": "2020-05-21T06:01:07.394494Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2013-01-01 0.472836\n",
"2013-01-02 0.461397\n",
"2013-01-03 0.324977\n",
"2013-01-04 0.941575\n",
"2013-01-05 1.458312\n",
"2013-01-06 -1.035226\n",
"Freq: D, Name: A, dtype: float64"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 'A'라는 이름을 가진 column의 데이터만 추출\n",
"df['A']"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:07.659636Z",
"start_time": "2020-05-21T06:01:07.652659Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df['A'])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:07.847042Z",
"start_time": "2020-05-21T06:01:07.836059Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.472836</td>\n",
" <td>-0.000761</td>\n",
" <td>-1.040429</td>\n",
" <td>0.152076</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.093737</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-01 0.472836 -0.000761 -1.040429 0.152076\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 0, 1, 2행을 추출 (특정 '행 범위'의 데이터 추출)\n",
"df[0:3]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:08.079289Z",
"start_time": "2020-05-21T06:01:08.061337Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.093737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>-0.076963</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737\n",
"2013-01-04 0.941575 -1.211694 -0.811266 -0.076963"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# index명에 해당하는 값들 추출\n",
"df['20130102':'20130104']"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:08.405847Z",
"start_time": "2020-05-21T06:01:08.353988Z"
}
},
"outputs": [
{
"ename": "KeyError",
"evalue": "'20130102'",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\core\\indexes\\base.py\u001b[0m in \u001b[0;36mget_loc\u001b[1;34m(self, key, method, tolerance)\u001b[0m\n\u001b[0;32m 2645\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2646\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2647\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32mpandas\\_libs\\index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas\\_libs\\index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas\\_libs\\hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas\\_libs\\hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[1;34m()\u001b[0m\n",
"\u001b[1;31mKeyError\u001b[0m: '20130102'",
"\nDuring handling of the above exception, another exception occurred:\n",
"\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-20-1c27f7d17bf2>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# 특정 행 하나를 가져오고 싶은 경우\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;31m# 다음과 같이 입력하면, '20130102'라는 이름의 'index'가 아니라 'column'을 갖고 있는지 찾게 되므로 에러 발생!!\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'20130102'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\core\\frame.py\u001b[0m in \u001b[0;36m__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m 2798\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mnlevels\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2799\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_getitem_multilevel\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2800\u001b[1;33m \u001b[0mindexer\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2801\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mis_integer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mindexer\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2802\u001b[0m \u001b[0mindexer\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[0mindexer\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\core\\indexes\\base.py\u001b[0m in \u001b[0;36mget_loc\u001b[1;34m(self, key, method, tolerance)\u001b[0m\n\u001b[0;32m 2646\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2647\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2648\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_maybe_cast_indexer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2649\u001b[0m \u001b[0mindexer\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mget_indexer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mkey\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtolerance\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mtolerance\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2650\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mindexer\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m1\u001b[0m \u001b[1;32mor\u001b[0m \u001b[0mindexer\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msize\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32mpandas\\_libs\\index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas\\_libs\\index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas\\_libs\\hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[1;34m()\u001b[0m\n",
"\u001b[1;32mpandas\\_libs\\hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[1;34m()\u001b[0m\n",
"\u001b[1;31mKeyError\u001b[0m: '20130102'"
]
}
],
"source": [
"# 특정 행 하나를 가져오고 싶은 경우\n",
"# 다음과 같이 입력하면, '20130102'라는 이름의 'index'가 아니라 'column'을 갖고 있는지 찾게 되므로 에러 발생!!\n",
"df['20130102']"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:08.510874Z",
"start_time": "2020-05-21T06:01:08.500903Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 특정 '행 하나'를 선택하고 싶은 경우, 다음과 같이 입력하면 된다\n",
"df['20130102':'20130102']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**<요약>** \n",
"- 데이터 프레임 자체가 갖고 있는 슬라이싱은 다음과 같은 형태로 사용 가능!!\n",
" - df[컬럼명]\n",
" - df[시작 인덱스:끝 인덱스 + 1]\n",
" - df[시작 인덱스명:끝 인덱스명]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 라벨의 이름을 이용(명칭 기반 인덱싱)하여 데이터 선택하기: .loc\n",
"- 행 위치에는 DataFrame index 값을, 열 위치에는 column명을 입력\n",
"- <주의> loc[]에 슬라이싱 기호를 적용하면 (종료 값 - 1)이 아니라 종료 값까지 포함하는 것을 의미!! (명칭은 숫자 형이 아닐 수 있기 때문이다)\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:09.057737Z",
"start_time": "2020-05-21T06:01:09.050788Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"A 0.472836\n",
"B -0.000761\n",
"C -1.040429\n",
"D 0.152076\n",
"Name: 2013-01-01 00:00:00, dtype: float64"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 첫 번째 index의 값인 '2013-01-01'에 해당하는 모든 column의 값 추출\n",
"df.loc[dates[0]]\n",
"\n",
"# 아래와 같이 입력해도 동일하게 작동함\n",
"# df.loc['20130101']\n",
"# df.loc['2013-01-01']"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:09.208593Z",
"start_time": "2020-05-21T06:01:09.198589Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.472836</td>\n",
" <td>-0.000761</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>-1.035226</td>\n",
" <td>1.476234</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B\n",
"2013-01-01 0.472836 -0.000761\n",
"2013-01-02 0.461397 -0.669242\n",
"2013-01-03 0.324977 0.175157\n",
"2013-01-04 0.941575 -1.211694\n",
"2013-01-05 1.458312 -0.885358\n",
"2013-01-06 -1.035226 1.476234"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# column 'A', 'B'에 대한 모든 값 추출\n",
"df.loc[:, ['A', 'B']]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:09.397633Z",
"start_time": "2020-05-21T06:01:09.386664Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B\n",
"2013-01-02 0.461397 -0.669242\n",
"2013-01-03 0.324977 0.175157\n",
"2013-01-04 0.941575 -1.211694"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# index '2013-01-02'부터 '2013-01-04'까지의 column 'A', 'B'의 값 추출\n",
"df.loc['20130102':'20130104', ['A', 'B']]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:09.540713Z",
"start_time": "2020-05-21T06:01:09.530748Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"A 0.472836\n",
"B -0.000761\n",
"Name: 2013-01-01 00:00:00, dtype: float64"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 특정 index 값의 column 'A', 'B' 값을 추출\n",
"df.loc[dates[0], ['A', 'B']]"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:09.981556Z",
"start_time": "2020-05-21T06:01:09.973613Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.4728357137416509"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 특정 index 값과 특정 column에 있는 값을 추출\n",
"df.at[dates[0], 'A']\n",
"\n",
"# 위 코드의 결과는 아래의 코드 결과와 동일함\n",
"#df.loc[dates[0], 'A']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 위치를 나타내는 index 번호를 이용(위치 기반 인덱싱)하여 데이터 선택하기: .iloc\n",
"- 행과 열 값으로 정수형 또는 정수형의 슬라이싱, 팬시 리스트 값을 입력\n",
"- iloc[]에 위치 인덱싱이 아닌 명칭을 입력하면 오류가 발생\n",
" - ex) data_df.iloc[0, 'Name']\n",
"- iloc[]에 문자열 인덱스를 행 위치에 입력해도 오류가 발생\n",
" - ex) data_df.iloc['one', 0]\n",
"- iloc[]는 슬라이싱과 팬시 인덱싱은 제공하나, 명확한 위치 기반 인덱싱이 사용되어야 하는 제약으로 인해 불린 인덱싱은 제공하지 않는다"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:10.310473Z",
"start_time": "2020-05-21T06:01:10.302496Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"A 0.941575\n",
"B -1.211694\n",
"C -0.811266\n",
"D -0.076963\n",
"Name: 2013-01-04 00:00:00, dtype: float64"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# index 번호(0 부터 시작) 3을 선택\n",
"df.iloc[3]"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:10.464152Z",
"start_time": "2020-05-21T06:01:10.451186Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B\n",
"2013-01-04 0.941575 -1.211694\n",
"2013-01-05 1.458312 -0.885358"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 행의 index는 3, 4이므로 네 번째 행과 다섯 번째 행을 선택\n",
"# 열의 index는 0, 1 이므로 첫 번째 열과 두 번째 열을 선택\n",
"df.iloc[3:5, 0:2]"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:10.616793Z",
"start_time": "2020-05-21T06:01:10.603832Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.225294</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>-0.839797</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-1.660743</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A C\n",
"2013-01-02 0.461397 -0.225294\n",
"2013-01-03 0.324977 -0.839797\n",
"2013-01-05 1.458312 -1.660743"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 행과 열의 index를 리스트로 넘겨주기\n",
"# 두 번째, 세 번째, 다섯 번째 행과 첫 번째, 세 번째 열을 선택\n",
"df.iloc[[1, 2, 4], [0, 2]]"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:10.758966Z",
"start_time": "2020-05-21T06:01:10.749979Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.093737</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 슬라이싱을 이용해서 행 또는 열 전체를 선택\n",
"df.iloc[1:3, :]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:10.945932Z",
"start_time": "2020-05-21T06:01:10.935945Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>-0.000761</td>\n",
" <td>-1.040429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>1.476234</td>\n",
" <td>0.109521</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" B C\n",
"2013-01-01 -0.000761 -1.040429\n",
"2013-01-02 -0.669242 -0.225294\n",
"2013-01-03 0.175157 -0.839797\n",
"2013-01-04 -1.211694 -0.811266\n",
"2013-01-05 -0.885358 -1.660743\n",
"2013-01-06 1.476234 0.109521"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:, 1:3]"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:11.087208Z",
"start_time": "2020-05-21T06:01:11.080235Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"-0.6692420264543495"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 값 하나를 선택\n",
"df.iloc[1, 1]\n",
"\n",
"# 아래의 코드도 위의 코드와 동일한 결과를 반환해준다\n",
"# df.iat[1, 1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 불린 인덱싱(조건 설정)을 사용하여 데이터 선택하기\n",
"- 보통 이걸 가장 많이 사용하므로 잘 기억해두자!!\n",
"- 특정한 열의 값들을 기준으로 조건을 만들어서, 해당 조건에 만족하는 '행'들만 선택"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:11.720411Z",
"start_time": "2020-05-21T06:01:11.705449Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.472836</td>\n",
" <td>-0.000761</td>\n",
" <td>-1.040429</td>\n",
" <td>0.152076</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-0.952451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.093737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>-0.076963</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" <td>2.021500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-01 0.472836 -0.000761 -1.040429 0.152076\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737\n",
"2013-01-04 0.941575 -1.211694 -0.811266 -0.076963\n",
"2013-01-05 1.458312 -0.885358 -1.660743 2.021500"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 'A'라는 열에 들어있는 값이 양수인 경우에 해당하는 '행'들을 선택\n",
"df[df.A > 0]"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:11.915480Z",
"start_time": "2020-05-21T06:01:11.888552Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.472836</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.152076</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2.021500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>NaN</td>\n",
" <td>1.476234</td>\n",
" <td>0.109521</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"2013-01-01 0.472836 NaN NaN 0.152076\n",
"2013-01-02 0.461397 NaN NaN NaN\n",
"2013-01-03 0.324977 0.175157 NaN NaN\n",
"2013-01-04 0.941575 NaN NaN NaN\n",
"2013-01-05 1.458312 NaN NaN 2.021500\n",
"2013-01-06 NaN 1.476234 0.109521 NaN"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 각 값을 기준으로 조건 만들기 (개별 선택)\n",
"# 조건을 값이 양수인 것들로 설정하면, 나머지 값들(0 또는 음수)은 NaN으로!\n",
"df[df > 0]"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:12.065243Z",
"start_time": "2020-05-21T06:01:12.040345Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" A B C D E\n",
"2013-01-01 0.472836 -0.000761 -1.040429 0.152076 one\n",
"2013-01-02 0.461397 -0.669242 -0.225294 -0.952451 one\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737 two\n",
"2013-01-04 0.941575 -1.211694 -0.811266 -0.076963 three\n",
"2013-01-05 1.458312 -0.885358 -1.660743 2.021500 four\n",
"2013-01-06 -1.035226 1.476234 0.109521 -0.435898 three\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>E</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-0.093737</td>\n",
" <td>two</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" <td>2.021500</td>\n",
" <td>four</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D E\n",
"2013-01-03 0.324977 0.175157 -0.839797 -0.093737 two\n",
"2013-01-05 1.458312 -0.885358 -1.660743 2.021500 four"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 필터링을 해야 하는 경우: isin() 메소드 사용\n",
"# 'E'라는 새로운 열 하나를 추가한 후, 새롭게 추가된 열에 들어있는 값을 기준으로 행을 선택\n",
"df2 = df.copy() # 원본 데이터 유지를 위해 복사본 사용\n",
"df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three']\n",
"print(df2)\n",
"\n",
"df2[df2['E'].isin(['two', 'four'])]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 데이터 변경하기\n",
"- 데이터 프레임의 특정 값들을 다른 값으로 변경 가능"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:12.412526Z",
"start_time": "2020-05-21T06:01:12.405546Z"
}
},
"outputs": [],
"source": [
"s1 = pd.Series([1, 2, 3, 4, 5, 6], index = pd.date_range('20130102', periods = 6))\n",
"s1\n",
"\n",
"df['F'] = s1"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:12.597289Z",
"start_time": "2020-05-21T06:01:12.589311Z"
}
},
"outputs": [],
"source": [
"# 데이터 프레임의 특정 값 하나를 선택하여 다른 값으로 변경\n",
"df.at[dates[0], 'A'] = 0\n",
"\n",
"# 위치(index 번호)를 이용한 변경도 가능\n",
"df.iloc[0, 1] = 0\n",
"\n",
"# 여러 값을 한꺼번에 변경하고 싶은 경우, 데이터의 크기만 잘 맞춰주면 된다\n",
"df.loc[:, 'D'] = np.array([5] * len(df))"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:12.969902Z",
"start_time": "2020-05-21T06:01:12.955973Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>F</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-1.040429</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>5</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>5</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>5</td>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>1.458312</td>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" <td>5</td>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>-1.035226</td>\n",
" <td>1.476234</td>\n",
" <td>0.109521</td>\n",
" <td>5</td>\n",
" <td>5.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D F\n",
"2013-01-01 0.000000 0.000000 -1.040429 5 NaN\n",
"2013-01-02 0.461397 -0.669242 -0.225294 5 1.0\n",
"2013-01-03 0.324977 0.175157 -0.839797 5 2.0\n",
"2013-01-04 0.941575 -1.211694 -0.811266 5 3.0\n",
"2013-01-05 1.458312 -0.885358 -1.660743 5 4.0\n",
"2013-01-06 -1.035226 1.476234 0.109521 5 5.0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 위에서 변경한 데이터들을 모두 적용한 데이터 프레임의 값들을 확인\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:13.128301Z",
"start_time": "2020-05-21T06:01:13.105365Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>F</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-1.040429</td>\n",
" <td>-5</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>-0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>-5</td>\n",
" <td>-1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>-0.324977</td>\n",
" <td>-0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>-5</td>\n",
" <td>-2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>-0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>-5</td>\n",
" <td>-3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>-1.458312</td>\n",
" <td>-0.885358</td>\n",
" <td>-1.660743</td>\n",
" <td>-5</td>\n",
" <td>-4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>-1.035226</td>\n",
" <td>-1.476234</td>\n",
" <td>-0.109521</td>\n",
" <td>-5</td>\n",
" <td>-5.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D F\n",
"2013-01-01 0.000000 0.000000 -1.040429 -5 NaN\n",
"2013-01-02 -0.461397 -0.669242 -0.225294 -5 -1.0\n",
"2013-01-03 -0.324977 -0.175157 -0.839797 -5 -2.0\n",
"2013-01-04 -0.941575 -1.211694 -0.811266 -5 -3.0\n",
"2013-01-05 -1.458312 -0.885358 -1.660743 -5 -4.0\n",
"2013-01-06 -1.035226 -1.476234 -0.109521 -5 -5.0"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 불린 인덱싱(조건 설정)을 이용한 데이터 선택 방법을 사용하여, 특정 조건을 만족하는 값들만 변경\n",
"# ex) 양수의 값을 가지는 값들에 한해서 음수로 변경\n",
"df2 = df.copy()\n",
"df2[df2 > 0] = -df2\n",
"df2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### DataFrame과 리스트, 딕셔너리, 넘파이 ndarray 상호 변환\n",
"\n",
"**리스트, ndarray에서 DataFrame변환**"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:13.413091Z",
"start_time": "2020-05-21T06:01:13.400159Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"array1 shape: (3,)\n",
"1차원 리스트로 만든 DataFrame:\n",
" col1\n",
"0 1\n",
"1 2\n",
"2 3\n",
"1차원 ndarray로 만든 DataFrame:\n",
" col1\n",
"0 1\n",
"1 2\n",
"2 3\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"col_name1 = ['col1']\n",
"list1 = [1, 2, 3]\n",
"array1 = np.array(list1)\n",
"print('array1 shape:', array1.shape)\n",
"\n",
"df_list1 = pd.DataFrame(list1, columns = col_name1)\n",
"print('1차원 리스트로 만든 DataFrame:\\n', df_list1)\n",
"\n",
"df_array1 = pd.DataFrame(array1, columns = col_name1)\n",
"print('1차원 ndarray로 만든 DataFrame:\\n', df_array1)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:13.596302Z",
"start_time": "2020-05-21T06:01:13.582341Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"array2 shape: (2, 3)\n",
"2차원 리스트로 만든 DataFrame:\n",
" col1 col2 col3\n",
"0 1 2 3\n",
"1 11 12 13\n",
"2차원 ndarray로 만든 DataFrame:\n",
" col1 col2 col3\n",
"0 1 2 3\n",
"1 11 12 13\n"
]
}
],
"source": [
"# 3개의 컬럼명이 필요함\n",
"col_name2 = ['col1', 'col2', 'col3']\n",
"\n",
"# 2행 x 3열 형태의 리스트와 ndarray 생성 한 뒤, 이를 DataFrame으로 변환\n",
"list2 = [[1, 2, 3],\n",
" [11, 12, 13]]\n",
"array2 = np.array(list2)\n",
"print('array2 shape:', array2.shape)\n",
"\n",
"df_list2 = pd.DataFrame(list2, columns = col_name2)\n",
"print('2차원 리스트로 만든 DataFrame:\\n', df_list2)\n",
"\n",
"df_array1 = pd.DataFrame(array2, columns = col_name2)\n",
"print('2차원 ndarray로 만든 DataFrame:\\n', df_array1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**딕셔너리(dict)에서 DataFrame변환**"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:14.142279Z",
"start_time": "2020-05-21T06:01:14.133272Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"딕셔너리로 만든 DataFrame:\n",
" col1 col2 col3\n",
"0 1 2 3\n",
"1 11 22 33\n"
]
}
],
"source": [
"# Key는 컬럼명으로 매핑, Value는 리스트 형(또는 ndarray)\n",
"dict = {'col1':[1, 11], 'col2':[2, 22], 'col3':[3, 33]}\n",
"df_dict = pd.DataFrame(dict)\n",
"print('딕셔너리로 만든 DataFrame:\\n', df_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**DataFrame을 ndarray로 변환**\n",
"- values를 이용한 ndarray로의 변환은 매우 많이 사용되므로 반드시 기억하자!!"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:14.805140Z",
"start_time": "2020-05-21T06:01:14.799158Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"df_dict.values 타입: <class 'numpy.ndarray'> df_dict.values shape: (2, 3)\n",
"[[ 1 2 3]\n",
" [11 22 33]]\n"
]
}
],
"source": [
"# DataFrame을 ndarray로 변환\n",
"array3 = df_dict.values\n",
"print('df_dict.values 타입:', type(array3), 'df_dict.values shape:', array3.shape)\n",
"print(array3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**DataFrame을 리스트와 딕셔너리로 변환**\n",
"- 리스트로의 변환은 values에서 얻은 ndarray에서 tolist()를 호출\n",
"- 딕셔너리로의 변환은 DataFrame 객체의 to_dict() 메소드를 호출하는데, 인자로 'list'를 입력하면 딕셔너리의 값이 리스트형으로 반환"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:15.209860Z",
"start_time": "2020-05-21T06:01:15.203878Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"df_dict.values.tolist() 타입: <class 'list'>\n",
"[[1, 2, 3], [11, 22, 33]]\n",
"\n",
" df_dict.to_dict() 타입: <class 'dict'>\n",
"{'col1': [1, 11], 'col2': [2, 22], 'col3': [3, 33]}\n"
]
}
],
"source": [
"# DataFrame을 리스트로 변환\n",
"list3 = df_dict.values.tolist()\n",
"print('df_dict.values.tolist() 타입:', type(list3))\n",
"print(list3)\n",
"\n",
"# DataFrame을 딕셔너리로 변환\n",
"dict3 = df_dict.to_dict('list') # 'list'를 입력해주지 않으면, {0:1, 1:11, ...} 이런 식으로 반환됨\n",
"print('\\n df_dict.to_dict() 타입:', type(dict3))\n",
"print(dict3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataFrame 데이터 삭제하기\n",
"- drop() 메소드를 사용\n",
"- DataFrame.drop(labels = None, axis = 0, index = None, columns = None, level = None, inplace = False, errors = 'raise')\n",
"- 위에서 가장 중요한 파라미터는 labels, axis, inplace 이다\n",
" - axis = 0: row 방향 축 --> 이상치 데이터를 삭제하는 경우에 주로 사용!!\n",
" - axis = 1: column 방향 축 --> 기존 column 값을 가공해서 새로운 column을 만들고 삭제(일반적으로 axis = 1을 쓰는 경우가 많음)\n",
" - inplace = False: 원본 DataFrame은 유지하고 drop된 DataFrame을 새롭게 객체 변수로 받고 싶은 경우\n",
" - inplace = True: 원본 DataFrame에 drop된 결과를 적용할 경우\n",
"\n",
"**inplace = True 와 inplace = False 둘 중, 하나를 딱 정해서 사용하자. 그래야 헷갈리지 않는다!!**\n",
"- 개인적으로 나는 inplace = False 로 통일해서 사용하는 걸로!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Index 객체\n",
"- DataFrame, Series에서 Index 객체만 추출하려면, DataFrame.index 또는 Series.index 속성을 통해 가능\n",
"- Index는 오직 식별용으로만 사용\n",
"- 한 번 만들어진 DataFrame 및 Series의 Index 객체는 함부로 변경할 수 없다\n",
"- DataFrame 및 Series에 reset_index() 메소드를 수행하면, 새롭게 인덱스를 연속 숫자형으로 할당하며 기존 인덱스는 'index'라는 새로운 칼럼 명으로 추가\n",
" - 인덱스가 연속된 int 숫자형 데이터가 아닐 경우에, 다시 이를 연속 int 숫자형 데이터로 만들 때 주로 사용!!\n",
" - Series에 reset_index()를 적용하면, Series가 아닌 DataFrame이 반환!!\n",
" - reset_index()의 파라미터 중, drop = True로 설정하면 기존 인덱스는 새로운 칼럼으로 추가되지 않고 삭제(drop)된다!!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 결측치(Missing Data) 처리하기\n",
"- 결측 데이터는 column에 값이 없는, 즉 NULL인 경우를 의미한다\n",
"- 기본적으로 머신러닝 알고리즘은 이 NaN 값을 처리하지 않으므로, 이 값을 다른 값으로 대체해야 한다\n",
" - NaN 값은 평균, 총합 등의 함수 연산 시 제외된다!\n",
"- 판다스(pandas)에서는 결측치를 np.nan 으로 나타내며, 기본적으로 연산에서 제외시킨다"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:17.540241Z",
"start_time": "2020-05-21T06:01:17.521291Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>F</th>\n",
" <th>E</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-1.040429</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>5</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>5</td>\n",
" <td>2.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>5</td>\n",
" <td>3.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D F E\n",
"2013-01-01 0.000000 0.000000 -1.040429 5 NaN 1.0\n",
"2013-01-02 0.461397 -0.669242 -0.225294 5 1.0 1.0\n",
"2013-01-03 0.324977 0.175157 -0.839797 5 2.0 NaN\n",
"2013-01-04 0.941575 -1.211694 -0.811266 5 3.0 NaN"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 재 인덱싱(reindex)은 해당 축에 대하여 index를 변경/추가/삭제\n",
"df1 = df.reindex(index = dates[0:4], columns = list(df.columns) + ['E'])\n",
"df1.loc[dates[0]:dates[1], 'E'] = 1\n",
"df1"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:17.758121Z",
"start_time": "2020-05-21T06:01:17.741201Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>F</th>\n",
" <th>E</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D F E\n",
"2013-01-01 False False False False True False\n",
"2013-01-02 False False False False False False\n",
"2013-01-03 False False False False False True\n",
"2013-01-04 False False False False False True"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 해당 값이 결측치인지 아닌지 확인하고자 하는 경우: isna() 메소드 사용\n",
"# 결측치이면 True, 값이 있으면 False를 반환\n",
"pd.isna(df1)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:17.898955Z",
"start_time": "2020-05-21T06:01:17.889018Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"A 0\n",
"B 0\n",
"C 0\n",
"D 0\n",
"F 1\n",
"E 2\n",
"dtype: int64"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 결측치의 개수는 isna() 결과에 sum() 함수를 추가해서 구할 수 있다\n",
"df1.isna().sum()"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:18.087908Z",
"start_time": "2020-05-21T06:01:18.071951Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>F</th>\n",
" <th>E</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>5</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D F E\n",
"2013-01-02 0.461397 -0.669242 -0.225294 5 1.0 1.0"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 결측치가 하나라도 존재하는 행들을 버리고 싶은 경우: dropna() 메소드 사용\n",
"df1.dropna(how = 'any')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*<주의>* - *inplace 파라미터*\n",
"- fillna()를 이용해 반환 값을 다시 받거나, inplace = True 파라미터를 fillna()에 추가해야 실제 데이터 세트 값이 변경된다!!"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:18.384527Z",
"start_time": "2020-05-21T06:01:18.369569Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>F</th>\n",
" <th>E</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-1.040429</td>\n",
" <td>5</td>\n",
" <td>5.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-0.225294</td>\n",
" <td>5</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.324977</td>\n",
" <td>0.175157</td>\n",
" <td>-0.839797</td>\n",
" <td>5</td>\n",
" <td>2.0</td>\n",
" <td>5.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>0.941575</td>\n",
" <td>-1.211694</td>\n",
" <td>-0.811266</td>\n",
" <td>5</td>\n",
" <td>3.0</td>\n",
" <td>5.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D F E\n",
"2013-01-01 0.000000 0.000000 -1.040429 5 5.0 1.0\n",
"2013-01-02 0.461397 -0.669242 -0.225294 5 1.0 1.0\n",
"2013-01-03 0.324977 0.175157 -0.839797 5 2.0 5.0\n",
"2013-01-04 0.941575 -1.211694 -0.811266 5 3.0 5.0"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 결측치가 있는 부분을 다른 값으로 채우고 싶은 경우: fillna() 메소드 사용\n",
"df1.fillna(value = 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 연산(Operations)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 통계적 지표들 (Stats)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:18.854345Z",
"start_time": "2020-05-21T06:01:18.846367Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"A 0.358506\n",
"B -0.185817\n",
"C -0.744668\n",
"D 5.000000\n",
"F 3.000000\n",
"dtype: float64"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 평균 구하기 (default인 axis = 0을 기준으로 평균이 구해진다)\n",
"df.mean()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:19.040207Z",
"start_time": "2020-05-21T06:01:19.032263Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2013-01-01 0.989893\n",
"2013-01-02 1.113372\n",
"2013-01-03 1.332067\n",
"2013-01-04 1.383723\n",
"2013-01-05 1.582442\n",
"2013-01-06 2.110106\n",
"Freq: D, dtype: float64"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 열(axis = 1) 축에 대해서 평균 구하기\n",
"df.mean(axis = 1)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:19.192384Z",
"start_time": "2020-05-21T06:01:19.177427Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>F</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>-0.675023</td>\n",
" <td>-0.824843</td>\n",
" <td>-1.839797</td>\n",
" <td>4.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>-2.058425</td>\n",
" <td>-4.211694</td>\n",
" <td>-3.811266</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>-3.541688</td>\n",
" <td>-5.885358</td>\n",
" <td>-6.660743</td>\n",
" <td>0.0</td>\n",
" <td>-1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D F\n",
"2013-01-01 NaN NaN NaN NaN NaN\n",
"2013-01-02 NaN NaN NaN NaN NaN\n",
"2013-01-03 -0.675023 -0.824843 -1.839797 4.0 1.0\n",
"2013-01-04 -2.058425 -4.211694 -3.811266 2.0 0.0\n",
"2013-01-05 -3.541688 -5.885358 -6.660743 0.0 -1.0\n",
"2013-01-06 NaN NaN NaN NaN NaN"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 서로 차원이 달라서 index를 맞추어야 하는 두 object 간의 연산\n",
"# pandas는 맞추어야 할 축만 지정해주면, 자동으로 해당 축을 기준으로 맞추어 연산을 수행\n",
"\n",
"# 기존 데이터 프레임의 index가 2013-01-03, 04, 05인 모든 column에 해당하는 값에, 각 1.0, 3.0, 5.0 을 빼준 값이 결과로 나온다\n",
"# 결측치가 존재하는 경우에는 계산이 불가능하므로 NaN 으로 표시된다\n",
"s = pd.Series([1, 3, 5, np.nan, 6, 8], index = dates).shift(2) # shift를 사용하면, index는 그대로 두고 데이터만 이동!\n",
"df.sub(s, axis = 'index') # index를 기준으로 '빼기' 연산을 수행"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 데이터 프레임에 함수 적용하기(Apply)\n",
"- 판다스는 apply 함수에 lambda 식을 결합해 DataFrame이나 Series의 레코드 별로 데이터를 가공하는 기능을 제공한다\n",
"- apply lambda 식으로 데이터를 가공하는 방법은 자주 사용하므로 잘 기억해두자!!\n",
" - ex) lambda x: x ** 2\n",
" - 위 예시에서 앞의 x 는 '입력 인자', 뒤의 x ** 2 는 입력 인자를 기반으로 한 계산식이며, 호출 시 계산 결과가 반환된다\n",
"- lambda 식을 이용할 때, 여러 개의 값을 입력 인자로 사용해야 할 경우, 보통 map() 함수를 결합해서 사용한다\n",
" - ex) a = [1, 2, 3]\n",
" squares = map(lambda x: x ** 2, a)\n",
" list(squares)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:19.491892Z",
"start_time": "2020-05-21T06:01:19.469933Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>F</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-1.040429</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-02</th>\n",
" <td>0.461397</td>\n",
" <td>-0.669242</td>\n",
" <td>-1.265723</td>\n",
" <td>10</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-03</th>\n",
" <td>0.786374</td>\n",
" <td>-0.494085</td>\n",
" <td>-2.105519</td>\n",
" <td>15</td>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-04</th>\n",
" <td>1.727949</td>\n",
" <td>-1.705779</td>\n",
" <td>-2.916785</td>\n",
" <td>20</td>\n",
" <td>6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-05</th>\n",
" <td>3.186261</td>\n",
" <td>-2.591137</td>\n",
" <td>-4.577528</td>\n",
" <td>25</td>\n",
" <td>10.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-06</th>\n",
" <td>2.151035</td>\n",
" <td>-1.114903</td>\n",
" <td>-4.468007</td>\n",
" <td>30</td>\n",
" <td>15.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D F\n",
"2013-01-01 0.000000 0.000000 -1.040429 5 NaN\n",
"2013-01-02 0.461397 -0.669242 -1.265723 10 1.0\n",
"2013-01-03 0.786374 -0.494085 -2.105519 15 3.0\n",
"2013-01-04 1.727949 -1.705779 -2.916785 20 6.0\n",
"2013-01-05 3.186261 -2.591137 -4.577528 25 10.0\n",
"2013-01-06 2.151035 -1.114903 -4.468007 30 15.0"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.apply(np.cumsum) # np.cumsum(): 누적 합계"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:19.629251Z",
"start_time": "2020-05-21T06:01:19.620278Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"A 2.493538\n",
"B 2.687928\n",
"C 1.770264\n",
"D 0.000000\n",
"F 4.000000\n",
"dtype: float64"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.apply(lambda x: x.max() - x.min())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 히스토그램 구하기(Histogramming)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:19.924417Z",
"start_time": "2020-05-21T06:01:19.913470Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 2\n",
"1 6\n",
"2 2\n",
"3 1\n",
"4 2\n",
"5 4\n",
"6 0\n",
"7 0\n",
"8 4\n",
"9 4\n",
"dtype: int32\n"
]
},
{
"data": {
"text/plain": [
"4 3\n",
"2 3\n",
"0 2\n",
"6 1\n",
"1 1\n",
"dtype: int64"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 데이터의 값들의 빈도를 조사하여 히스토그램 만들기\n",
"s = pd.Series(np.random.randint(0, 7, size = 10))\n",
"print(s)\n",
"\n",
"s.value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 문자열 관련 메소드들(String Methods)\n",
"- Series는 배열의 각 요소에 쉽게 적용이 가능하도록 str 이라는 속성에 문자열을 처리할 수 있는 여러 가지 메소드들을 갖추고 있다\n",
"- 문자열 내에서의 패턴을 찾기 위한 작업들은 일반적으로 '정규표현식'을 사용하는 것에 유의!"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:20.213517Z",
"start_time": "2020-05-21T06:01:20.205541Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 a\n",
"1 b\n",
"2 c\n",
"3 aaba\n",
"4 baca\n",
"5 NaN\n",
"6 caba\n",
"7 dog\n",
"8 cat\n",
"dtype: object"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])\n",
"s.str.lower() # 각 요소 값들을 모두 소문자로 변경"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 합치기(Merging)\n",
"- 다양한 정보를 담은 자료들을 합쳐서, 새로운 자료를 만들어야 하는 경우에 사용"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Concat\n",
"- 같은 형태의 자료들을 이어 하나로 만들어준다\n",
"- concat 메소드 관련 사이트: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#concatenating-objects"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:20.746108Z",
"start_time": "2020-05-21T06:01:20.726164Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.808655</td>\n",
" <td>-0.178923</td>\n",
" <td>0.487119</td>\n",
" <td>-0.427400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.191072</td>\n",
" <td>0.071646</td>\n",
" <td>0.095561</td>\n",
" <td>1.174140</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-0.432226</td>\n",
" <td>1.027590</td>\n",
" <td>1.516764</td>\n",
" <td>-0.386455</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-1.719182</td>\n",
" <td>0.094227</td>\n",
" <td>-0.819980</td>\n",
" <td>-0.504760</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.645324</td>\n",
" <td>-0.074010</td>\n",
" <td>-0.533842</td>\n",
" <td>0.407079</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>-0.879944</td>\n",
" <td>-0.024651</td>\n",
" <td>-1.181698</td>\n",
" <td>-0.670168</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>-0.281970</td>\n",
" <td>-0.677789</td>\n",
" <td>0.265645</td>\n",
" <td>-0.675350</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0.093248</td>\n",
" <td>-1.028898</td>\n",
" <td>0.506544</td>\n",
" <td>-0.694781</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>-0.939844</td>\n",
" <td>-1.400401</td>\n",
" <td>-0.799056</td>\n",
" <td>1.473795</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>-1.470519</td>\n",
" <td>0.741310</td>\n",
" <td>0.634705</td>\n",
" <td>-0.391076</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3\n",
"0 -0.808655 -0.178923 0.487119 -0.427400\n",
"1 0.191072 0.071646 0.095561 1.174140\n",
"2 -0.432226 1.027590 1.516764 -0.386455\n",
"3 -1.719182 0.094227 -0.819980 -0.504760\n",
"4 0.645324 -0.074010 -0.533842 0.407079\n",
"5 -0.879944 -0.024651 -1.181698 -0.670168\n",
"6 -0.281970 -0.677789 0.265645 -0.675350\n",
"7 0.093248 -1.028898 0.506544 -0.694781\n",
"8 -0.939844 -1.400401 -0.799056 1.473795\n",
"9 -1.470519 0.741310 0.634705 -0.391076"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 임의의 수를 담고있는 10 x 4 형태의 데이터 프레임 생성\n",
"df = pd.DataFrame(np.random.randn(10, 4))\n",
"\n",
"# 만들어진 데이터 프레임을 세 부분(row 기준)으로 분할\n",
"pieces = [df[:3], df[3:7], df[7:]]\n",
"\n",
"# pandas에 있는 concat 메소드를 사용하여, 데이터 프레임을 원래대로 다시 합치기\n",
"pd.concat(pieces)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Join\n",
"- 데이터베이스에서 사용하는 SQL 스타일의 합치기 기능\n",
"- merge 메소드를 통해 이루어진다"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:21.083207Z",
"start_time": "2020-05-21T06:01:21.056280Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" key lval\n",
"0 foo 1\n",
"1 foo 2\n",
"\n",
" key rval\n",
"0 foo 4\n",
"1 foo 5\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>key</th>\n",
" <th>lval</th>\n",
" <th>rval</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>foo</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>foo</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>foo</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>foo</td>\n",
" <td>2</td>\n",
" <td>5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" key lval rval\n",
"0 foo 1 4\n",
"1 foo 1 5\n",
"2 foo 2 4\n",
"3 foo 2 5"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# key 값을 중복으로 가질 때, merge 메소드의 작동 방식\n",
"# 보통 key로 사용하는 값은 중복일 경우가 잘 없지만, 만약 중복된 값이 있을 경우에는 모든 경우의 수를 만들어내는 작동 방식!!\n",
"left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})\n",
"right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})\n",
"print(left)\n",
"print()\n",
"print(right)\n",
"\n",
"merged = pd.merge(left, right, on = 'key') # merge 기준을 'key'로 설정\n",
"merged"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:21.228820Z",
"start_time": "2020-05-21T06:01:21.203885Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" key lval\n",
"0 foo 1\n",
"1 bar 2\n",
"\n",
" key rval\n",
"0 foo 4\n",
"1 bar 5\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>key</th>\n",
" <th>lval</th>\n",
" <th>rval</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>foo</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>bar</td>\n",
" <td>2</td>\n",
" <td>5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" key lval rval\n",
"0 foo 1 4\n",
"1 bar 2 5"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# key 값을 중복으로 가지지 않을 때, merge 메소드의 작동 방식\n",
"left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})\n",
"right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})\n",
"print(left)\n",
"print()\n",
"print(right)\n",
"\n",
"merged = pd.merge(left, right, on = 'key')\n",
"merged"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Append\n",
"- 데이터 프레임의 맨 뒤에 행을 추가한다\n",
"- append 메소드 관련 사이트: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#merging-concatenation"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:21.561925Z",
"start_time": "2020-05-21T06:01:21.539985Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.548845</td>\n",
" <td>0.979803</td>\n",
" <td>-1.466482</td>\n",
" <td>0.164177</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-1.020548</td>\n",
" <td>1.370382</td>\n",
" <td>-2.089888</td>\n",
" <td>0.430789</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-0.462215</td>\n",
" <td>-0.466806</td>\n",
" <td>0.395783</td>\n",
" <td>-1.695677</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.232238</td>\n",
" <td>-0.633204</td>\n",
" <td>0.302578</td>\n",
" <td>-0.309603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.347692</td>\n",
" <td>-1.017691</td>\n",
" <td>1.191371</td>\n",
" <td>1.473029</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.946739</td>\n",
" <td>0.060351</td>\n",
" <td>1.521950</td>\n",
" <td>1.294958</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>-0.297713</td>\n",
" <td>-0.526382</td>\n",
" <td>-0.109674</td>\n",
" <td>-0.343933</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>-0.553328</td>\n",
" <td>0.241240</td>\n",
" <td>-0.811586</td>\n",
" <td>0.408618</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>0.232238</td>\n",
" <td>-0.633204</td>\n",
" <td>0.302578</td>\n",
" <td>-0.309603</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"0 0.548845 0.979803 -1.466482 0.164177\n",
"1 -1.020548 1.370382 -2.089888 0.430789\n",
"2 -0.462215 -0.466806 0.395783 -1.695677\n",
"3 0.232238 -0.633204 0.302578 -0.309603\n",
"4 -0.347692 -1.017691 1.191371 1.473029\n",
"5 0.946739 0.060351 1.521950 1.294958\n",
"6 -0.297713 -0.526382 -0.109674 -0.343933\n",
"7 -0.553328 0.241240 -0.811586 0.408618\n",
"8 0.232238 -0.633204 0.302578 -0.309603"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 4번째 행을 기존의 데이터 프레임의 맨 뒤에 한 번 더 추가\n",
"df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])\n",
"\n",
"s = df.iloc[3] # index 번호가 3. 즉, 4번째 행을 의미\n",
"df.append(s, ignore_index = True) # ignore_index = True 를 설정해주면, index 이름을 무시한다"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 그룹화 = 묶기(Grouping)\n",
"- 어떠한 기준을 바탕으로 데이터를 나누는 일(splitting)\n",
"- 각 그룹에 어떤 함수를 독립적으로 적용시키는 일(applying)\n",
"- 적용되어 나온 결과들을 통합하는 일(combining)\n",
"- grouping 관련 사이트: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#groupby"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:21.875152Z",
"start_time": "2020-05-21T06:01:21.862187Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>foo</td>\n",
" <td>one</td>\n",
" <td>-1.062168</td>\n",
" <td>-1.658541</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>bar</td>\n",
" <td>one</td>\n",
" <td>-0.537435</td>\n",
" <td>0.294346</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>foo</td>\n",
" <td>two</td>\n",
" <td>-0.457747</td>\n",
" <td>0.457779</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>bar</td>\n",
" <td>three</td>\n",
" <td>0.215592</td>\n",
" <td>-0.538010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>foo</td>\n",
" <td>two</td>\n",
" <td>0.897209</td>\n",
" <td>-2.013092</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>bar</td>\n",
" <td>two</td>\n",
" <td>-0.518347</td>\n",
" <td>0.628899</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>foo</td>\n",
" <td>one</td>\n",
" <td>-0.122072</td>\n",
" <td>1.753600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>foo</td>\n",
" <td>three</td>\n",
" <td>0.441977</td>\n",
" <td>-1.818391</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"0 foo one -1.062168 -1.658541\n",
"1 bar one -0.537435 0.294346\n",
"2 foo two -0.457747 0.457779\n",
"3 bar three 0.215592 -0.538010\n",
"4 foo two 0.897209 -2.013092\n",
"5 bar two -0.518347 0.628899\n",
"6 foo one -0.122072 1.753600\n",
"7 foo three 0.441977 -1.818391"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',\n",
" 'foo', 'bar', 'foo', 'foo'],\n",
" 'B': ['one', 'one', 'two', 'three',\n",
" 'two', 'two', 'one', 'three'],\n",
" 'C': np.random.randn(8),\n",
" 'D': np.random.randn(8)}) # np.random.randn(): 표준 정규 분포에서 난수 matrix array 생성\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:22.023962Z",
"start_time": "2020-05-21T06:01:22.011996Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" <tr>\n",
" <th>A</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bar</th>\n",
" <td>-0.840191</td>\n",
" <td>0.385235</td>\n",
" </tr>\n",
" <tr>\n",
" <th>foo</th>\n",
" <td>-0.302801</td>\n",
" <td>-3.278645</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" C D\n",
"A \n",
"bar -0.840191 0.385235\n",
"foo -0.302801 -3.278645"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 'A' column 값을 기준으로 그룹을 묶고, 각 그룹의 합계 구하기\n",
"df.groupby('A').sum()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:22.207875Z",
"start_time": "2020-05-21T06:01:22.193913Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" <tr>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">bar</th>\n",
" <th>one</th>\n",
" <td>-0.537435</td>\n",
" <td>0.294346</td>\n",
" </tr>\n",
" <tr>\n",
" <th>three</th>\n",
" <td>0.215592</td>\n",
" <td>-0.538010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>two</th>\n",
" <td>-0.518347</td>\n",
" <td>0.628899</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">foo</th>\n",
" <th>one</th>\n",
" <td>-1.184240</td>\n",
" <td>0.095059</td>\n",
" </tr>\n",
" <tr>\n",
" <th>three</th>\n",
" <td>0.441977</td>\n",
" <td>-1.818391</td>\n",
" </tr>\n",
" <tr>\n",
" <th>two</th>\n",
" <td>0.439463</td>\n",
" <td>-1.555313</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" C D\n",
"A B \n",
"bar one -0.537435 0.294346\n",
" three 0.215592 -0.538010\n",
" two -0.518347 0.628899\n",
"foo one -1.184240 0.095059\n",
" three 0.441977 -1.818391\n",
" two 0.439463 -1.555313"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 여러 column을 기준으로 그룹을 묶기\n",
"df.groupby(['A', 'B']).sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### groupby( )에서 agg( ) 활용법\n",
"- SQL의 경우, 서로 다른 aggregation 함수를 적용할 경우에는 Select 절에 나열하기만 하면 된다\n",
" - aggregation 함수: ex) min(), max(), sum(), count() 등...\n",
" - ex) Select count(PassengerId), count(Survived), ...\n",
" from titanic_table\n",
" group by Pclass\n",
"- 여러 개의 column이 서로 다른 aggregation 함수를 groupby에서 호출하려면, agg()를 이용해서 SQL과 같은 처리가 가능\n",
" - DataFrame groupby()의 경우, 적용하려는 여러 개의 aggregation 함수명을 DataFrameGroupBy 객체의 agg() 내에 인자로 입력해서 사용\n",
" - agg() 내에 입력 값으로, '딕셔너리' 형태로 aggregation이 적용될 column들과 aggregation 함수를 입력"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:22.557403Z",
"start_time": "2020-05-21T06:01:22.526483Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe thead tr:last-of-type th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th colspan=\"2\" halign=\"left\">B</th>\n",
" <th colspan=\"2\" halign=\"left\">C</th>\n",
" <th colspan=\"2\" halign=\"left\">D</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th>max</th>\n",
" <th>min</th>\n",
" <th>max</th>\n",
" <th>min</th>\n",
" <th>max</th>\n",
" <th>min</th>\n",
" </tr>\n",
" <tr>\n",
" <th>A</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bar</th>\n",
" <td>two</td>\n",
" <td>one</td>\n",
" <td>0.215592</td>\n",
" <td>-0.537435</td>\n",
" <td>0.628899</td>\n",
" <td>-0.538010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>foo</th>\n",
" <td>two</td>\n",
" <td>one</td>\n",
" <td>0.897209</td>\n",
" <td>-1.062168</td>\n",
" <td>1.753600</td>\n",
" <td>-2.013092</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" B C D \n",
" max min max min max min\n",
"A \n",
"bar two one 0.215592 -0.537435 0.628899 -0.538010\n",
"foo two one 0.897209 -1.062168 1.753600 -2.013092"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(['A']).agg([max, min])"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:22.683782Z",
"start_time": "2020-05-21T06:01:22.671816Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" <tr>\n",
" <th>A</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bar</th>\n",
" <td>0.215592</td>\n",
" <td>-0.538010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>foo</th>\n",
" <td>0.897209</td>\n",
" <td>-2.013092</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" C D\n",
"A \n",
"bar 0.215592 -0.538010\n",
"foo 0.897209 -2.013092"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agg_format = {'C':'max', 'D':'min'}\n",
"df.groupby(['A']).agg(agg_format)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 변형하기(Reshaping)\n",
"- 데이터 프레임을 다른 형태로 변환\n",
"- Reshaping 관련 참고 사이트: https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#reshaping-stacking"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stack 메소드\n",
"- 데이터 프레임의 column들을 index의 레벨로 만들며, 이를 '압축'한다고 표현한다"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:23.172003Z",
"start_time": "2020-05-21T06:01:23.152056Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" A B\n",
"first second \n",
"bar one -0.672399 -0.583972\n",
" two 0.656727 -0.070697\n",
"baz one 0.935536 -1.169262\n",
" two -0.413284 -0.813473\n"
]
},
{
"data": {
"text/plain": [
"first second \n",
"bar one A -0.672399\n",
" B -0.583972\n",
" two A 0.656727\n",
" B -0.070697\n",
"baz one A 0.935536\n",
" B -1.169262\n",
" two A -0.413284\n",
" B -0.813473\n",
"dtype: float64"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tuples = list(zip(*[['bar', 'bar', 'baz', 'baz',\n",
" 'foo', 'foo', 'qux', 'qux'],\n",
" ['one', 'two', 'one', 'two',\n",
" 'one', 'two', 'one', 'two']]))\n",
"\n",
"index = pd.MultiIndex.from_tuples(tuples, names = ['first', 'second'])\n",
"df = pd.DataFrame(np.random.randn(8, 2), index = index, columns = ['A', 'B'])\n",
"df2 = df[:4]\n",
"print(df2)\n",
"\n",
"# stack 메소드를 통해 A와 B라는 값을 가지는 index 레벨이 하나 더 추가된 형태로 변환\n",
"stacked = df2.stack()\n",
"stacked"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:23.347693Z",
"start_time": "2020-05-21T06:01:23.334729Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" </tr>\n",
" <tr>\n",
" <th>first</th>\n",
" <th>second</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">bar</th>\n",
" <th>one</th>\n",
" <td>-0.672399</td>\n",
" <td>-0.583972</td>\n",
" </tr>\n",
" <tr>\n",
" <th>two</th>\n",
" <td>0.656727</td>\n",
" <td>-0.070697</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">baz</th>\n",
" <th>one</th>\n",
" <td>0.935536</td>\n",
" <td>-1.169262</td>\n",
" </tr>\n",
" <tr>\n",
" <th>two</th>\n",
" <td>-0.413284</td>\n",
" <td>-0.813473</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B\n",
"first second \n",
"bar one -0.672399 -0.583972\n",
" two 0.656727 -0.070697\n",
"baz one 0.935536 -1.169262\n",
" two -0.413284 -0.813473"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# stack 메소드를 통해 압축된 수준을 갖는 데이터 프레임을 다시 unstack 메소드로 원복시키기\n",
"# stack 메소드를 통해 압축되었던 마지막 수준부터 풀어주는 기능\n",
"stacked.unstack()"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:23.500643Z",
"start_time": "2020-05-21T06:01:23.486686Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>first</th>\n",
" <th>bar</th>\n",
" <th>baz</th>\n",
" </tr>\n",
" <tr>\n",
" <th>second</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">one</th>\n",
" <th>A</th>\n",
" <td>-0.672399</td>\n",
" <td>0.935536</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>-0.583972</td>\n",
" <td>-1.169262</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">two</th>\n",
" <th>A</th>\n",
" <td>0.656727</td>\n",
" <td>-0.413284</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>-0.070697</td>\n",
" <td>-0.813473</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"first bar baz\n",
"second \n",
"one A -0.672399 0.935536\n",
" B -0.583972 -1.169262\n",
"two A 0.656727 -0.413284\n",
" B -0.070697 -0.813473"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# unstack 메소드에서 '해제할 수준'을 지정해주기\n",
"\n",
"# 첫 번째 수준을 해제 --> bar와 baz라는 column이 생김\n",
"stacked.unstack(0)"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:23.685278Z",
"start_time": "2020-05-21T06:01:23.668315Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>second</th>\n",
" <th>one</th>\n",
" <th>two</th>\n",
" </tr>\n",
" <tr>\n",
" <th>first</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">bar</th>\n",
" <th>A</th>\n",
" <td>-0.672399</td>\n",
" <td>0.656727</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>-0.583972</td>\n",
" <td>-0.070697</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">baz</th>\n",
" <th>A</th>\n",
" <td>0.935536</td>\n",
" <td>-0.413284</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>-1.169262</td>\n",
" <td>-0.813473</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"second one two\n",
"first \n",
"bar A -0.672399 0.656727\n",
" B -0.583972 -0.070697\n",
"baz A 0.935536 -0.413284\n",
" B -1.169262 -0.813473"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 두 번째 수준을 해제 --> one과 two라는 column이 생김\n",
"stacked.unstack(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pivot Tables\n",
"- pivot_table(data, values = None, index = None, columns = None, aggfunc = 'mean', fill_value = None, margins = False, margins_name = 'All')\n",
" - data: 분석할 데이터 프레임 (메서드일 때는 필요하지 않음)\n",
" - values: 분석할 데이터 프레임에서 분석할 열\n",
" - index: 행 인덱스로 들어갈 키 열 또는 키 열의 리스트\n",
" - columns: 열 인덱스로 들어갈 키 열 또는 키 열의 리스트\n",
" - aggfunc: 분석 메서드 설정\n",
" - fill_value: NaN 대체 값 설정\n",
" - margins: 모든 데이터를 분석한 결과를 오른쪽과 아래에 붙일지 여부\n",
" - margins_name: 마진 열(행)의 이름\n",
"- Pivot Table 관련 참고 사이트: https://datascienceschool.net/view-notebook/76dcd63bba2c4959af15bec41b197e7c/"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:23.972010Z",
"start_time": "2020-05-21T06:01:23.961042Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"( A B C D E\n",
" 0 one A foo -0.447158 -0.016947\n",
" 1 one B foo -0.011032 -0.747219\n",
" 2 two C foo 0.160751 0.157437\n",
" 3 three A bar 1.808888 0.753999\n",
" 4 one B bar -0.003062 1.526580\n",
" 5 one C bar -1.231371 0.055387\n",
" 6 two A foo -0.433729 -1.438124\n",
" 7 three B foo -0.225208 1.376454\n",
" 8 one C foo -1.132540 -0.100790\n",
" 9 one A bar -0.692918 -0.311524\n",
" 10 two B bar -0.579238 -0.803641\n",
" 11 three C bar 0.344959 0.238237,)"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,\n",
" 'B': ['A', 'B', 'C'] * 4,\n",
" 'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,\n",
" 'D': np.random.randn(12),\n",
" 'E': np.random.randn(12)})\n",
"df, "
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:24.135336Z",
"start_time": "2020-05-21T06:01:24.109404Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>C</th>\n",
" <th>bar</th>\n",
" <th>foo</th>\n",
" </tr>\n",
" <tr>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">one</th>\n",
" <th>A</th>\n",
" <td>-0.692918</td>\n",
" <td>-0.447158</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>-0.003062</td>\n",
" <td>-0.011032</td>\n",
" </tr>\n",
" <tr>\n",
" <th>C</th>\n",
" <td>-1.231371</td>\n",
" <td>-1.132540</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">three</th>\n",
" <th>A</th>\n",
" <td>1.808888</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>NaN</td>\n",
" <td>-0.225208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>C</th>\n",
" <td>0.344959</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">two</th>\n",
" <th>A</th>\n",
" <td>NaN</td>\n",
" <td>-0.433729</td>\n",
" </tr>\n",
" <tr>\n",
" <th>B</th>\n",
" <td>-0.579238</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>C</th>\n",
" <td>NaN</td>\n",
" <td>0.160751</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"C bar foo\n",
"A B \n",
"one A -0.692918 -0.447158\n",
" B -0.003062 -0.011032\n",
" C -1.231371 -1.132540\n",
"three A 1.808888 NaN\n",
" B NaN -0.225208\n",
" C 0.344959 NaN\n",
"two A NaN -0.433729\n",
" B -0.579238 NaN\n",
" C NaN 0.160751"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 피벗 테이블 기능을 이용하여 위의 데이터 프레임을 변형\n",
"# 찾기 못한 값은 NaN 으로 표시\n",
"pd.pivot_table(df, values = 'D', index = ['A', 'B'], columns = ['C'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 시계열 데이터 다루기(Time Series)\n",
"- Pandas는 시계열 단위인 주기(frequency)를 다시 샘플링할 수 있다\n",
"- 특히 금융 데이터를 다룰 때 많이 사용한다\n",
"- Time Series 관련 참고 사이트: https://datascienceschool.net/view-notebook/8959673a97214e8fafdb159f254185e9/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### pd.to_datetime() 함수\n",
"- 날짜/시간을 나타내는 문자열을 자동으로 datetime 자료형으로 바꾼 후, DatetimeIndex 자료형 index를 생성"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:24.555618Z",
"start_time": "2020-05-21T06:01:24.549633Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2018-01-01', '2018-01-04', '2018-01-05', '2018-01-06'], dtype='datetime64[ns]', freq=None)"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"date_str = [\"2018, 1, 1\", \"2018, 1, 4\", \"2018, 1, 5\", \"2018, 1, 6\"]\n",
"idx = pd.to_datetime(date_str)\n",
"idx"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:24.708900Z",
"start_time": "2020-05-21T06:01:24.698925Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2018-01-01 1.764052\n",
"2018-01-04 0.400157\n",
"2018-01-05 0.978738\n",
"2018-01-06 2.240893\n",
"dtype: float64"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 위에서 만들어진 index를 사용하여, Series나 DataFrame을 생성\n",
"np.random.seed(0)\n",
"s = pd.Series(np.random.randn(4), index = idx)\n",
"s"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### pd.date_range() 함수\n",
"- 모든 날짜/시간을 일일히 입력할 필용없이, 시작일과 종료일 또는 시작일과 기간을 입력하면 범위 내의 index를 생성한다\n",
"- freq 인수로 특정한 날짜만 생성되도록 할 수도 있으며, 많이 사용되는 freq 인수값은 다음과 같다\n",
" - s: 초\n",
" - T: 분\n",
" - H: 시간\n",
" - D: 일(day)\n",
" - B: 주말이 아닌 평일\n",
" - W: 주(일요일)\n",
" - W-MON: 주(월요일)\n",
" - M: 각 달(month)의 마지막 날\n",
" - MS: 각 달의 첫날\n",
" - BM: 주말이 아닌 평일 중에서 각 달의 마지막 날\n",
" - BMS: 주말이 아닌 평일 중에서 각 달의 첫날\n",
" - WOM-2THU: 각 달의 두번째 목요일\n",
" - Q-JAN: 각 분기의 첫달의 마지막 날\n",
" - Q-DEC: 각 분기의 마지막 달의 마지막 날"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:25.034569Z",
"start_time": "2020-05-21T06:01:25.027592Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2018-04-01', '2018-04-02', '2018-04-03', '2018-04-04',\n",
" '2018-04-05', '2018-04-06', '2018-04-07', '2018-04-08',\n",
" '2018-04-09', '2018-04-10', '2018-04-11', '2018-04-12',\n",
" '2018-04-13', '2018-04-14', '2018-04-15', '2018-04-16',\n",
" '2018-04-17', '2018-04-18', '2018-04-19', '2018-04-20',\n",
" '2018-04-21', '2018-04-22', '2018-04-23', '2018-04-24',\n",
" '2018-04-25', '2018-04-26', '2018-04-27', '2018-04-28',\n",
" '2018-04-29', '2018-04-30'],\n",
" dtype='datetime64[ns]', freq='D')"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.date_range(\"2018-4-1\", \"2018-4-30\")"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:25.183467Z",
"start_time": "2020-05-21T06:01:25.176521Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2018-04-01', '2018-04-02', '2018-04-03', '2018-04-04',\n",
" '2018-04-05', '2018-04-06', '2018-04-07', '2018-04-08',\n",
" '2018-04-09', '2018-04-10', '2018-04-11', '2018-04-12',\n",
" '2018-04-13', '2018-04-14', '2018-04-15', '2018-04-16',\n",
" '2018-04-17', '2018-04-18', '2018-04-19', '2018-04-20',\n",
" '2018-04-21', '2018-04-22', '2018-04-23', '2018-04-24',\n",
" '2018-04-25', '2018-04-26', '2018-04-27', '2018-04-28',\n",
" '2018-04-29', '2018-04-30'],\n",
" dtype='datetime64[ns]', freq='D')"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.date_range(start = \"2018-4-1\", periods = 30)"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:25.368631Z",
"start_time": "2020-05-21T06:01:25.361653Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2018-04-02', '2018-04-03', '2018-04-04', '2018-04-05',\n",
" '2018-04-06', '2018-04-09', '2018-04-10', '2018-04-11',\n",
" '2018-04-12', '2018-04-13', '2018-04-16', '2018-04-17',\n",
" '2018-04-18', '2018-04-19', '2018-04-20', '2018-04-23',\n",
" '2018-04-24', '2018-04-25', '2018-04-26', '2018-04-27',\n",
" '2018-04-30'],\n",
" dtype='datetime64[ns]', freq='B')"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.date_range(\"2018-4-1\", \"2018-4-30\", freq = \"B\") # freq=\"B\" 는 \"주말이 아닌 평일\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### shift 연산\n",
"- 시계열 데이터의 index는 시간이나 날짜를 나태나므로, 날짜 이동 등의 다양한 연산이 가능하다\n",
"- shift 연산을 사용하면, index는 그대로 두고 데이터만 이동할 수 있다"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:25.662270Z",
"start_time": "2020-05-21T06:01:25.653286Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2018-01-31 1.764052\n",
"2018-02-28 0.400157\n",
"2018-03-31 0.978738\n",
"2018-04-30 2.240893\n",
"Freq: M, dtype: float64"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.seed(0)\n",
"ts = pd.Series(np.random.randn(4), \n",
" index = pd.date_range(\"2018-1-1\", periods = 4, freq = \"M\")) # freq = \"M\" 는 \"각 달(month)의 마지막 날\"\n",
"ts"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:25.847953Z",
"start_time": "2020-05-21T06:01:25.837976Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2018-01-31 NaN\n",
"2018-02-28 1.764052\n",
"2018-03-31 0.400157\n",
"2018-04-30 0.978738\n",
"Freq: M, dtype: float64"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ts.shift(1)"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:25.994287Z",
"start_time": "2020-05-21T06:01:25.984311Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2018-01-31 0.400157\n",
"2018-02-28 0.978738\n",
"2018-03-31 2.240893\n",
"2018-04-30 NaN\n",
"Freq: M, dtype: float64"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# shift 인자로 '-1'을 입력하면, 맨 아래의 데이터를 기준으로 이동\n",
"ts.shift(-1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### resample 연산\n",
"- 시간 간격을 재조정하는 리샘플링이 가능\n",
"- 시간 구간이 작아지면 데이터 양이 증가한다 --> \"업-샘플링\"\n",
"- 시간 구간이 커지면 데이터 양이 감소한다 --> \"다운-샘플링\""
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:26.325538Z",
"start_time": "2020-05-21T06:01:26.316565Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2018-01-01 1.867558\n",
"2018-01-02 -0.977278\n",
"2018-01-03 0.950088\n",
"2018-01-04 -0.151357\n",
"2018-01-05 -0.103219\n",
"2018-01-06 0.410599\n",
"2018-01-07 0.144044\n",
"2018-01-08 1.454274\n",
"2018-01-09 0.761038\n",
"2018-01-10 0.121675\n",
"Freq: D, dtype: float64"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ts = pd.Series(np.random.randn(100), \n",
" index = pd.date_range(\"2018-1-1\", periods = 100, freq = \"D\"))\n",
"ts.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:26.482529Z",
"start_time": "2020-05-21T06:01:26.467536Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2018-01-07 0.305776\n",
"2018-01-14 0.629064\n",
"2018-01-21 -0.006910\n",
"2018-01-28 0.277065\n",
"2018-02-04 -0.144972\n",
"2018-02-11 -0.496299\n",
"2018-02-18 -0.474473\n",
"2018-02-25 -0.201222\n",
"2018-03-04 -0.775142\n",
"2018-03-11 0.052868\n",
"2018-03-18 -0.450379\n",
"2018-03-25 0.601892\n",
"2018-04-01 0.334893\n",
"2018-04-08 0.509605\n",
"2018-04-15 -0.150544\n",
"Freq: W-SUN, dtype: float64"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 다운-샘플링의 경우, 원래의 데이터가 그룹으로 묶이기 때문에\n",
"# groupby 때와 같이 그룹 연산을 해서 대표값을 구해야 한다\n",
"ts.resample('W').mean()"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:26.621250Z",
"start_time": "2020-05-21T06:01:26.611276Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2018-01-31 1.867558\n",
"2018-02-28 0.156349\n",
"2018-03-31 -1.726283\n",
"2018-04-30 0.356366\n",
"Freq: M, dtype: float64"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 달(month) 단위 구간 별로 각 column의 첫 번째 값(first value) 구하기\n",
"ts.resample('M').first()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 범주형 데이터 다루기(Categoricals)"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:26.956093Z",
"start_time": "2020-05-21T06:01:26.943131Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>raw_grade</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>a</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>b</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>b</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>a</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>a</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>e</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id raw_grade\n",
"0 1 a\n",
"1 2 b\n",
"2 3 b\n",
"3 4 a\n",
"4 5 a\n",
"5 6 e"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({\"id\": [1, 2, 3, 4, 5, 6],\n",
" \"raw_grade\": ['a', 'b', 'b', 'a', 'a', 'e']})\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:27.139492Z",
"start_time": "2020-05-21T06:01:27.128523Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 a\n",
"1 b\n",
"2 b\n",
"3 a\n",
"4 a\n",
"5 e\n",
"Name: grade, dtype: category\n",
"Categories (3, object): [a, b, e]"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 단순한 문자로 되어있는 raw 데이터의 grade column을 범주형으로 변환\n",
"df['grade'] = df['raw_grade'].astype('category')\n",
"df['grade']"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:27.281582Z",
"start_time": "2020-05-21T06:01:27.273588Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 very good\n",
"1 good\n",
"2 good\n",
"3 very good\n",
"4 very good\n",
"5 very bad\n",
"Name: grade, dtype: category\n",
"Categories (3, object): [very good, good, very bad]"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 범주들의 이름을 원하는대로 바꿔주기\n",
"# Series.cat.categories 에 이름들을 할당\n",
"df['grade'].cat.categories = [\"very good\", \"good\", \"very bad\"]\n",
"df['grade']"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:27.430592Z",
"start_time": "2020-05-21T06:01:27.419676Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 very good\n",
"1 good\n",
"2 good\n",
"3 very good\n",
"4 very good\n",
"5 very bad\n",
"Name: grade, dtype: category\n",
"Categories (5, object): [very bad, bad, medium, good, very good]"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Series.cat 아래의 메소드들은 기본적으로 새로운 시리즈를 반환\n",
"df[\"grade\"] = df[\"grade\"].cat.set_categories([\"very bad\", \"bad\", \"medium\",\n",
" \"good\", \"very good\"])\n",
"\n",
"df[\"grade\"]"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:27.579325Z",
"start_time": "2020-05-21T06:01:27.567357Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>raw_grade</th>\n",
" <th>grade</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>e</td>\n",
" <td>very bad</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>b</td>\n",
" <td>good</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>b</td>\n",
" <td>good</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>a</td>\n",
" <td>very good</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>a</td>\n",
" <td>very good</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>a</td>\n",
" <td>very good</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id raw_grade grade\n",
"5 6 e very bad\n",
"1 2 b good\n",
"2 3 b good\n",
"0 1 a very good\n",
"3 4 a very good\n",
"4 5 a very good"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 범주의 순서를 정렬하기\n",
"\n",
"# 범주 이름의 어휘적 순서가 아닌, 범주에 이미 매겨진 값의 순서대로 정렬\n",
"# 즉, 범주형 자료를 만들거나 범주들을 재정의할 때 이루어진 순서가 범주에 매겨진 값이다\n",
"df.sort_values(by = 'grade')"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:27.699004Z",
"start_time": "2020-05-21T06:01:27.690056Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"grade\n",
"very bad 1\n",
"bad 0\n",
"medium 0\n",
"good 2\n",
"very good 3\n",
"dtype: int64"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 범주형 자료를 담고있는 column을 그룹으로 묶고, 각 범주에 해당하는 값의 빈도수 확인\n",
"# 이 과정을 통해 비어있는 범주가 무엇인지 알 수 있다\n",
"df.groupby('grade').size()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 그래프로 표현하기(Plotting)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### plot() 메소드"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:28.397747Z",
"start_time": "2020-05-21T06:01:28.118901Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x293d4e99788>"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAEECAYAAADNv0QiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3dd5hU1fnA8e+Z2d47LCyw9F5FRBQVxQYqGhOjRkWNmhh/MaZoNJaYRA0aNbEkMfYSa6LGhqggIlV6r7vUZZdtbO+7c35/3DuzM+xsn7az7+d59mHm1rNceOfMKe9RWmuEEEIEJ4u/CyCEEMJ7JMgLIUQQkyAvhBBBTIK8EEIEMQnyQggRxCTICyFEEAvxdwGcpaSk6MzMTH8XQwghepQNGzYUaa1T3e0LqCCfmZnJ+vXr/V0MIYToUZRSh1rbJ801QggRxCTICyFEEJMgL4QQQUyCvBBCBDEJ8kIIEcQkyAshRBALiiDf2GTzdxGEECIg9fgg/9yybIbd+zlznlrO22sP+7s4QggRUHp8kF/w+W4AduaVc88H29weU9vQRG1Dky+LJYQQAaFHB/mqusYW2+oaWwbzc55Yxuwnl/miSEIIEVACKq1BZ+WW1rTYNvK+RTx15SSq65vonxDJzOEpHDWPa2yyEWLt0Z9rQgjRKT0q4u3MLcdmM9akraxrpKymAYDk6DCX4/706S7u+WAb1728lkXbjzm23/Xfrb4rrBBCBIAeU5NftreQ+S+vxaLg0kn9+WDTUa6aNgCAx6+YyI6jZTz+5V4AiirrHOfd+uZGx+sPNh3lyR9O8m3BhRDCj3pMTX7p7gIAbNoI1gCLdxnbBiZF8X9nD+fxH0x0e25EaPOvWez0ASCEEMGuxwT5rTmlLbYVVhgBOzbC+ELy/ZMyuHxKRovjTh+WynPXnATA9txyL5ZSCCECS8AH+Sab5qUVB9iXX8kVU5sDeJ+4cMfruIhQx+uBSVEAxISHMKpvLABzxvfltGHJWC2KxxbtRmvdqTJ09nghhAgUAR3ktdY89sVu/vTpTirqGkmKDmfxr87gvrmjmZCRAMDQ1GgiQq2Oc6YNTgKMjtn6RmMm7KDkKGIjQpmWmcSO3HLW7D/e4TJsOHSckfct4oONOR78zYQQwjcCOsh/uOko/1q23/E+KTqUYWmx3DRzCAVmU82c8eku55ycmeh43T8xEoC+8cafL86fCsCmIyVu76e15p/fZHOgqAqtNbmlNew+VkF9k41PtuR67hcTQggfCejRNSuzil3eV9c3T3SaPiSJLUdK+fHpg12OCbFaeOG6qUSHWRmVHseKrCL6JxhBPjo8hLiIEI6V1bq9X3ZhFY8u2s1XO4/x/ZMG8LsPtzF3gvEhUigdtkKIHiiga/J5Zc2TnWYOT+Gyyf0d739z3kjW3TubhKiwFuedO6YPM4alkBQdxiUT+7nsS4+PJK+VIL/psFHDD7FYWL3f+ID5bGseANuPlnPOE9906/cRQghfC9iafHFlHauyiwm1Kl69YRqnDUtx2R9qtZAaG97K2a0bmBxFVkGl230bzSAfFxnC9qNlLfZnF1Z1+n5CCOFPAVuTf/CTnQA8fOn4FgG+O6YMTORAURVl1Q0u21dlF/H22iOAMf7+QFFzQB+SGk2a+YHyj2+yJLWxEKLHCNggn282qVw2pX87R3ZOenwEAEVVrm3sV7/wXYtjZ49OA2DSgAR+fs5wAB5btIdV2cUtjhVCiEAUsEH+QHEV3z8pg1APJxSLjzTG1Nvz3pzoulMHOV4/evkEFnxvPHdfOIopAxNIMnPkfLpVRtoIIXqGgGyTf25ZNoUVdQxIjPL4tePMIP/t3kKmDDSGW9psmjCrhTNHpvK7OaP58emD6ZcQSajVwpXTBgKQFhvBxvvP5f/e2sjyfUUeL5cQQnhDQNbkP9xo5KY5Y4Tn2uLt7DX5vy3e59hWXttAfZON6UOSiQi1Mig5utVvEJnJ0RRU1FFR28CLy/fTZJPZsEKIwBVwNfnahib25FeQEhPO5IGJ7Z/QSQlRoS22lZqdsIlu9p2oT1w4TTbN3R9s47OteWQkRnLBuPR2zxNCCH8IuJr8x+bM0iIvTT5KiQlnykAjJUKDOUqm1Gyft9fy25Jh5sb5dm8hYKRAltw2QohAFXBBvqSq3uv3uGKqkYd+VXYx9Y02Ryesu1r+iaZlGrlxKmqNpQffXnuEpXsKvFRSIYTonoAL8vYl/Z65arLX7tHHHEY5/+W1PPDRdkqrjQ+WjtTko8NbtnBtOdJy4pQQQgSCgAvyRVX1DE6J5uIT0hF4kn2sPMDSPQXklBgfLGlxEa2d4uL9W091ef/OusMs3V3A/8zFTIQQIlAEXJA/XlnfYs1WT8twGpqZX17Hi8v3k5EY6ZKXvi2TBxgdwhGhFh65bDwFFXXc8Oo67nh3s2MNWiGECAQBN7qmuKqOzORor94jJjyEzOQoDhZXA1BS3cBUs629IywWxf5H5qAUKKVYsiufJebyhEdKqhnk5fILIURHBV5Nvqqe5JjOJx7rrNdvPIXrZ2Q63ttXlOooi0WhlAJgaFqMY/vh49UM/d1CFny+2yPlFEKI7gjMIO/l5howslE+eMlYx71+dtbQLl/LYgZ7gLvf30aTTfPcsuxOX6e4so6XVxwg8+7P+KOZoE0IIbojoJprGm0am4bkGO8Hebslvz6TJpvu1reHmvpGx+ujpTVtHNk6rTUnPbTY8f7llQe49tRBDE6Rph8hRNcFVE2+qcnotEzyQU3eLiEqrNvNQ1edMpDYiJaflw0dTEm8bG8h5zyxrMX2za0sUyiEEB3l9Zq8UuoC4CnACryotV7Q2rFVZo04xQdt8p40qm8c2x48H4Dl+wr5x9JsVu8vpqiyjnRzfdm2zH95rdvtx8pkyUEhRPd4tSavlLICfwcuBMYAVymlxrR2vL2pw5fNNZ42c3iqY93Z/PL2g7RzSoQwMyla37gI4iJC2JVXziXPriCroMI7hRVCBD1vN9dMA7K01vu11vXAO8C89k7yZXONN/QxJ1Xll7tfS9bZIwt3OV7PGJbMit/OYtEdM0mPj+TjLblszSnjqSVZXiurECK4eTvI9weOOL3PMbc5KKVuUUqtV0qtt29LcrM4d0/SJ85obvrJGxvaPfaF5Qccrx++bDwZiVEkRIXRYGtuz7cod2cKIUT7vB3k3YUnlymhWuvntdZTtdZTAc4d04cQD68G5WvOHbntdb4OSm4en98/obn9/pA5UQugsrYRIYToCm9H0xxggNP7DKDVtfMU8Py1J3m5SN5ntSiiw6wAHCtru8mmtqEJgEcuG++y/c+XjScjMZKLJqSzPbdM0hkLIbrE20F+HTBcKTVYKRUGXAl83NrBiVFhjlmkPd3L158MwN781jtN9+ZXkF9ex/0XjeHqUwa67Lvi5AGs+O3ZnDYshfzyOk5/dCnV9VKjF0J0jleDvNa6Efg/4AtgF/Ce1npHa8f3T2x/uGFPMXFAAuEhFlZmFbvdr7XmvL9+C8DYfnGtXueSif2YPTqNo6U1XPX8GkmAJoToFK83fmutF2qtR2ith2qtH/b2/QJFRKiVaYOTePO7QxS7WeUqq6DS8TqjjQ+36PAQXpx/Mj87ayhbcso4fNxoq/9gYw5f7873fMGFEEGlZ/dwBriLJqRT12hje255i33vb2zOPd+3A3nsLzTXkd1hXutX723hxlfXk1NS3dZpQoheToK8F9nTFx8squLN7w6x/WjzClLLzDViH7x4TIdGE43oa2S6vO2tjY5zAR5dtMeTRRZCBJmASlAWbBLN8f6//7i5G+L9W2dQXd/IrjyjRn79aYM7dK3wEKvjtXMaBBlDL4Roi9TkvcjdmrGX/3MVn27J69L13rr5lBbb6ho6lgRNCNE7SZD3IqtF0TcugikDE1y2v7vemAT8jx9N6dT1hqbGuLyfOiiRspqG7hVSCBHUJMh72dLfnMV/fzqjxfYhqdHMGZ/eqWulxYYTYrbP3Hn+SGIjQqisM8bOy2QpIYQ7EuS9LDLMisWp4fzCcX0BuGBs305fSylF33hjJE5UmJWYiFAq6xp5Z+1hBt+zkIpaqdULIVxJkPex0enGxKeu1rt/fd4IwMh5ExMewoGiKl5Yvh+Ab/cWeaKIQoggIkHeR84YkQp0fzTMZZMz2HDfbGaNTGNnrjEkM7uwCoD1h4537+JCiKAjQyh95KX5U6lrtJFfXsvjX+5l3qR+Xb6WPctldLjr47MPyxRCCDupyftIqNVCTHgIQ1NjOLhgLqP6tp6vpqOevmqyy/uuLiIuhAheEuR7sBPXws0rrWVbThm//2i7JDITQgAS5INGbEQIjTbNxc+u4LXVh9hfVNn+SUKIoCdBvofLevhCfjl7BAtvn0motblXd/aT37a7KpUQIvhJkO/hQqwWfjF7OAOSorj1rGEu+zYfKfVTqYQQgUKCfBC545zhrLnnHMf7jYdK/FgaIUQgkCAfRCyW5hmxAOsOyrh5IXo7CfJB6E+XjkMpWHewBJtNU9vQxPsbcjheVe/vogkhfEwmQwWha6cPAq25/6MdFFbW8fO3NrH24HFmDE3mrZun+7t4Qggfkpp8kLLPit2ZV85as9lmVXYxu4+VU9fYxKpsyXMjRG8gQT5I2VelWrTtGAD/NHPXf727gLv+u5WrX/hOZsgK0QtIkA9SyTFGkLcvUHLKkGQAHlu0h4825wLwxfZj/imcEMJnJMgHqT5xzaNs5k5IJyk6rMUxf/x0J4eLq31ZLCGEj0mQD1LxkaEocwLsxROMjJf3zR3d4rjj1Z0bcdPQZGPR9jxZiUqIHkKCfBC7fkYmAFMGGWvM/uCkAYCxdKDd+k6Opf/zwt389N8bWXdQJloJ0RNIkA9i980dw7d3ziIt1mi6iY8KZccfzudnZw1lya/PBOChz3Z16ppL9xQAUNPQ5NnCCiG8QoJ8ELNaFAOTo1y2RYeHoJRiQGJUK2e1zb6O7E2vrePsx7/pbhGFEF4mQb6XCguxcO30QSRGhXbqvNoGI7NlQ5Nmf1GVN4omhPAgCfK9WHxkKOW1jR3uRNVaSzONED2MBPleLC4yhCabprKusUPHl9c00nTCilONkrNeiIAmQb4XSzBnxXY0cdmx8loAnrlqMvdfNAaAqrq2a/bvrT/C1hzJay+Ev0iQ78UGJhmdr4c6OCEqr8xIg9A3PoLYcCO3XUVdQ5vn3PXfrVzy7MpOlevnb2/iRy+ukbH4QniABPlebEhKNAAPfLS9Q8fnmzX5vnERxEQYQb6wog6AlVlFXP3CGvLLa8kqqKSspoH6xq415XyyJZeVWcXkltV26XwhRDMJ8r1Ympn6oO6EYLw1p5QV+1pmqTxWZgT0PnERDDY/IC77xyq01jz51V5WZRfz+492MPvJZdz67w2UOs2mPbEtvyOW7Slk1uPf8PXu/E6fK4QwSJDv5W48bTDlNa5NLpc8u5JrXvrOUUu3yyqsJDk6jLAQCyP7xDq23/HuZjaYSw0u2mEkPVuzv5iS6ubr2sfXt8fm9GHw9trDHCiq4sZX13fulxJCOEiQ7+VSYsOoqm+iqq6RhiabSzv4yQ8vdqQjzi+v5fNtecwYlgIYSw1+8LMZAI6sls7iIkPJdUplXFrdQE19EwXlbTfBVNU3j/TZdrTM8frIcUmkJkRXdCvIK6X+opTarZTaqpT6UCmV4LTvHqVUllJqj1Lq/O4XVXhDqrm4yI9e/I6z/vJNi5E2py34mheX7+fr3QU02jS3nz3Msa+vU6ZLML4VgNGhW1rdwKOLdjv2XfGv1fzk3xuY9siSNjtUWxvOuemIjNARoiu6W5P/ChintZ4A7AXuAVBKjQGuBMYCFwD/UEpZu3kv4QWZZtv65iOlHC2t4Y01h1oc87fF+9hypJSEqFCGpcU4tqfGhjteTx6YwJ3nj+SVG07mb1dOAmD3sQq+N6U/AAUVdXy7txCAPKcO1YYmG88ty6barMFX1Bp/3jZrKDOHp/Ch+W3hWFkNH27KoaZeJmMJ0RndWuNVa/2l09s1wPfN1/OAd7TWdcABpVQWMA1Y3Z37Cc+zj7CxW7qnsMUxlXWNbDtaxqDkaJQ9fzEQarXwx3ljGZAYxYxhyYSHWJk1Ms3l3OtnZHLT6UOY8/Ryx7a9+RX0S4gE4OPNuSz4fDcvLj/AyrtnOUbwzByeyp3nj0JrTWSolY8257Ijt5wtM8o4OTOJuz/YypJfnenoPBZCuOfJNvkbgc/N1/2BI077csxtIsA4LyaSFB3GNnPi0pe/PMPluB255aS4WXjkulMzmTUqjfAQ91/URvSJdYzEsduXX+l4XVRZ5/jz5tc38PgXewDISDQ+BJRS9IkLZ29+BQA5JTV8ujWXitpGRyevEKJ17dbklVKLgb5udt2rtf7IPOZeoBF4036am+PdNsQqpW4BbgEYOHBgB4osPMm5Zp4eH8HxqnqsFsXglGgW3TGTqrpGLv+n8QUsMqzjLW4vXjeVTUdKiAg1zokItTiSmzknNjvoNBHL3pwDru39aXERjuNCLIqoMOOfbVZB84eFEMK9doO81np2W/uVUvOBi4BzdHOPWg4wwOmwDKDlEAzj+s8DzwNMnTpVpjj6UZ+4CHbklpORGEmo1cKovnEuwyidPxDaM3tMH2aP6eN4/7OzhvHkV3sBXMbPZxVUuD0/xNr8JdN5KcO6xibqGo12+YOydKEQ7eru6JoLgN8Cl2itnf/HfQxcqZQKV0oNBoYDa7tzL+E9j10+gfvmjuaCccYXNufOzWSziWZMepzb5QM76vZzhjtWpCo3x8zXNjSxI7ecq6a1/Q0uzamD93h1gyO4F1XUsS2nzNGUI4RoqVsdr8CzQDjwlVnLW6O1/qnWeodS6j1gJ0Yzzm1aaxkWEaCuONn40nXAbEYJsTTX2C0WxcEFcz1yn9tmDWPDoRIKKozO1eteWkt1fRNnj0rj7bWHAUiJCeeSif1czusT1xzktzgNpSyuquPiZ1cAeKyMQgSb7o6uGdbGvoeBh7tzfeFbg5KiuO7UQVw+JcOr99l+tJyymgbWmuvLDk1t7phdd+85LZqF+rgZQTNzeAqrs4u9Wk4hgkF3a/IiiFgsij/OG+fVexwsNr4t/G/TUce2/omRDEmNZlpmktt2f/satZGhVseiJdOHJLPcTX4dIYQrSWsgfOov358AwB6zHX1aZhLhIVa+/vVZLLh8gttzhqZFExFq4QdTm79hTBmY6HLM59vyvFRiIXo2CfLCp0aYic225Rh5aR66rP1vDmmxEWx/8Hx+cc5w+sVH8LcfTmJ8RrzLMbe+uZE3Vh/0dHGF6PEkyAufigkPIcSiHMnH+pszX9sTYrWQHBPOqnvO4dLJ/YkJD+HCca7TN+7/aIfHyytETydBXviUUopEc1hmfGQo0eFd7xZ64oqJvHXzKZ4qmhBBSYK88LmJGUay0iQ3aRI6IyoshBlDU1y21TbISF0hnEmQFz536WRjHPwBp/QGnrLxcInHrylETyZBXvjciZkqu+vdW6bzxo+nERcRwp8X7ubLHce4/J+ryLz7s1bz0wvRW8g4eeFz0eEhXD4lg1OHJnvkeqcMMa7z+4vH8uv/bOGWNzY49hWU1/Ly1jye/GovWx88j7iIUI/cU4ieQoK88Isnrpjo8Wt+b0p/Pt2a65IT/7ll2by3PgeA/LJaCfKi15HmGhE0lFLcef4ol232AA9QLatKiV5IgrwIKicuUOLMnv1SiN5EgrwIKpFhVj79+ek8d80Ux7abZxoLjJfXSCes6H0kyIugM65/PCdnJgHwkzOG8OPThwBw53+3cNEzy1lvZr8UojeQjlcRlJJjwll9z9mkxUZgX7Csur6J7UfLWbjtGDklNdzx7mY2P3AuCVHdm5QlRCCTmrwIWunxkVgtihCrhetOHeTY3mSzcdf7WwGY9MeveHH5fn8VUQivkyAveoW+8c0LjyzPKqK+0eZ4/9Bnu/xRJCF8QoK86BUGJTWPutlf2DKdgoy8EcFKgrzoFYb3iXF5P+KE9xMe/JKjpTW+LJIQPiEdr6JXGJ4Ww5+/N56DRVX869v93DZrGPnltVTWNvL011kA5Byv7nB+eyF6CgnyoldQSnHVtIHUNjQxe0wfxxDLwoo6R5CvljTFIghJc43oVSJCrY4AD5ASE8ZJg4z1Ykur6/1VLCG8RoK86NWUUrw0fyoAx6uk81UEHwnyoteLiwglNiKErIJKfxdFCI+TIC96PYtFMWlAAm+vPUxZjdTmRXCRIC8EkGyuN/vjV9f5uSRCeJYEeSGAyjpjZM36QyV8siXXz6URwnMkyAsB3HXBSMfrn7+9iSpZG1YECQnyQgAj+sSy7cHzHO8PH6/2Y2mE8BwJ8kKYYiNCee3GaQBU10tNXgQHCfJCOIkOswJQVSezX0VwkCAvhJPocCPTh7TJi2AhQV4IJ9FhRpC/9c2NbD9a5ufSCNF9EuSFcBIdbnW8vuiZFayT9WBFDydBXggn9uYau5tfX++nkgjhGRLkhXASEWrlF+cMd7wvrW5gf6HktBE9l0eCvFLqN0oprZRKMd8rpdTTSqkspdRWpdQUT9xHCF/45bkj+PbOWTx6+XgAdh+r8HOJhOi6bgd5pdQA4FzgsNPmC4Hh5s8twD+7ex8hfGlgchQXT+yHRcG3ewv9XRwhuswTNfm/AncB2mnbPOB1bVgDJCil0j1wLyF8JioshLNH9WGZBHnRg3UryCulLgGOaq23nLCrP3DE6X2OuU2IHmVYWgx5ZbXYbLr9g4UIQO0GeaXUYqXUdjc/84B7gQfcneZmm9v/JUqpW5RS65VS6wsLpcYkAktshDHa5uGFu/xcEiG6pt0gr7WerbUed+IPsB8YDGxRSh0EMoCNSqm+GDX3AU6XyQDc5m/VWj+vtZ6qtZ6ampra3d9HCI+y55l/acUBqc2LHqnLzTVa621a6zStdabWOhMjsE/RWh8DPgauM0fZTAfKtNZ5nimyEL7zg6nNdZUtOaV+LIkQXeOtcfILMWr6WcALwM+8dB8hvMpqUSz4njGUMr+8zs+lEaLzQto/pGPM2rz9tQZu89S1hfCn2WP6wAfbyC+v9XdRhOg0mfEqRDuSosIID7FwsLiqU+c1SRu+CAAS5IVoh8WimDIwkU+35lFT37E88xsPlzD0dwvZcOg4JVX1ktFS+I0EeSE64NLJ/SisqGNHbseC9Td7jOHAS3YVMPXhxVz0zApvFk+IVkmQF6IDxvWPB6CosoOdr9poqmmyaUezzaLtx1izv9gr5RM9k82muf3tTYy493Pyymq8cg8J8kJ0QGpMOABZBUZGyoraBh5btJu6RvfNNwUVxofBv77d79j2039v4Mrn16C1tNULw76CSj7ekkt9k40bXlnn2L4tp4zGJptH7iFBXogOSDInRT3+5V6abJqnl+zjH99k89Fmt3P8OFraeq2spLrBK2UUPUdjkw2bTVNR2/xvYfexCrbmlLIqq4iLn13B66sPeeReEuSF6IAQq4U54/sCsDO3nFIzUD+2aE+LYwsqalm+r6jVa3nra7noGbTWnPLIEn79ny1UndCR/98NOTzw8Q4ADh+v9sj9JMgL0UE/OWMoAMfKa2kwv0oXVdaRVVDJjtwyR61s/cESAMab7fiXTTZy8502LBmALUdkpE1vtiO3nOKqej7cdJRdeeUAXDDWqEC8vvqQo0mww/0/7fDYZCghgl1anNEuf6CokpXZzR2oP/zXaoqr6gH47PbTySszJk29duM0osKshIdYuGRiP84YkcqZf1nKiqxCrj5loO9/AREQvtqZ73j96Vajue93c0azaMcxAO6dM5qvdxc4/h11lwR5ITooJSYcpeCRhbsBeOrKSezILed5p87VuU+vYGy/OMJCLCRGhaKUkZB11qg0wEhdfKjYM1/DRc90+Hg1/RMiqWloYvtRoyYfFW7l61+fSUSolX4JkezMK2ftAc8sIi/NNUJ0UKjV9b/LvEn9uXBc3xbH7cgtZ1BSlCPAO8tMjuZQcbWMsOnFjpbW0D8hklF9Yx3bosNCGJIaQ7+ESAD6xEVwtLSG6Y8s4SdvdG8xeQnyQnSCPTa/duM0APonRjr2bX3wPMfrRHM0zokyk6OorGt0NO+I3ievrIZ+CRGMTo8DQCmICHUNxWeMSAGM/p8vduS3uEZnSJAXogvG9jP+g6ZEhzu2xUWE8r/bTgPgR620uQ9KiQbgUCfz4Ijg0GTTHCurJd2pJq81Lb71nZyZ5PK+O2PmJcgL0Qk/OmUgKTFhpJiToywWxVs3ncKqu88GYNKABLY8cB7zJrlf7TIz2Qjy17+yjnfWHpacNkHkQFEVZe3MgSiqrKOhSdMvIdIxi7pffESL405sGvxg41Gq6xu7VC4J8kJ0wsOXjWf9fee6bJsxLMXRlgoQHxXa6vn9zeMqahu5+4NtktMmiMx6/Bsufrbt55ldaAyPHJAYyej0OP5wyVge+/5Et8fOHJ7ieH3X+1uZ89TyLpVLgrwQPhQWYnEEetHz1TY08dyybKrqjFq28wSmEye97S+s5M7/bEUpmDwwEYD5MzI53SmYO3tx/lR2/OF8x/uDXRyVJUFeCB9b8dtZfG+y++acziioqJWc9T5WVdfI6uxiNh0uocmmefyLPSz4fLdLCoLjVfWc/PBiTv3z145x8ABnP7GMo6U1nD4shfjI1r/t2YWHWIkOD2HvQxc6tnWlyUbGyQvhY0opfn/JWD7YdBQwkp4NS4tpcdxtb25keJ8Y7pg9wmX7qqwiPtmay9trjzAwKYqXr5/KsLTYFueLrttwqIRhqTGOpreGJhvXv7KWlVnus4jam2EAznxsKRVmzf7rXQVcNKGfy4dx37iWbfBtCQux8M4t07ny+TUs3V3I3AnpnTpfavJC+EF8ZCgTByQA8Nqqg47tBeW1fL07H601n23L42+L97mcV1JVz9Uvfsfba48ARvPA7Ce/9Vm5ewObTXP5P1dx7l+XObYt3JbXaoAHI+eMnT3AA5RUG0NlnVcVczN9ol1TBiZitSh25nW+o16CvBB+8tw1UwAormrOUfLj19Zz46vr2ZNf4fac/UWVbrcf8VAyKwHlZg6igoo6SnIXzVwAABRUSURBVMz5DCVO8xpOzkx0vL7z/JGO1wlRoS77wFj8vbHJ5shHA2DpQpQPC7EwKDnK5Tonlrc1EuSF8JP0+EimZSZRUtX8n3SnmbDKXXZLMGbTunPTa+tlFq2HOKeC3pxTyrNf7+OLHfmEWhWb7j+Xt2+e7th/7amDsJgxu7S6ocV495155dz8+nqXeRFdzVs0LDWmRZA/cryaCQ9+2eZ5EuSF8KOEqFDHV3poXvz7690FgPHV3nkizFNm883mB87ly1+e4Qg4e/IrWH+oxFfFDmrOz+PRz3fz+Jd7Wb2/mKToMBKjwwixWnj9xmncO2c0cRGhbLq/eaazfcz7Wzefws/OMrKWLt1TyIGiahKjQjm4YC4TMhK6VK5haTFkF1bx/LfZaG1Mqpr52NJ2z5MgL4QfJUWHcdxsCqhw87Vbaxz7bTZNaU0DJ2cmkhAVxog+sZw6NJkPfjYDgC1HSn1X8CBW6hTkdx9rbjbLL29uVjtjRCo3nzEEMOZFXDC2L4//YCJ/uGQcX/7yDGYMTSE2onkEzdtrD5NpznbuqhF9jM71RxbuJqekhk2HO/ahLkFeCD9KjA6jpLoerTXPfJ3l9phL/76SzLs/47XVB2myaS49Yfjl5AEJhFktHDlezaV/X8mq7NYXLPGnI8er250RGggKzaUbX5o/1WW7vQ/FneeuPYnvn5RBfFSoIxhfdsJzss927qohqc3nbzhUwq1vbuzQeRLkhfCjpKgwGpo0x8prXVIWO8s184o/tywbaF5v1k4pRUpMGF/tzGfzkVKufuE7Mu/+rN0OOV/SWjPzsaXMfaZrszZ96ViZEeRnDk/l4IK5rL7nbA4umMsF4zo3dLFvfAQHF8zlqmlGG3xkmLVb5RrfP55fn2sMp73j3c2O7fZRWq2RIC+EHyWY47BXZ7c+PM+u2lwqLiGqZYbL1Nhwx4eBXV6pZxad8IRHzY7knJKagO8gPlpaTXJ0GGEhRnhMj+/eDOXbZg0lPT6C+admdus6Sil+fs5wZo1MdWzb8vvz+MhMitcaCfJC+JF9gXD7MnCv3HCyY98f5411Obai1hh/neAmN84gN00BdY1NLbb5i/NcgBVZgdmcBJBVUMF763McaQc8ISMxitX3nMPIvp6ZsGZPjnfemD4dmjkrQV4IP7Lnnf/fZmP6+5QBiZw1MpVbzhjCdadmsuWB81h4+0xG9GmeEZvg5j/2cHPGrNXSPITvyPHAWTDcouAsswb67rojfi5N6+wTy84b08fPJWldaqwR5Pt0cOasBHkh/GhAYhTQ3NkXFxnCqzdM43dzRgPGyI0x/eKYZLa7hlqV2yyX9rQI9jz3ALe91bGOOW9raLJRVd/ElIGJDEqOorYhcL5hOHNuRkqNC2/jSP9KM4N8R7+pSZAXwo9SYlzb190tGQjN7cKzRqYRHtKyA29AkvFh0dm8KL5QVmN0ACdEhdInLoLy2q7lRfc2+8LZw9JiOHN4ajtH+4/9WTsP6WyLBHkh/Egpxd9+OAmgzfbVgeZ/7Na6LMf2M3KTP/K98S6pjN1lqSyqrPNp9spSc9hkfGQocREhVAZokN9pziZ+9PIJWCxdSDDjI6cNS2Hu+HTumTOqQ8dLkBfCzy6d3J9vfnMWi391ZqvH2FcROnVIstv9Sinmz8gkJSacb++axRM/MBaicE51C8aEq6kPLebRRbsd295dd5jDXcxV3paa+ibyy2sdk7kSo8KIjQiloi5whnY6yys3avIDkgI7339EqJW//2gKo/rGtX8wEuSFCAiZKdGODjV3RvaNZflds7jhtMx2r2W1KC6d3B+LosXygtuPGrXV57/dT2OTjcq6Rn77/jbO+Ev70+M766bX13HKI0u44l+rAWMyUEx4YNbkP9iYwwMfbceiIDk6cNvju0KCvBA9xICkqFbb7E9ktSiGpsZwyKyhL91dQFl1Aztym4P+P7/JZtzvv3C8/3BTTovrdNWh4qoWqXn7J0aSFB1GaU0D9Y1dX5jakw4UVXHtS9/xq/e2oDXERoS6jFAKBhLkhQhSA5Oi+HJnPs8ty+aGV9fxuw+38dBnuxz7n/hqr8vxr6w86LF722vvdm/ddApWiyIjMRKtIbc0MIZ3PrZoN8v3NY/bnzHUfXNYTyYrQwkRpAYmG521Cz432t8/25YHwKDkKEcN39nWnDLW7C9meivt/h31zZ4C8svrmDUyFZuGWSNTmTHMWMfUPjIku7Cy2wm7PCEmvDkEvvHjacwM4FE1XSU1eSGC1CAzoJ7ojtnDXd5bFDx6+XgAVnpgNur1r6wD4OpTBvHajdO4/rTBjn2TBiQQFxHClzvyu30fTzhS0vxhN21wkh9L4j3dDvJKqZ8rpfYopXYopR5z2n6PUirL3Hd+W9cQQnievSbfYntSFNFmsqz1981m9T3n8MOTB5ISE0ZRZXOa3er6xk5njaxyWvpu+pCWQTMi1MrIvrEccFpEw18q6xpZs/+44727+QfBoFvNNUqpWcA8YILWuk4plWZuHwNcCYwF+gGLlVIjtNaBOdVNiCDUN84YCjg0NZrEqDDHoiLJ0eEs/MVMahtsjjwoYM9tX0dBRS3PfbOfz7blkl9ex8EFc9u8T0F5LQUVdYzrH88xcxji3ReOcsmn7iwjMYq1B4673edLe52WWPzijjP8WBLv6m5N/lZggda6DkBrXWBunwe8o7Wu01ofALKAad28lxCiE4akRjOqbyx/uGQc/77pFMf2gUlRDEqObpEwKyk6jH35ldz9/jZeXnnAMaOyvayRlzy7koueWYHWmnwzyE8wx/W7k5EYybHyWpcVr3ytrLqBq19YAxgzXD2VPCwQdbfjdQQwUyn1MFAL/EZrvQ7oD6xxOi7H3NaCUuoW4BaAgQO7tvahEKKliFAri5xqqB//32nUNthanc05OCWGNfuPs7/ItSmlvKbRbb4cO3vt/a+L9zHUXNjCXTpku4zESJpsmryyWkdHrC9V1Dbw+Jd7qG0wPmTamoQWDNqtySulFiultrv5mYfxIZEITAfuBN5TxkBed/+K3FYHtNbPa62naq2npqYGX8+2EIFiQkZCm52LD1w0BnfD8F9ZdaDVpGIbnNaVfXrJPj4ys2m29aGQYSZlc+709KVfvruZN9YcAuCF66a2c3TP126Q11rP1lqPc/PzEUYN/QNtWAvYgBRz+wCny2QAuS2vLoQIFJFhVqxOUf65a6YQHWblb4v3Mer+RS7HHjlejc2m2egU5EMsihXmmPO28vCMTo/DalEeGcnTWfWNNhbvKnC8PzeAUwp7Snfb5P8HnA2glBoBhAFFwMfAlUqpcKXUYGA4sLab9xJCeJnNbH9/7popXDAu3WXJO3tt/ps9Bcx8bClPLdlHiNX4ULj/ojGM6BNLfZMNq0U5Ru+4kxQdxvC0GPbmV3rxN3HP3mfQm3Q3yL8MDFFKbQfeAeabtfodwHvATmARcJuMrBEi8EWHGd10qbFGyuLrZ2Q69mUXGkH5Y7NJZmdeOX/4ZCcAN8zIdKw1mhQd1m76hZSYcIoq67DZNA99upMPN+VQUlXvMgTTGwoqjCA/rn8cn/78dK/eK1B0q+NVa10PXNPKvoeBh7tzfSGEb70wfypvfXeYMelGhkPnUSdLdhUwJj2OJbuN5o6vdjZPaLJYFKPMYzuSGiAlJoy92RXM+/tKth0tI8xqob7JxoCkSJbfdbYnfyUXOSVGOoW/fH8io9M7lsWxp5O0BkIIh+lDkl3SGoSFWFh4+0zmPL2cJ7/ay6urDjoWAbGzj7W/ZGI/tuaUdSjPeVpcBAUVdRSYK2LVm8Mpjxyvob7R5lhE25MOFFXxi3c2kx4fwZBU/6dU8BVJayCEaNOYfnHMGd8XwJEbfohT3plFd8wEjPVqn7hiossEq9ZMHdS8UPbNMwe77Bv9wKJ2x+Z3hb2j97ZZw4J2dqs7EuSFEO1yHoVy5ckDePUGY25jqFV1KKifyDkR2G/OH+myr8mmKazs2NJ2nWFv779sstspO0FLmmuEEO2aOqh5fP0vzx1Bn7gI7p0zmqmZiW2c1brIMCt/v3oKtQ1NhIdYSYgKdSwTCHCouJq0WM+uV1tZ14hSENXGyJ9gJDV5IUS7nGem2mvuN58xhMkDuxbkAeZOSOfykzIAePOmU/jNeSNYeLvR9LPQTIvcHS+vOEDm3Z+xZr+xeEllXSMxYSEdXnglWEhNXgjRKd5YOWlsv3jG9jPy3Zw+LIVVJ6wq1RV//NQY3vnG6kNMH5JMZW0jMRG9L+T1vt9YCNEly+48yyUVsbcMTolmRVYRq7OLObUbKzWlxoZTWFHnaJ6prGt0WSSkt5DmGiFEhwxKjuakQV1vnumoK6YaGVGuemENDV3MVKm1prTa+EDKLTPGxmcXVtI33rPt/D2BBHkhREAZnd48Acs+eamzSqsbaGgyhmGuzCrmtjc3sje/khlDUzxSxp5EgrwQIqCEWJvD0m1vbuzSNexDMK+YanTsOq9v29tIkBdCBKydeeWOCVhaaw4VV/Hmd4fanCxls2l25pYDcNow15p7v4RI7xU2QEmQF0IEnI9uO40HLx4DwJQ/fcWR49X84ZOdnPmXb7j3w+1sPGykOF6xz+igdfafDUe4493NAIw/YYWqzF5Yk+99Xc1CiIA3cUAC/RIiedDMcjnzsaUu+7fmlJEaE8E1L30H4LIO7YGi5sVInMf33372sDZXrApWEuSFEAEpNbb1dAk7cpvTHAM0NNkItbo2TLx7y3RCrRZeuf5k6hqbXHLj9ybSXCOECFh/mjeW04a1HCu/NafU5f29H25zvC6vbSAlJpxTzGyas0al9doADxLkhRAB7NpTM7nr/Japi09cVeq99TmO1+U1DcRFSiOFnQR5IURAm5ARz1NXTmLuhPZr4+W1DXy+/RhxEa2vMdvbyMedECKgKaWYN6k/8yb1Z0x6Fvnltby++pDbYz/adJQmm+ba6YN8XMrAJUFeCNFj3DZrGAB3XziKnbnlTMhI4NmlWTzz9T6abJojJTWEhVj43pTelTO+LRLkhRA9TlRYCFMzjRz3SVGhaA1TH/qKkuoGBiVH9bp0wm2RNnkhRI/WN96YxVpiLjoS4oVUyD2ZBHkhRI+WfkJmSS8sD9ujSZAXQvRomeai4j8+3VwQXCryLiTICyF6tPjIUA4umMsvZg8H4OaZQ/xcosAiHa9CiKAQFxHqksNGGKQmL4QQQUyCvBBCBDEJ8kIIEcQkyAshRBCTIC+EEEFMgrwQQgQxCfJCCBHEJMgLIUQQUzqAEj0opSqAPR08PB4o88AxnT3WX8f5897e+F1SgCI/3Fuen2+v2dHn3NFrBtPfjSfvPVJrHet2j9Y6YH6A9Z049nlPHNPZY/11XE8oYyd/lw4960D/XYLp+Xnp3n75P91D/m48du+2/p57cnPNJx46prPH+us4f97bG79LRwX67xJMz89b1/TkvYPp78Yb924h0Jpr1mutp/q7HML75Fn3DvKcfaOtv+dAq8k/7+8CCJ+RZ907yHP2jVb/ngOqJi+EEMKzAq0mH/SUUpXt7P9GKSVfb3s4ec69Q094zhLkhRAiiPklyLf36RfslFJnKaU+dXr/rFLqej8WyWt687OW59w7BPpzlpq8EEIEMb8FeaVUjFJqiVJqo1Jqm1Jqnrk9Uym1Syn1glJqh1LqS6VUpL/KKbpPnnXvIM85MPmzJl8LXKa1ngLMAp5QStnXWR8O/F1rPRYoBS73Uxm9pRHXv/sIfxXER3rrs5bnLM/Z7/wZ5BXwiFJqK7AY6A/0Mfcd0FpvNl9vADJ9XzyvOgSMUUqFK6XigXP8XSAv663PWp6zPGe/C/HjvX8EpAInaa0blFIHaf4ErHM6rgkIiq92SqkQoE5rfUQp9R6wFdgHbPJvybyuVz1rec7ynP1bMlf+DPLxQIH5j2EWMMiPZfGVsUA2gNb6LuCuEw/QWp/l4zL5Qm971vKc5Tljbj/Lx2VqwedB3v7pB7wJfKKUWg9sBnb7uiy+pJT6KXA7cIe/y+IrvfFZy3OW5xxofJ7WQCk1EXhBaz3NpzcWPifPuneQ5xzYfNrxan76vQ3c58v7Ct+TZ907yHMOfJKgTAghgphXa/JKqQFKqaXmRIgdSqlfmNuTlFJfKaX2mX8mmtuVUupppVSWUmqrUmqK07Xmm8fvU0rN92a5Red5+FkvUkqVOk8VF4HBU89ZKTVJKbXavMZWpdQP/fl7BbWOLj/VlR8gHZhivo4F9gJjgMeAu83tdwOPmq/nAJ9jjLedDnxnbk8C9pt/JpqvE71Zdvnxz7M2950DXAx86u/fS36885yBEcBw83U/IA9I8PfvF4w/Xq3Ja63ztNYbzdcVwC6MCRLzgNfMw14DLjVfzwNe14Y1QIJSKh04H/hKa31ca10CfAVc4M2yi87x4LNGa70EqPBl+UXHeOo5a633aq33mdfJBQowxtgLD/NZx6tSKhOYDHwH9NFa54HxjwZIMw/rDxxxOi3H3NbadhGAuvmsRQ/hqeeslJoGhGGOORee5ZMgr5SKAd4H7tBal7d1qJttuo3tIsB44FmLHsBTz9n89vYGcIPW2ubZUgrwQZBXSoVi/GN4U2v9gbk53/7V3PyzwNyeAwxwOj0DyG1juwggHnrWIsB56jkrpeKAz4D7zKYc4QXeHl2jgJeAXVrrJ512fQzYR8jMBz5y2n6d2SM/HSgzv/p9AZynlEo0e+3PM7eJAOHBZy0CmKees1IqDPgQo73+Pz4qfu/kzV5d4HSMr2ZbMaY5b8bobU8GlmAk81kCJJnHK+DvGG1z24CpTte6Ecgyf27wd4+1/Hj1WS8HCoEajJrg+f7+/eTHs88ZuAZocLrGZmCSv3+/YPyRyVBCCBHEZPk/IYQIYhLkhRAiiEmQF0KIICZBXgghgpgEeSGECGIS5IUQIohJkBdCiCAmQV4IIYLY/wMmiH7ke8/ywgAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"ts = pd.Series(np.random.randn(1000), index = pd.date_range('1/1/2000', periods = 1000))\n",
"ts = ts.cumsum() # 누적 합계\n",
"ts.plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**plot() 메소드를 사용하여 여러 개의 열을 한 번에 그릴 수 있다**"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"ExecuteTime": {
"end_time": "2020-05-21T06:01:28.740862Z",
"start_time": "2020-05-21T06:01:28.414735Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x293d5e43748>"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"<Figure size 432x288 with 0 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df = pd.DataFrame(np.random.randn(1000, 4), index = ts.index, columns = ['A', 'B', 'C', 'D'])\n",
"df = df.cumsum()\n",
"\n",
"# figure() 함수: matplotlib에서 figure를 만들고 편집할 수 있게 만들어주는 함수\n",
"plt.figure()\n",
"df.plot()\n",
"# loc = 'best'는 레전드의 위치를 자동으로 계산하여 최적의 위치에 레전드를 위치시킨다\n",
"plt.legend(loc = 'best')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 데이터 입/출력(Getting Data In/Out)\n",
"- Pandas는 데이터 파일을 읽어와서 데이터 프레임으로 만들 수 있다\n",
" - CSV\n",
" - Excel\n",
" - HTML\n",
" - JSON\n",
" - HDF5\n",
" - SAS\n",
" - STATA\n",
" - SQL\n",
"- 데이터 입/출력 관련 참고 사이트: \n",
" - 1. CSV 파일 입/출력: https://datascienceschool.net/view-notebook/c5ccddd6716042ee8be3e5436081778b/\n",
" - 2. Excel 파일 입/출력: https://rfriend.tistory.com/464"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CSV 파일 입력\n",
"- CSV 파일 포맷은 데이터 값이 쉼표(,)로 구분되는 텍스트 파일\n",
"- pd.read_csv('data.csv', names = None, index_col = None, sep = None, skiprows = None, na_values = None)\n",
" - names: 데이터 파일에 열 인덱스 정보가 없는 경우 --> ex) names = ['c1', 'c2', 'c3']\n",
" - index_col: 테이블 내의 특정한 열을 행 인덱스로 지정하고 싶은 경우 --> ex) index_col = 'c1'\n",
" - sep: 데이터를 구분하는 구분자가 쉼표(,)가 아닌 경우, sep 인수를 써서 구분자를 지정 --> ex) sep = '\\s+' => 구분자의 길이가 정해지지 않은 공백인 경우\n",
" - skiprows: 데이터 중에 건너 뛰어야 할 행이 있는 경우 --> ex) skiprows = [0, 1]\n",
" - na_values: 특정한 값을 NaN 으로 취급하고 싶은 경우 --> ex) na_values = ['누락']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CSV 파일 출력\n",
"- to_csv() 메소드를 사용\n",
" - ex) df.to_csv('data.csv', sep = None, na_rep = None, index = None, header = None)\n",
" - sep: 구분자 변경 --> ex) sep = '|'\n",
" - na_rep: NaN 으로 입력된 값을 변경 --> ex) na_rep = '누락'\n",
" - index: 인덱스 출력 여부 지정 --> ex) index = False\n",
" - header: 헤더 출력 여부 지정 --> ex) header = False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Excel 파일 입/출력\n",
"- pd.read_excel('data.xlsx', ...)\n",
"- CSV 파일 입/출력과 거의 유사하다\n",
"- 위에 적어놓은 Excel 파일 입/출력 관련 사이트를 참고하자!!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment