Skip to content

Instantly share code, notes, and snippets.

@mrbkdad
Created January 27, 2019 10:49
Show Gist options
  • Save mrbkdad/126d2610a412cf1c412809554371a86c to your computer and use it in GitHub Desktop.
Save mrbkdad/126d2610a412cf1c412809554371a86c to your computer and use it in GitHub Desktop.
Python basic course 3
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 셀 명령\n",
"\n",
"- 시작 !"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!dir"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!type \"data\\GDPR Ticketing Report.csv\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pandas\n",
"1. 엑셀 작업용 파이썬 라이브러리\n",
"2. 샘플 - GDPRS 자료\n",
"3. DataFrame, Series 객체"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"data/GDPR Ticketing 2019-01-20T22_51_51+0000.csv\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"df.columns.values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs = df[['Summary','Issue key','Custom field (Email Address (Mirrored))','Custom field (First Name (Mirrored))',\n",
" 'Custom field (Last Name (Mirrored))','Custom field (Request Type (Mirrored))']]\n",
"p_ehs.columns = ['Summary','IssueKey','EmailAddress','FirstName', 'LastName','RequestType']\n",
"p_ehs.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs.describe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs['RequestCnt'] = 1\n",
"# p_ehs['new'] = p_ehs['RequestCnt'] * p_ehs['B']\n",
"p_ehs.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs.describe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"type(p_ehs),type(p_ehs.IssueKey)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs['IssueKey'][:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs.IssueKey[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs.RequestType.unique()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"p_ehs[p_ehs.RequestType == 'Right to erasure']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"p_ehs.EmailAddress.str.split('@')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"p_ehs.EmailAddress.map(lambda x:x.split('@')[1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs['EmailDomain']=p_ehs.EmailAddress.map(lambda x:x.split('@')[1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs.EmailDomain.unique()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pd.DataFrame(p_ehs.groupby('EmailDomain').count()['RequestType'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p_ehs['cat']=p_ehs['EmailDomain']+'-'+p_ehs['RequestType']\n",
"p_ehs.head(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cat_df = pd.DataFrame(p_ehs.groupby('cat').count()['RequestType'])\n",
"cat_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cat_df = cat_df.reset_index()\n",
"cat_df.columns = ['cat','cnt']\n",
"cat_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cat_df = cat_df.sort_values('cat')\n",
"cat_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cat_df.to_csv('data/GDPR Ticketing Report.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"import matplotlib as mpl\n",
"import matplotlib.pylab as plt\n",
"cat_df = cat_df.set_index(['cat'])\n",
"cat_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cat_df.plot()\n",
"#plt.legend()\n",
"plt.xlabel('category')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 인터넷 데이터 이용\n",
"\n",
"- pandas_datareader 패키지의 DataReader 이용\n",
"- https://pandas-datareader.readthedocs.io/en/latest/index.html\n",
"- FRED, Fama/French, World Bank, OECD, Eurostat, EDGAR Index, TSP Fund Data\n",
"- Oanda currency historical rate, Nasdaq Trader Symbol Definitions\n",
"- https://fred.stlouisfed.org/series/GDP\n",
"- https://fred.stlouisfed.org/series/CPIAUCSL\n",
"- https://fred.stlouisfed.org/series/CPILFESL\n",
"<pre>\n",
"import pandas_datareader as pdr\n",
"import datetime\n",
"dt_start = datetime.datetime(2015, 1, 1)\n",
"dt_end = \"2016, 6, 30\"\n",
"gdp = pdr.get_data_fred('GDP', dt_start, dt_end)\n",
"inflation = pdr.get_data_fred([\"CPIAUCSL\", \"CPILFESL\"], dt_start, dt_end)\n",
"</pre>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"https://raw.githubusercontent.com/datascienceschool/docker_rpython/master/data/titanic.csv\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"data/titanic.txt\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.tail()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment