Skip to content

Instantly share code, notes, and snippets.

@VintageAppMaker
Created March 19, 2024 10:56
Show Gist options
  • Save VintageAppMaker/f62a4e65927b4270dfd7e48680e5f34d to your computer and use it in GitHub Desktop.
Save VintageAppMaker/f62a4e65927b4270dfd7e48680e5f34d to your computer and use it in GitHub Desktop.
python-pandas.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/VintageAppMaker/f62a4e65927b4270dfd7e48680e5f34d/python-pandas.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"## Python Pandas 설명\n",
"\n",
"**Pandas**는 Python에서 데이터 분석을 위한 강력하고 인기 있는 라이브러리입니다. 데이터를 효율적으로 조작, 정제, 분석 및 시각화하는 데 도움이 되는 다양한 기능을 제공합니다.\n",
"\n",
"**핵심 기능:**\n",
"\n",
"* **데이터 구조:**\n",
" * **Series:** 1차원 배열과 유사하며, 데이터의 각 항목에 고유한 인덱스가 연결됩니다.\n",
" * **DataFrame:** 표 형식의 2차원 데이터 구조이며, 행과 열에 인덱스를 가질 수 있습니다.\n",
"* **데이터 불러오기:** CSV, Excel, SQL, JSON 등 다양한 데이터 소스로부터 데이터를 불러올 수 있습니다.\n",
"* **데이터 정제:** 누락된 값 처리, 이상값 제거, 데이터 형식 변환 등 다양한 데이터 정제 작업을 수행할 수 있습니다.\n",
"* **데이터 분석:** 통계 요약, 그룹화 연산, 필터링, 합계 및 평균 계산 등 다양한 데이터 분석 작업을 수행할 수 있습니다.\n",
"* **데이터 시각화:** Matplotlib과 같은 라이브러리와 함께 사용하여 다양한 그래프와 차트를 만들 수 있습니다.\n",
"\n",
"## 필수 예제\n",
"\n",
"**1. 데이터 불러오기 및 DataFrame 생성**"
],
"metadata": {
"id": "yew1sRxzSozb"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"\n",
"# CSV 파일 불러오기\n",
"df = pd.read_csv(\"data.csv\")\n",
"\n",
"# DataFrame 생성\n",
"df = pd.DataFrame({\"Name\": [\"Alice\", \"Bob\", \"Carol\"], \"Age\": [25, 30, 35]})"
],
"outputs": [],
"execution_count": null,
"metadata": {
"id": "GhHLoRxfSozc"
}
},
{
"cell_type": "markdown",
"source": [
"**2. 데이터 확인**"
],
"metadata": {
"id": "JBmTg9lPSozc"
}
},
{
"cell_type": "code",
"source": [
"# DataFrame 정보 확인\n",
"print(df.info())\n",
"\n",
"# 첫 5개 행 출력\n",
"print(df.head())\n",
"\n",
"# 마지막 5개 행 출력\n",
"print(df.tail())"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 3 entries, 0 to 2\n",
"Data columns (total 2 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Name 3 non-null object\n",
" 1 Age 3 non-null int64 \n",
"dtypes: int64(1), object(1)\n",
"memory usage: 176.0+ bytes\n",
"None\n",
" Name Age\n",
"0 Alice 25\n",
"1 Bob 30\n",
"2 Carol 35\n",
" Name Age\n",
"0 Alice 25\n",
"1 Bob 30\n",
"2 Carol 35\n"
]
}
],
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "OrJwn-RFSozc",
"outputId": "5182e7a3-88dc-44e8-eec0-769ff092c11f"
}
},
{
"cell_type": "markdown",
"source": [
"**3. 데이터 정제**"
],
"metadata": {
"id": "ESYi35s5Sozd"
}
},
{
"cell_type": "code",
"source": [
"# 누락된 값 처리\n",
"df[\"Age\"].fillna(0, inplace=True)\n",
"\n",
"# 이상값 제거\n",
"df = df[df[\"Age\"] < 100]\n",
"\n",
"# 데이터 형식 변환\n",
"df[\"Age\"] = df[\"Age\"].astype(\"int\")"
],
"outputs": [],
"execution_count": null,
"metadata": {
"id": "Q9uN3FAuSozd"
}
},
{
"cell_type": "markdown",
"source": [
"**4. 데이터 분석**"
],
"metadata": {
"id": "-JsB44GqSozd"
}
},
{
"cell_type": "code",
"source": [
"# 통계 요약\n",
"print(df.describe())\n",
"\n",
"# 그룹화 연산\n",
"df_grouped = df.groupby(\"Name\")\n",
"print(df_grouped[\"Age\"].mean())\n",
"\n",
"# 필터링\n",
"df_filtered = df[df[\"Age\"] > 30]\n",
"\n",
"# 합계 및 평균 계산\n",
"total_age = df[\"Age\"].sum()\n",
"average_age = df[\"Age\"].mean()"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" Age\n",
"count 3.0\n",
"mean 30.0\n",
"std 5.0\n",
"min 25.0\n",
"25% 27.5\n",
"50% 30.0\n",
"75% 32.5\n",
"max 35.0\n",
"Name\n",
"Alice 25.0\n",
"Bob 30.0\n",
"Carol 35.0\n",
"Name: Age, dtype: float64\n"
]
}
],
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "BGH3I41hSozd",
"outputId": "b8ad2e34-1a3b-4dbe-860f-a8d4afbbedb8"
}
},
{
"cell_type": "markdown",
"source": [
"**5. 데이터 시각화**"
],
"metadata": {
"id": "eIYcyTDuSozd"
}
},
{
"cell_type": "code",
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# 히스토그램\n",
"plt.hist(df[\"Age\"])\n",
"plt.show()\n",
"\n",
"# 산점도\n",
"plt.scatter(df[\"Age\"], df[\"Height\"])\n",
"plt.show()"
],
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "error",
"ename": "KeyError",
"evalue": "'Height'",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3801\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3802\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcasted_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3803\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mKeyError\u001b[0m: 'Height'",
"\nThe above exception was the direct cause of the following exception:\n",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-10-885281bcd3f0>\u001b[0m in \u001b[0;36m<cell line: 8>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;31m# 산점도\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscatter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"Age\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"Height\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 9\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshow\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 3805\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnlevels\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3806\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_multilevel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3807\u001b[0;31m \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3808\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mindexer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3809\u001b[0m \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mindexer\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3802\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcasted_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3803\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3804\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3805\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3806\u001b[0m \u001b[0;31m# If we have a listlike key, _check_indexing_error will raise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyError\u001b[0m: 'Height'"
]
}
],
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 891
},
"id": "mFrap8wUSozd",
"outputId": "40a5a6e8-9191-455b-ea03-15d910a890ae"
}
},
{
"cell_type": "markdown",
"source": [
"**참고자료:**\n",
"\n",
"* Pandas 공식 문서: [https://pandas.pydata.org/pandas-docs/stable/](https://pandas.pydata.org/pandas-docs/stable/)\n",
"* Pandas チュートリアル: [유효하지 않은 URL 삭제됨]\n",
"* Python Pandas 실무 데이터 분석: [https://wikidocs.net/book/4639](https://wikidocs.net/book/4639)"
],
"metadata": {
"id": "RS173uaNSozd"
}
},
{
"cell_type": "markdown",
"source": [
"---\n",
"# 🎃 위의 소스는 버그가 있다!!!\n",
"> 실행하면 에러가 발생한다. 그러므로 사용자의 Python 문해력이 필요하다. 결국 다음과 같은 순서로 문제를 해결했다.\n",
"\n",
"1. 위의 예제는 버그가 있다.\n",
"2. csv가 없다.\n",
"3. 그래서 다음과 같은 프롬프트를 사용했다.\n",
"\n",
"~~~\n",
"Pandas의 예제로 사용하기 위해 CSV를 만든다.\n",
"Name은 문자열, Age는 숫자로 한다. 파일을 만들어줘\n",
"~~~\n",
"그랬더니 다음과 같은 코드를 만들었다."
],
"metadata": {
"id": "a7BEWyvxTMoV"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"\n",
"data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [20, 25, 30]}\n",
"\n",
"# Create a DataFrame\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Save the DataFrame to a CSV file\n",
"df.to_csv('example.csv', index=False)"
],
"metadata": {
"id": "jCEznZQUTRDk"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"위의 코드를 실행 후, example.csv가 만들어지면 위의 1번으로 이동하여 소스에서 예제파일을 \"example.csv\"로 변경 후 차례대로 실행한다.\n"
],
"metadata": {
"id": "wg_ZHj00Txpj"
}
}
],
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment