Created
March 14, 2021 20:11
-
-
Save jmcalvomartin/ff28541665cd8c9f35df7c53c8400f8f to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.7" | |
}, | |
"colab": { | |
"name": "Actividad_IncendiosAustralia.ipynb", | |
"provenance": [], | |
"collapsed_sections": [] | |
}, | |
"accelerator": "GPU" | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "yRn9TM4FAYu_" | |
}, | |
"source": [ | |
"# DataScience Incendios de Australia\n", | |
"**Actividad**: Estudio de los incendios producidos en Australia entre agosto y septiembre de 2019<br>\n", | |
"**Objetivo**: Buscar la relación que hubo entre los diferentes icendios.\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "PvGpXCj5AYvF" | |
}, | |
"source": [ | |
"### Objetivos de la Práctica\n", | |
"\n", | |
"#### Video 1\n", | |
"* Cargar el DataSet\n", | |
"* Realizar un estudio de DataCleaning\n", | |
" * Localizar valores nulos\n", | |
" * Despreciar valores\n", | |
" * Variables Boolean\n", | |
" \n", | |
"#### Video 2\n", | |
"* Estudio de los Datos mediante gráficos (explicados y etiquetados) \n", | |
" * Histograma\n", | |
" * Barras\n", | |
" * Grafico de Puntos\n", | |
" * Geolocalización\n", | |
" \n", | |
"#### Video 3\n", | |
"* Creación modelo de agrupamiento\n", | |
" * Normalización de los datos\n", | |
" * Busqueda optima de agrupamiento (clusters)\n", | |
" * Aplicar modelo (KMeans)\n", | |
" * Graficar resultados del modelo usando **PCA**\n", | |
" * Muestra de agrupamiento geolocalizada en mapa " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "YgIIigeKAYvG" | |
}, | |
"source": [ | |
"### Información del Dataset" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "8l4NLsnhAYvG" | |
}, | |
"source": [ | |
"#### Columnas\n", | |
"\n", | |
"* Latitude: Center of 1km fire pixel but not necessarily the actual location of the fire as one or more fires can be detected within the 1km pixel.\n", | |
"\n", | |
"* Longitude: Center of 1km fire pixel but not necessarily the actual location of the fire as one or more fires can be detected within the 1km pixel.\n", | |
"\n", | |
"* Brightness temperature 21 (Kelvin): Channel 21/22 brightness temperature of the fire pixel measured in Kelvin.\n", | |
"\n", | |
"* Scan pixel size: The algorithm produces 1km fire pixels but MODIS pixels get bigger toward the edge of scan. Scan and track reflect actual pixel size.\n", | |
"\n", | |
"* Track pixel size: The algorithm produces 1km fire pixels but MODIS pixels get bigger toward the edge of scan. Scan and track reflect actual pixel size.\n", | |
"\n", | |
"* Acquisition Date: Date of MODIS acquisition.\n", | |
"\n", | |
"* Acquisition Time: Time of acquisition/overpass of the satellite (in UTC).\n", | |
"\n", | |
"* Satellite: A = Aqua and T = Terra.\n", | |
"\n", | |
"* Instrument: Constant value for MODIS.\n", | |
"\n", | |
"* Confidence (0-100%): This value is based on a collection of intermediate algorithm quantities used in the detection process. It is intended to help users gauge the quality of individual hotspot/fire pixels. Confidence estimates range between 0 and 100% and are assigned one of the three fire classes (low-confidence fire, nominal-confidence fire, or high-confidence fire).\n", | |
"\n", | |
"* Version (Collection and source): Version identifies the collection (e.g. MODIS Collection 6) and source of data processing: Near Real-Time (NRT suffix added to collection) or Standard Processing (collection only). \"6.0NRT\" - Collection 6 NRT processing. \"6.0\" - Collection 6 Standard processing. Find out more on collections and on the differences between FIRMS data sourced from LANCE FIRMS and University of Maryland.\n", | |
"\n", | |
"* Brightness temperature 31 (Kelvin): Channel 31 brightness temperature of the fire pixel measured in Kelvin.\n", | |
"\n", | |
"* Fire Radiative Power: Depicts the pixel-integrated fire radiative power in MW (megawatts).\n", | |
"\n", | |
"* Day / Night: D = Daytime, N = Nighttime" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "QdmGiTjFAYvG" | |
}, | |
"source": [ | |
"# Se trabaja con un dataset sobre los incendios ocurridos en Australia entre agosto y septiembre de 2019" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "E8YfLWkXmVw8" | |
}, | |
"source": [ | |
"import pandas as pd\r\n", | |
"import numpy as np\r\n", | |
"import matplotlib.pyplot as plt" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "_dY-rdI1meV6" | |
}, | |
"source": [ | |
"australia=pd.read_csv(\"Fire_Australia.csv\")\r\n", | |
"australia.head()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "LhVzTTjWmxV2" | |
}, | |
"source": [ | |
"#Sacar la dimensión del dataset\r\n", | |
"australia.shape" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "GePpf3mBnAOk" | |
}, | |
"source": [ | |
"australia.info()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "R_vIBtFYnOwd" | |
}, | |
"source": [ | |
"australia.describe()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "k3kK1dyjnirG" | |
}, | |
"source": [ | |
"" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "7GXcGfXwAYvH" | |
}, | |
"source": [ | |
"## Primero se limpia el Dataset" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "UbhGjMpsAYvL" | |
}, | |
"source": [ | |
"#### Al no haber ningún valor nulo, se decidirá si toda la información proporcionada por el dataset es necesaria " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "MypkExN2ntrC" | |
}, | |
"source": [ | |
"australia.isnull().sum()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "TBWq49anAYvM" | |
}, | |
"source": [ | |
"Se observa junto a la información previamente dada que una columna, MODIS, es constante, y por ello no proporciona ninguna información variada" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "8P_J3IBmoEtH" | |
}, | |
"source": [ | |
"australia[\"instrument\"].unique()\r\n", | |
"australia=australia.drop([\"instrument\"],axis=1)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "7VtP_vLPoZJ-" | |
}, | |
"source": [ | |
"australia.head()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "B6qHl8WIAYvM" | |
}, | |
"source": [ | |
"El campo Version también se mantiene constante a lo largo de la tabla y se elimina" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "prE-FYN3oij1" | |
}, | |
"source": [ | |
"australia[\"version\"].unique()\r\n", | |
"australia=australia.drop([\"version\"],axis=1)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "rIRmqBbCAYvN" | |
}, | |
"source": [ | |
"* Observando que solo tiene dos valores se procede a ponerlo de forma númerica para facilitar su futuro analisis, para ello se crean variables dummies " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "kJhI4fGHo4M7" | |
}, | |
"source": [ | |
"australia[\"daynight\"].unique()\r\n", | |
"australia[\"daynight\"]=australia[\"daynight\"].replace(\"D\",1)\r\n", | |
"australia[\"daynight\"]=australia[\"daynight\"].replace(\"N\",0)\r\n", | |
"australia.tail()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "r4w8C0pfpjXl" | |
}, | |
"source": [ | |
"australia[\"satellite\"].unique()\r\n", | |
"australia[\"satellite\"]=australia[\"satellite\"].replace(\"Terra\",1)\r\n", | |
"australia[\"satellite\"]=australia[\"satellite\"].replace(\"Aqua\",0)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "cz6bQBVqp0IS" | |
}, | |
"source": [ | |
"australia.head()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "WrB8BIzSAYvQ" | |
}, | |
"source": [ | |
"#### Estudio de las correlaciones en los datos de los incendios" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "sjAjxh_lp-2R" | |
}, | |
"source": [ | |
"australia.corr()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "OhnE8YfPqZrB" | |
}, | |
"source": [ | |
"australia.corr().idxmin()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "4TwGCo_Bqrep" | |
}, | |
"source": [ | |
"positive_corr=australia.corr()\r\n", | |
"np.fill_diagonal(positive_corr.values,0)\r\n", | |
"positive_corr.idxmax()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "bAtmgyWIAYvR" | |
}, | |
"source": [ | |
"## Se hace un estudio a través de gráficos" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "c6jjORPBAYvR" | |
}, | |
"source": [ | |
"#### Se realizará un histograma" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "HSGfiBDbwThZ" | |
}, | |
"source": [ | |
"plt.hist(australia.confidence, label=\"Confianza\", color=\"red\")\r\n", | |
"plt.legend()\r\n", | |
"plt.xlabel(\"Nivel de Confianza\")\r\n", | |
"plt.ylabel(\"Número de incendios\")\r\n", | |
"plt.title(\"Grafico de Confianza Incendios Autralia\")" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "gftfb5WBAYvR" | |
}, | |
"source": [ | |
"#### Se realiza un diagrama de puntos\n", | |
"* Usando las variables de mayor correlación" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "eR-ihix9xEXo" | |
}, | |
"source": [ | |
"plt.scatter(australia.brightness,australia.frp, label=\"Incendio\", color=\"orange\", alpha=0.7)\r\n", | |
"plt.legend()\r\n", | |
"plt.xlabel(\"Luminosidad\")\r\n", | |
"plt.ylabel(\"FRP\")\r\n", | |
"plt.title(\"Relación en tre Luminosidad y FRP\")" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "RVBsFr8qAYvS" | |
}, | |
"source": [ | |
"##### Se realiza un diagrama de barras " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "yXxFNOTpx6kQ" | |
}, | |
"source": [ | |
"pd.crosstab(australia.daynight,australia.satellite).plot(kind=\"bar\")\r\n", | |
"plt.title(\"Relación entre Dia/Noche con el Satelite\")" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "jWLPVshqAYvU" | |
}, | |
"source": [ | |
"## Graficos Mapa\n", | |
"### Usamos la libreria folium\n", | |
"* Dentro de esta librería usaremos tambien el plugin de Cluster para poder agrupar los datos" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "FvPUl7Hky2Sr" | |
}, | |
"source": [ | |
"import folium" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "d7w1EoiRy5e_" | |
}, | |
"source": [ | |
"mapa=folium.Map([-25.274398,133.775136], zoom_start=4.5, tiles=\"Stamen Terrain\")\r\n", | |
"mapa" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "69AkYFRszgCz" | |
}, | |
"source": [ | |
"from folium.plugins import MarkerCluster" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "FJOZHEiDzwAj" | |
}, | |
"source": [ | |
"australia_cluster=MarkerCluster().add_to(mapa)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "siH0derMz7wT" | |
}, | |
"source": [ | |
"for lat,long,frp in zip(australia.latitude,australia.longitude,australia.frp):\r\n", | |
" folium.Marker([lat,long],tooltip=(\"FRP: \" + str(frp))).add_to(australia_cluster)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "s1P8rLJ20nDn" | |
}, | |
"source": [ | |
"mapa" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "SS74PLiJAYvW" | |
}, | |
"source": [ | |
"# Creación de modelo\n", | |
"#### Por medio de librerias de machine learning se busca el agrupamiento lógico de los incendios para buscar patrones de similitud\n", | |
"\n", | |
"#### Para esta parte de la práctica se realizarán los siguientes pasos\n", | |
"* Normalizar los datos del dataset para facilitar su posterior análisis del modelo\n", | |
"* Se establece como técnica de Machine Learning, el modelo KMeans del paquete sklearn\n", | |
"* Se busca en cuantos cluster se pueden agrupar los incendios de manera más optima\n", | |
"* Se realiza el análisis y clasificación." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "h39NY6Tp3Unv" | |
}, | |
"source": [ | |
"from sklearn.preprocessing import MinMaxScaler\r\n", | |
"from sklearn.cluster import KMeans" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "oFgqGm8AAYvY" | |
}, | |
"source": [ | |
"### Normalizar datos \n", | |
"* Se procede a normalizar los datos de algunas variables para facilitar el posterior análisis predictivo" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "chJmY4OW3g1f" | |
}, | |
"source": [ | |
"australia.head()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "SJaWFANQG1jX" | |
}, | |
"source": [ | |
"australia_norm=australia.iloc[:,2:]\r\n", | |
"australia_norm.head()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "fIjoq1FtHF6x" | |
}, | |
"source": [ | |
"australia_norm=australia_norm.drop([\"acq_date\"],axis=1)\r\n", | |
"australia_norm.head()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "n6i82vqtHVMY" | |
}, | |
"source": [ | |
"list_columns=australia_norm.columns.values.tolist()\r\n", | |
"list_columns\r\n", | |
"scaler=MinMaxScaler()\r\n", | |
"scaler_australia=scaler.fit_transform(australia_norm)\r\n", | |
"australia_norm=pd.DataFrame(scaler_australia,columns=list_columns)\r\n", | |
"australia_norm.head()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "0GvfU1ivAYvY" | |
}, | |
"source": [ | |
"### Buscamos la cantidad óptima de clusters\n", | |
"* Para ello se usará el metodo del **codo de Jambu** .Calculando que tan similares con los incendios dentro de los cluster" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "9IjiRueCIOAp" | |
}, | |
"source": [ | |
"sc=[]\r\n", | |
"for i in range(1,15):\r\n", | |
" kmeans=KMeans(n_clusters=i,max_iter=300)\r\n", | |
" kmeans.fit(australia_norm)\r\n", | |
" sc.append(kmeans.inertia_)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "L_ln8vfrJBuz" | |
}, | |
"source": [ | |
"plt.plot(range(1,15),sc)\r\n", | |
"plt.title(\"Codo de Jambu\")\r\n", | |
"plt.xlabel(\"Nº Cluster\")\r\n", | |
"plt.ylabel(\"SC\")\r\n", | |
"plt.show()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "nl7DkUvRAYvZ" | |
}, | |
"source": [ | |
"### Aplicamos el modelo Kmeans para n cluster" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "HFLgkiGWJg73" | |
}, | |
"source": [ | |
"fire_cluster=KMeans(n_clusters=4,max_iter=300)\r\n", | |
"fire_cluster.fit(australia_norm)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "K5pNdTwFJrqI" | |
}, | |
"source": [ | |
"australia_norm[\"Cluster\"]=fire_cluster.labels_\r\n", | |
"australia_norm.tail(100)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "oLpiZrZyKDoS" | |
}, | |
"source": [ | |
"australia_final=pd.concat([australia,australia_norm.Cluster],axis=1)\r\n", | |
"australia_final.tail(100)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "JI3rwjBUKd3N" | |
}, | |
"source": [ | |
"australia_final.Cluster=australia_final.Cluster.replace(0,\"Blue\")\r\n", | |
"australia_final.Cluster=australia_final.Cluster.replace(1,\"Green\")\r\n", | |
"australia_final.Cluster=australia_final.Cluster.replace(2,\"Red\")\r\n", | |
"australia_final.Cluster=australia_final.Cluster.replace(3,\"Yellow\")\r\n", | |
"\r\n", | |
"australia_final.head()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "exy7CCFFAYvZ" | |
}, | |
"source": [ | |
"### Graficar los clusters\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "PbbI-8RCC8y4" | |
}, | |
"source": [ | |
"def draw_mapa_cluster(dataset):\r\n", | |
" \r\n", | |
" mapa=folium.Map([-25.274398,133.775136],zoom_start=4.5,tiles='Stamen Terrain') \r\n", | |
"\r\n", | |
" for lat,long,grupo,fire in zip(dataset[\"latitude\"],dataset[\"longitude\"],dataset[\"Cluster\"],dataset[\"frp\"]):\r\n", | |
" folium.vector_layers.CircleMarker([lat,long],radius=5,tooltip=(\"Radiactive: \" + str(fire)),color=grupo,fill=True,fillo_color=grupo,fill_opacity=0.7).add_to(mapa)\r\n", | |
" \r\n", | |
" return mapa " | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "FKh8yTzVLZxn" | |
}, | |
"source": [ | |
"draw_mapa_cluster(australia_final[australia_final[\"satellite\"]==1])" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "nrUPcVawLpCF" | |
}, | |
"source": [ | |
"" | |
], | |
"execution_count": null, | |
"outputs": [] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment