Skip to content

Instantly share code, notes, and snippets.

@jmcalvomartin
Created March 14, 2021 20:11
Show Gist options
  • Save jmcalvomartin/ff28541665cd8c9f35df7c53c8400f8f to your computer and use it in GitHub Desktop.
Save jmcalvomartin/ff28541665cd8c9f35df7c53c8400f8f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"colab": {
"name": "Actividad_IncendiosAustralia.ipynb",
"provenance": [],
"collapsed_sections": []
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "yRn9TM4FAYu_"
},
"source": [
"# DataScience Incendios de Australia\n",
"**Actividad**: Estudio de los incendios producidos en Australia entre agosto y septiembre de 2019<br>\n",
"**Objetivo**: Buscar la relación que hubo entre los diferentes icendios.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PvGpXCj5AYvF"
},
"source": [
"### Objetivos de la Práctica\n",
"\n",
"#### Video 1\n",
"* Cargar el DataSet\n",
"* Realizar un estudio de DataCleaning\n",
" * Localizar valores nulos\n",
" * Despreciar valores\n",
" * Variables Boolean\n",
" \n",
"#### Video 2\n",
"* Estudio de los Datos mediante gráficos (explicados y etiquetados) \n",
" * Histograma\n",
" * Barras\n",
" * Grafico de Puntos\n",
" * Geolocalización\n",
" \n",
"#### Video 3\n",
"* Creación modelo de agrupamiento\n",
" * Normalización de los datos\n",
" * Busqueda optima de agrupamiento (clusters)\n",
" * Aplicar modelo (KMeans)\n",
" * Graficar resultados del modelo usando **PCA**\n",
" * Muestra de agrupamiento geolocalizada en mapa "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YgIIigeKAYvG"
},
"source": [
"### Información del Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8l4NLsnhAYvG"
},
"source": [
"#### Columnas\n",
"\n",
"* Latitude: Center of 1km fire pixel but not necessarily the actual location of the fire as one or more fires can be detected within the 1km pixel.\n",
"\n",
"* Longitude: Center of 1km fire pixel but not necessarily the actual location of the fire as one or more fires can be detected within the 1km pixel.\n",
"\n",
"* Brightness temperature 21 (Kelvin): Channel 21/22 brightness temperature of the fire pixel measured in Kelvin.\n",
"\n",
"* Scan pixel size: The algorithm produces 1km fire pixels but MODIS pixels get bigger toward the edge of scan. Scan and track reflect actual pixel size.\n",
"\n",
"* Track pixel size: The algorithm produces 1km fire pixels but MODIS pixels get bigger toward the edge of scan. Scan and track reflect actual pixel size.\n",
"\n",
"* Acquisition Date: Date of MODIS acquisition.\n",
"\n",
"* Acquisition Time: Time of acquisition/overpass of the satellite (in UTC).\n",
"\n",
"* Satellite: A = Aqua and T = Terra.\n",
"\n",
"* Instrument: Constant value for MODIS.\n",
"\n",
"* Confidence (0-100%): This value is based on a collection of intermediate algorithm quantities used in the detection process. It is intended to help users gauge the quality of individual hotspot/fire pixels. Confidence estimates range between 0 and 100% and are assigned one of the three fire classes (low-confidence fire, nominal-confidence fire, or high-confidence fire).\n",
"\n",
"* Version (Collection and source): Version identifies the collection (e.g. MODIS Collection 6) and source of data processing: Near Real-Time (NRT suffix added to collection) or Standard Processing (collection only). \"6.0NRT\" - Collection 6 NRT processing. \"6.0\" - Collection 6 Standard processing. Find out more on collections and on the differences between FIRMS data sourced from LANCE FIRMS and University of Maryland.\n",
"\n",
"* Brightness temperature 31 (Kelvin): Channel 31 brightness temperature of the fire pixel measured in Kelvin.\n",
"\n",
"* Fire Radiative Power: Depicts the pixel-integrated fire radiative power in MW (megawatts).\n",
"\n",
"* Day / Night: D = Daytime, N = Nighttime"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QdmGiTjFAYvG"
},
"source": [
"# Se trabaja con un dataset sobre los incendios ocurridos en Australia entre agosto y septiembre de 2019"
]
},
{
"cell_type": "code",
"metadata": {
"id": "E8YfLWkXmVw8"
},
"source": [
"import pandas as pd\r\n",
"import numpy as np\r\n",
"import matplotlib.pyplot as plt"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "_dY-rdI1meV6"
},
"source": [
"australia=pd.read_csv(\"Fire_Australia.csv\")\r\n",
"australia.head()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "LhVzTTjWmxV2"
},
"source": [
"#Sacar la dimensión del dataset\r\n",
"australia.shape"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "GePpf3mBnAOk"
},
"source": [
"australia.info()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "R_vIBtFYnOwd"
},
"source": [
"australia.describe()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "k3kK1dyjnirG"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7GXcGfXwAYvH"
},
"source": [
"## Primero se limpia el Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UbhGjMpsAYvL"
},
"source": [
"#### Al no haber ningún valor nulo, se decidirá si toda la información proporcionada por el dataset es necesaria "
]
},
{
"cell_type": "code",
"metadata": {
"id": "MypkExN2ntrC"
},
"source": [
"australia.isnull().sum()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "TBWq49anAYvM"
},
"source": [
"Se observa junto a la información previamente dada que una columna, MODIS, es constante, y por ello no proporciona ninguna información variada"
]
},
{
"cell_type": "code",
"metadata": {
"id": "8P_J3IBmoEtH"
},
"source": [
"australia[\"instrument\"].unique()\r\n",
"australia=australia.drop([\"instrument\"],axis=1)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "7VtP_vLPoZJ-"
},
"source": [
"australia.head()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "B6qHl8WIAYvM"
},
"source": [
"El campo Version también se mantiene constante a lo largo de la tabla y se elimina"
]
},
{
"cell_type": "code",
"metadata": {
"id": "prE-FYN3oij1"
},
"source": [
"australia[\"version\"].unique()\r\n",
"australia=australia.drop([\"version\"],axis=1)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "rIRmqBbCAYvN"
},
"source": [
"* Observando que solo tiene dos valores se procede a ponerlo de forma númerica para facilitar su futuro analisis, para ello se crean variables dummies "
]
},
{
"cell_type": "code",
"metadata": {
"id": "kJhI4fGHo4M7"
},
"source": [
"australia[\"daynight\"].unique()\r\n",
"australia[\"daynight\"]=australia[\"daynight\"].replace(\"D\",1)\r\n",
"australia[\"daynight\"]=australia[\"daynight\"].replace(\"N\",0)\r\n",
"australia.tail()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "r4w8C0pfpjXl"
},
"source": [
"australia[\"satellite\"].unique()\r\n",
"australia[\"satellite\"]=australia[\"satellite\"].replace(\"Terra\",1)\r\n",
"australia[\"satellite\"]=australia[\"satellite\"].replace(\"Aqua\",0)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "cz6bQBVqp0IS"
},
"source": [
"australia.head()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "WrB8BIzSAYvQ"
},
"source": [
"#### Estudio de las correlaciones en los datos de los incendios"
]
},
{
"cell_type": "code",
"metadata": {
"id": "sjAjxh_lp-2R"
},
"source": [
"australia.corr()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "OhnE8YfPqZrB"
},
"source": [
"australia.corr().idxmin()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "4TwGCo_Bqrep"
},
"source": [
"positive_corr=australia.corr()\r\n",
"np.fill_diagonal(positive_corr.values,0)\r\n",
"positive_corr.idxmax()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "bAtmgyWIAYvR"
},
"source": [
"## Se hace un estudio a través de gráficos"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c6jjORPBAYvR"
},
"source": [
"#### Se realizará un histograma"
]
},
{
"cell_type": "code",
"metadata": {
"id": "HSGfiBDbwThZ"
},
"source": [
"plt.hist(australia.confidence, label=\"Confianza\", color=\"red\")\r\n",
"plt.legend()\r\n",
"plt.xlabel(\"Nivel de Confianza\")\r\n",
"plt.ylabel(\"Número de incendios\")\r\n",
"plt.title(\"Grafico de Confianza Incendios Autralia\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "gftfb5WBAYvR"
},
"source": [
"#### Se realiza un diagrama de puntos\n",
"* Usando las variables de mayor correlación"
]
},
{
"cell_type": "code",
"metadata": {
"id": "eR-ihix9xEXo"
},
"source": [
"plt.scatter(australia.brightness,australia.frp, label=\"Incendio\", color=\"orange\", alpha=0.7)\r\n",
"plt.legend()\r\n",
"plt.xlabel(\"Luminosidad\")\r\n",
"plt.ylabel(\"FRP\")\r\n",
"plt.title(\"Relación en tre Luminosidad y FRP\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "RVBsFr8qAYvS"
},
"source": [
"##### Se realiza un diagrama de barras "
]
},
{
"cell_type": "code",
"metadata": {
"id": "yXxFNOTpx6kQ"
},
"source": [
"pd.crosstab(australia.daynight,australia.satellite).plot(kind=\"bar\")\r\n",
"plt.title(\"Relación entre Dia/Noche con el Satelite\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "jWLPVshqAYvU"
},
"source": [
"## Graficos Mapa\n",
"### Usamos la libreria folium\n",
"* Dentro de esta librería usaremos tambien el plugin de Cluster para poder agrupar los datos"
]
},
{
"cell_type": "code",
"metadata": {
"id": "FvPUl7Hky2Sr"
},
"source": [
"import folium"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "d7w1EoiRy5e_"
},
"source": [
"mapa=folium.Map([-25.274398,133.775136], zoom_start=4.5, tiles=\"Stamen Terrain\")\r\n",
"mapa"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "69AkYFRszgCz"
},
"source": [
"from folium.plugins import MarkerCluster"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "FJOZHEiDzwAj"
},
"source": [
"australia_cluster=MarkerCluster().add_to(mapa)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "siH0derMz7wT"
},
"source": [
"for lat,long,frp in zip(australia.latitude,australia.longitude,australia.frp):\r\n",
" folium.Marker([lat,long],tooltip=(\"FRP: \" + str(frp))).add_to(australia_cluster)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "s1P8rLJ20nDn"
},
"source": [
"mapa"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "SS74PLiJAYvW"
},
"source": [
"# Creación de modelo\n",
"#### Por medio de librerias de machine learning se busca el agrupamiento lógico de los incendios para buscar patrones de similitud\n",
"\n",
"#### Para esta parte de la práctica se realizarán los siguientes pasos\n",
"* Normalizar los datos del dataset para facilitar su posterior análisis del modelo\n",
"* Se establece como técnica de Machine Learning, el modelo KMeans del paquete sklearn\n",
"* Se busca en cuantos cluster se pueden agrupar los incendios de manera más optima\n",
"* Se realiza el análisis y clasificación."
]
},
{
"cell_type": "code",
"metadata": {
"id": "h39NY6Tp3Unv"
},
"source": [
"from sklearn.preprocessing import MinMaxScaler\r\n",
"from sklearn.cluster import KMeans"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "oFgqGm8AAYvY"
},
"source": [
"### Normalizar datos \n",
"* Se procede a normalizar los datos de algunas variables para facilitar el posterior análisis predictivo"
]
},
{
"cell_type": "code",
"metadata": {
"id": "chJmY4OW3g1f"
},
"source": [
"australia.head()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "SJaWFANQG1jX"
},
"source": [
"australia_norm=australia.iloc[:,2:]\r\n",
"australia_norm.head()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "fIjoq1FtHF6x"
},
"source": [
"australia_norm=australia_norm.drop([\"acq_date\"],axis=1)\r\n",
"australia_norm.head()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "n6i82vqtHVMY"
},
"source": [
"list_columns=australia_norm.columns.values.tolist()\r\n",
"list_columns\r\n",
"scaler=MinMaxScaler()\r\n",
"scaler_australia=scaler.fit_transform(australia_norm)\r\n",
"australia_norm=pd.DataFrame(scaler_australia,columns=list_columns)\r\n",
"australia_norm.head()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "0GvfU1ivAYvY"
},
"source": [
"### Buscamos la cantidad óptima de clusters\n",
"* Para ello se usará el metodo del **codo de Jambu** .Calculando que tan similares con los incendios dentro de los cluster"
]
},
{
"cell_type": "code",
"metadata": {
"id": "9IjiRueCIOAp"
},
"source": [
"sc=[]\r\n",
"for i in range(1,15):\r\n",
" kmeans=KMeans(n_clusters=i,max_iter=300)\r\n",
" kmeans.fit(australia_norm)\r\n",
" sc.append(kmeans.inertia_)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "L_ln8vfrJBuz"
},
"source": [
"plt.plot(range(1,15),sc)\r\n",
"plt.title(\"Codo de Jambu\")\r\n",
"plt.xlabel(\"Nº Cluster\")\r\n",
"plt.ylabel(\"SC\")\r\n",
"plt.show()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "nl7DkUvRAYvZ"
},
"source": [
"### Aplicamos el modelo Kmeans para n cluster"
]
},
{
"cell_type": "code",
"metadata": {
"id": "HFLgkiGWJg73"
},
"source": [
"fire_cluster=KMeans(n_clusters=4,max_iter=300)\r\n",
"fire_cluster.fit(australia_norm)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "K5pNdTwFJrqI"
},
"source": [
"australia_norm[\"Cluster\"]=fire_cluster.labels_\r\n",
"australia_norm.tail(100)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "oLpiZrZyKDoS"
},
"source": [
"australia_final=pd.concat([australia,australia_norm.Cluster],axis=1)\r\n",
"australia_final.tail(100)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "JI3rwjBUKd3N"
},
"source": [
"australia_final.Cluster=australia_final.Cluster.replace(0,\"Blue\")\r\n",
"australia_final.Cluster=australia_final.Cluster.replace(1,\"Green\")\r\n",
"australia_final.Cluster=australia_final.Cluster.replace(2,\"Red\")\r\n",
"australia_final.Cluster=australia_final.Cluster.replace(3,\"Yellow\")\r\n",
"\r\n",
"australia_final.head()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "exy7CCFFAYvZ"
},
"source": [
"### Graficar los clusters\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "PbbI-8RCC8y4"
},
"source": [
"def draw_mapa_cluster(dataset):\r\n",
" \r\n",
" mapa=folium.Map([-25.274398,133.775136],zoom_start=4.5,tiles='Stamen Terrain') \r\n",
"\r\n",
" for lat,long,grupo,fire in zip(dataset[\"latitude\"],dataset[\"longitude\"],dataset[\"Cluster\"],dataset[\"frp\"]):\r\n",
" folium.vector_layers.CircleMarker([lat,long],radius=5,tooltip=(\"Radiactive: \" + str(fire)),color=grupo,fill=True,fillo_color=grupo,fill_opacity=0.7).add_to(mapa)\r\n",
" \r\n",
" return mapa "
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "FKh8yTzVLZxn"
},
"source": [
"draw_mapa_cluster(australia_final[australia_final[\"satellite\"]==1])"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "nrUPcVawLpCF"
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment