maryamalqasmi/EDA Visualization.ipynb

## EDA Visualization.ipynb
{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<center>\n    <img src=\"https://gitlab.com/ibm/skills-network/courses/placeholder101/-/raw/master/labs/module%201/images/IDSNlogo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\"  />\n</center>\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "# **SpaceX  Falcon 9 First Stage Landing Prediction**\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Assignment: Exploring and Preparing\u00a0Data\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Estimated time needed: **70** minutes\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "In this assignment, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is due to the fact that SpaceX can reuse the first stage.\n\nIn this lab, you will perform Exploratory Data Analysis and Feature Engineering.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Falcon 9 first stage will land successfully\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/landing\\_1.gif)\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Several examples of an unsuccessful landing are shown here:\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/crash.gif)\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Most unsuccessful landings are planned. Space X performs a controlled landing in the oceans.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Objectives\n\nPerform exploratory Data Analysis and Feature Engineering using `Pandas` and `Matplotlib`\n\n*   Exploratory Data Analysis\n*   Preparing\u00a0Data  Feature Engineering\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "***\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### Import Libraries and Define Auxiliary Functions\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "We will import the following libraries the lab\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# andas is a software library written for the Python programming language for data manipulation and analysis.\nimport pandas as pd\n#NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays\nimport numpy as np\n# Matplotlib is a plotting library for python and pyplot gives us a MatLab like plotting framework. We will use this in our plotter function to plot data.\nimport matplotlib.pyplot as plt\n#Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics\nimport seaborn as sns"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Exploratory Data Analysis\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "First, let's read the SpaceX dataset into a Pandas dataframe and print its summary\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "df=pd.read_csv(\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_2.csv\")\n\n# If you were unable to complete the previous lab correctly you can uncomment and load this csv\n\n# df = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/dataset_part_2.csv')\n\ndf.head(5)"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "First, let's try to see how the `FlightNumber` (indicating the continuous launch attempts.) and `Payload` variables would affect the launch outcome.\n\nWe can plot out the <code>FlightNumber</code> vs. <code>PayloadMass</code>and overlay the outcome of the launch. We see that as the flight number increases, the first stage is more likely to land successfully. The payload mass is also important; it seems the more massive the payload, the less likely the first stage will return.\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "sns.catplot(y=\"PayloadMass\", x=\"FlightNumber\", hue=\"Class\", data=df, aspect = 5)\nplt.xlabel(\"Flight Number\",fontsize=20)\nplt.ylabel(\"Pay load Mass (kg)\",fontsize=20)\nplt.show()"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "We see that different launch sites have different success rates.  <code>CCAFS LC-40</code>, has a success rate of 60 %, while  <code>KSC LC-39A</code> and <code>VAFB SLC 4E</code> has a success rate of 77%.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Next, let's drill down to each site visualize its detailed launch records.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### TASK 1: Visualize the relationship between Flight Number and Launch Site\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Use the function <code>catplot</code> to plot <code>FlightNumber</code> vs <code>LaunchSite</code>, set the  parameter <code>x</code>  parameter to <code>FlightNumber</code>,set the  <code>y</code> to <code>Launch Site</code> and set the parameter <code>hue</code> to <code>'class'</code>\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# Plot a scatter point chart with x axis to be Flight Number and y axis to be the launch site, and hue to be the class value\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Now try to explain the patterns you found in the Flight Number vs. Launch Site scatter point plots.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### TASK 2: Visualize the relationship between Payload and Launch Site\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "We also want to observe if there is any relationship between launch sites and their payload mass.\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# Plot a scatter point chart with x axis to be Pay Load Mass (kg) and y axis to be the launch site, and hue to be the class value\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Now try to explain any patterns you found in the Payload Vs. Launch Site scatter point chart.\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": ""
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### TASK  3: Visualize the relationship between success rate of each orbit type\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Next, we want to visually check if there are any relationship between success rate and orbit type.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Let's create a `bar chart` for the sucess rate of each orbit\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# HINT use groupby method on Orbit column and get the mean of Class column\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Analyze the ploted bar chart try to find which orbits have high sucess rate.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### TASK  4: Visualize the relationship between FlightNumber and Orbit type\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "For each orbit, we want to see if there is any relationship between FlightNumber and Orbit type.\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# Plot a scatter point chart with x axis to be FlightNumber and y axis to be the Orbit, and hue to be the class value\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "You should see that in the LEO orbit the Success appears related to the number of flights; on the other hand, there seems to be no relationship between flight number when in GTO orbit.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### TASK  5: Visualize the relationship between Payload and Orbit type\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Similarly, we can plot the Payload vs. Orbit scatter point charts to reveal the relationship between Payload and Orbit type\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# Plot a scatter point chart with x axis to be Payload and y axis to be the Orbit, and hue to be the class value\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "You should observe that Heavy payloads have a negative influence on GTO orbits and positive on GTO and Polar LEO (ISS) orbits.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### TASK  6: Visualize the launch success yearly trend\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "You can plot a line chart with x axis to be <code>Year</code> and y axis to be average success rate, to get the average launch success trend.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "The function will help you get the year from the date:\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# A function to Extract years from the date \nyear=[]\ndef Extract_year(date):\n    for i in df[\"Date\"]:\n        year.append(i.split(\"-\")[0])\n    return year\n    "
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# Plot a line chart with x axis to be the extracted year and y axis to be the success rate\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "you can observe that the sucess rate since 2013 kept increasing till 2020\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Features Engineering\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "By now, you should obtain some preliminary insights about how each important variable would affect the success rate, we will select the features that will be used in success prediction in the future module.\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "features = df[['FlightNumber', 'PayloadMass', 'Orbit', 'LaunchSite', 'Flights', 'GridFins', 'Reused', 'Legs', 'LandingPad', 'Block', 'ReusedCount', 'Serial']]\nfeatures.head()"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### TASK  7: Create dummy variables to categorical columns\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Use the function <code>get_dummies</code> and <code>features</code> dataframe to apply OneHotEncoder to the column <code>Orbits</code>, <code>LaunchSite</code>, <code>LandingPad</code>, and <code>Serial</code>. Assign the value to the variable <code>features_one_hot</code>, display the results using the method head. Your result dataframe must include all features including the encoded ones.\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# HINT: Use get_dummies() function on the categorical columns\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "### TASK  8: Cast all numeric columns to `float64`\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Now that our <code>features_one_hot</code> dataframe only contains numbers cast the entire dataframe to variable type <code>float64</code>\n"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "# HINT: use astype function\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "We can now export it to a <b>CSV</b> for the next section,but to make the answers consistent, in the next lab we will provide data in a pre-selected date range.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<code>features_one_hot.to_csv('dataset_part\\_3.csv', index=False)</code>\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Authors\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<a href=\"https://www.linkedin.com/in/joseph-s-50398b136/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01\">Joseph Santarcangelo</a> has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "<a href=\"https://www.linkedin.com/in/nayefaboutayoun/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01\">Nayef Abou Tayoun</a> is a Data Scientist at IBM and pursuing a Master of Management in Artificial intelligence degree at Queen's University.\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## Change Log\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "| Date (YYYY-MM-DD) | Version | Changed By | Change Description      |\n| ----------------- | ------- | ---------- | ----------------------- |\n| 2020-09-20        | 1.0     | Joseph     | Modified Multiple Areas |\n| 2020-11-10        | 1.1     | Nayef      | updating the input data |\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Copyright \u00a9 2020 IBM Corporation. All rights reserved.\n"
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.8.8"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<center>\n <img src=\"https://gitlab.com/ibm/skills-network/courses/placeholder101/-/raw/master/labs/module%201/images/IDSNlogo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n</center>\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "# SpaceX Falcon 9 First Stage Landing Prediction\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "## Assignment: Exploring and Preparing\u00a0Data\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Estimated time needed: 70 minutes\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "In this assignment, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is due to the fact that SpaceX can reuse the first stage.\n\nIn this lab, you will perform Exploratory Data Analysis and Feature Engineering.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Falcon 9 first stage will land successfully\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/landing\\_1.gif)\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Several examples of an unsuccessful landing are shown here:\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/crash.gif)\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Most unsuccessful landings are planned. Space X performs a controlled landing in the oceans.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "## Objectives\n\nPerform exploratory Data Analysis and Feature Engineering using `Pandas` and `Matplotlib`\n\n* Exploratory Data Analysis\n* Preparing\u00a0Data Feature Engineering\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "***\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### Import Libraries and Define Auxiliary Functions\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "We will import the following libraries the lab\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# andas is a software library written for the Python programming language for data manipulation and analysis.\nimport pandas as pd\n#NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays\nimport numpy as np\n# Matplotlib is a plotting library for python and pyplot gives us a MatLab like plotting framework. We will use this in our plotter function to plot data.\nimport matplotlib.pyplot as plt\n#Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics\nimport seaborn as sns"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "## Exploratory Data Analysis\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "First, let's read the SpaceX dataset into a Pandas dataframe and print its summary\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "df=pd.read_csv(\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_2.csv\")\n\n# If you were unable to complete the previous lab correctly you can uncomment and load this csv\n\n# df = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/dataset_part_2.csv')\n\ndf.head(5)"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "First, let's try to see how the `FlightNumber` (indicating the continuous launch attempts.) and `Payload` variables would affect the launch outcome.\n\nWe can plot out the <code>FlightNumber</code> vs. <code>PayloadMass</code>and overlay the outcome of the launch. We see that as the flight number increases, the first stage is more likely to land successfully. The payload mass is also important; it seems the more massive the payload, the less likely the first stage will return.\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "sns.catplot(y=\"PayloadMass\", x=\"FlightNumber\", hue=\"Class\", data=df, aspect = 5)\nplt.xlabel(\"Flight Number\",fontsize=20)\nplt.ylabel(\"Pay load Mass (kg)\",fontsize=20)\nplt.show()"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "We see that different launch sites have different success rates. <code>CCAFS LC-40</code>, has a success rate of 60 %, while <code>KSC LC-39A</code> and <code>VAFB SLC 4E</code> has a success rate of 77%.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Next, let's drill down to each site visualize its detailed launch records.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### TASK 1: Visualize the relationship between Flight Number and Launch Site\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Use the function <code>catplot</code> to plot <code>FlightNumber</code> vs <code>LaunchSite</code>, set the parameter <code>x</code> parameter to <code>FlightNumber</code>,set the <code>y</code> to <code>Launch Site</code> and set the parameter <code>hue</code> to <code>'class'</code>\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# Plot a scatter point chart with x axis to be Flight Number and y axis to be the launch site, and hue to be the class value\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Now try to explain the patterns you found in the Flight Number vs. Launch Site scatter point plots.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### TASK 2: Visualize the relationship between Payload and Launch Site\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "We also want to observe if there is any relationship between launch sites and their payload mass.\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# Plot a scatter point chart with x axis to be Pay Load Mass (kg) and y axis to be the launch site, and hue to be the class value\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Now try to explain any patterns you found in the Payload Vs. Launch Site scatter point chart.\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": ""
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### TASK 3: Visualize the relationship between success rate of each orbit type\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Next, we want to visually check if there are any relationship between success rate and orbit type.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Let's create a `bar chart` for the sucess rate of each orbit\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# HINT use groupby method on Orbit column and get the mean of Class column\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Analyze the ploted bar chart try to find which orbits have high sucess rate.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### TASK 4: Visualize the relationship between FlightNumber and Orbit type\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "For each orbit, we want to see if there is any relationship between FlightNumber and Orbit type.\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# Plot a scatter point chart with x axis to be FlightNumber and y axis to be the Orbit, and hue to be the class value\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "You should see that in the LEO orbit the Success appears related to the number of flights; on the other hand, there seems to be no relationship between flight number when in GTO orbit.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### TASK 5: Visualize the relationship between Payload and Orbit type\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Similarly, we can plot the Payload vs. Orbit scatter point charts to reveal the relationship between Payload and Orbit type\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# Plot a scatter point chart with x axis to be Payload and y axis to be the Orbit, and hue to be the class value\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "You should observe that Heavy payloads have a negative influence on GTO orbits and positive on GTO and Polar LEO (ISS) orbits.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### TASK 6: Visualize the launch success yearly trend\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "You can plot a line chart with x axis to be <code>Year</code> and y axis to be average success rate, to get the average launch success trend.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "The function will help you get the year from the date:\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# A function to Extract years from the date \nyear=[]\ndef Extract_year(date):\n for i in df[\"Date\"]:\n year.append(i.split(\"-\")[0])\n return year\n "
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# Plot a line chart with x axis to be the extracted year and y axis to be the success rate\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "you can observe that the sucess rate since 2013 kept increasing till 2020\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "## Features Engineering\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "By now, you should obtain some preliminary insights about how each important variable would affect the success rate, we will select the features that will be used in success prediction in the future module.\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "features = df[['FlightNumber', 'PayloadMass', 'Orbit', 'LaunchSite', 'Flights', 'GridFins', 'Reused', 'Legs', 'LandingPad', 'Block', 'ReusedCount', 'Serial']]\nfeatures.head()"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### TASK 7: Create dummy variables to categorical columns\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Use the function <code>get_dummies</code> and <code>features</code> dataframe to apply OneHotEncoder to the column <code>Orbits</code>, <code>LaunchSite</code>, <code>LandingPad</code>, and <code>Serial</code>. Assign the value to the variable <code>features_one_hot</code>, display the results using the method head. Your result dataframe must include all features including the encoded ones.\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# HINT: Use get_dummies() function on the categorical columns\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "### TASK 8: Cast all numeric columns to `float64`\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Now that our <code>features_one_hot</code> dataframe only contains numbers cast the entire dataframe to variable type <code>float64</code>\n"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "# HINT: use astype function\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "We can now export it to a <b>CSV</b> for the next section,but to make the answers consistent, in the next lab we will provide data in a pre-selected date range.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<code>features_one_hot.to_csv('dataset_part\\_3.csv', index=False)</code>\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "## Authors\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<a href=\"https://www.linkedin.com/in/joseph-s-50398b136/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01\">Joseph Santarcangelo</a> has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<a href=\"https://www.linkedin.com/in/nayefaboutayoun/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01\">Nayef Abou Tayoun</a> is a Data Scientist at IBM and pursuing a Master of Management in Artificial intelligence degree at Queen's University.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "## Change Log\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "\| Date (YYYY-MM-DD) \| Version \| Changed By \| Change Description \|\n\| ----------------- \| ------- \| ---------- \| ----------------------- \|\n\| 2020-09-20 \| 1.0 \| Joseph \| Modified Multiple Areas \|\n\| 2020-11-10 \| 1.1 \| Nayef \| updating the input data \|\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Copyright \u00a9 2020 IBM Corporation. All rights reserved.\n"
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.8"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}