Skip to content

Instantly share code, notes, and snippets.

Created April 2, 2021 13:39
Show Gist options
  • Save PeterKjeldsen/a8b8d2142d9acf7ae7ef395796351480 to your computer and use it in GitHub Desktop.
Save PeterKjeldsen/a8b8d2142d9acf7ae7ef395796351480 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
"cells": [
"metadata": {},
"cell_type": "markdown",
"source": "<center>\n <img src=\"\" width=\"300\" alt=\" logo\" />\n</center>\n"
"metadata": {},
"cell_type": "markdown",
"source": "# **Survey Dataset Exploration Lab**\n"
"metadata": {},
"cell_type": "markdown",
"source": "Estimated time needed: **30** minutes\n"
"metadata": {},
"cell_type": "markdown",
"source": "## Objectives\n"
"metadata": {},
"cell_type": "markdown",
"source": "After completing this lab you will be able to:\n"
"metadata": {},
"cell_type": "markdown",
"source": "- Load the dataset that will used thru the capstone project.\n- Explore the dataset.\n- Get familier with the data types.\n"
"metadata": {},
"cell_type": "markdown",
"source": "## Load the dataset\n"
"metadata": {},
"cell_type": "markdown",
"source": "Import the required libraries.\n"
"metadata": {},
"cell_type": "code",
"source": "import pandas as pd",
"execution_count": 1,
"outputs": []
"metadata": {},
"cell_type": "markdown",
"source": "The dataset is available on the IBM Cloud at the below url.\n"
"metadata": {},
"cell_type": "code",
"source": "dataset_url = \"\"",
"execution_count": 2,
"outputs": []
"metadata": {},
"cell_type": "markdown",
"source": "Load the data available at dataset_url into a dataframe.\n"
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\ndf=pd.read_csv(dataset_url)",
"execution_count": 3,
"outputs": []
"metadata": {},
"cell_type": "markdown",
"source": "## Explore the data set\n"
"metadata": {},
"cell_type": "markdown",
"source": "It is a good idea to print the top 5 rows of the dataset to get a feel of how the dataset will look.\n"
"metadata": {},
"cell_type": "markdown",
"source": "Display the top 5 rows and columns from your dataset.\n"
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\ndf.head()",
"execution_count": 5,
"outputs": [
"output_type": "execute_result",
"execution_count": 5,
"data": {
"text/plain": " Respondent MainBranch Hobbyist \\\n0 4 I am a developer by profession No \n1 9 I am a developer by profession Yes \n2 13 I am a developer by profession Yes \n3 16 I am a developer by profession Yes \n4 17 I am a developer by profession Yes \n\n OpenSourcer \\\n0 Never \n1 Once a month or more often \n2 Less than once a month but more than once per ... \n3 Never \n4 Less than once a month but more than once per ... \n\n OpenSource Employment \\\n0 The quality of OSS and closed source software ... Employed full-time \n1 The quality of OSS and closed source software ... Employed full-time \n2 OSS is, on average, of HIGHER quality than pro... Employed full-time \n3 The quality of OSS and closed source software ... Employed full-time \n4 The quality of OSS and closed source software ... Employed full-time \n\n Country Student EdLevel \\\n0 United States No Bachelor\u2019s degree (BA, BS, B.Eng., etc.) \n1 New Zealand No Some college/university study without earning ... \n2 United States No Master\u2019s degree (MA, MS, M.Eng., MBA, etc.) \n3 United Kingdom No Master\u2019s degree (MA, MS, M.Eng., MBA, etc.) \n4 Australia No Bachelor\u2019s degree (BA, BS, B.Eng., etc.) \n\n UndergradMajor ... \\\n0 Computer science, computer engineering, or sof... ... \n1 Computer science, computer engineering, or sof... ... \n2 Computer science, computer engineering, or sof... ... \n3 NaN ... \n4 Computer science, computer engineering, or sof... ... \n\n WelcomeChange \\\n0 Just as welcome now as I felt last year \n1 Just as welcome now as I felt last year \n2 Somewhat more welcome now than last year \n3 Just as welcome now as I felt last year \n4 Just as welcome now as I felt last year \n\n SONewContent Age Gender Trans \\\n0 Tech articles written by other developers;Indu... 22.0 Man No \n1 NaN 23.0 Man No \n2 Tech articles written by other developers;Cour... 28.0 Man No \n3 Tech articles written by other developers;Indu... 26.0 Man No \n4 Tech articles written by other developers;Indu... 29.0 Man No \n\n Sexuality Ethnicity Dependents \\\n0 Straight / Heterosexual White or of European descent No \n1 Bisexual White or of European descent No \n2 Straight / Heterosexual White or of European descent Yes \n3 Straight / Heterosexual White or of European descent No \n4 Straight / Heterosexual Hispanic or Latino/Latina;Multiracial No \n\n SurveyLength SurveyEase \n0 Appropriate in length Easy \n1 Appropriate in length Neither easy nor difficult \n2 Appropriate in length Easy \n3 Appropriate in length Neither easy nor difficult \n4 Appropriate in length Easy \n\n[5 rows x 85 columns]",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Respondent</th>\n <th>MainBranch</th>\n <th>Hobbyist</th>\n <th>OpenSourcer</th>\n <th>OpenSource</th>\n <th>Employment</th>\n <th>Country</th>\n <th>Student</th>\n <th>EdLevel</th>\n <th>UndergradMajor</th>\n <th>...</th>\n <th>WelcomeChange</th>\n <th>SONewContent</th>\n <th>Age</th>\n <th>Gender</th>\n <th>Trans</th>\n <th>Sexuality</th>\n <th>Ethnicity</th>\n <th>Dependents</th>\n <th>SurveyLength</th>\n <th>SurveyEase</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>4</td>\n <td>I am a developer by profession</td>\n <td>No</td>\n <td>Never</td>\n <td>The quality of OSS and closed source software ...</td>\n <td>Employed full-time</td>\n <td>United States</td>\n <td>No</td>\n <td>Bachelor\u2019s degree (BA, BS, B.Eng., etc.)</td>\n <td>Computer science, computer engineering, or sof...</td>\n <td>...</td>\n <td>Just as welcome now as I felt last year</td>\n <td>Tech articles written by other developers;Indu...</td>\n <td>22.0</td>\n <td>Man</td>\n <td>No</td>\n <td>Straight / Heterosexual</td>\n <td>White or of European descent</td>\n <td>No</td>\n <td>Appropriate in length</td>\n <td>Easy</td>\n </tr>\n <tr>\n <th>1</th>\n <td>9</td>\n <td>I am a developer by profession</td>\n <td>Yes</td>\n <td>Once a month or more often</td>\n <td>The quality of OSS and closed source software ...</td>\n <td>Employed full-time</td>\n <td>New Zealand</td>\n <td>No</td>\n <td>Some college/university study without earning ...</td>\n <td>Computer science, computer engineering, or sof...</td>\n <td>...</td>\n <td>Just as welcome now as I felt last year</td>\n <td>NaN</td>\n <td>23.0</td>\n <td>Man</td>\n <td>No</td>\n <td>Bisexual</td>\n <td>White or of European descent</td>\n <td>No</td>\n <td>Appropriate in length</td>\n <td>Neither easy nor difficult</td>\n </tr>\n <tr>\n <th>2</th>\n <td>13</td>\n <td>I am a developer by profession</td>\n <td>Yes</td>\n <td>Less than once a month but more than once per ...</td>\n <td>OSS is, on average, of HIGHER quality than pro...</td>\n <td>Employed full-time</td>\n <td>United States</td>\n <td>No</td>\n <td>Master\u2019s degree (MA, MS, M.Eng., MBA, etc.)</td>\n <td>Computer science, computer engineering, or sof...</td>\n <td>...</td>\n <td>Somewhat more welcome now than last year</td>\n <td>Tech articles written by other developers;Cour...</td>\n <td>28.0</td>\n <td>Man</td>\n <td>No</td>\n <td>Straight / Heterosexual</td>\n <td>White or of European descent</td>\n <td>Yes</td>\n <td>Appropriate in length</td>\n <td>Easy</td>\n </tr>\n <tr>\n <th>3</th>\n <td>16</td>\n <td>I am a developer by profession</td>\n <td>Yes</td>\n <td>Never</td>\n <td>The quality of OSS and closed source software ...</td>\n <td>Employed full-time</td>\n <td>United Kingdom</td>\n <td>No</td>\n <td>Master\u2019s degree (MA, MS, M.Eng., MBA, etc.)</td>\n <td>NaN</td>\n <td>...</td>\n <td>Just as welcome now as I felt last year</td>\n <td>Tech articles written by other developers;Indu...</td>\n <td>26.0</td>\n <td>Man</td>\n <td>No</td>\n <td>Straight / Heterosexual</td>\n <td>White or of European descent</td>\n <td>No</td>\n <td>Appropriate in length</td>\n <td>Neither easy nor difficult</td>\n </tr>\n <tr>\n <th>4</th>\n <td>17</td>\n <td>I am a developer by profession</td>\n <td>Yes</td>\n <td>Less than once a month but more than once per ...</td>\n <td>The quality of OSS and closed source software ...</td>\n <td>Employed full-time</td>\n <td>Australia</td>\n <td>No</td>\n <td>Bachelor\u2019s degree (BA, BS, B.Eng., etc.)</td>\n <td>Computer science, computer engineering, or sof...</td>\n <td>...</td>\n <td>Just as welcome now as I felt last year</td>\n <td>Tech articles written by other developers;Indu...</td>\n <td>29.0</td>\n <td>Man</td>\n <td>No</td>\n <td>Straight / Heterosexual</td>\n <td>Hispanic or Latino/Latina;Multiracial</td>\n <td>No</td>\n <td>Appropriate in length</td>\n <td>Easy</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows \u00d7 85 columns</p>\n</div>"
"metadata": {}
"metadata": {},
"cell_type": "markdown",
"source": "## Find out the number of rows and columns\n"
"metadata": {},
"cell_type": "markdown",
"source": "Start by exploring the numbers of rows and columns of data in the dataset.\n"
"metadata": {},
"cell_type": "markdown",
"source": "Print the number of rows in the dataset.\n"
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\ndf.shape[0]",
"execution_count": 7,
"outputs": [
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/plain": "11552"
"metadata": {}
"metadata": {},
"cell_type": "markdown",
"source": "Print the number of columns in the dataset.\n"
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\ndf.shape[1]",
"execution_count": 8,
"outputs": [
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/plain": "85"
"metadata": {}
"metadata": {},
"cell_type": "markdown",
"source": "## Identify the data types of each column\n"
"metadata": {},
"cell_type": "markdown",
"source": "Explore the dataset and identify the data types of each column.\n"
"metadata": {},
"cell_type": "markdown",
"source": "Print the datatype of all columns.\n"
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\ndf.dtypes",
"execution_count": 9,
"outputs": [
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/plain": "Respondent int64\nMainBranch object\nHobbyist object\nOpenSourcer object\nOpenSource object\n ... \nSexuality object\nEthnicity object\nDependents object\nSurveyLength object\nSurveyEase object\nLength: 85, dtype: object"
"metadata": {}
"metadata": {},
"cell_type": "markdown",
"source": "Print the mean age of the survey participants.\n"
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\ndf['Age'].mean()",
"execution_count": 11,
"outputs": [
"output_type": "execute_result",
"execution_count": 11,
"data": {
"text/plain": "30.77239449133718"
"metadata": {}
"metadata": {},
"cell_type": "markdown",
"source": "The dataset is the result of a world wide survey. Print how many unique countries are there in the Country column.\n"
"metadata": {},
"cell_type": "code",
"source": "# your code goes here\ndf2 = df.groupby(['Country']).count()\nlen(df2.index)-1",
"execution_count": 13,
"outputs": [
"output_type": "execute_result",
"execution_count": 13,
"data": {
"text/plain": "134"
"metadata": {}
"metadata": {},
"cell_type": "markdown",
"source": "## Authors\n"
"metadata": {},
"cell_type": "markdown",
"source": "Ramesh Sannareddy\n"
"metadata": {},
"cell_type": "markdown",
"source": "### Other Contributors\n"
"metadata": {},
"cell_type": "markdown",
"source": "Rav Ahuja\n"
"metadata": {},
"cell_type": "markdown",
"source": "## Change Log\n"
"metadata": {},
"cell_type": "markdown",
"source": "| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n| ----------------- | ------- | ----------------- | ---------------------------------- |\n| 2020-10-17 | 0.1 | Ramesh Sannareddy | Created initial version of the lab |\n"
"metadata": {},
"cell_type": "markdown",
"source": " Copyright \u00a9 2020 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](\n"
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3.7",
"language": "python"
"language_info": {
"name": "python",
"version": "3.7.10",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
"nbformat": 4,
"nbformat_minor": 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment