Skip to content

Instantly share code, notes, and snippets.

@redpanda-ai
Created February 21, 2019 01:51
Show Gist options
  • Save redpanda-ai/588f57474bad029cfe52d0a58dfa6fd6 to your computer and use it in GitHub Desktop.
Save redpanda-ai/588f57474bad029cfe52d0a58dfa6fd6 to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"My friend [Johannes Giorgis] and I are developing a series of [Data Science Challenges] to help others\n",
"become better data scientists by presenting a series of challenges. Why did we do this?\n",
"\n",
"\n",
"> Because that's what heroes do!\n",
">\n",
"> --Johannes Giorgis\n",
"\n",
"I now present my response to the first challenge, Exploring the Meetup API in my the city of my choice.\n",
"\n",
"**San Francisco, CA**, I choose you!\n",
"\n",
"\n",
"\n",
"# Challenge 01: Explore the Meetup API\n",
"Use the [Meetup API] to explore meetups in your city of choice.\n",
"\n",
"\n",
"**Guide Questions**:\n",
"\n",
"Below are some guide line questions to get you started:\n",
"\n",
"- What is the largest meetup in your location of choice (city, cities, country...etc)?\n",
"- How many meetups of a certain category (e.g. Tech, Art...etc) are in your city?\n",
"- Basic statistics of meetups\n",
"\t- What is the average size of meetups?\n",
"\t- How frequently do meetups host events?\n",
"\n",
"\n",
"\n",
"\n",
"## Prerequisites:\n",
"Add a [Meetup API Key] to your environment.\n",
"\n",
"[//]: # (References)\n",
"\n",
"[Meetup API]: https://www.meetup.com/meetup_api/\n",
"[Meetup API Key]: https://secure.meetup.com/meetup_api/key/\n",
"[Johannes Giorgis]: http://johannesgiorgis.com/\n",
"[Data Science Challenges]: https://medium.com/red-panda-ai/introducing-data-science-challenges-4ae4a103d67b"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import datetime\n",
"import json\n",
"import math\n",
"import meetup.api\n",
"import os\n",
"import pprint\n",
"import requests\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sb\n",
"\n",
"from tqdm import tnrange, tqdm_notebook\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Double check your environment\n",
"\n",
"Nothing works without **MEETUP_API_KEY**."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"assert 'MEETUP_API_KEY' in os.environ, (\n",
" \"You need a MEETUP_API_KEY in your environment please look at the \"\n",
" \"README for instructions.\")\n",
"client = meetup.api.Client()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Categories\n",
"\n",
"There are multiple categories of groups in Meetup, let's use Python's meetup.api to [GetCategories](https://meetup-api.readthedocs.io/en/latest/meetup_api.html#meetup.api.meetup.api.Client.GetCategories)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"categories = client.GetCategories()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exploring the response object\n",
"\n",
"By looking at the **meta** member of the response, we can see that there are 33 different categories."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What are the attributes of the response object?\n",
"We can find out by using __dict__ to get the attribute dictionary. First, \n",
"let's create some helper functions"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Helper functions\n",
"def parse_meta(response):\n",
" \"\"\"Return a vertically aligned dataframe, where each row is an element \n",
" of the response.meta dictionary\"\"\"\n",
" return pd.DataFrame.from_dict(response.meta, orient='index')\n",
"\n",
"def parse_results(response):\n",
" \"\"\"Return a horizontally aligned dataframe, where each column is an\n",
" element of the response.results dictionary \"\"\"\n",
" return pd.DataFrame.from_dict(response.results)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now we can easily inspect our categories response object\n",
"\n",
"#### Firstly, let's look at the meta-data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>next</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>method</th>\n",
" <td>Categories</td>\n",
" </tr>\n",
" <tr>\n",
" <th>total_count</th>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>link</th>\n",
" <td>https://api.meetup.com/2/categories</td>\n",
" </tr>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>description</th>\n",
" <td>Returns a list of Meetup group categories</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lon</th>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>title</th>\n",
" <td>Categories</td>\n",
" </tr>\n",
" <tr>\n",
" <th>url</th>\n",
" <td>https://api.meetup.com/2/categories?offset=0&amp;f...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>updated</th>\n",
" <td>1450292956000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"next \n",
"method Categories\n",
"total_count 33\n",
"link https://api.meetup.com/2/categories\n",
"count 33\n",
"description Returns a list of Meetup group categories\n",
"lon None\n",
"title Categories\n",
"url https://api.meetup.com/2/categories?offset=0&f...\n",
"id \n",
"updated 1450292956000\n",
"lat None"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cat_meta_df = parse_meta(categories)\n",
"cat_meta_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see from the meta that there are 33 categories available to us.\n",
"I wonder what they are. \n",
"\n",
"Also, notice that the value of **next** (above) is an empty string. Meetup API v2 response payloads come in **pages**, one at a time, but provide the URI of the **next** API call in the sequence. We can use this to programmatically get each next **page** in **response.meta\\[\"next\"\\]**. until the complete result is returned.\n",
"\n",
"As we can see, the **response.meta\\[\"next\"\\]** for this page is an empty string, so all of the categories\n",
"fit into our first API call.\n",
"\n",
"#### Secondly, let's review the categories themselves"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>shortname</th>\n",
" <th>sort_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Arts &amp; Culture</td>\n",
" <td>Arts</td>\n",
" <td>Arts &amp; Culture</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>18</td>\n",
" <td>Book Clubs</td>\n",
" <td>Book Clubs</td>\n",
" <td>Book Clubs</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>Career &amp; Business</td>\n",
" <td>Business</td>\n",
" <td>Career &amp; Business</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>Cars &amp; Motorcycles</td>\n",
" <td>Auto</td>\n",
" <td>Cars &amp; Motorcycles</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>Community &amp; Environment</td>\n",
" <td>Community</td>\n",
" <td>Community &amp; Environment</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5</td>\n",
" <td>Dancing</td>\n",
" <td>Dancing</td>\n",
" <td>Dancing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>6</td>\n",
" <td>Education &amp; Learning</td>\n",
" <td>Education</td>\n",
" <td>Education &amp; Learning</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>Fashion &amp; Beauty</td>\n",
" <td>Fashion</td>\n",
" <td>Fashion &amp; Beauty</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>Fitness</td>\n",
" <td>Fitness</td>\n",
" <td>Fitness</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td>Food &amp; Drink</td>\n",
" <td>Food &amp; Drink</td>\n",
" <td>Food &amp; Drink</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>11</td>\n",
" <td>Games</td>\n",
" <td>Games</td>\n",
" <td>Games</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>13</td>\n",
" <td>Movements &amp; Politics</td>\n",
" <td>Movements</td>\n",
" <td>Movements &amp; Politics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>14</td>\n",
" <td>Health &amp; Wellbeing</td>\n",
" <td>Well-being</td>\n",
" <td>Health &amp; Wellbeing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>15</td>\n",
" <td>Hobbies &amp; Crafts</td>\n",
" <td>Crafts</td>\n",
" <td>Hobbies &amp; Crafts</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>16</td>\n",
" <td>Language &amp; Ethnic Identity</td>\n",
" <td>Languages</td>\n",
" <td>Language &amp; Ethnic Identity</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>12</td>\n",
" <td>LGBT</td>\n",
" <td>LGBT</td>\n",
" <td>LGBT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>17</td>\n",
" <td>Lifestyle</td>\n",
" <td>Lifestyle</td>\n",
" <td>Lifestyle</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>20</td>\n",
" <td>Movies &amp; Film</td>\n",
" <td>Films</td>\n",
" <td>Movies &amp; Film</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>21</td>\n",
" <td>Music</td>\n",
" <td>Music</td>\n",
" <td>Music</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>22</td>\n",
" <td>New Age &amp; Spirituality</td>\n",
" <td>Spirituality</td>\n",
" <td>New Age &amp; Spirituality</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>23</td>\n",
" <td>Outdoors &amp; Adventure</td>\n",
" <td>Outdoors</td>\n",
" <td>Outdoors &amp; Adventure</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>24</td>\n",
" <td>Paranormal</td>\n",
" <td>Paranormal</td>\n",
" <td>Paranormal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>25</td>\n",
" <td>Parents &amp; Family</td>\n",
" <td>Moms &amp; Dads</td>\n",
" <td>Parents &amp; Family</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>26</td>\n",
" <td>Pets &amp; Animals</td>\n",
" <td>Pets</td>\n",
" <td>Pets &amp; Animals</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>27</td>\n",
" <td>Photography</td>\n",
" <td>Photography</td>\n",
" <td>Photography</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>28</td>\n",
" <td>Religion &amp; Beliefs</td>\n",
" <td>Beliefs</td>\n",
" <td>Religion &amp; Beliefs</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>29</td>\n",
" <td>Sci-Fi &amp; Fantasy</td>\n",
" <td>Sci fi</td>\n",
" <td>Sci-Fi &amp; Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>30</td>\n",
" <td>Singles</td>\n",
" <td>Singles</td>\n",
" <td>Singles</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>31</td>\n",
" <td>Socializing</td>\n",
" <td>Social</td>\n",
" <td>Socializing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>32</td>\n",
" <td>Sports &amp; Recreation</td>\n",
" <td>Sports</td>\n",
" <td>Sports &amp; Recreation</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>33</td>\n",
" <td>Support</td>\n",
" <td>Support</td>\n",
" <td>Support</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>34</td>\n",
" <td>Tech</td>\n",
" <td>Tech</td>\n",
" <td>Tech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>36</td>\n",
" <td>Writing</td>\n",
" <td>Writing</td>\n",
" <td>Writing</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name shortname sort_name\n",
"0 1 Arts & Culture Arts Arts & Culture\n",
"1 18 Book Clubs Book Clubs Book Clubs\n",
"2 2 Career & Business Business Career & Business\n",
"3 3 Cars & Motorcycles Auto Cars & Motorcycles\n",
"4 4 Community & Environment Community Community & Environment\n",
"5 5 Dancing Dancing Dancing\n",
"6 6 Education & Learning Education Education & Learning\n",
"7 8 Fashion & Beauty Fashion Fashion & Beauty\n",
"8 9 Fitness Fitness Fitness\n",
"9 10 Food & Drink Food & Drink Food & Drink\n",
"10 11 Games Games Games\n",
"11 13 Movements & Politics Movements Movements & Politics\n",
"12 14 Health & Wellbeing Well-being Health & Wellbeing\n",
"13 15 Hobbies & Crafts Crafts Hobbies & Crafts\n",
"14 16 Language & Ethnic Identity Languages Language & Ethnic Identity\n",
"15 12 LGBT LGBT LGBT\n",
"16 17 Lifestyle Lifestyle Lifestyle\n",
"17 20 Movies & Film Films Movies & Film\n",
"18 21 Music Music Music\n",
"19 22 New Age & Spirituality Spirituality New Age & Spirituality\n",
"20 23 Outdoors & Adventure Outdoors Outdoors & Adventure\n",
"21 24 Paranormal Paranormal Paranormal\n",
"22 25 Parents & Family Moms & Dads Parents & Family\n",
"23 26 Pets & Animals Pets Pets & Animals\n",
"24 27 Photography Photography Photography\n",
"25 28 Religion & Beliefs Beliefs Religion & Beliefs\n",
"26 29 Sci-Fi & Fantasy Sci fi Sci-Fi & Fantasy\n",
"27 30 Singles Singles Singles\n",
"28 31 Socializing Social Socializing\n",
"29 32 Sports & Recreation Sports Sports & Recreation\n",
"30 33 Support Support Support\n",
"31 34 Tech Tech Tech\n",
"32 36 Writing Writing Writing"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cats_df = parse_results(categories)\n",
"cats_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### So, if we want to work with a particular category\n",
"In this case, I want **Tech**. Let's query the dataframe for categories named **Tech**."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>shortname</th>\n",
" <th>sort_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>34</td>\n",
" <td>Tech</td>\n",
" <td>Tech</td>\n",
" <td>Tech</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name shortname sort_name\n",
"31 34 Tech Tech Tech"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tech_df = cats_df.loc[cats_df['name'] == 'Tech']\n",
"tech_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's store the category ID number for later use"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"34"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tech_category_id = tech_df['id'].values[0]\n",
"tech_category_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore Cities\n",
"### Now let's look at cities in the United States named San Francisco"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"cities_resp = client.GetCities(country='United States', query='San Francisco')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we used the [GetCities] method of the [Python Meetup API client]\n",
"\n",
"I used a query for cities in **United States** called **San Francisco**.\n",
"\n",
"[//]: # (References)\n",
"\n",
"[GetCities]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#meetup.api.meetup.api.Client.GetCities\n",
"[Python Meetup API client]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#api-client-details\n",
"\n",
"Now let's take a look at the **meta** for our results."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>next</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>method</th>\n",
" <td>Cities</td>\n",
" </tr>\n",
" <tr>\n",
" <th>total_count</th>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>link</th>\n",
" <td>https://api.meetup.com/2/cities</td>\n",
" </tr>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>description</th>\n",
" <td>Returns Meetup cities. This method supports se...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lon</th>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>title</th>\n",
" <td>Cities</td>\n",
" </tr>\n",
" <tr>\n",
" <th>url</th>\n",
" <td>https://api.meetup.com/2/cities?offset=0&amp;query...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>updated</th>\n",
" <td>1526855850000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"next \n",
"method Cities\n",
"total_count 10\n",
"link https://api.meetup.com/2/cities\n",
"count 10\n",
"description Returns Meetup cities. This method supports se...\n",
"lon None\n",
"title Cities\n",
"url https://api.meetup.com/2/cities?offset=0&query...\n",
"id \n",
"updated 1526855850000\n",
"lat None"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_meta_df = parse_meta(cities_resp)\n",
"cities_meta_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hmm, a count of 10 cities is suspicious...\n",
"\n",
"I only know of the one San Francisco, why are there 10 cities?"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city</th>\n",
" <th>country</th>\n",
" <th>id</th>\n",
" <th>lat</th>\n",
" <th>localized_country_name</th>\n",
" <th>lon</th>\n",
" <th>member_count</th>\n",
" <th>name_string</th>\n",
" <th>ranking</th>\n",
" <th>state</th>\n",
" <th>zip</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>San Francisco</td>\n",
" <td>us</td>\n",
" <td>94101</td>\n",
" <td>37.779999</td>\n",
" <td>USA</td>\n",
" <td>-122.419998</td>\n",
" <td>60351</td>\n",
" <td>San Francisco, California, USA</td>\n",
" <td>0</td>\n",
" <td>CA</td>\n",
" <td>94101</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Bosque</td>\n",
" <td>us</td>\n",
" <td>87006</td>\n",
" <td>34.560001</td>\n",
" <td>USA</td>\n",
" <td>-106.779999</td>\n",
" <td>5</td>\n",
" <td>San Francisco, New Mexico, USA</td>\n",
" <td>1</td>\n",
" <td>NM</td>\n",
" <td>87006</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>San Luis</td>\n",
" <td>us</td>\n",
" <td>81152</td>\n",
" <td>37.080002</td>\n",
" <td>USA</td>\n",
" <td>-105.620003</td>\n",
" <td>4</td>\n",
" <td>San Francisco, Colorado, USA</td>\n",
" <td>2</td>\n",
" <td>CO</td>\n",
" <td>81152</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>San Francisco de Macorís</td>\n",
" <td>do</td>\n",
" <td>1009671</td>\n",
" <td>19.299999</td>\n",
" <td>Dominican Republic</td>\n",
" <td>-70.250000</td>\n",
" <td>4</td>\n",
" <td>San Francisco de Macorís, Dominican Republic</td>\n",
" <td>3</td>\n",
" <td>NaN</td>\n",
" <td>meetup6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Reserve</td>\n",
" <td>us</td>\n",
" <td>87830</td>\n",
" <td>33.650002</td>\n",
" <td>USA</td>\n",
" <td>-108.769997</td>\n",
" <td>1</td>\n",
" <td>San Francisco Plaza, New Mexico, USA</td>\n",
" <td>4</td>\n",
" <td>NM</td>\n",
" <td>87830</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>San Francisco de Yojoa</td>\n",
" <td>hn</td>\n",
" <td>1016115</td>\n",
" <td>15.020000</td>\n",
" <td>Honduras</td>\n",
" <td>-87.970001</td>\n",
" <td>1</td>\n",
" <td>San Francisco de Yojoa, Honduras</td>\n",
" <td>5</td>\n",
" <td>NaN</td>\n",
" <td>meetup213</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>San Francisco del Mar</td>\n",
" <td>mx</td>\n",
" <td>1028671</td>\n",
" <td>16.230000</td>\n",
" <td>Mexico</td>\n",
" <td>-94.650002</td>\n",
" <td>1</td>\n",
" <td>San Francisco del Mar, Mexico</td>\n",
" <td>6</td>\n",
" <td>NaN</td>\n",
" <td>meetup1676</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>San Francisco Menéndez</td>\n",
" <td>sv</td>\n",
" <td>1038344</td>\n",
" <td>13.850000</td>\n",
" <td>El Salvador</td>\n",
" <td>-90.019997</td>\n",
" <td>0</td>\n",
" <td>San Francisco Menéndez, El Salvador</td>\n",
" <td>7</td>\n",
" <td>NaN</td>\n",
" <td>meetup110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>San Francisco</td>\n",
" <td>ar</td>\n",
" <td>1111239</td>\n",
" <td>-31.430000</td>\n",
" <td>Argentina</td>\n",
" <td>-62.080002</td>\n",
" <td>0</td>\n",
" <td>San Francisco, Argentina</td>\n",
" <td>8</td>\n",
" <td>NaN</td>\n",
" <td>meetup95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>San Francisco El Alto</td>\n",
" <td>gt</td>\n",
" <td>1015524</td>\n",
" <td>14.950000</td>\n",
" <td>Guatemala</td>\n",
" <td>-91.449997</td>\n",
" <td>0</td>\n",
" <td>San Francisco El Alto, Guatemala</td>\n",
" <td>9</td>\n",
" <td>NaN</td>\n",
" <td>meetup24</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city country id lat \\\n",
"0 San Francisco us 94101 37.779999 \n",
"1 Bosque us 87006 34.560001 \n",
"2 San Luis us 81152 37.080002 \n",
"3 San Francisco de Macorís do 1009671 19.299999 \n",
"4 Reserve us 87830 33.650002 \n",
"5 San Francisco de Yojoa hn 1016115 15.020000 \n",
"6 San Francisco del Mar mx 1028671 16.230000 \n",
"7 San Francisco Menéndez sv 1038344 13.850000 \n",
"8 San Francisco ar 1111239 -31.430000 \n",
"9 San Francisco El Alto gt 1015524 14.950000 \n",
"\n",
" localized_country_name lon member_count \\\n",
"0 USA -122.419998 60351 \n",
"1 USA -106.779999 5 \n",
"2 USA -105.620003 4 \n",
"3 Dominican Republic -70.250000 4 \n",
"4 USA -108.769997 1 \n",
"5 Honduras -87.970001 1 \n",
"6 Mexico -94.650002 1 \n",
"7 El Salvador -90.019997 0 \n",
"8 Argentina -62.080002 0 \n",
"9 Guatemala -91.449997 0 \n",
"\n",
" name_string ranking state zip \n",
"0 San Francisco, California, USA 0 CA 94101 \n",
"1 San Francisco, New Mexico, USA 1 NM 87006 \n",
"2 San Francisco, Colorado, USA 2 CO 81152 \n",
"3 San Francisco de Macorís, Dominican Republic 3 NaN meetup6 \n",
"4 San Francisco Plaza, New Mexico, USA 4 NM 87830 \n",
"5 San Francisco de Yojoa, Honduras 5 NaN meetup213 \n",
"6 San Francisco del Mar, Mexico 6 NaN meetup1676 \n",
"7 San Francisco Menéndez, El Salvador 7 NaN meetup110 \n",
"8 San Francisco, Argentina 8 NaN meetup95 \n",
"9 San Francisco El Alto, Guatemala 9 NaN meetup24 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_df = parse_results(cities_resp)\n",
"cities_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's filter the dataframe with a query give us only cities in California, US"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city</th>\n",
" <th>country</th>\n",
" <th>id</th>\n",
" <th>lat</th>\n",
" <th>localized_country_name</th>\n",
" <th>lon</th>\n",
" <th>member_count</th>\n",
" <th>name_string</th>\n",
" <th>ranking</th>\n",
" <th>state</th>\n",
" <th>zip</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>San Francisco</td>\n",
" <td>us</td>\n",
" <td>94101</td>\n",
" <td>37.779999</td>\n",
" <td>USA</td>\n",
" <td>-122.419998</td>\n",
" <td>60351</td>\n",
" <td>San Francisco, California, USA</td>\n",
" <td>0</td>\n",
" <td>CA</td>\n",
" <td>94101</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city country id lat localized_country_name lon \\\n",
"0 San Francisco us 94101 37.779999 USA -122.419998 \n",
"\n",
" member_count name_string ranking state zip \n",
"0 60351 San Francisco, California, USA 0 CA 94101 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"single_city_df = cities_df.loc[\n",
" (cities_df['state'] == 'CA') & \n",
" (cities_df['country'] == 'us')]\n",
"\n",
"single_city_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One San Francisco, perfect! \n",
"\n",
"### Let's store the latitude and longitude for later use as well"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(37.779998779296875, -122.41999816894531)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"latitude = single_city_df['lat'].values[0]\n",
"longitude = single_city_df['lon'].values[0]\n",
"latitude, longitude"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now let's look at groups in San Francisco, CA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Since we are going to grab lots of groups, lets make a function to help us call the API\n",
"\n",
"Note, this function will use the tech_category_id, latitude, and longitude values that we \n",
"found eariler."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def get_a_group(page_number, category_id=tech_category_id, lat=latitude,\n",
" lon=longitude):\n",
" group = None\n",
" retry_counter, retry_max = 0, 3\n",
" print(f\"Getting page {page_number}\")\n",
" while retry_counter < retry_max:\n",
" try:\n",
" group = client.GetGroups(\n",
" category_id=category_id, lat=lat, lon=lon, offset=page_number)\n",
" return group\n",
" except:\n",
" print(f\"Fetch failure {retry_counter + 1}\")\n",
" retry_counter += 1\n",
"\n",
" raise Exception(f\"Unable to fetch page after {retry_counter} attempts\")\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now, After grabbing the first group\n",
"\n",
"Let's review the **meta** to help us see what we are getting into"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>next</th>\n",
" <td>https://api.meetup.com/2/groups?offset=1&amp;forma...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>method</th>\n",
" <td>Groups</td>\n",
" </tr>\n",
" <tr>\n",
" <th>total_count</th>\n",
" <td>2186</td>\n",
" </tr>\n",
" <tr>\n",
" <th>link</th>\n",
" <td>https://api.meetup.com/2/groups</td>\n",
" </tr>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>description</th>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lon</th>\n",
" <td>-122.42</td>\n",
" </tr>\n",
" <tr>\n",
" <th>title</th>\n",
" <td>Meetup Groups v2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>url</th>\n",
" <td>https://api.meetup.com/2/groups?offset=0&amp;forma...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>updated</th>\n",
" <td>1550616065000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <td>37.78</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"next https://api.meetup.com/2/groups?offset=1&forma...\n",
"method Groups\n",
"total_count 2186\n",
"link https://api.meetup.com/2/groups\n",
"count 200\n",
"description None\n",
"lon -122.42\n",
"title Meetup Groups v2\n",
"url https://api.meetup.com/2/groups?offset=0&forma...\n",
"id \n",
"updated 1550616065000\n",
"lat 37.78"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%capture --no-display\n",
"group_resp = get_a_group(0)\n",
"group_meta = parse_meta(group_resp)\n",
"group_meta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Wait! There's a meta[\\\"next\\\"].\n",
"\n",
"Remember earlier when I spoke about **response.meta\\[\"next\"\\]**? \n",
"\n",
"I seems as though our result will span mulitple API calls, each returning 200 new groups.\n",
"\n",
"Let's make a new helper that will grab each payload in a series of API calls until we obtain the entire data set:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def get_all_groups_as_a_df():\n",
" \"\"\"Returns a single dataframe composed from data from multiple\n",
" successive calls to get_a_group.\n",
" \n",
" We will loop through get_a_group pages while our page.meta['next'] is \n",
" not the empty string.\n",
" \"\"\"\n",
" page_df_list = []\n",
" next_page = None\n",
" page_number = 0\n",
" while next_page != '': \n",
" page = get_a_group(page_number)\n",
" next_page = page.meta[\"next\"]\n",
" frame = parse_results(page)\n",
" page_number += 1\n",
" page_df_list.append(frame)\n",
" \n",
" return pd.concat(page_df_list, ignore_index=True)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Getting page 0\n",
"28/30 (6 seconds remaining)\n",
"Getting page 1\n",
"27/30 (6 seconds remaining)\n",
"Getting page 2\n",
"26/30 (4 seconds remaining)\n",
"Getting page 3\n",
"25/30 (4 seconds remaining)\n",
"Getting page 4\n",
"24/30 (3 seconds remaining)\n",
"Getting page 5\n",
"23/30 (1 seconds remaining)\n",
"Getting page 6\n",
"22/30 (0 seconds remaining)\n",
"Getting page 7\n",
"29/30 (10 seconds remaining)\n",
"Getting page 8\n",
"28/30 (9 seconds remaining)\n",
"Getting page 9\n",
"27/30 (8 seconds remaining)\n",
"Getting page 10\n",
"26/30 (7 seconds remaining)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>category</th>\n",
" <th>city</th>\n",
" <th>country</th>\n",
" <th>created</th>\n",
" <th>description</th>\n",
" <th>group_photo</th>\n",
" <th>id</th>\n",
" <th>join_mode</th>\n",
" <th>lat</th>\n",
" <th>link</th>\n",
" <th>...</th>\n",
" <th>name</th>\n",
" <th>organizer</th>\n",
" <th>rating</th>\n",
" <th>state</th>\n",
" <th>timezone</th>\n",
" <th>topics</th>\n",
" <th>urlname</th>\n",
" <th>utc_offset</th>\n",
" <th>visibility</th>\n",
" <th>who</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>{'name': 'tech', 'id': 34, 'shortname': 'tech'}</td>\n",
" <td>San Francisco</td>\n",
" <td>US</td>\n",
" <td>1034097740000</td>\n",
" <td>&lt;p&gt;The SF PHP Community Meetup is an open foru...</td>\n",
" <td>{'highres_link': 'https://secure.meetupstatic....</td>\n",
" <td>120903</td>\n",
" <td>open</td>\n",
" <td>37.77</td>\n",
" <td>https://www.meetup.com/sf-php/</td>\n",
" <td>...</td>\n",
" <td>SF PHP Community</td>\n",
" <td>{'member_id': 126468982, 'name': 'Andre Marigo...</td>\n",
" <td>4.38</td>\n",
" <td>CA</td>\n",
" <td>US/Pacific</td>\n",
" <td>[{'urlkey': 'php', 'name': 'PHP', 'id': 455}, ...</td>\n",
" <td>sf-php</td>\n",
" <td>-28800000</td>\n",
" <td>public</td>\n",
" <td>PHP Developers</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1 rows × 22 columns</p>\n",
"</div>"
],
"text/plain": [
" category city country \\\n",
"0 {'name': 'tech', 'id': 34, 'shortname': 'tech'} San Francisco US \n",
"\n",
" created description \\\n",
"0 1034097740000 <p>The SF PHP Community Meetup is an open foru... \n",
"\n",
" group_photo id join_mode lat \\\n",
"0 {'highres_link': 'https://secure.meetupstatic.... 120903 open 37.77 \n",
"\n",
" link ... name \\\n",
"0 https://www.meetup.com/sf-php/ ... SF PHP Community \n",
"\n",
" organizer rating state timezone \\\n",
"0 {'member_id': 126468982, 'name': 'Andre Marigo... 4.38 CA US/Pacific \n",
"\n",
" topics urlname utc_offset \\\n",
"0 [{'urlkey': 'php', 'name': 'PHP', 'id': 455}, ... sf-php -28800000 \n",
"\n",
" visibility who \n",
"0 public PHP Developers \n",
"\n",
"[1 rows x 22 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Collect all groups into a single dataframe\n",
"all_groups_df = get_all_groups_as_a_df()\n",
"\n",
"# Show the first row in the dataframe\n",
"all_groups_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### That's too many columns\n",
"I really only care about a small list of columns, let's exclude the unneeded columns."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>members</th>\n",
" <th>rating</th>\n",
" <th>join_mode</th>\n",
" <th>urlname</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>120903</td>\n",
" <td>SF PHP Community</td>\n",
" <td>2696</td>\n",
" <td>4.38</td>\n",
" <td>open</td>\n",
" <td>sf-php</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name members rating join_mode urlname\n",
"0 120903 SF PHP Community 2696 4.38 open sf-php"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"column_list = ['id', 'name', 'members', 'rating', 'join_mode', 'urlname']\n",
"all_groups_df = all_groups_df[column_list]\n",
"all_groups_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's double check the size of our new dataframe"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2186, 6)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_groups_df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That looks just about right:\n",
"* 2186 rows\n",
"* 6 columns\n",
"\n",
"---\n",
"## Explore Members per Group\n",
"\n",
"Each group has a different sized membership, let's explore this first!\n",
"\n",
"\n",
"### Let's start with with a histogram\n",
"\n",
"This visualization should give us a basic idea of how big our groups are."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 792x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Using seaborn's distplot function\n",
"plt.rcParams['figure.figsize'] = [11, 6]\n",
"sb.distplot(all_groups_df['members'], kde=False, color=\"g\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### It appears that most groups are relatively small\n",
"\n",
"Let's take a closer look at some basic stats for our data in a tabular \n",
"format for some hard numbers:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>members</th>\n",
" <th>rating</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>2,186.00</td>\n",
" <td>2,186.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>814.32</td>\n",
" <td>2.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1,743.11</td>\n",
" <td>2.34</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>87.25</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>257.00</td>\n",
" <td>4.48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>781.75</td>\n",
" <td>4.84</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>36,050.00</td>\n",
" <td>5.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" members rating\n",
"count 2,186.00 2,186.00\n",
"mean 814.32 2.75\n",
"std 1,743.11 2.34\n",
"min 1.00 0.00\n",
"25% 87.25 0.00\n",
"50% 257.00 4.48\n",
"75% 781.75 4.84\n",
"max 36,050.00 5.00"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.options.display.float_format = '{:20,.2f}'.format\n",
"all_groups_df[[\"members\", \"rating\"]].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a table I can see some numbers:\n",
"\n",
"1. It looks like the average group size is about 814 persons.\n",
"2. Half of the group sits at or under 257 members.\n",
"3. The smallest group has a single person.\n",
"4. **the largest group has 36,000+ members!**\n",
"\n",
"What an outlier! But are there other **mega-groups** like this?\n",
"\n",
"### Maybe a box and whisker plot can visualize these stats for us"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x1440 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.rcParams['figure.figsize'] = [6, 20]\n",
"\n",
"#sb.set(style=\"whitegrid\")\n",
"#ax = sb.boxplot(y=\"members\", data=all_groups_df, palette=\"Set2\")\n",
"all_groups_df['members'].plot.box();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wow, there are quite a few **mega-groups**, as indicated by the circles above our top whisker!\n",
"\n",
"---\n",
"\n",
"Why are the groups so big?\n",
"\n",
"In fact...\n",
"\n",
"### What are the 10 biggest tech groups in the area?"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>members</th>\n",
" <th>rating</th>\n",
" <th>join_mode</th>\n",
" <th>urlname</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>844726</td>\n",
" <td>Silicon Valley Entrepreneurs &amp; Startups</td>\n",
" <td>36050</td>\n",
" <td>4.58</td>\n",
" <td>open</td>\n",
" <td>sventrepreneurs</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107</th>\n",
" <td>1619955</td>\n",
" <td>SFHTML5</td>\n",
" <td>17656</td>\n",
" <td>4.67</td>\n",
" <td>open</td>\n",
" <td>sfhtml5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>106</th>\n",
" <td>1615633</td>\n",
" <td>Designers + Geeks</td>\n",
" <td>15433</td>\n",
" <td>4.46</td>\n",
" <td>open</td>\n",
" <td>designersgeeks</td>\n",
" </tr>\n",
" <tr>\n",
" <th>426</th>\n",
" <td>9226282</td>\n",
" <td>SF Data Science</td>\n",
" <td>14864</td>\n",
" <td>4.49</td>\n",
" <td>open</td>\n",
" <td>SF-Data-Science</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>1060260</td>\n",
" <td>The SF JavaScript Meetup</td>\n",
" <td>13354</td>\n",
" <td>4.54</td>\n",
" <td>open</td>\n",
" <td>jsmeetup</td>\n",
" </tr>\n",
" <tr>\n",
" <th>250</th>\n",
" <td>3483762</td>\n",
" <td>Tech in Motion Events: San Francisco</td>\n",
" <td>13073</td>\n",
" <td>4.48</td>\n",
" <td>approval</td>\n",
" <td>TechinMotionSF</td>\n",
" </tr>\n",
" <tr>\n",
" <th>540</th>\n",
" <td>13402242</td>\n",
" <td>Docker Online Meetup</td>\n",
" <td>12471</td>\n",
" <td>4.37</td>\n",
" <td>open</td>\n",
" <td>Docker-Online-Meetup</td>\n",
" </tr>\n",
" <tr>\n",
" <th>191</th>\n",
" <td>2065031</td>\n",
" <td>SF Data Mining</td>\n",
" <td>12378</td>\n",
" <td>4.64</td>\n",
" <td>open</td>\n",
" <td>Data-Mining</td>\n",
" </tr>\n",
" <tr>\n",
" <th>201</th>\n",
" <td>2252591</td>\n",
" <td>Women Who Code SF</td>\n",
" <td>12326</td>\n",
" <td>4.78</td>\n",
" <td>open</td>\n",
" <td>Women-Who-Code-SF</td>\n",
" </tr>\n",
" <tr>\n",
" <th>705</th>\n",
" <td>18354966</td>\n",
" <td>SF Big Analytics</td>\n",
" <td>11858</td>\n",
" <td>4.53</td>\n",
" <td>open</td>\n",
" <td>SF-Big-Analytics</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name members \\\n",
"19 844726 Silicon Valley Entrepreneurs & Startups 36050 \n",
"107 1619955 SFHTML5 17656 \n",
"106 1615633 Designers + Geeks 15433 \n",
"426 9226282 SF Data Science 14864 \n",
"28 1060260 The SF JavaScript Meetup 13354 \n",
"250 3483762 Tech in Motion Events: San Francisco 13073 \n",
"540 13402242 Docker Online Meetup 12471 \n",
"191 2065031 SF Data Mining 12378 \n",
"201 2252591 Women Who Code SF 12326 \n",
"705 18354966 SF Big Analytics 11858 \n",
"\n",
" rating join_mode urlname \n",
"19 4.58 open sventrepreneurs \n",
"107 4.67 open sfhtml5 \n",
"106 4.46 open designersgeeks \n",
"426 4.49 open SF-Data-Science \n",
"28 4.54 open jsmeetup \n",
"250 4.48 approval TechinMotionSF \n",
"540 4.37 open Docker-Online-Meetup \n",
"191 4.64 open Data-Mining \n",
"201 4.78 open Women-Who-Code-SF \n",
"705 4.53 open SF-Big-Analytics "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"biggest_ten_df = all_groups_df.sort_values('members', ascending=False).head(10)\n",
"biggest_ten_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Group Events"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### We need to do some data shaping before the next api call\n",
"\n",
"Mostly we need to:\n",
"1. pass in a string with group ids from our 10 biggest groups\n",
"2. convert our human-readable date ranges to milliseconds since Jan 1, 1970\n",
"3. Call the GetEvents API filtering for past events using our group IDs and our date range\n",
"\n",
"#### First, let's make that string of group ids\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'844726,1619955,1615633,9226282,1060260,3483762,13402242,2065031,2252591,18354966'"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id_list = biggest_ten_df['id'].tolist()\n",
"id_list\n",
"ids = ','.join(str(x) for x in id_list)\n",
"ids"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now, let's get the epoch milliseconds for a date range between now and 9 months ago"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Now: 1550587334649, nine months ago: 1535035334649\n"
]
}
],
"source": [
"def to_millis(dt):\n",
" return int(pd.to_datetime(dt).value / 1000000)\n",
"\n",
"right_now = to_millis(datetime.datetime.now())\n",
"nine_months_ago = int(right_now - 180 * 24 * 60 * 60 * 1000)\n",
"print(f\"Now: {right_now}, nine months ago: {nine_months_ago}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Finally, let's look at those events."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"events_resp = client.GetEvents(group_id=ids, status='past',\n",
" time=f\"{nine_months_ago},{right_now}\");"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>next</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>method</th>\n",
" <td>Events</td>\n",
" </tr>\n",
" <tr>\n",
" <th>total_count</th>\n",
" <td>106</td>\n",
" </tr>\n",
" <tr>\n",
" <th>link</th>\n",
" <td>https://api.meetup.com/2/events</td>\n",
" </tr>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>106</td>\n",
" </tr>\n",
" <tr>\n",
" <th>description</th>\n",
" <td>Access Meetup events using a group, member, or...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lon</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>title</th>\n",
" <td>Meetup Events v2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>url</th>\n",
" <td>https://api.meetup.com/2/events?offset=0&amp;forma...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>updated</th>\n",
" <td>1550349112000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <td></td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"next \n",
"method Events\n",
"total_count 106\n",
"link https://api.meetup.com/2/events\n",
"count 106\n",
"description Access Meetup events using a group, member, or...\n",
"lon \n",
"title Meetup Events v2\n",
"url https://api.meetup.com/2/events?offset=0&forma...\n",
"id \n",
"updated 1550349112000\n",
"lat "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"parse_meta(events_resp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Again, our raw results dataframe has extra columns that I don't care about"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>created</th>\n",
" <th>description</th>\n",
" <th>duration</th>\n",
" <th>event_url</th>\n",
" <th>group</th>\n",
" <th>headcount</th>\n",
" <th>how_to_find_us</th>\n",
" <th>id</th>\n",
" <th>maybe_rsvp_count</th>\n",
" <th>name</th>\n",
" <th>...</th>\n",
" <th>rating</th>\n",
" <th>rsvp_limit</th>\n",
" <th>status</th>\n",
" <th>time</th>\n",
" <th>updated</th>\n",
" <th>utc_offset</th>\n",
" <th>venue</th>\n",
" <th>visibility</th>\n",
" <th>waitlist_count</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1531451567000</td>\n",
" <td>&lt;p&gt;We are going to have two talks, one from Go...</td>\n",
" <td>10800000</td>\n",
" <td>https://www.meetup.com/SF-Big-Analytics/events...</td>\n",
" <td>{'join_mode': 'open', 'created': 1421705588000...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>252731150</td>\n",
" <td>0</td>\n",
" <td>TensorFlow Probability and how Yelp use ML to ...</td>\n",
" <td>...</td>\n",
" <td>{'count': 0, 'average': 0}</td>\n",
" <td>nan</td>\n",
" <td>past</td>\n",
" <td>1535072400000</td>\n",
" <td>1535085901000</td>\n",
" <td>-25200000</td>\n",
" <td>{'zip': '94105', 'country': 'us', 'localized_c...</td>\n",
" <td>public</td>\n",
" <td>0</td>\n",
" <td>646</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" created description duration \\\n",
"0 1531451567000 <p>We are going to have two talks, one from Go... 10800000 \n",
"\n",
" event_url \\\n",
"0 https://www.meetup.com/SF-Big-Analytics/events... \n",
"\n",
" group headcount \\\n",
"0 {'join_mode': 'open', 'created': 1421705588000... 0 \n",
"\n",
" how_to_find_us id maybe_rsvp_count \\\n",
"0 NaN 252731150 0 \n",
"\n",
" name ... \\\n",
"0 TensorFlow Probability and how Yelp use ML to ... ... \n",
"\n",
" rating rsvp_limit status time \\\n",
"0 {'count': 0, 'average': 0} nan past 1535072400000 \n",
"\n",
" updated utc_offset \\\n",
"0 1535085901000 -25200000 \n",
"\n",
" venue visibility \\\n",
"0 {'zip': '94105', 'country': 'us', 'localized_c... public \n",
"\n",
" waitlist_count yes_rsvp_count \n",
"0 0 646 \n",
"\n",
"[1 rows x 21 columns]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_results_df = parse_results(events_resp)\n",
"raw_results_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### So again, let's filter down to just what's relevant"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>time</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>{'join_mode': 'open', 'created': 1421705588000...</td>\n",
" <td>1535072400000</td>\n",
" <td>10800000</td>\n",
" <td>646</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group time duration \\\n",
"0 {'join_mode': 'open', 'created': 1421705588000... 1535072400000 10800000 \n",
"\n",
" yes_rsvp_count \n",
"0 646 "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_results_df = parse_results(events_resp)\n",
"column_list = ['group', 'time', 'duration', 'yes_rsvp_count']\n",
"raw_results_df = raw_results_df[column_list]\n",
"raw_results_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The group column\n",
"\n",
"The **group** column is actually a JSON object full of metadata about the group.\n",
"\n",
"I really only need the **group\\[\"id\"\\]** for now, so let's focus on that."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>time</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18354966</td>\n",
" <td>1535072400000</td>\n",
" <td>10800000</td>\n",
" <td>646</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id time duration yes_rsvp_count\n",
"0 18354966 1535072400000 10800000 646"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def get_id(my_dict):\n",
" \"\"\"Extract the id member of a python dictionary\"\"\"\n",
" return my_dict[\"id\"]\n",
"\n",
"raw_results_df[\"id\"] = raw_results_df[\"group\"].apply(get_id)\n",
"\n",
"# Let's \n",
"columns = ['id', 'time', 'duration', 'yes_rsvp_count']\n",
"raw_results_df = raw_results_df[columns]\n",
"raw_results_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hmm, it seems that our **time** is numeric\n",
"\n",
"The **time** is stored in **Epoch milliseconds** format.\n",
"\n",
"This is great if you want to see time as the number of milliseconds since Jan 1, 1970.\n",
"\n",
"This is not-so-great if you just want to see a human-readable date and time equivalent.\n",
"\n",
"Let's make a new human-readable column called **time_dt**"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>time</th>\n",
" <th>time_dt</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18354966</td>\n",
" <td>1535072400000</td>\n",
" <td>08/24/18 01:00</td>\n",
" <td>10800000</td>\n",
" <td>646</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id time time_dt duration yes_rsvp_count\n",
"0 18354966 1535072400000 08/24/18 01:00 10800000 646"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for field in [\"time\"]:\n",
" raw_results_df[\"time_dt\"] = pd.to_datetime(\n",
" raw_results_df[field], unit='ms').dt.strftime('%m/%d/%y %H:%M')\n",
" \n",
"columns = ['id', 'time','time_dt', 'duration', 'yes_rsvp_count']\n",
"raw_results_df = raw_results_df[columns]\n",
"raw_results_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now let's convert the duration column to something human-readable\n",
"\n",
"Let's convert the column to a string that shows hours and minutes."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>time</th>\n",
" <th>time_dt</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18354966</td>\n",
" <td>1535072400000</td>\n",
" <td>08/24/18 01:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>646</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id time time_dt duration yes_rsvp_count\n",
"0 18354966 1535072400000 08/24/18 01:00 3 hours, 0 minutes 646"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def millis_2_hours_and_minutes(ms):\n",
" \"\"\"Converts milliseconds to hours and minutes.\"\"\"\n",
" seconds = ms / 1000\n",
" minutes, seconds = divmod(seconds, 60)\n",
" hours, minutes = divmod(minutes, 60)\n",
"\n",
" return f\"{int(hours)} hours, {int(minutes)} minutes\" \n",
"\n",
"raw_results_df[\"duration\"] = raw_results_df[\"duration\"].apply(\n",
" millis_2_hours_and_minutes)\n",
"\n",
"raw_results_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now let's join our top ten mega-groups dataframe with our events dataframe\n",
"\n",
"If you are familiar with SQL this is similar to a left join from **raw_results_df**\n",
"to **biggest_ten_df** on **id**\n",
"\n",
"Then we sort the output by **name** ascending and then **time** descending."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>time</th>\n",
" <th>time_dt</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1542337200000</td>\n",
" <td>11/16/18 03:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1541124000000</td>\n",
" <td>11/02/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1539914400000</td>\n",
" <td>10/19/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1538704800000</td>\n",
" <td>10/05/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1537495200000</td>\n",
" <td>09/21/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1536285600000</td>\n",
" <td>09/07/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>87</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>Docker Online Meetup</td>\n",
" <td>1540915200000</td>\n",
" <td>10/30/18 16:00</td>\n",
" <td>1 hours, 0 minutes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1549591200000</td>\n",
" <td>02/08/19 02:00</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>418</td>\n",
" </tr>\n",
" <tr>\n",
" <th>85</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1547690400000</td>\n",
" <td>01/17/19 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>83</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1547604000000</td>\n",
" <td>01/16/19 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>69</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1544061600000</td>\n",
" <td>12/06/18 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>58</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1542247200000</td>\n",
" <td>11/15/18 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1542160800000</td>\n",
" <td>11/14/18 02:00</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>478</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1541782800000</td>\n",
" <td>11/09/18 17:00</td>\n",
" <td>57 hours, 0 minutes</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1541120400000</td>\n",
" <td>11/02/18 01:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>152</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1539909000000</td>\n",
" <td>10/19/18 00:30</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>390</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1538010000000</td>\n",
" <td>09/27/18 01:00</td>\n",
" <td>2 hours, 45 minutes</td>\n",
" <td>245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1536800400000</td>\n",
" <td>09/13/18 01:00</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>218</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1535072400000</td>\n",
" <td>08/24/18 01:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>646</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>SF Data Mining</td>\n",
" <td>1537200000000</td>\n",
" <td>09/17/18 16:00</td>\n",
" <td>104 hours, 0 minutes</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>SF Data Science</td>\n",
" <td>1548988200000</td>\n",
" <td>02/01/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>SF Data Science</td>\n",
" <td>1548383400000</td>\n",
" <td>01/25/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>87</th>\n",
" <td>SF Data Science</td>\n",
" <td>1547778600000</td>\n",
" <td>01/18/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86</th>\n",
" <td>SF Data Science</td>\n",
" <td>1547692200000</td>\n",
" <td>01/17/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>65</th>\n",
" <td>SF Data Science</td>\n",
" <td>1543539600000</td>\n",
" <td>11/30/18 01:00</td>\n",
" <td>3 hours, 30 minutes</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>SF Data Science</td>\n",
" <td>1542211200000</td>\n",
" <td>11/14/18 16:00</td>\n",
" <td>10 hours, 0 minutes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>SF Data Science</td>\n",
" <td>1540774800000</td>\n",
" <td>10/29/18 01:00</td>\n",
" <td>4 hours, 0 minutes</td>\n",
" <td>21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>SF Data Science</td>\n",
" <td>1539910800000</td>\n",
" <td>10/19/18 01:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>SF Data Science</td>\n",
" <td>1539306000000</td>\n",
" <td>10/12/18 01:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>SFHTML5</td>\n",
" <td>1550075400000</td>\n",
" <td>02/13/19 16:30</td>\n",
" <td>12 hours, 30 minutes</td>\n",
" <td>39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1548815400000</td>\n",
" <td>01/30/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1548210600000</td>\n",
" <td>01/23/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>84</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1547605800000</td>\n",
" <td>01/16/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>32</td>\n",
" </tr>\n",
" <tr>\n",
" <th>81</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1547001000000</td>\n",
" <td>01/09/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>57</td>\n",
" </tr>\n",
" <tr>\n",
" <th>76</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1545271200000</td>\n",
" <td>12/20/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>199</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1545186600000</td>\n",
" <td>12/19/18 02:30</td>\n",
" <td>1 hours, 30 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1545186600000</td>\n",
" <td>12/19/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>23</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1544581800000</td>\n",
" <td>12/12/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>70</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1544149800000</td>\n",
" <td>12/07/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1543977000000</td>\n",
" <td>12/05/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1543975200000</td>\n",
" <td>12/05/18 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>63</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1542767400000</td>\n",
" <td>11/21/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>59</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1542335400000</td>\n",
" <td>11/16/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1541557800000</td>\n",
" <td>11/07/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1541122200000</td>\n",
" <td>11/02/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1540949400000</td>\n",
" <td>10/31/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1540859400000</td>\n",
" <td>10/30/18 00:30</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>64</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1540400400000</td>\n",
" <td>10/24/18 17:00</td>\n",
" <td>1 hours, 0 minutes</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1540344600000</td>\n",
" <td>10/24/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>36</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1539912600000</td>\n",
" <td>10/19/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1539135000000</td>\n",
" <td>10/10/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1538703000000</td>\n",
" <td>10/05/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1538530200000</td>\n",
" <td>10/03/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1538530200000</td>\n",
" <td>10/03/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1537493400000</td>\n",
" <td>09/21/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1537320600000</td>\n",
" <td>09/19/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1537030800000</td>\n",
" <td>09/15/18 17:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1536715800000</td>\n",
" <td>09/12/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1535506200000</td>\n",
" <td>08/29/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1535074200000</td>\n",
" <td>08/24/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>106 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" name time time_dt \\\n",
"60 Designers + Geeks 1542337200000 11/16/18 03:00 \n",
"51 Designers + Geeks 1541124000000 11/02/18 02:00 \n",
"37 Designers + Geeks 1539914400000 10/19/18 02:00 \n",
"29 Designers + Geeks 1538704800000 10/05/18 02:00 \n",
"19 Designers + Geeks 1537495200000 09/21/18 02:00 \n",
"9 Designers + Geeks 1536285600000 09/07/18 02:00 \n",
"45 Docker Online Meetup 1540915200000 10/30/18 16:00 \n",
"102 SF Big Analytics 1549591200000 02/08/19 02:00 \n",
"85 SF Big Analytics 1547690400000 01/17/19 02:00 \n",
"83 SF Big Analytics 1547604000000 01/16/19 02:00 \n",
"69 SF Big Analytics 1544061600000 12/06/18 02:00 \n",
"58 SF Big Analytics 1542247200000 11/15/18 02:00 \n",
"56 SF Big Analytics 1542160800000 11/14/18 02:00 \n",
"55 SF Big Analytics 1541782800000 11/09/18 17:00 \n",
"49 SF Big Analytics 1541120400000 11/02/18 01:00 \n",
"34 SF Big Analytics 1539909000000 10/19/18 00:30 \n",
"21 SF Big Analytics 1538010000000 09/27/18 01:00 \n",
"13 SF Big Analytics 1536800400000 09/13/18 01:00 \n",
"0 SF Big Analytics 1535072400000 08/24/18 01:00 \n",
"15 SF Data Mining 1537200000000 09/17/18 16:00 \n",
"98 SF Data Science 1548988200000 02/01/19 02:30 \n",
"94 SF Data Science 1548383400000 01/25/19 02:30 \n",
"87 SF Data Science 1547778600000 01/18/19 02:30 \n",
"86 SF Data Science 1547692200000 01/17/19 02:30 \n",
"65 SF Data Science 1543539600000 11/30/18 01:00 \n",
"57 SF Data Science 1542211200000 11/14/18 16:00 \n",
"43 SF Data Science 1540774800000 10/29/18 01:00 \n",
"35 SF Data Science 1539910800000 10/19/18 01:00 \n",
"31 SF Data Science 1539306000000 10/12/18 01:00 \n",
"104 SFHTML5 1550075400000 02/13/19 16:30 \n",
".. ... ... ... \n",
"97 Women Who Code SF 1548815400000 01/30/19 02:30 \n",
"91 Women Who Code SF 1548210600000 01/23/19 02:30 \n",
"84 Women Who Code SF 1547605800000 01/16/19 02:30 \n",
"81 Women Who Code SF 1547001000000 01/09/19 02:30 \n",
"76 Women Who Code SF 1545271200000 12/20/18 02:00 \n",
"74 Women Who Code SF 1545186600000 12/19/18 02:30 \n",
"75 Women Who Code SF 1545186600000 12/19/18 02:30 \n",
"72 Women Who Code SF 1544581800000 12/12/18 02:30 \n",
"70 Women Who Code SF 1544149800000 12/07/18 02:30 \n",
"68 Women Who Code SF 1543977000000 12/05/18 02:30 \n",
"67 Women Who Code SF 1543975200000 12/05/18 02:00 \n",
"63 Women Who Code SF 1542767400000 11/21/18 02:30 \n",
"59 Women Who Code SF 1542335400000 11/16/18 02:30 \n",
"54 Women Who Code SF 1541557800000 11/07/18 02:30 \n",
"50 Women Who Code SF 1541122200000 11/02/18 01:30 \n",
"48 Women Who Code SF 1540949400000 10/31/18 01:30 \n",
"44 Women Who Code SF 1540859400000 10/30/18 00:30 \n",
"39 Women Who Code SF 1540400400000 10/24/18 17:00 \n",
"38 Women Who Code SF 1540344600000 10/24/18 01:30 \n",
"36 Women Who Code SF 1539912600000 10/19/18 01:30 \n",
"30 Women Who Code SF 1539135000000 10/10/18 01:30 \n",
"28 Women Who Code SF 1538703000000 10/05/18 01:30 \n",
"26 Women Who Code SF 1538530200000 10/03/18 01:30 \n",
"27 Women Who Code SF 1538530200000 10/03/18 01:30 \n",
"18 Women Who Code SF 1537493400000 09/21/18 01:30 \n",
"16 Women Who Code SF 1537320600000 09/19/18 01:30 \n",
"14 Women Who Code SF 1537030800000 09/15/18 17:00 \n",
"12 Women Who Code SF 1536715800000 09/12/18 01:30 \n",
"6 Women Who Code SF 1535506200000 08/29/18 01:30 \n",
"1 Women Who Code SF 1535074200000 08/24/18 01:30 \n",
"\n",
" duration yes_rsvp_count \n",
"60 2 hours, 0 minutes 71 \n",
"51 2 hours, 0 minutes 71 \n",
"37 2 hours, 0 minutes 41 \n",
"29 2 hours, 0 minutes 29 \n",
"19 2 hours, 0 minutes 37 \n",
"9 2 hours, 0 minutes 87 \n",
"45 1 hours, 0 minutes 1 \n",
"102 2 hours, 30 minutes 418 \n",
"85 3 hours, 0 minutes 450 \n",
"83 3 hours, 0 minutes 5 \n",
"69 3 hours, 0 minutes 7 \n",
"58 3 hours, 0 minutes 16 \n",
"56 2 hours, 30 minutes 478 \n",
"55 57 hours, 0 minutes 16 \n",
"49 3 hours, 0 minutes 152 \n",
"34 2 hours, 30 minutes 390 \n",
"21 2 hours, 45 minutes 245 \n",
"13 2 hours, 30 minutes 218 \n",
"0 3 hours, 0 minutes 646 \n",
"15 104 hours, 0 minutes 6 \n",
"98 2 hours, 0 minutes 94 \n",
"94 2 hours, 0 minutes 48 \n",
"87 2 hours, 0 minutes 26 \n",
"86 2 hours, 0 minutes 59 \n",
"65 3 hours, 30 minutes 18 \n",
"57 10 hours, 0 minutes 1 \n",
"43 4 hours, 0 minutes 21 \n",
"35 2 hours, 0 minutes 95 \n",
"31 3 hours, 0 minutes 15 \n",
"104 12 hours, 30 minutes 39 \n",
".. ... ... \n",
"97 2 hours, 0 minutes 37 \n",
"91 2 hours, 0 minutes 29 \n",
"84 2 hours, 0 minutes 32 \n",
"81 2 hours, 0 minutes 57 \n",
"76 2 hours, 0 minutes 199 \n",
"74 1 hours, 30 minutes 50 \n",
"75 2 hours, 0 minutes 23 \n",
"72 2 hours, 0 minutes 22 \n",
"70 2 hours, 0 minutes 17 \n",
"68 2 hours, 0 minutes 29 \n",
"67 3 hours, 0 minutes 1 \n",
"63 2 hours, 0 minutes 31 \n",
"59 2 hours, 0 minutes 50 \n",
"54 2 hours, 0 minutes 27 \n",
"50 2 hours, 0 minutes 50 \n",
"48 2 hours, 0 minutes 29 \n",
"44 2 hours, 30 minutes 64 \n",
"39 1 hours, 0 minutes 2 \n",
"38 2 hours, 0 minutes 36 \n",
"36 2 hours, 0 minutes 50 \n",
"30 2 hours, 0 minutes 25 \n",
"28 2 hours, 0 minutes 50 \n",
"26 2 hours, 0 minutes 10 \n",
"27 2 hours, 0 minutes 27 \n",
"18 2 hours, 0 minutes 33 \n",
"16 2 hours, 0 minutes 14 \n",
"14 3 hours, 0 minutes 10 \n",
"12 2 hours, 0 minutes 25 \n",
"6 2 hours, 0 minutes 35 \n",
"1 2 hours, 0 minutes 50 \n",
"\n",
"[106 rows x 5 columns]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged_df = pd.merge(\n",
" raw_results_df,\n",
" biggest_ten_df[['id', 'name']],\n",
" on='id',\n",
" how='left')\n",
"\n",
"columns = ['name', 'time', 'time_dt', 'duration', 'yes_rsvp_count']\n",
"final_df = merged_df[columns]\n",
"\n",
"# Sort the output by name and time\n",
"final_df.sort_values(by=['name', 'time'], ascending=[True, False])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ah, this is interesting\n",
"\n",
"From the top 10 mega-groups I can see that **SF Data Science** meets pretty regularly, and that their attendance is pretty consistent. \n",
"\n",
"Let's take a closer look at just this group by using another dataframe query."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>time</th>\n",
" <th>time_dt</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>SF Data Science</td>\n",
" <td>1539306000000</td>\n",
" <td>10/12/18 01:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>SF Data Science</td>\n",
" <td>1539910800000</td>\n",
" <td>10/19/18 01:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>SF Data Science</td>\n",
" <td>1540774800000</td>\n",
" <td>10/29/18 01:00</td>\n",
" <td>4 hours, 0 minutes</td>\n",
" <td>21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>SF Data Science</td>\n",
" <td>1542211200000</td>\n",
" <td>11/14/18 16:00</td>\n",
" <td>10 hours, 0 minutes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>65</th>\n",
" <td>SF Data Science</td>\n",
" <td>1543539600000</td>\n",
" <td>11/30/18 01:00</td>\n",
" <td>3 hours, 30 minutes</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86</th>\n",
" <td>SF Data Science</td>\n",
" <td>1547692200000</td>\n",
" <td>01/17/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>87</th>\n",
" <td>SF Data Science</td>\n",
" <td>1547778600000</td>\n",
" <td>01/18/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>SF Data Science</td>\n",
" <td>1548383400000</td>\n",
" <td>01/25/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>SF Data Science</td>\n",
" <td>1548988200000</td>\n",
" <td>02/01/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>94</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name time time_dt duration \\\n",
"31 SF Data Science 1539306000000 10/12/18 01:00 3 hours, 0 minutes \n",
"35 SF Data Science 1539910800000 10/19/18 01:00 2 hours, 0 minutes \n",
"43 SF Data Science 1540774800000 10/29/18 01:00 4 hours, 0 minutes \n",
"57 SF Data Science 1542211200000 11/14/18 16:00 10 hours, 0 minutes \n",
"65 SF Data Science 1543539600000 11/30/18 01:00 3 hours, 30 minutes \n",
"86 SF Data Science 1547692200000 01/17/19 02:30 2 hours, 0 minutes \n",
"87 SF Data Science 1547778600000 01/18/19 02:30 2 hours, 0 minutes \n",
"94 SF Data Science 1548383400000 01/25/19 02:30 2 hours, 0 minutes \n",
"98 SF Data Science 1548988200000 02/01/19 02:30 2 hours, 0 minutes \n",
"\n",
" yes_rsvp_count \n",
"31 15 \n",
"35 95 \n",
"43 21 \n",
"57 1 \n",
"65 18 \n",
"86 59 \n",
"87 26 \n",
"94 48 \n",
"98 94 "
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"interesting_group_df = final_df[final_df['name'] == \"SF Data Science\"]\n",
"interesting_group_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A simple bar chart shows us something\n",
"\n",
"I will use the **time_dt** column vs **yes_rsvp_count** to see if people\n",
"are looking forward to new events."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 792x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.rcParams['figure.figsize'] = [11, 6]\n",
"\n",
"lines = interesting_group_df[['yes_rsvp_count', 'time_dt']].plot(\n",
" kind='bar', x='time_dt', y='yes_rsvp_count', style='.-')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"This is nice, but what is the trend, is the group active? It's tough to tell\n",
"from a simple bar chart.\n",
"\n",
"### Let's visualize the trend line\n",
"\n",
"Let's see the trend using a seaborn regplot. \n",
"\n",
"I'll use the epoch milliseconds **time** column of our dataframe, since it is numeric and can be used to generate a trend line."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7fe27cd0cb38>"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 792x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sb.regplot(x='time', y='yes_rsvp_count', data=interesting_group_df,\n",
" color=\"green\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Goal Achieved!\n",
"\n",
"At last! We found a Tech group in San Francisco, CA that:\n",
"1. is among the 10 biggest in the city\n",
"2. has a trend of more and more people RSVP-ing to their events\n",
"3. holds events occuring every few weeks.\n",
"\n",
"## Conclusion\n",
"\n",
"We achieved our objectives and demonstrated several useful techniques along the way we :\n",
"1. worked with the [Python Meetup API client]\n",
"2. built helper functions to deal with our response objects' **meta** and **results** dictionaries\n",
"3. loaded pages of **results** from multiple API calls into Pandas dataframes\n",
"4. Used pandas.DataFrame.[query] to sort and filter data of interest\n",
"5. Used pandas.DataFrame.[apply] to clean columns of data using custom helper functions\n",
"6. Used pandas.DataFrame.[describe] to get descriptive statistics that summarize\n",
" * the central tendency\n",
" * dispersion\n",
" * shape of our dataset's distribution\n",
"7. Used pandas.DataFrame.[merge] to join the **events** and **groups** dataframes to create a report of the events for our 10 biggest mega-groups in technology\n",
"8. Found a mega-group that shows enduring and increasing interest by its membership\n",
"\n",
"\n",
"[//]: # (References)\n",
"\n",
"[Python Meetup API client]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#api-client-details\n",
"[query]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html\n",
"[apply]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html\n",
"[describe]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html\n",
"[merge]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (datascience_challenges)",
"language": "python",
"name": "datascience_challenges"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment