Skip to content

Instantly share code, notes, and snippets.

@redpanda-ai
Created February 21, 2019 12:13
Show Gist options
  • Save redpanda-ai/45b7e7c2abceb8e04fcbc9c8f2302cf8 to your computer and use it in GitHub Desktop.
Save redpanda-ai/45b7e7c2abceb8e04fcbc9c8f2302cf8 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"My friend [Johannes Giorgis] and I are developing a series of [Data Science Challenges] to help others\n",
"become better data scientists by presenting a series of challenges. Why did we do this?\n",
"\n",
"\n",
"> Because that's what heroes do!\n",
">\n",
"> --Johannes Giorgis\n",
"\n",
"I now present my response to the first challenge, Exploring the Meetup API in the city of my choice.\n",
"\n",
"**San Francisco, CA**, I choose you!\n",
"\n",
"\n",
"\n",
"# Challenge 01: Explore the Meetup API\n",
"Use the [Meetup API] to explore meetups in your city of choice.\n",
"\n",
"\n",
"**Guide Questions**:\n",
"\n",
"Below are some guide line questions to get you started:\n",
"\n",
"- What is the largest meetup in your location of choice (city, cities, country...etc)?\n",
"- How many meetups of a certain category (e.g. Tech, Art...etc) are in your city?\n",
"- Basic statistics of meetups\n",
"\t- What is the average size of meetups?\n",
"\t- How frequently do meetups host events?\n",
" \n",
"## Okay, but what I really want to know is\n",
"\n",
"What is the biggest Tech Group in San Francisco that meets regularly and has a growing and enthusiastic membership?\n",
"\n",
"\n",
"## Prerequisites:\n",
"Add a [Meetup API Key] to your environment.\n",
"\n",
"[//]: # (References)\n",
"\n",
"[Meetup API]: https://www.meetup.com/meetup_api/\n",
"[Meetup API Key]: https://secure.meetup.com/meetup_api/key/\n",
"[Johannes Giorgis]: http://johannesgiorgis.com/\n",
"[Data Science Challenges]: https://medium.com/red-panda-ai/introducing-data-science-challenges-4ae4a103d67b"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import datetime\n",
"import json\n",
"import math\n",
"import meetup.api\n",
"import os\n",
"import pprint\n",
"import requests\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sb\n",
"\n",
"from tqdm import tnrange, tqdm_notebook\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Double check your environment\n",
"\n",
"Nothing works without **MEETUP_API_KEY**."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"assert 'MEETUP_API_KEY' in os.environ, (\n",
" \"You need a MEETUP_API_KEY in your environment please look at the \"\n",
" \"README for instructions.\")\n",
"client = meetup.api.Client()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Categories\n",
"\n",
"There are multiple categories of groups in Meetup, let's use Python's meetup.api to [GetCategories](https://meetup-api.readthedocs.io/en/latest/meetup_api.html#meetup.api.meetup.api.Client.GetCategories)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"categories = client.GetCategories()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What are the attributes of the response object? \n",
"\n",
"First, \n",
"let's create ahelper function to help us parse out the two most useful\n",
"different pieces:\n",
"\n",
"1. **meta**: an object containing meta-data about the response object itself\n",
"2. **results**: A page of actual data from the entire result of our API call"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def parse_response(response):\n",
" \"\"\"Returns two dataframes, meta and results:\n",
" meta: a vertically aligned dataframe, where each row is an element \n",
" of the response.meta dictionary\n",
" results: a horizontally aligned dataframe, where each column is\n",
" an element of the response.results dictionary\"\"\"\n",
" meta = pd.DataFrame.from_dict(response.meta, orient='index')\n",
" results = pd.DataFrame.from_dict(response.results)\n",
" return meta, results\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exploring the response object\n",
"\n",
"We received a response object when we called ```client.GetCategories()```.\n",
"\n",
"By looking at the categories **meta** dataframe, we can see that there are 33 different categories."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>next</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>method</th>\n",
" <td>Categories</td>\n",
" </tr>\n",
" <tr>\n",
" <th>total_count</th>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>link</th>\n",
" <td>https://api.meetup.com/2/categories</td>\n",
" </tr>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>description</th>\n",
" <td>Returns a list of Meetup group categories</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lon</th>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>title</th>\n",
" <td>Categories</td>\n",
" </tr>\n",
" <tr>\n",
" <th>url</th>\n",
" <td>https://api.meetup.com/2/categories?offset=0&amp;f...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>updated</th>\n",
" <td>1450292956000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"next \n",
"method Categories\n",
"total_count 33\n",
"link https://api.meetup.com/2/categories\n",
"count 33\n",
"description Returns a list of Meetup group categories\n",
"lon None\n",
"title Categories\n",
"url https://api.meetup.com/2/categories?offset=0&f...\n",
"id \n",
"updated 1450292956000\n",
"lat None"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cats_meta_df, cats_df = parse_response(categories)\n",
"cats_meta_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see from the meta that there are 33 categories available to us.\n",
"I wonder what they are. \n",
"\n",
"### How the Meetup API works\n",
"\n",
"Notice that the value of **next** (above) is an empty string. Meetup API v2 response payloads come in **pages**, one at a time, but provide the URI of the **next** API call in the sequence. We can use this to programmatically get each next **page** in **response.meta\\[\"next\"\\]**. until the complete result is returned.\n",
"\n",
"As we can see, the **response.meta\\[\"next\"\\]** for this page is an empty string, so all of the categories\n",
"fit into our first API call.\n",
"\n",
"#### Secondly, let's review the categories results dataframe"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>shortname</th>\n",
" <th>sort_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Arts &amp; Culture</td>\n",
" <td>Arts</td>\n",
" <td>Arts &amp; Culture</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>18</td>\n",
" <td>Book Clubs</td>\n",
" <td>Book Clubs</td>\n",
" <td>Book Clubs</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>Career &amp; Business</td>\n",
" <td>Business</td>\n",
" <td>Career &amp; Business</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>Cars &amp; Motorcycles</td>\n",
" <td>Auto</td>\n",
" <td>Cars &amp; Motorcycles</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>Community &amp; Environment</td>\n",
" <td>Community</td>\n",
" <td>Community &amp; Environment</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5</td>\n",
" <td>Dancing</td>\n",
" <td>Dancing</td>\n",
" <td>Dancing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>6</td>\n",
" <td>Education &amp; Learning</td>\n",
" <td>Education</td>\n",
" <td>Education &amp; Learning</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>Fashion &amp; Beauty</td>\n",
" <td>Fashion</td>\n",
" <td>Fashion &amp; Beauty</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>9</td>\n",
" <td>Fitness</td>\n",
" <td>Fitness</td>\n",
" <td>Fitness</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10</td>\n",
" <td>Food &amp; Drink</td>\n",
" <td>Food &amp; Drink</td>\n",
" <td>Food &amp; Drink</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>11</td>\n",
" <td>Games</td>\n",
" <td>Games</td>\n",
" <td>Games</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>13</td>\n",
" <td>Movements &amp; Politics</td>\n",
" <td>Movements</td>\n",
" <td>Movements &amp; Politics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>14</td>\n",
" <td>Health &amp; Wellbeing</td>\n",
" <td>Well-being</td>\n",
" <td>Health &amp; Wellbeing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>15</td>\n",
" <td>Hobbies &amp; Crafts</td>\n",
" <td>Crafts</td>\n",
" <td>Hobbies &amp; Crafts</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>16</td>\n",
" <td>Language &amp; Ethnic Identity</td>\n",
" <td>Languages</td>\n",
" <td>Language &amp; Ethnic Identity</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>12</td>\n",
" <td>LGBT</td>\n",
" <td>LGBT</td>\n",
" <td>LGBT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>17</td>\n",
" <td>Lifestyle</td>\n",
" <td>Lifestyle</td>\n",
" <td>Lifestyle</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>20</td>\n",
" <td>Movies &amp; Film</td>\n",
" <td>Films</td>\n",
" <td>Movies &amp; Film</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>21</td>\n",
" <td>Music</td>\n",
" <td>Music</td>\n",
" <td>Music</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>22</td>\n",
" <td>New Age &amp; Spirituality</td>\n",
" <td>Spirituality</td>\n",
" <td>New Age &amp; Spirituality</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>23</td>\n",
" <td>Outdoors &amp; Adventure</td>\n",
" <td>Outdoors</td>\n",
" <td>Outdoors &amp; Adventure</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>24</td>\n",
" <td>Paranormal</td>\n",
" <td>Paranormal</td>\n",
" <td>Paranormal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>25</td>\n",
" <td>Parents &amp; Family</td>\n",
" <td>Moms &amp; Dads</td>\n",
" <td>Parents &amp; Family</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>26</td>\n",
" <td>Pets &amp; Animals</td>\n",
" <td>Pets</td>\n",
" <td>Pets &amp; Animals</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>27</td>\n",
" <td>Photography</td>\n",
" <td>Photography</td>\n",
" <td>Photography</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>28</td>\n",
" <td>Religion &amp; Beliefs</td>\n",
" <td>Beliefs</td>\n",
" <td>Religion &amp; Beliefs</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>29</td>\n",
" <td>Sci-Fi &amp; Fantasy</td>\n",
" <td>Sci fi</td>\n",
" <td>Sci-Fi &amp; Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>30</td>\n",
" <td>Singles</td>\n",
" <td>Singles</td>\n",
" <td>Singles</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>31</td>\n",
" <td>Socializing</td>\n",
" <td>Social</td>\n",
" <td>Socializing</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>32</td>\n",
" <td>Sports &amp; Recreation</td>\n",
" <td>Sports</td>\n",
" <td>Sports &amp; Recreation</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>33</td>\n",
" <td>Support</td>\n",
" <td>Support</td>\n",
" <td>Support</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>34</td>\n",
" <td>Tech</td>\n",
" <td>Tech</td>\n",
" <td>Tech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>36</td>\n",
" <td>Writing</td>\n",
" <td>Writing</td>\n",
" <td>Writing</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name shortname sort_name\n",
"0 1 Arts & Culture Arts Arts & Culture\n",
"1 18 Book Clubs Book Clubs Book Clubs\n",
"2 2 Career & Business Business Career & Business\n",
"3 3 Cars & Motorcycles Auto Cars & Motorcycles\n",
"4 4 Community & Environment Community Community & Environment\n",
"5 5 Dancing Dancing Dancing\n",
"6 6 Education & Learning Education Education & Learning\n",
"7 8 Fashion & Beauty Fashion Fashion & Beauty\n",
"8 9 Fitness Fitness Fitness\n",
"9 10 Food & Drink Food & Drink Food & Drink\n",
"10 11 Games Games Games\n",
"11 13 Movements & Politics Movements Movements & Politics\n",
"12 14 Health & Wellbeing Well-being Health & Wellbeing\n",
"13 15 Hobbies & Crafts Crafts Hobbies & Crafts\n",
"14 16 Language & Ethnic Identity Languages Language & Ethnic Identity\n",
"15 12 LGBT LGBT LGBT\n",
"16 17 Lifestyle Lifestyle Lifestyle\n",
"17 20 Movies & Film Films Movies & Film\n",
"18 21 Music Music Music\n",
"19 22 New Age & Spirituality Spirituality New Age & Spirituality\n",
"20 23 Outdoors & Adventure Outdoors Outdoors & Adventure\n",
"21 24 Paranormal Paranormal Paranormal\n",
"22 25 Parents & Family Moms & Dads Parents & Family\n",
"23 26 Pets & Animals Pets Pets & Animals\n",
"24 27 Photography Photography Photography\n",
"25 28 Religion & Beliefs Beliefs Religion & Beliefs\n",
"26 29 Sci-Fi & Fantasy Sci fi Sci-Fi & Fantasy\n",
"27 30 Singles Singles Singles\n",
"28 31 Socializing Social Socializing\n",
"29 32 Sports & Recreation Sports Sports & Recreation\n",
"30 33 Support Support Support\n",
"31 34 Tech Tech Tech\n",
"32 36 Writing Writing Writing"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cats_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### So, if we want to work with a particular category\n",
"In this case, I want **Tech**. Let's query the dataframe for categories named **Tech**."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>shortname</th>\n",
" <th>sort_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>34</td>\n",
" <td>Tech</td>\n",
" <td>Tech</td>\n",
" <td>Tech</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name shortname sort_name\n",
"31 34 Tech Tech Tech"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tech_df = cats_df.loc[cats_df['name'] == 'Tech']\n",
"tech_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's store the category ID number for later use"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"34"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tech_category_id = tech_df['id'].values[0]\n",
"tech_category_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore Cities\n",
"### Now let's look at cities in the United States named San Francisco"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"cities_resp = client.GetCities(country='us', query='San Francisco')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we used the [GetCities] method of the [Python Meetup API client]\n",
"\n",
"I used a query for cities in **United States** called **San Francisco**.\n",
"\n",
"[//]: # (References)\n",
"\n",
"[GetCities]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#meetup.api.meetup.api.Client.GetCities\n",
"[Python Meetup API client]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#api-client-details\n",
"\n",
"Now let's take a look at the **meta** for our results."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>next</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>method</th>\n",
" <td>Cities</td>\n",
" </tr>\n",
" <tr>\n",
" <th>total_count</th>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>link</th>\n",
" <td>https://api.meetup.com/2/cities</td>\n",
" </tr>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>description</th>\n",
" <td>Returns Meetup cities. This method supports se...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lon</th>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>title</th>\n",
" <td>Cities</td>\n",
" </tr>\n",
" <tr>\n",
" <th>url</th>\n",
" <td>https://api.meetup.com/2/cities?country=us&amp;off...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>updated</th>\n",
" <td>1263132740000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"next \n",
"method Cities\n",
"total_count 4\n",
"link https://api.meetup.com/2/cities\n",
"count 4\n",
"description Returns Meetup cities. This method supports se...\n",
"lon None\n",
"title Cities\n",
"url https://api.meetup.com/2/cities?country=us&off...\n",
"id \n",
"updated 1263132740000\n",
"lat None"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_meta_df, cities_df = parse_response(cities_resp)\n",
"cities_meta_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hmm, a count of 4 cities is suspicious...\n",
"\n",
"I only know of the one San Francisco, why are there 4 cities?"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city</th>\n",
" <th>country</th>\n",
" <th>id</th>\n",
" <th>lat</th>\n",
" <th>localized_country_name</th>\n",
" <th>lon</th>\n",
" <th>member_count</th>\n",
" <th>name_string</th>\n",
" <th>ranking</th>\n",
" <th>state</th>\n",
" <th>zip</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>San Francisco</td>\n",
" <td>us</td>\n",
" <td>94101</td>\n",
" <td>37.779999</td>\n",
" <td>USA</td>\n",
" <td>-122.419998</td>\n",
" <td>60351</td>\n",
" <td>San Francisco, California, USA</td>\n",
" <td>0</td>\n",
" <td>CA</td>\n",
" <td>94101</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Bosque</td>\n",
" <td>us</td>\n",
" <td>87006</td>\n",
" <td>34.560001</td>\n",
" <td>USA</td>\n",
" <td>-106.779999</td>\n",
" <td>5</td>\n",
" <td>San Francisco, New Mexico, USA</td>\n",
" <td>1</td>\n",
" <td>NM</td>\n",
" <td>87006</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>San Luis</td>\n",
" <td>us</td>\n",
" <td>81152</td>\n",
" <td>37.080002</td>\n",
" <td>USA</td>\n",
" <td>-105.620003</td>\n",
" <td>4</td>\n",
" <td>San Francisco, Colorado, USA</td>\n",
" <td>2</td>\n",
" <td>CO</td>\n",
" <td>81152</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Reserve</td>\n",
" <td>us</td>\n",
" <td>87830</td>\n",
" <td>33.650002</td>\n",
" <td>USA</td>\n",
" <td>-108.769997</td>\n",
" <td>1</td>\n",
" <td>San Francisco Plaza, New Mexico, USA</td>\n",
" <td>3</td>\n",
" <td>NM</td>\n",
" <td>87830</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city country id lat localized_country_name lon \\\n",
"0 San Francisco us 94101 37.779999 USA -122.419998 \n",
"1 Bosque us 87006 34.560001 USA -106.779999 \n",
"2 San Luis us 81152 37.080002 USA -105.620003 \n",
"3 Reserve us 87830 33.650002 USA -108.769997 \n",
"\n",
" member_count name_string ranking state zip \n",
"0 60351 San Francisco, California, USA 0 CA 94101 \n",
"1 5 San Francisco, New Mexico, USA 1 NM 87006 \n",
"2 4 San Francisco, Colorado, USA 2 CO 81152 \n",
"3 1 San Francisco Plaza, New Mexico, USA 3 NM 87830 "
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oh, there are lot's of San Franciscos!\n",
"\n",
"### Let's filter the dataframe with a query to give us only cities in California, US"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city</th>\n",
" <th>country</th>\n",
" <th>id</th>\n",
" <th>lat</th>\n",
" <th>localized_country_name</th>\n",
" <th>lon</th>\n",
" <th>member_count</th>\n",
" <th>name_string</th>\n",
" <th>ranking</th>\n",
" <th>state</th>\n",
" <th>zip</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>San Francisco</td>\n",
" <td>us</td>\n",
" <td>94101</td>\n",
" <td>37.779999</td>\n",
" <td>USA</td>\n",
" <td>-122.419998</td>\n",
" <td>60351</td>\n",
" <td>San Francisco, California, USA</td>\n",
" <td>0</td>\n",
" <td>CA</td>\n",
" <td>94101</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" city country id lat localized_country_name lon \\\n",
"0 San Francisco us 94101 37.779999 USA -122.419998 \n",
"\n",
" member_count name_string ranking state zip \n",
"0 60351 San Francisco, California, USA 0 CA 94101 "
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"single_city_df = cities_df.loc[\n",
" (cities_df['state'] == 'CA')]\n",
"\n",
"single_city_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One San Francisco, perfect! \n",
"\n",
"### Let's store the latitude and longitude for later use as well"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(37.779998779296875, -122.41999816894531)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"latitude = single_city_df['lat'][0]\n",
"longitude = single_city_df['lon'][0]\n",
"latitude, longitude"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now let's look at groups in San Francisco, CA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Since we are going to grab lots of groups, lets make a function to help us call the API\n",
"\n",
"Note, this function will use the tech_category_id, latitude, and longitude values that we \n",
"found eariler."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"def get_a_group(page_number, category_id=tech_category_id, lat=latitude,\n",
" lon=longitude):\n",
" group = None\n",
" retry_counter, retry_max = 0, 3\n",
" print(f\"Getting page {page_number}\")\n",
" while retry_counter < retry_max:\n",
" try:\n",
" group = client.GetGroups(\n",
" category_id=category_id, lat=lat, lon=lon, offset=page_number)\n",
" return group\n",
" except:\n",
" print(f\"Fetch failure {retry_counter + 1}\")\n",
" retry_counter += 1\n",
"\n",
" raise Exception(f\"Unable to fetch page after {retry_counter} attempts\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now, After grabbing the first group\n",
"\n",
"Let's review the **meta** to help us see what we are getting into"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>next</th>\n",
" <td>https://api.meetup.com/2/groups?offset=1&amp;forma...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>method</th>\n",
" <td>Groups</td>\n",
" </tr>\n",
" <tr>\n",
" <th>total_count</th>\n",
" <td>2189</td>\n",
" </tr>\n",
" <tr>\n",
" <th>link</th>\n",
" <td>https://api.meetup.com/2/groups</td>\n",
" </tr>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>description</th>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lon</th>\n",
" <td>-122.42</td>\n",
" </tr>\n",
" <tr>\n",
" <th>title</th>\n",
" <td>Meetup Groups v2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>url</th>\n",
" <td>https://api.meetup.com/2/groups?offset=0&amp;forma...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>updated</th>\n",
" <td>1550723256000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lat</th>\n",
" <td>37.78</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"next https://api.meetup.com/2/groups?offset=1&forma...\n",
"method Groups\n",
"total_count 2189\n",
"link https://api.meetup.com/2/groups\n",
"count 200\n",
"description None\n",
"lon -122.42\n",
"title Meetup Groups v2\n",
"url https://api.meetup.com/2/groups?offset=0&forma...\n",
"id \n",
"updated 1550723256000\n",
"lat 37.78"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%capture --no-display\n",
"group_resp = get_a_group(0)\n",
"group_meta, _ = parse_response(group_resp)\n",
"group_meta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Wait! There's a meta[\\\"next\\\"].\n",
"\n",
"Remember earlier when I spoke about **response.meta\\[\"next\"\\]**? \n",
"\n",
"It seems as though our result will span mulitple API calls, each returning 200 new groups in \n",
"a **page**.\n",
"\n",
"Let's make a new helper that will grab each **page** in a series of API calls until we obtain the entire data set:\n",
"\n",
"We will use the pandas.DataFrame.[concat] function to collate all pages into a single useful dataframe\n",
"\n",
"[//]: # (References)\n",
"\n",
"[concat]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"def get_all_groups_as_a_df():\n",
" \"\"\"Returns a single dataframe composed from data from multiple\n",
" successive calls to get_a_group.\n",
" \n",
" We will loop through get_a_group pages while our page.meta['next'] is \n",
" not the empty string.\n",
" \"\"\"\n",
" page_df_list = []\n",
" next_page = None\n",
" page_number = 0\n",
" while next_page != '': \n",
" page = get_a_group(page_number)\n",
" next_page = page.meta[\"next\"]\n",
" _, frame = parse_response(page)\n",
" page_number += 1\n",
" page_df_list.append(frame)\n",
" \n",
" return pd.concat(page_df_list, ignore_index=True)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Getting page 0\n",
"29/30 (10 seconds remaining)\n",
"Getting page 1\n",
"28/30 (9 seconds remaining)\n",
"Getting page 2\n",
"27/30 (8 seconds remaining)\n",
"Getting page 3\n",
"26/30 (7 seconds remaining)\n",
"Getting page 4\n",
"25/30 (5 seconds remaining)\n",
"Getting page 5\n",
"24/30 (4 seconds remaining)\n",
"Getting page 6\n",
"23/30 (4 seconds remaining)\n",
"Getting page 7\n",
"22/30 (3 seconds remaining)\n",
"Getting page 8\n",
"21/30 (2 seconds remaining)\n",
"Getting page 9\n",
"20/30 (1 seconds remaining)\n",
"Getting page 10\n",
"29/30 (10 seconds remaining)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>category</th>\n",
" <th>city</th>\n",
" <th>country</th>\n",
" <th>created</th>\n",
" <th>description</th>\n",
" <th>group_photo</th>\n",
" <th>id</th>\n",
" <th>join_mode</th>\n",
" <th>lat</th>\n",
" <th>link</th>\n",
" <th>...</th>\n",
" <th>name</th>\n",
" <th>organizer</th>\n",
" <th>rating</th>\n",
" <th>state</th>\n",
" <th>timezone</th>\n",
" <th>topics</th>\n",
" <th>urlname</th>\n",
" <th>utc_offset</th>\n",
" <th>visibility</th>\n",
" <th>who</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>{'name': 'tech', 'id': 34, 'shortname': 'tech'}</td>\n",
" <td>San Francisco</td>\n",
" <td>US</td>\n",
" <td>1034097740000</td>\n",
" <td>&lt;p&gt;The SF PHP Community Meetup is an open foru...</td>\n",
" <td>{'highres_link': 'https://secure.meetupstatic....</td>\n",
" <td>120903</td>\n",
" <td>open</td>\n",
" <td>37.77</td>\n",
" <td>https://www.meetup.com/sf-php/</td>\n",
" <td>...</td>\n",
" <td>SF PHP Community</td>\n",
" <td>{'member_id': 126468982, 'name': 'Andre Marigo...</td>\n",
" <td>4.38</td>\n",
" <td>CA</td>\n",
" <td>US/Pacific</td>\n",
" <td>[{'urlkey': 'php', 'name': 'PHP', 'id': 455}, ...</td>\n",
" <td>sf-php</td>\n",
" <td>-28800000</td>\n",
" <td>public</td>\n",
" <td>PHP Developers</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1 rows × 22 columns</p>\n",
"</div>"
],
"text/plain": [
" category city country \\\n",
"0 {'name': 'tech', 'id': 34, 'shortname': 'tech'} San Francisco US \n",
"\n",
" created description \\\n",
"0 1034097740000 <p>The SF PHP Community Meetup is an open foru... \n",
"\n",
" group_photo id join_mode lat \\\n",
"0 {'highres_link': 'https://secure.meetupstatic.... 120903 open 37.77 \n",
"\n",
" link ... name \\\n",
"0 https://www.meetup.com/sf-php/ ... SF PHP Community \n",
"\n",
" organizer rating state timezone \\\n",
"0 {'member_id': 126468982, 'name': 'Andre Marigo... 4.38 CA US/Pacific \n",
"\n",
" topics urlname utc_offset \\\n",
"0 [{'urlkey': 'php', 'name': 'PHP', 'id': 455}, ... sf-php -28800000 \n",
"\n",
" visibility who \n",
"0 public PHP Developers \n",
"\n",
"[1 rows x 22 columns]"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Collect all groups into a single dataframe\n",
"all_groups_df = get_all_groups_as_a_df()\n",
"\n",
"# Show the first row in the dataframe\n",
"all_groups_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### That's too many columns\n",
"I really only care about a small list of columns, let's exclude the unneeded columns."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>members</th>\n",
" <th>rating</th>\n",
" <th>join_mode</th>\n",
" <th>urlname</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>120903</td>\n",
" <td>SF PHP Community</td>\n",
" <td>2699</td>\n",
" <td>4.38</td>\n",
" <td>open</td>\n",
" <td>sf-php</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name members rating join_mode urlname\n",
"0 120903 SF PHP Community 2699 4.38 open sf-php"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"column_list = ['id', 'name', 'members', 'rating', 'join_mode', 'urlname']\n",
"all_groups_df = all_groups_df[column_list]\n",
"all_groups_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's double check the size of our new dataframe"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2189, 6)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_groups_df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That looks just about right:\n",
"* 2189 rows\n",
"* 6 columns\n",
"\n",
"---\n",
"## Explore Members per Group\n",
"\n",
"Each group has a different sized membership, let's explore this first!\n",
"\n",
"\n",
"### Let's start with with a histogram\n",
"\n",
"This visualization should give us a basic idea of how big our groups are."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 792x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Using seaborn's distplot function\n",
"plt.rcParams['figure.figsize'] = [11, 6]\n",
"sb.distplot(all_groups_df['members'], kde=False, color=\"g\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### It appears that most groups are relatively small\n",
"\n",
"Let's take a closer look at some basic stats for our data in a tabular \n",
"format for some hard numbers:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>members</th>\n",
" <th>rating</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>2,189.00</td>\n",
" <td>2,189.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>813.80</td>\n",
" <td>2.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1,742.79</td>\n",
" <td>2.34</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>87.00</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>257.00</td>\n",
" <td>4.47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>782.00</td>\n",
" <td>4.84</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>36,057.00</td>\n",
" <td>5.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" members rating\n",
"count 2,189.00 2,189.00\n",
"mean 813.80 2.75\n",
"std 1,742.79 2.34\n",
"min 1.00 0.00\n",
"25% 87.00 0.00\n",
"50% 257.00 4.47\n",
"75% 782.00 4.84\n",
"max 36,057.00 5.00"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.options.display.float_format = '{:20,.2f}'.format\n",
"all_groups_df[[\"members\", \"rating\"]].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a table I can see some numbers:\n",
"\n",
"1. It looks like the average group size is about 814 persons.\n",
"2. Half of the group sits at or under 257 members.\n",
"3. The smallest group has a single person.\n",
"4. **the largest group has 36,000+ members!**\n",
"\n",
"What an outlier! But are there other **mega-groups** like this?\n",
"\n",
"### Maybe a box and whisker plot can visualize these stats for us"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x1440 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.rcParams['figure.figsize'] = [6, 20]\n",
"all_groups_df['members'].plot.box();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wow, there are quite a few **mega-groups**, as indicated by the circles above our top whisker!\n",
"\n",
"---\n",
"\n",
"Why are the groups so big?\n",
"\n",
"In fact...\n",
"\n",
"### What are the 10 biggest tech groups in the area?"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>members</th>\n",
" <th>rating</th>\n",
" <th>join_mode</th>\n",
" <th>urlname</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>844726</td>\n",
" <td>Silicon Valley Entrepreneurs &amp; Startups</td>\n",
" <td>36057</td>\n",
" <td>4.58</td>\n",
" <td>open</td>\n",
" <td>sventrepreneurs</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107</th>\n",
" <td>1619955</td>\n",
" <td>SFHTML5</td>\n",
" <td>17671</td>\n",
" <td>4.67</td>\n",
" <td>open</td>\n",
" <td>sfhtml5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>106</th>\n",
" <td>1615633</td>\n",
" <td>Designers + Geeks</td>\n",
" <td>15443</td>\n",
" <td>4.46</td>\n",
" <td>open</td>\n",
" <td>designersgeeks</td>\n",
" </tr>\n",
" <tr>\n",
" <th>426</th>\n",
" <td>9226282</td>\n",
" <td>SF Data Science</td>\n",
" <td>14868</td>\n",
" <td>4.49</td>\n",
" <td>open</td>\n",
" <td>SF-Data-Science</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>1060260</td>\n",
" <td>The SF JavaScript Meetup</td>\n",
" <td>13356</td>\n",
" <td>4.54</td>\n",
" <td>open</td>\n",
" <td>jsmeetup</td>\n",
" </tr>\n",
" <tr>\n",
" <th>250</th>\n",
" <td>3483762</td>\n",
" <td>Tech in Motion Events: San Francisco</td>\n",
" <td>13078</td>\n",
" <td>4.48</td>\n",
" <td>approval</td>\n",
" <td>TechinMotionSF</td>\n",
" </tr>\n",
" <tr>\n",
" <th>540</th>\n",
" <td>13402242</td>\n",
" <td>Docker Online Meetup</td>\n",
" <td>12475</td>\n",
" <td>4.37</td>\n",
" <td>open</td>\n",
" <td>Docker-Online-Meetup</td>\n",
" </tr>\n",
" <tr>\n",
" <th>191</th>\n",
" <td>2065031</td>\n",
" <td>SF Data Mining</td>\n",
" <td>12378</td>\n",
" <td>4.64</td>\n",
" <td>open</td>\n",
" <td>Data-Mining</td>\n",
" </tr>\n",
" <tr>\n",
" <th>201</th>\n",
" <td>2252591</td>\n",
" <td>Women Who Code SF</td>\n",
" <td>12331</td>\n",
" <td>4.78</td>\n",
" <td>open</td>\n",
" <td>Women-Who-Code-SF</td>\n",
" </tr>\n",
" <tr>\n",
" <th>705</th>\n",
" <td>18354966</td>\n",
" <td>SF Big Analytics</td>\n",
" <td>11869</td>\n",
" <td>4.53</td>\n",
" <td>open</td>\n",
" <td>SF-Big-Analytics</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name members \\\n",
"19 844726 Silicon Valley Entrepreneurs & Startups 36057 \n",
"107 1619955 SFHTML5 17671 \n",
"106 1615633 Designers + Geeks 15443 \n",
"426 9226282 SF Data Science 14868 \n",
"28 1060260 The SF JavaScript Meetup 13356 \n",
"250 3483762 Tech in Motion Events: San Francisco 13078 \n",
"540 13402242 Docker Online Meetup 12475 \n",
"191 2065031 SF Data Mining 12378 \n",
"201 2252591 Women Who Code SF 12331 \n",
"705 18354966 SF Big Analytics 11869 \n",
"\n",
" rating join_mode urlname \n",
"19 4.58 open sventrepreneurs \n",
"107 4.67 open sfhtml5 \n",
"106 4.46 open designersgeeks \n",
"426 4.49 open SF-Data-Science \n",
"28 4.54 open jsmeetup \n",
"250 4.48 approval TechinMotionSF \n",
"540 4.37 open Docker-Online-Meetup \n",
"191 4.64 open Data-Mining \n",
"201 4.78 open Women-Who-Code-SF \n",
"705 4.53 open SF-Big-Analytics "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"biggest_ten_df = all_groups_df.sort_values('members', ascending=False).head(10)\n",
"biggest_ten_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Group Events"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### We need to do some data shaping before the next api call\n",
"\n",
"Mostly we need to:\n",
"1. pass in a string with group ids from our 10 biggest groups\n",
"2. convert our human-readable date ranges to milliseconds since Jan 1, 1970\n",
"3. Call the GetEvents API filtering for past events using our group IDs and our date range\n",
"\n",
"#### First, let's make that string of group ids\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'844726,1619955,1615633,9226282,1060260,3483762,13402242,2065031,2252591,18354966'"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id_list = biggest_ten_df['id'].tolist()\n",
"id_list\n",
"ids = ','.join(str(x) for x in id_list)\n",
"ids"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now, let's get the epoch milliseconds for a date range between now and 9 months ago"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Now: 1550687560221, nine months ago: 1535135560221\n"
]
}
],
"source": [
"def to_millis(dt):\n",
" return int(pd.to_datetime(dt).value / 1000000)\n",
"\n",
"right_now = to_millis(datetime.datetime.now())\n",
"nine_months_ago = int(right_now - 180 * 24 * 60 * 60 * 1000)\n",
"print(f\"Now: {right_now}, nine months ago: {nine_months_ago}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Finally, let's look at those events."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"events_resp = client.GetEvents(group_id=ids, status='past',\n",
" time=f\"{nine_months_ago},{right_now}\");\n",
"\n",
"events_meta, events_df = parse_response(events_resp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Again, our events_df dataframe has extra columns that I don't care about"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>created</th>\n",
" <th>description</th>\n",
" <th>duration</th>\n",
" <th>event_url</th>\n",
" <th>group</th>\n",
" <th>headcount</th>\n",
" <th>how_to_find_us</th>\n",
" <th>id</th>\n",
" <th>maybe_rsvp_count</th>\n",
" <th>name</th>\n",
" <th>...</th>\n",
" <th>rating</th>\n",
" <th>rsvp_limit</th>\n",
" <th>status</th>\n",
" <th>time</th>\n",
" <th>updated</th>\n",
" <th>utc_offset</th>\n",
" <th>venue</th>\n",
" <th>visibility</th>\n",
" <th>waitlist_count</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1532649759000</td>\n",
" <td>&lt;p&gt;Join us for an evening of TensorFlow.js tal...</td>\n",
" <td>14400000</td>\n",
" <td>https://www.meetup.com/sfhtml5/events/253185437/</td>\n",
" <td>{'join_mode': 'open', 'created': 1269470260000...</td>\n",
" <td>0</td>\n",
" <td>Event will be on the 7th Floor</td>\n",
" <td>253185437</td>\n",
" <td>0</td>\n",
" <td>TensorFlow.js with Nick Kreeger and Ping Yu</td>\n",
" <td>...</td>\n",
" <td>{'count': 0, 'average': 0}</td>\n",
" <td>470.00</td>\n",
" <td>past</td>\n",
" <td>1535155200000</td>\n",
" <td>1535171121000</td>\n",
" <td>-25200000</td>\n",
" <td>{'zip': '94105', 'country': 'us', 'localized_c...</td>\n",
" <td>public</td>\n",
" <td>0</td>\n",
" <td>454</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" created description duration \\\n",
"0 1532649759000 <p>Join us for an evening of TensorFlow.js tal... 14400000 \n",
"\n",
" event_url \\\n",
"0 https://www.meetup.com/sfhtml5/events/253185437/ \n",
"\n",
" group headcount \\\n",
"0 {'join_mode': 'open', 'created': 1269470260000... 0 \n",
"\n",
" how_to_find_us id maybe_rsvp_count \\\n",
"0 Event will be on the 7th Floor 253185437 0 \n",
"\n",
" name ... \\\n",
"0 TensorFlow.js with Nick Kreeger and Ping Yu ... \n",
"\n",
" rating rsvp_limit status time \\\n",
"0 {'count': 0, 'average': 0} 470.00 past 1535155200000 \n",
"\n",
" updated utc_offset \\\n",
"0 1535171121000 -25200000 \n",
"\n",
" venue visibility \\\n",
"0 {'zip': '94105', 'country': 'us', 'localized_c... public \n",
"\n",
" waitlist_count yes_rsvp_count \n",
"0 0 454 \n",
"\n",
"[1 rows x 21 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### So again, let's filter down to just what's relevant"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>time</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>{'join_mode': 'open', 'created': 1269470260000...</td>\n",
" <td>1535155200000</td>\n",
" <td>14400000</td>\n",
" <td>454</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group time duration \\\n",
"0 {'join_mode': 'open', 'created': 1269470260000... 1535155200000 14400000 \n",
"\n",
" yes_rsvp_count \n",
"0 454 "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"column_list = ['group', 'time', 'duration', 'yes_rsvp_count']\n",
"events_df = events_df[column_list]\n",
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The group column\n",
"\n",
"The **group** column is actually a JSON object full of metadata about the group.\n",
"\n",
"I really only need the **group\\[\"id\"\\]** for now, so let's focus on that."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>time</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1619955</td>\n",
" <td>1535155200000</td>\n",
" <td>14400000</td>\n",
" <td>454</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id time duration yes_rsvp_count\n",
"0 1619955 1535155200000 14400000 454"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def get_id(my_dict):\n",
" \"\"\"Extract the id member of a python dictionary\"\"\"\n",
" return my_dict[\"id\"]\n",
"\n",
"events_df[\"id\"] = events_df[\"group\"].apply(get_id)\n",
"\n",
"# Let's \n",
"columns = ['id', 'time', 'duration', 'yes_rsvp_count']\n",
"events_df = events_df[columns]\n",
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hmm, it seems that our **time** is numeric\n",
"\n",
"The **time** is stored in **Epoch milliseconds** format.\n",
"\n",
"This is great if you want to see time as the number of milliseconds since Jan 1, 1970.\n",
"\n",
"This is not-so-great if you just want to see a human-readable date and time equivalent.\n",
"\n",
"Let's make a new human-readable column called **time_dt**"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>time</th>\n",
" <th>time_dt</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1619955</td>\n",
" <td>1535155200000</td>\n",
" <td>08/25/18 00:00</td>\n",
" <td>14400000</td>\n",
" <td>454</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id time time_dt duration yes_rsvp_count\n",
"0 1619955 1535155200000 08/25/18 00:00 14400000 454"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"events_df[\"time_dt\"] = pd.to_datetime(\n",
" events_df[\"time\"], unit='ms').dt.strftime('%m/%d/%y %H:%M')\n",
" \n",
"columns = ['id', 'time','time_dt', 'duration', 'yes_rsvp_count']\n",
"events_df = events_df[columns]\n",
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now let's convert the duration column to something human-readable\n",
"\n",
"Let's convert the column to a string that shows hours and minutes."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>time</th>\n",
" <th>time_dt</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1619955</td>\n",
" <td>1535155200000</td>\n",
" <td>08/25/18 00:00</td>\n",
" <td>4 hours, 0 minutes</td>\n",
" <td>454</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id time time_dt duration yes_rsvp_count\n",
"0 1619955 1535155200000 08/25/18 00:00 4 hours, 0 minutes 454"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def millis_2_hours_and_minutes(ms):\n",
" \"\"\"Converts milliseconds to hours and minutes.\"\"\"\n",
" seconds = ms / 1000\n",
" minutes, seconds = divmod(seconds, 60)\n",
" hours, minutes = divmod(minutes, 60)\n",
"\n",
" return f\"{int(hours)} hours, {int(minutes)} minutes\" \n",
"\n",
"events_df[\"duration\"] = events_df[\"duration\"].apply(\n",
" millis_2_hours_and_minutes)\n",
"\n",
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now let's join our top ten mega-groups dataframe with our events dataframe\n",
"\n",
"If you are familiar with SQL this is similar to a left join from **raw_results_df**\n",
"to **biggest_ten_df** on **id**\n",
"\n",
"Then we sort the output by **name** ascending and then **time** descending."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>time</th>\n",
" <th>time_dt</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1542337200000</td>\n",
" <td>11/16/18 03:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1541124000000</td>\n",
" <td>11/02/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1539914400000</td>\n",
" <td>10/19/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1538704800000</td>\n",
" <td>10/05/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1537495200000</td>\n",
" <td>09/21/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Designers + Geeks</td>\n",
" <td>1536285600000</td>\n",
" <td>09/07/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>87</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>Docker Online Meetup</td>\n",
" <td>1540915200000</td>\n",
" <td>10/30/18 16:00</td>\n",
" <td>1 hours, 0 minutes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1549591200000</td>\n",
" <td>02/08/19 02:00</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>418</td>\n",
" </tr>\n",
" <tr>\n",
" <th>82</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1547690400000</td>\n",
" <td>01/17/19 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>80</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1547604000000</td>\n",
" <td>01/16/19 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>66</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1544061600000</td>\n",
" <td>12/06/18 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1542247200000</td>\n",
" <td>11/15/18 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1542160800000</td>\n",
" <td>11/14/18 02:00</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>478</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1541782800000</td>\n",
" <td>11/09/18 17:00</td>\n",
" <td>57 hours, 0 minutes</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1541120400000</td>\n",
" <td>11/02/18 01:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>152</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1539909000000</td>\n",
" <td>10/19/18 00:30</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>390</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1538010000000</td>\n",
" <td>09/27/18 01:00</td>\n",
" <td>2 hours, 45 minutes</td>\n",
" <td>245</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>SF Big Analytics</td>\n",
" <td>1536800400000</td>\n",
" <td>09/13/18 01:00</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>218</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>SF Data Mining</td>\n",
" <td>1537200000000</td>\n",
" <td>09/17/18 16:00</td>\n",
" <td>104 hours, 0 minutes</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>SF Data Science</td>\n",
" <td>1548988200000</td>\n",
" <td>02/01/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>SF Data Science</td>\n",
" <td>1548383400000</td>\n",
" <td>01/25/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>84</th>\n",
" <td>SF Data Science</td>\n",
" <td>1547778600000</td>\n",
" <td>01/18/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>83</th>\n",
" <td>SF Data Science</td>\n",
" <td>1547692200000</td>\n",
" <td>01/17/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62</th>\n",
" <td>SF Data Science</td>\n",
" <td>1543539600000</td>\n",
" <td>11/30/18 01:00</td>\n",
" <td>3 hours, 30 minutes</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>SF Data Science</td>\n",
" <td>1542211200000</td>\n",
" <td>11/14/18 16:00</td>\n",
" <td>10 hours, 0 minutes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>SF Data Science</td>\n",
" <td>1540774800000</td>\n",
" <td>10/29/18 01:00</td>\n",
" <td>4 hours, 0 minutes</td>\n",
" <td>21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>SF Data Science</td>\n",
" <td>1539910800000</td>\n",
" <td>10/19/18 01:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>SF Data Science</td>\n",
" <td>1539306000000</td>\n",
" <td>10/12/18 01:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>SFHTML5</td>\n",
" <td>1550075400000</td>\n",
" <td>02/13/19 16:30</td>\n",
" <td>12 hours, 30 minutes</td>\n",
" <td>39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>SFHTML5</td>\n",
" <td>1548464400000</td>\n",
" <td>01/26/19 01:00</td>\n",
" <td>3 hours, 30 minutes</td>\n",
" <td>419</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1549420200000</td>\n",
" <td>02/06/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1548815400000</td>\n",
" <td>01/30/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1548210600000</td>\n",
" <td>01/23/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>81</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1547605800000</td>\n",
" <td>01/16/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>32</td>\n",
" </tr>\n",
" <tr>\n",
" <th>78</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1547001000000</td>\n",
" <td>01/09/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>57</td>\n",
" </tr>\n",
" <tr>\n",
" <th>73</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1545271200000</td>\n",
" <td>12/20/18 02:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>199</td>\n",
" </tr>\n",
" <tr>\n",
" <th>71</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1545186600000</td>\n",
" <td>12/19/18 02:30</td>\n",
" <td>1 hours, 30 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1545186600000</td>\n",
" <td>12/19/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>23</td>\n",
" </tr>\n",
" <tr>\n",
" <th>69</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1544581800000</td>\n",
" <td>12/12/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1544149800000</td>\n",
" <td>12/07/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>65</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1543977000000</td>\n",
" <td>12/05/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>64</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1543975200000</td>\n",
" <td>12/05/18 02:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1542767400000</td>\n",
" <td>11/21/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1542335400000</td>\n",
" <td>11/16/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1541557800000</td>\n",
" <td>11/07/18 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1541122200000</td>\n",
" <td>11/02/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1540949400000</td>\n",
" <td>10/31/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1540859400000</td>\n",
" <td>10/30/18 00:30</td>\n",
" <td>2 hours, 30 minutes</td>\n",
" <td>64</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1540400400000</td>\n",
" <td>10/24/18 17:00</td>\n",
" <td>1 hours, 0 minutes</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1540344600000</td>\n",
" <td>10/24/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>36</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1539912600000</td>\n",
" <td>10/19/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1539135000000</td>\n",
" <td>10/10/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1538703000000</td>\n",
" <td>10/05/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1538530200000</td>\n",
" <td>10/03/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1538530200000</td>\n",
" <td>10/03/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1537493400000</td>\n",
" <td>09/21/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1537320600000</td>\n",
" <td>09/19/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1537030800000</td>\n",
" <td>09/15/18 17:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1536715800000</td>\n",
" <td>09/12/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Women Who Code SF</td>\n",
" <td>1535506200000</td>\n",
" <td>08/29/18 01:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>35</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>103 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" name time time_dt \\\n",
"57 Designers + Geeks 1542337200000 11/16/18 03:00 \n",
"48 Designers + Geeks 1541124000000 11/02/18 02:00 \n",
"34 Designers + Geeks 1539914400000 10/19/18 02:00 \n",
"26 Designers + Geeks 1538704800000 10/05/18 02:00 \n",
"16 Designers + Geeks 1537495200000 09/21/18 02:00 \n",
"6 Designers + Geeks 1536285600000 09/07/18 02:00 \n",
"42 Docker Online Meetup 1540915200000 10/30/18 16:00 \n",
"99 SF Big Analytics 1549591200000 02/08/19 02:00 \n",
"82 SF Big Analytics 1547690400000 01/17/19 02:00 \n",
"80 SF Big Analytics 1547604000000 01/16/19 02:00 \n",
"66 SF Big Analytics 1544061600000 12/06/18 02:00 \n",
"55 SF Big Analytics 1542247200000 11/15/18 02:00 \n",
"53 SF Big Analytics 1542160800000 11/14/18 02:00 \n",
"52 SF Big Analytics 1541782800000 11/09/18 17:00 \n",
"46 SF Big Analytics 1541120400000 11/02/18 01:00 \n",
"31 SF Big Analytics 1539909000000 10/19/18 00:30 \n",
"18 SF Big Analytics 1538010000000 09/27/18 01:00 \n",
"10 SF Big Analytics 1536800400000 09/13/18 01:00 \n",
"12 SF Data Mining 1537200000000 09/17/18 16:00 \n",
"95 SF Data Science 1548988200000 02/01/19 02:30 \n",
"91 SF Data Science 1548383400000 01/25/19 02:30 \n",
"84 SF Data Science 1547778600000 01/18/19 02:30 \n",
"83 SF Data Science 1547692200000 01/17/19 02:30 \n",
"62 SF Data Science 1543539600000 11/30/18 01:00 \n",
"54 SF Data Science 1542211200000 11/14/18 16:00 \n",
"40 SF Data Science 1540774800000 10/29/18 01:00 \n",
"32 SF Data Science 1539910800000 10/19/18 01:00 \n",
"28 SF Data Science 1539306000000 10/12/18 01:00 \n",
"101 SFHTML5 1550075400000 02/13/19 16:30 \n",
"92 SFHTML5 1548464400000 01/26/19 01:00 \n",
".. ... ... ... \n",
"97 Women Who Code SF 1549420200000 02/06/19 02:30 \n",
"94 Women Who Code SF 1548815400000 01/30/19 02:30 \n",
"88 Women Who Code SF 1548210600000 01/23/19 02:30 \n",
"81 Women Who Code SF 1547605800000 01/16/19 02:30 \n",
"78 Women Who Code SF 1547001000000 01/09/19 02:30 \n",
"73 Women Who Code SF 1545271200000 12/20/18 02:00 \n",
"71 Women Who Code SF 1545186600000 12/19/18 02:30 \n",
"72 Women Who Code SF 1545186600000 12/19/18 02:30 \n",
"69 Women Who Code SF 1544581800000 12/12/18 02:30 \n",
"67 Women Who Code SF 1544149800000 12/07/18 02:30 \n",
"65 Women Who Code SF 1543977000000 12/05/18 02:30 \n",
"64 Women Who Code SF 1543975200000 12/05/18 02:00 \n",
"60 Women Who Code SF 1542767400000 11/21/18 02:30 \n",
"56 Women Who Code SF 1542335400000 11/16/18 02:30 \n",
"51 Women Who Code SF 1541557800000 11/07/18 02:30 \n",
"47 Women Who Code SF 1541122200000 11/02/18 01:30 \n",
"45 Women Who Code SF 1540949400000 10/31/18 01:30 \n",
"41 Women Who Code SF 1540859400000 10/30/18 00:30 \n",
"36 Women Who Code SF 1540400400000 10/24/18 17:00 \n",
"35 Women Who Code SF 1540344600000 10/24/18 01:30 \n",
"33 Women Who Code SF 1539912600000 10/19/18 01:30 \n",
"27 Women Who Code SF 1539135000000 10/10/18 01:30 \n",
"25 Women Who Code SF 1538703000000 10/05/18 01:30 \n",
"23 Women Who Code SF 1538530200000 10/03/18 01:30 \n",
"24 Women Who Code SF 1538530200000 10/03/18 01:30 \n",
"15 Women Who Code SF 1537493400000 09/21/18 01:30 \n",
"13 Women Who Code SF 1537320600000 09/19/18 01:30 \n",
"11 Women Who Code SF 1537030800000 09/15/18 17:00 \n",
"9 Women Who Code SF 1536715800000 09/12/18 01:30 \n",
"3 Women Who Code SF 1535506200000 08/29/18 01:30 \n",
"\n",
" duration yes_rsvp_count \n",
"57 2 hours, 0 minutes 71 \n",
"48 2 hours, 0 minutes 71 \n",
"34 2 hours, 0 minutes 41 \n",
"26 2 hours, 0 minutes 29 \n",
"16 2 hours, 0 minutes 37 \n",
"6 2 hours, 0 minutes 87 \n",
"42 1 hours, 0 minutes 1 \n",
"99 2 hours, 30 minutes 418 \n",
"82 3 hours, 0 minutes 450 \n",
"80 3 hours, 0 minutes 5 \n",
"66 3 hours, 0 minutes 7 \n",
"55 3 hours, 0 minutes 16 \n",
"53 2 hours, 30 minutes 478 \n",
"52 57 hours, 0 minutes 16 \n",
"46 3 hours, 0 minutes 152 \n",
"31 2 hours, 30 minutes 390 \n",
"18 2 hours, 45 minutes 245 \n",
"10 2 hours, 30 minutes 218 \n",
"12 104 hours, 0 minutes 6 \n",
"95 2 hours, 0 minutes 94 \n",
"91 2 hours, 0 minutes 48 \n",
"84 2 hours, 0 minutes 26 \n",
"83 2 hours, 0 minutes 59 \n",
"62 3 hours, 30 minutes 18 \n",
"54 10 hours, 0 minutes 1 \n",
"40 4 hours, 0 minutes 21 \n",
"32 2 hours, 0 minutes 95 \n",
"28 3 hours, 0 minutes 15 \n",
"101 12 hours, 30 minutes 39 \n",
"92 3 hours, 30 minutes 419 \n",
".. ... ... \n",
"97 2 hours, 0 minutes 47 \n",
"94 2 hours, 0 minutes 37 \n",
"88 2 hours, 0 minutes 29 \n",
"81 2 hours, 0 minutes 32 \n",
"78 2 hours, 0 minutes 57 \n",
"73 2 hours, 0 minutes 199 \n",
"71 1 hours, 30 minutes 50 \n",
"72 2 hours, 0 minutes 23 \n",
"69 2 hours, 0 minutes 22 \n",
"67 2 hours, 0 minutes 17 \n",
"65 2 hours, 0 minutes 29 \n",
"64 3 hours, 0 minutes 1 \n",
"60 2 hours, 0 minutes 31 \n",
"56 2 hours, 0 minutes 50 \n",
"51 2 hours, 0 minutes 27 \n",
"47 2 hours, 0 minutes 50 \n",
"45 2 hours, 0 minutes 29 \n",
"41 2 hours, 30 minutes 64 \n",
"36 1 hours, 0 minutes 2 \n",
"35 2 hours, 0 minutes 36 \n",
"33 2 hours, 0 minutes 50 \n",
"27 2 hours, 0 minutes 25 \n",
"25 2 hours, 0 minutes 50 \n",
"23 2 hours, 0 minutes 10 \n",
"24 2 hours, 0 minutes 27 \n",
"15 2 hours, 0 minutes 33 \n",
"13 2 hours, 0 minutes 14 \n",
"11 3 hours, 0 minutes 10 \n",
"9 2 hours, 0 minutes 25 \n",
"3 2 hours, 0 minutes 35 \n",
"\n",
"[103 rows x 5 columns]"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged_df = pd.merge(\n",
" events_df,\n",
" biggest_ten_df[['id', 'name']],\n",
" on='id',\n",
" how='left')\n",
"\n",
"columns = ['name', 'time', 'time_dt', 'duration', 'yes_rsvp_count']\n",
"final_df = merged_df[columns]\n",
"\n",
"# Sort the output by name and time\n",
"final_df.sort_values(by=['name', 'time'], ascending=[True, False])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ah, this is interesting\n",
"\n",
"From the top 10 mega-groups I can see that **SF Data Science** meets pretty regularly, and that their attendance is pretty consistent. \n",
"\n",
"Let's take a closer look at just this group by using another dataframe query."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>time</th>\n",
" <th>time_dt</th>\n",
" <th>duration</th>\n",
" <th>yes_rsvp_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>SF Data Science</td>\n",
" <td>1539306000000</td>\n",
" <td>10/12/18 01:00</td>\n",
" <td>3 hours, 0 minutes</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>SF Data Science</td>\n",
" <td>1539910800000</td>\n",
" <td>10/19/18 01:00</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>SF Data Science</td>\n",
" <td>1540774800000</td>\n",
" <td>10/29/18 01:00</td>\n",
" <td>4 hours, 0 minutes</td>\n",
" <td>21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>SF Data Science</td>\n",
" <td>1542211200000</td>\n",
" <td>11/14/18 16:00</td>\n",
" <td>10 hours, 0 minutes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62</th>\n",
" <td>SF Data Science</td>\n",
" <td>1543539600000</td>\n",
" <td>11/30/18 01:00</td>\n",
" <td>3 hours, 30 minutes</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>83</th>\n",
" <td>SF Data Science</td>\n",
" <td>1547692200000</td>\n",
" <td>01/17/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>84</th>\n",
" <td>SF Data Science</td>\n",
" <td>1547778600000</td>\n",
" <td>01/18/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>SF Data Science</td>\n",
" <td>1548383400000</td>\n",
" <td>01/25/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>SF Data Science</td>\n",
" <td>1548988200000</td>\n",
" <td>02/01/19 02:30</td>\n",
" <td>2 hours, 0 minutes</td>\n",
" <td>94</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name time time_dt duration \\\n",
"28 SF Data Science 1539306000000 10/12/18 01:00 3 hours, 0 minutes \n",
"32 SF Data Science 1539910800000 10/19/18 01:00 2 hours, 0 minutes \n",
"40 SF Data Science 1540774800000 10/29/18 01:00 4 hours, 0 minutes \n",
"54 SF Data Science 1542211200000 11/14/18 16:00 10 hours, 0 minutes \n",
"62 SF Data Science 1543539600000 11/30/18 01:00 3 hours, 30 minutes \n",
"83 SF Data Science 1547692200000 01/17/19 02:30 2 hours, 0 minutes \n",
"84 SF Data Science 1547778600000 01/18/19 02:30 2 hours, 0 minutes \n",
"91 SF Data Science 1548383400000 01/25/19 02:30 2 hours, 0 minutes \n",
"95 SF Data Science 1548988200000 02/01/19 02:30 2 hours, 0 minutes \n",
"\n",
" yes_rsvp_count \n",
"28 15 \n",
"32 95 \n",
"40 21 \n",
"54 1 \n",
"62 18 \n",
"83 59 \n",
"84 26 \n",
"91 48 \n",
"95 94 "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"interesting_group_df = final_df[final_df['name'] == \"SF Data Science\"]\n",
"interesting_group_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A simple bar chart shows us something\n",
"\n",
"I will use the **time_dt** column vs **yes_rsvp_count** to see if people\n",
"are looking forward to new events."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 792x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.rcParams['figure.figsize'] = [11, 6]\n",
"\n",
"lines = interesting_group_df[['yes_rsvp_count', 'time_dt']].plot(\n",
" kind='bar', x='time_dt', y='yes_rsvp_count', style='.-')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"This is nice, but what is the trend, is the group active? It's tough to tell\n",
"from a simple bar chart.\n",
"\n",
"### Let's visualize the trend line\n",
"\n",
"Let's see the trend using a seaborn regplot. \n",
"\n",
"I'll use the epoch milliseconds **time** column of our dataframe, since it is numeric and can be used to generate a trend line."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f57a5df52e8>"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 792x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sb.regplot(x='time', y='yes_rsvp_count', data=interesting_group_df,\n",
" color=\"green\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Goal Achieved!\n",
"\n",
"At last! We found a Tech group in San Francisco, CA that:\n",
"1. is among the 10 biggest in the city\n",
"2. has a trend of more and more people RSVP-ing to their events\n",
"3. holds events occuring every few weeks.\n",
"\n",
"## Conclusion\n",
"\n",
"We achieved our objectives and demonstrated several useful techniques along the way we :\n",
"1. worked with the [Python Meetup API client]\n",
"2. built a helper function to parse response objects into **meta** and **results** dataframes\n",
"3. built a helper function to loop through multiple API calls and [concat]-enate a list of pages into a single useful dataframe\n",
"4. Used pandas.DataFrame.[query] to sort and filter data of interest\n",
"5. Used pandas.DataFrame.[apply] to clean columns of data using custom helper functions\n",
"6. Used pandas.DataFrame.[describe] to get descriptive statistics that summarize\n",
" * the central tendency\n",
" * dispersion\n",
" * shape of our dataset's distribution\n",
"7. Used pandas.DataFrame.[merge] to join the **events** and **groups** dataframes to create a report of the events for our 10 biggest mega-groups in technology\n",
"8. Found a mega-group that shows enduring and increasing interest by its membership\n",
"\n",
"\n",
"[//]: # (References)\n",
"\n",
"[Python Meetup API client]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#api-client-details\n",
"[concat]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html\n",
"[query]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html\n",
"[apply]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html\n",
"[describe]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html\n",
"[merge]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (datascience_challenges)",
"language": "python",
"name": "datascience_challenges"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment