Skip to content

Instantly share code, notes, and snippets.

@tathagata
Created November 17, 2017 16:22
Show Gist options
  • Save tathagata/5c8315c5d259a770c5f2d16ffc2ab713 to your computer and use it in GitHub Desktop.
Save tathagata/5c8315c5d259a770c5f2d16ffc2ab713 to your computer and use it in GitHub Desktop.
# coding: utf-8
# ## Data Analysis using Pandas
# Pandas has become the defacto package for data analysis. In this workshop, we are going to use the basics of pandas to analyze the interests of today's group. We are going to use meetup.com's api and fetch the list of interests that are listed in each of our meetup.com profile. We will compute which interests are common, which are uncommon, and find out how we can use topics of common interest to form teams for project night.
#
# Lets get started by importing the essentials. You would need meetup.com's python api and pandas installed.
# In[ ]:
get_ipython().system('pip install meetup-api')
import meetup.api
import pandas as pd
from IPython.display import Image, display, HTML
from itertools import combinations
# Next we need your meetup.com API. You will find it https://secure.meetup.com/meetup_api/key/ Also we need today's event id. Tonight's event id is 239174132
# In[7]:
API_KEY = '3f6d3275d3b6314e73453c4aa27'
event_id='239174132'
# The following function uses the api and loads the data into a pandas data frame.
# In[8]:
def get_members(event_id):
client = meetup.api.Client(API_KEY)
rsvps=client.GetRsvps(event_id=event_id, urlname='_ChiPy_')
member_id = ','.join([str(i['member']['member_id']) for i in rsvps.results])
return client.GetMembers(member_id=member_id)
def get_topics(members):
topics = set()
for member in members.results:
try:
for t in member['topics']:
topics.add(t['name'])
except:
pass
return list(topics)
def df_topics(event_id):
members = get_members(event_id=event_id)
topics = get_topics(members)
columns=['name','id','thumb_link'] + topics
data = []
for member in members.results:
topic_vector = [0]*len(topics)
for topic in member['topics']:
index = topics.index(topic['name'])
topic_vector[index-1] = 1
try:
data.append([member['name'], member['id'], member['photo']['thumb_link']] + topic_vector)
except:
pass
return pd.DataFrame(data=data, columns=columns)
#df.to_csv('output.csv', sep=";")
# ### Q1: Load data from meetup.com into a dataframe by calling df_topics.
# You'll need to call the `df_topics` function with the `event_id` and assign it to a variable to use it for the following questions.
# ### Q2: What are the column names of the dataframe?
# ### Q3: How do you check the index of the dataframe? Can you set the index of the data frame to be the names column?
# ### Q4: How would you get the transpose of the dataframe?
# ### Q5: What does the first and last 10 rows of the dataset look like?
# ### Q6: Write the data out to a csv file. Only include names and topics. Do not include member id and thumblink.
# ### Q7: How many unique topics of interest do we have?
# ### Q8: Write a function that takes a name and gives back all of his/her interest.
# ### Q9: Write a function that takes a topic of interest and gives back names who are interested.
# ### Q10: Who has the highest number of topics? How many topics is he/she interested in?
# ### Q11: Which is the most common topic of intertest? Which is the least popular topic of interest?
# ### Q12: Which names are associated with the topics of interest found in the previous question.
# ### Q13: Draw a plot that shows the frequency of each topic.
# ### Q14: Are there topic(s) common to all the members of your team?
# ### Q14: Write a function that will take the names of your team members and rank every pair by the number of topics common among them. So if the team has A, B, C and D, an example could be
#
# A, B - 6
# A, C - 5
# A, D - 4
# B, C - 3
# B, D - 2
# C, D - 1
#
# ### Note the pairs are sorted in the number of topics common among them.
#
# ### Q15: Implement an algorithm where you can create groups of four from the given data frame, such that each team combination has the highest number of common topics possible.
# For example, lets say we have A, B, C, D, E, F. We can create teams
#
# (A, B, C, D - 11), (E, F - 8)
# (B, C, D, E - 10), (A, F - 7)
# (C, D, E, A - 9), (B, F - 6)
# ...
#
# where the number represent the number of topics common among all four of them.
# Then we will pick row one as that shows the team compositions with maximum number of topics common among them.
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment