Skip to content

Instantly share code, notes, and snippets.

@kikiliu
Created April 8, 2014 06:11
Show Gist options
  • Save kikiliu/10096184 to your computer and use it in GitHub Desktop.
Save kikiliu/10096184 to your computer and use it in GitHub Desktop.
Extract_posts
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"from pandas import Series, DataFrame"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import facebook\n",
"import requests\n",
"import fbtoken\n",
"#https://developers.facebook.com/tools/explorer/145634995501895/?method=GET&path=me%2Fposts%3Ffields%3Dmessage%26limit%3D50&version="
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"assert fbtoken.token is not None"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import urlparse\n",
"import urllib\n",
"import json"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def set_url_parameter(url, name, value):\n",
" \"\"\"Take an input URL template and key-value string, one pair each call. Return URL\"\"\"\n",
" uri = urlparse.urlparse(url)\n",
" \n",
" #convert pr.query from string to dict \n",
" query_dict = urlparse.parse_qs(uri.query)#Parse a query string given as a string argument\n",
" query_dict[name]= value\n",
"\n",
" #e.g. http://stackoverflow.com/a/10233141/7782 \n",
" #convert a mapping object or a sequence (doseq=True) of two-element tuples to a \u201cpercent-encoded\u201d string\n",
" #suitable to pass to urlopen() as the optional data argument.\n",
" query = urllib.urlencode(query_dict, doseq=True)\n",
" uri = (uri.scheme, uri.netloc, uri.path, uri.params, query, uri.fragment)#avoid magic number[4]\n",
" \n",
" #Construct a URL from a tuple as returned by urlparse()\n",
" return urlparse.urlunparse(uri)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def get_posts(posts_limit): \n",
" url_posts = \"https://graph.facebook.com/me/posts?fields=message&limit=&method=GET&format=json&suppress_http_code=1&access_token=\"\n",
" url = set_url_parameter(url_posts, 'access_token', fbtoken.token)\n",
" url = set_url_parameter(url, 'limit', posts_limit)\n",
" response = requests.get(url).content \n",
" json_data = json.loads(response)\n",
" \n",
" return json_data[\"data\"]\n",
"posts = get_posts(50)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"pd.DataFrame(posts)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>created_time</th>\n",
" <th>id</th>\n",
" <th>message</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> 2014-04-08T01:36:32+0000</td>\n",
" <td> 615201382_10152400022766383</td>\n",
" <td> Thanks to Chris Moshe and Lily, I finally make...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> 2014-04-01T21:43:39+0000</td>\n",
" <td> 615201382_10152387632741383</td>\n",
" <td> I just learn the news from my open data class ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> 2014-04-01T03:19:20+0000</td>\n",
" <td> 615201382_10152386174281383</td>\n",
" <td> Thank you all for your warm wishes and your wo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> 2014-03-29T15:13:55+0000</td>\n",
" <td> 615201382_10152380835206383</td>\n",
" <td> I'm always curious about where programmers go ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> 2014-03-29T02:12:39+0000</td>\n",
" <td> 615201382_10152379917316383</td>\n",
" <td> One thing we can frequently see: Rainbow. On o...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> 2014-03-24T17:58:43+0000</td>\n",
" <td> 615201382_10152371207451383</td>\n",
" <td> A sad day #RIP It may take years to investigat...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> 2014-03-18T17:03:33+0000</td>\n",
" <td> 615201382_10152358426636383</td>\n",
" <td> I don't trust goldman but have faith in Musk. ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> 2014-03-13T05:56:58+0000</td>\n",
" <td> 615201382_10152346519641383</td>\n",
" <td> Kind of obsessed on this. Pls be aliens. And p...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> 2014-03-11T05:56:03+0000</td>\n",
" <td> 615201382_10152342641006383</td>\n",
" <td> What do you guys think of the new design?</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> 2014-03-10T21:17:35+0000</td>\n",
" <td> 615201382_10152341715456383</td>\n",
" <td> Glad that they both escape unscratched. @my Sk...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td> 2014-03-09T04:04:31+0000</td>\n",
" <td> 615201382_10152338035136383</td>\n",
" <td> Expecting a wonderful performance from Vienna ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td> 2014-03-03T18:07:35+0000</td>\n",
" <td> 615201382_10152325357666383</td>\n",
" <td> Honestly, I desperately need a skydiving after...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td> 2014-02-28T15:53:57+0000</td>\n",
" <td> 615201382_10152318529656383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td> 2014-02-28T06:12:32+0000</td>\n",
" <td> 615201382_10152317794896383</td>\n",
" <td> Self-destruct dude! Beat iPhone 6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td> 2014-02-26T05:19:00+0000</td>\n",
" <td> 615201382_10152313649851383</td>\n",
" <td> \"#Datamining is Art\" #quotefromSheyas</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td> 2014-02-25T22:32:30+0000</td>\n",
" <td> 615201382_10152313027176383</td>\n",
" <td> Thinking about \"Dark internet\" in House of Car...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td> 2014-02-23T22:10:00+0000</td>\n",
" <td> 615201382_10152308461996383</td>\n",
" <td> Hey family and friends: I'm setting up a group...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td> 2014-02-19T23:21:57+0000</td>\n",
" <td> 615201382_10152300323941383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td> 2014-02-15T16:31:48+0000</td>\n",
" <td> 615201382_10152290609641383</td>\n",
" <td> Allow me to forwards it to MS people: Your chi...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td> 2014-02-15T00:49:40+0000</td>\n",
" <td> 615201382_10152289283606383</td>\n",
" <td> Anyone applied Bahamas visa before? Need some ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td> 2014-02-08T06:18:39+0000</td>\n",
" <td> 615201382_10152274887236383</td>\n",
" <td> My photoshop homework for Valentine's Day! Fee...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td> 2014-02-08T03:00:22+0000</td>\n",
" <td> 615201382_10152274613241383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td> 2014-02-07T03:43:03+0000</td>\n",
" <td> 615201382_10152272546291383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td> 2014-02-06T06:09:44+0000</td>\n",
" <td> 615201382_10152270244311383</td>\n",
" <td> It reminds me of the bizarre double toilets in...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td> 2014-02-04T16:25:01+0000</td>\n",
" <td> 615201382_10152265357156383</td>\n",
" <td> Everyone's happy? Not as suggested in stock pr...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td> 2014-02-01T17:24:20+0000</td>\n",
" <td> 615201382_10152258404711383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td> 2014-01-30T00:04:32+0000</td>\n",
" <td> 615201382_10152252791841383</td>\n",
" <td> freaking awesome. Henry</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td> 2014-01-19T20:11:40+0000</td>\n",
" <td> 615201382_10152231090566383</td>\n",
" <td> Due to runway closure at OAK, Alaska has to la...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td> 2014-01-16T06:48:57+0000</td>\n",
" <td> 615201382_10152223330066383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td> 2014-01-10T07:44:42+0000</td>\n",
" <td> 615201382_10152211081026383</td>\n",
" <td> My oh my. Does foreflight the app know this? W...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td> 2014-01-10T04:21:43+0000</td>\n",
" <td> 615201382_10152210882651383</td>\n",
" <td> crowdsourcing always comes with integrity prob...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td> 2014-01-09T23:43:09+0000</td>\n",
" <td> 615201382_10152210487506383</td>\n",
" <td> Man, I was at those places years ago. How can ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td> 2013-12-29T18:53:34+0000</td>\n",
" <td> 615201382_10152184886461383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td> 2013-12-25T20:01:32+0000</td>\n",
" <td> 615201382_10152175537386383</td>\n",
" <td> Merry Christmas my friends! Thanks for sharing...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td> 2013-12-23T21:33:07+0000</td>\n",
" <td> 615201382_10152170961036383</td>\n",
" <td> Embrace civilization (4G/LTE) after 3-day escape.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td> 2013-12-23T07:49:04+0000</td>\n",
" <td> 615201382_523230197791392</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td> 2013-12-20T20:27:12+0000</td>\n",
" <td> 615201382_10152164760221383</td>\n",
" <td> Let it snow, let it snow, let it snow...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td> 2013-12-19T00:04:31+0000</td>\n",
" <td> 615201382_10152161121676383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td> 2013-12-18T03:43:38+0000</td>\n",
" <td> 615201382_10152159506081383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td> 2013-12-18T03:15:58+0000</td>\n",
" <td> 615201382_376501735820465</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td> 2013-12-16T21:34:28+0000</td>\n",
" <td> 615201382_701178206574019</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td> 2013-12-14T00:46:32+0000</td>\n",
" <td> 615201382_10152149250041383</td>\n",
" <td> Be good to santa: use URL Shortener. Good job ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td> 2013-12-13T06:04:05+0000</td>\n",
" <td> 615201382_10152147542591383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td> 2013-12-11T18:55:32+0000</td>\n",
" <td> 615201382_10152144356726383</td>\n",
" <td> Hate to debug, no matter what language</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td> 2013-12-07T05:15:09+0000</td>\n",
" <td> 615201382_10152133179656383</td>\n",
" <td> yeah, import svm will do;)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td> 2013-12-04T15:27:12+0000</td>\n",
" <td> 615201382_10152126221951383</td>\n",
" <td> yesterday when we talked about bit coin as FX,...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td> 2013-12-04T02:16:14+0000</td>\n",
" <td> 615201382_10152125011516383</td>\n",
" <td> Less and less people show up at class, esp. ev...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td> 2013-11-30T20:47:15+0000</td>\n",
" <td> 615201382_10152117442236383</td>\n",
" <td> Two days in row. Not impressive:(</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td> 2013-11-30T00:48:37+0000</td>\n",
" <td> 615201382_10152115742191383</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td> 2013-11-28T23:05:50+0000</td>\n",
" <td> 615201382_10152113398016383</td>\n",
" <td> Ready for the mountain!</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>50 rows \u00d7 3 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 21,
"text": [
" created_time id \\\n",
"0 2014-04-08T01:36:32+0000 615201382_10152400022766383 \n",
"1 2014-04-01T21:43:39+0000 615201382_10152387632741383 \n",
"2 2014-04-01T03:19:20+0000 615201382_10152386174281383 \n",
"3 2014-03-29T15:13:55+0000 615201382_10152380835206383 \n",
"4 2014-03-29T02:12:39+0000 615201382_10152379917316383 \n",
"5 2014-03-24T17:58:43+0000 615201382_10152371207451383 \n",
"6 2014-03-18T17:03:33+0000 615201382_10152358426636383 \n",
"7 2014-03-13T05:56:58+0000 615201382_10152346519641383 \n",
"8 2014-03-11T05:56:03+0000 615201382_10152342641006383 \n",
"9 2014-03-10T21:17:35+0000 615201382_10152341715456383 \n",
"10 2014-03-09T04:04:31+0000 615201382_10152338035136383 \n",
"11 2014-03-03T18:07:35+0000 615201382_10152325357666383 \n",
"12 2014-02-28T15:53:57+0000 615201382_10152318529656383 \n",
"13 2014-02-28T06:12:32+0000 615201382_10152317794896383 \n",
"14 2014-02-26T05:19:00+0000 615201382_10152313649851383 \n",
"15 2014-02-25T22:32:30+0000 615201382_10152313027176383 \n",
"16 2014-02-23T22:10:00+0000 615201382_10152308461996383 \n",
"17 2014-02-19T23:21:57+0000 615201382_10152300323941383 \n",
"18 2014-02-15T16:31:48+0000 615201382_10152290609641383 \n",
"19 2014-02-15T00:49:40+0000 615201382_10152289283606383 \n",
"20 2014-02-08T06:18:39+0000 615201382_10152274887236383 \n",
"21 2014-02-08T03:00:22+0000 615201382_10152274613241383 \n",
"22 2014-02-07T03:43:03+0000 615201382_10152272546291383 \n",
"23 2014-02-06T06:09:44+0000 615201382_10152270244311383 \n",
"24 2014-02-04T16:25:01+0000 615201382_10152265357156383 \n",
"25 2014-02-01T17:24:20+0000 615201382_10152258404711383 \n",
"26 2014-01-30T00:04:32+0000 615201382_10152252791841383 \n",
"27 2014-01-19T20:11:40+0000 615201382_10152231090566383 \n",
"28 2014-01-16T06:48:57+0000 615201382_10152223330066383 \n",
"29 2014-01-10T07:44:42+0000 615201382_10152211081026383 \n",
"30 2014-01-10T04:21:43+0000 615201382_10152210882651383 \n",
"31 2014-01-09T23:43:09+0000 615201382_10152210487506383 \n",
"32 2013-12-29T18:53:34+0000 615201382_10152184886461383 \n",
"33 2013-12-25T20:01:32+0000 615201382_10152175537386383 \n",
"34 2013-12-23T21:33:07+0000 615201382_10152170961036383 \n",
"35 2013-12-23T07:49:04+0000 615201382_523230197791392 \n",
"36 2013-12-20T20:27:12+0000 615201382_10152164760221383 \n",
"37 2013-12-19T00:04:31+0000 615201382_10152161121676383 \n",
"38 2013-12-18T03:43:38+0000 615201382_10152159506081383 \n",
"39 2013-12-18T03:15:58+0000 615201382_376501735820465 \n",
"40 2013-12-16T21:34:28+0000 615201382_701178206574019 \n",
"41 2013-12-14T00:46:32+0000 615201382_10152149250041383 \n",
"42 2013-12-13T06:04:05+0000 615201382_10152147542591383 \n",
"43 2013-12-11T18:55:32+0000 615201382_10152144356726383 \n",
"44 2013-12-07T05:15:09+0000 615201382_10152133179656383 \n",
"45 2013-12-04T15:27:12+0000 615201382_10152126221951383 \n",
"46 2013-12-04T02:16:14+0000 615201382_10152125011516383 \n",
"47 2013-11-30T20:47:15+0000 615201382_10152117442236383 \n",
"48 2013-11-30T00:48:37+0000 615201382_10152115742191383 \n",
"49 2013-11-28T23:05:50+0000 615201382_10152113398016383 \n",
"\n",
" message \n",
"0 Thanks to Chris Moshe and Lily, I finally make... \n",
"1 I just learn the news from my open data class ... \n",
"2 Thank you all for your warm wishes and your wo... \n",
"3 I'm always curious about where programmers go ... \n",
"4 One thing we can frequently see: Rainbow. On o... \n",
"5 A sad day #RIP It may take years to investigat... \n",
"6 I don't trust goldman but have faith in Musk. ... \n",
"7 Kind of obsessed on this. Pls be aliens. And p... \n",
"8 What do you guys think of the new design? \n",
"9 Glad that they both escape unscratched. @my Sk... \n",
"10 Expecting a wonderful performance from Vienna ... \n",
"11 Honestly, I desperately need a skydiving after... \n",
"12 NaN \n",
"13 Self-destruct dude! Beat iPhone 6 \n",
"14 \"#Datamining is Art\" #quotefromSheyas \n",
"15 Thinking about \"Dark internet\" in House of Car... \n",
"16 Hey family and friends: I'm setting up a group... \n",
"17 NaN \n",
"18 Allow me to forwards it to MS people: Your chi... \n",
"19 Anyone applied Bahamas visa before? Need some ... \n",
"20 My photoshop homework for Valentine's Day! Fee... \n",
"21 NaN \n",
"22 NaN \n",
"23 It reminds me of the bizarre double toilets in... \n",
"24 Everyone's happy? Not as suggested in stock pr... \n",
"25 NaN \n",
"26 freaking awesome. Henry \n",
"27 Due to runway closure at OAK, Alaska has to la... \n",
"28 NaN \n",
"29 My oh my. Does foreflight the app know this? W... \n",
"30 crowdsourcing always comes with integrity prob... \n",
"31 Man, I was at those places years ago. How can ... \n",
"32 NaN \n",
"33 Merry Christmas my friends! Thanks for sharing... \n",
"34 Embrace civilization (4G/LTE) after 3-day escape. \n",
"35 NaN \n",
"36 Let it snow, let it snow, let it snow... \n",
"37 NaN \n",
"38 NaN \n",
"39 NaN \n",
"40 NaN \n",
"41 Be good to santa: use URL Shortener. Good job ... \n",
"42 NaN \n",
"43 Hate to debug, no matter what language \n",
"44 yeah, import svm will do;) \n",
"45 yesterday when we talked about bit coin as FX,... \n",
"46 Less and less people show up at class, esp. ev... \n",
"47 Two days in row. Not impressive:( \n",
"48 NaN \n",
"49 Ready for the mountain! \n",
"\n",
"[50 rows x 3 columns]"
]
}
],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment