Skip to content

Instantly share code, notes, and snippets.

@justinlittman
justinlittman / weibo.json
Created May 12, 2017 14:03
A sample weibo
{
"created_at": "Thu May 11 15:43:13 +0800 2017",
"id": 4106245138832598,
"mid": "4106245138832598",
"idstr": "4106245138832598",
"text": "#民生服务# 【中央机关公开遴选选调360名公务员】2017年中央机关公开遴选和公开选调公务员工作今日开始报名。此次公开遴选和公开选调共有56个中央机关参加,计划选拔360名公务员。[心]报名时间截止5月22日18:00。[心]笔试时间为2017年6月25日,[心]考试地点设在北京、上海、西安、兰州等17个城市。详情 ​",
"textLength": 314,
"source_allowclick": 0,
"source_type": 1,
"source": "<a href=\"http://app.weibo.com/t/feed/6ghA0p\" rel=\"nofollow\">搜狗高速浏览器</a>",
{
"contributors": null,
"truncated": true,
"text": "@justin_littman Some of the changes went live. This is going to be an example for a blog post I'm writing that will… https://t.co/Hq4h61I3FX",
"is_quote_status": false,
"in_reply_to_status_id": 839526473534959600,
"id": 847804888365117400,
"favorite_count": 0,
"source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
"retweeted": false,
{
"contributors": null,
"truncated": false,
"is_quote_status": false,
"in_reply_to_status_id": 839526473534959600,
"id": 847804888365117400,
"favorite_count": 0,
"full_text": "@justin_littman Some of the changes went live. This is going to be an example for a blog post I'm writing that will be available at: https://t.co/MfQy5wTWBc",
"source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
"retweeted": false,
{
"contributors": null,
"truncated": true,
"text": "@justin_littman Some of the changes went live. This is going to be an example for a blog post I'm writing that will… https://t.co/Hq4h61I3FX",
"is_quote_status": false,
"in_reply_to_status_id": 839526473534959600,
"id": 847804888365117400,
"favorite_count": 0,
"source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
"retweeted": false,
@justinlittman
justinlittman / urls.txt
Created July 11, 2016 15:59
List of URLs extract from #PulseNightclub twitter collection
This file has been truncated, but you can view the full file.
33417 https://trumprally.org/
23026 http://www.snappytv.com/tc/2149352
21674 http://www.snappytv.com/tc/2148868
10665 https://www.oneblood.org/
10339 https://amp.twimg.com/v/e902fbec-bc94-4d9d-bf12-182a7f617057
10289 https://www.gofundme.com/PulseVictimsFund
8949 http://www.dailymail.co.uk/news/article-3638622/Orlando-clubber-30-sent-heartbreaking-texts-mother-trapped-bathroom-Pulse-massacre-confirmed-one-50-killed.html?ito=social-twitter_dailymailus
8764 http://www.snappytv.com/tc/2137321
8655 http://www.cbsnews.com/live/
7956 https://amp.twimg.com/v/b533d5ff-27f5-435d-9111-c0598b7e709c
@justinlittman
justinlittman / WARC request record
Created May 21, 2016 00:03
WARC record header and HTTP message from a WARC request record.
WARC/1.0
WARC-Type: request
WARC-Record-ID: <urn:uuid:d150a270-041b-49d5-845a-58f0f334ea80>
WARC-Date: 2016-05-20T19:28:18Z
WARC-Target-URI: https://api.twitter.com/1.1/statuses/user_timeline.json?count=200&max_id=682336123457155073&user_id=216776631
WARC-Concurrent-To: <urn:uuid:6a8b0aaa-d40c-47ac-a4d0-35cddb7cbe83>
WARC-Block-Digest: sha1:05a8973dda59c50b8c0c1343d2286443a1352387
Content-Type: application/http;msgtype=request
Content-Length: 575
@justinlittman
justinlittman / WARC response record
Last active June 13, 2016 15:50
WARC record header and HTTP message header from a WARC response record.
WARC/1.0
WARC-Type: response
WARC-Record-ID: <urn:uuid:6a8b0aaa-d40c-47ac-a4d0-35cddb7cbe83>
WARC-Date: 2016-05-20T19:28:18Z
WARC-Target-URI: https://api.twitter.com/1.1/statuses/user_timeline.json?count=200&max_id=682336123457155073&user_id=216776631
WARC-IP-Address: 199.16.156.199
Content-Type: application/http;msgtype=response
Content-Length: 12584
WARC-Block-Digest: sha1:d4f5ddcfbe1c814fdee445ff145abebf22411bf8
WARC-Payload-Digest: sha1:c952c2176ccf15f7ecb604be6b58390491bbfe40
{
"contributors": null,
"truncated": false,
"text": "We have got to tell corporate America that if they want us to buy their products, they damn well better manufacture them in America.",
"is_quote_status": false,
"in_reply_to_status_id": null,
"id": 733689250588168192,
"favorite_count": 3003,
"source": "<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck</a>",
"retweeted": false,
@justinlittman
justinlittman / weibo.json
Created March 18, 2016 13:30
A post from Sina Weibo
{
"reposts_count": 543,
"biz_feature": 0,
"truncated": false,
"text": "\u771f\u597d //@\u7cbd\u7cbd\u7cbd\u7cbd\u7cbd\u7cbd\u7cbd:\u7fa1\u6155 //@\u5c38\u59d0:\u771f\u597d\u554a\uff0c\u591a\u5c11\u4eba\u66fe\u548c\u597d\u59d0\u59b9\u8bb8\u8fc7\u8fd9\u6837\u7684\u613f\u671b\uff0c\u6709\u51e0\u4e2a\u771f\u6b63\u5b9e\u73b0\u4e86\u5462 //@\u9ec4\u660f\u5c11\u5973\u82b1\u8a00\u521d:\u7fa1\u6155 //@\u5c81\u534e\u897f\u98ce:\u7fa1\u6155",
"pid": 3952964537875184,
"visible": {
"type": 0,
"list_id": 0
},
@justinlittman
justinlittman / output.txt
Created October 14, 2015 19:31
Recording Flickr API calls to WARC using warcprox
2015-10-14 15:24:45,556 90037 INFO MainThread warcprox.dedup.DedupDb.__init__(dedup.py:25) creating new deduplication database ./warcprox-dedup.db
2015-10-14 15:24:45,563 90037 INFO MainThread warcprox.warcprox.WarcProxy.server_activate(warcprox.py:265) WarcProxy listening on 127.0.0.1:8000
2015-10-14 15:24:45,564 90037 INFO MainThread warcprox.warcwriter.WarcWriter.__init__(warcwriter.py:50) warc destination directory ./warcs doesn't exist, creating it
2015-10-14 15:24:45,564 90037 INFO MainThread warcprox.controller.WarcproxController.run_until_shutdown(controller.py:58) SIGTERM will initiate graceful shutdown
2015-10-14 15:24:45,565 90037 INFO WarcWriterThread warcprox.warcwriter.WarcWriterThread.run(warcwriter.py:273) WarcWriterThread starting, directory=/Users/justinlittman/Data/sfm3/blog_examples/warcs gzip=False rollover_size=1000000000 rollover_idle_time=None prefix=WARCPROX port=8000
2015-10-14 15:24:50,601 90037 INFO Thread-1 warcprox.warcprox.WarcProxyHandler.log_message(mitmproxy.py:140) WarcProxy