Skip to content

Instantly share code, notes, and snippets.

@scrapehero
Last active June 2, 2021 07:02
Show Gist options
  • Save scrapehero/d0305d8d15b0e447dcefdf548a9846e9 to your computer and use it in GitHub Desktop.
Save scrapehero/d0305d8d15b0e447dcefdf548a9846e9 to your computer and use it in GitHub Desktop.
How to scrape Historical Tweet Data from Twitter using Web Scraper Extension
{
"_id":"twitter_feed",
"startUrl":[
"https://twitter.com/search?l=&q=web%20scraping%20since%3A2018-10-01%20until%3A2018-10-05&src=typd&lang=en"
],
"selectors":[
{
"id":"tweet",
"type":"SelectorElementScroll",
"parentSelectors":[
"_root"
],
"selector":"div.tweet",
"multiple":true,
"delay":"2000"
},
{
"id":"handle",
"type":"SelectorText",
"parentSelectors":[
"tweet"
],
"selector":"span.username",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"name",
"type":"SelectorText",
"parentSelectors":[
"tweet"
],
"selector":"strong.fullname",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"content",
"type":"SelectorText",
"parentSelectors":[
"tweet"
],
"selector":".tweet-text",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"replies",
"type":"SelectorText",
"parentSelectors":[
"tweet"
],
"selector":"div.ProfileTweet-action.ProfileTweet-action--reply span.ProfileTweet-actionCountForPresentation",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"retweets",
"type":"SelectorText",
"parentSelectors":[
"tweet"
],
"selector":"div.ProfileTweet-action.ProfileTweet-action--retweet button.ProfileTweet-actionButton span.ProfileTweet-actionCountForPresentation",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"favorites",
"type":"SelectorText",
"parentSelectors":[
"tweet"
],
"selector":"div.ProfileTweet-action.ProfileTweet-action--favorite button.ProfileTweet-actionButton span.ProfileTweet-actionCountForPresentation",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"unix_timestamp",
"type":"SelectorElementAttribute",
"parentSelectors":[
"tweet"
],
"selector":"span._timestamp",
"multiple":false,
"extractAttribute":"data-time-ms",
"delay":0
},
{
"id":"published_date",
"type":"SelectorElementAttribute",
"parentSelectors":[
"tweet"
],
"selector":".time a.tweet-timestamp",
"multiple":false,
"extractAttribute":"title",
"delay":0
},
{
"id":"url",
"type":"SelectorElementAttribute",
"parentSelectors":[
"tweet"
],
"selector":"a.tweet-timestamp",
"multiple":false,
"extractAttribute":"href",
"delay":0
}
]
}
@afghanpasha
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment