Skip to content

Instantly share code, notes, and snippets.

@gmag11
Created December 31, 2019 13:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gmag11/3ca0c191a8986a0b90192aa593e979a5 to your computer and use it in GitHub Desktop.
Save gmag11/3ca0c191a8986a0b90192aa593e979a5 to your computer and use it in GitHub Desktop.
Hash processing from www.tweetstats.com
from xml.dom import minidom
import re
import math
mydoc = minidom.parse('TweetCloud.xml')
items = mydoc.getElementsByTagName('a')
hashtags = []
for item in items:
hashtag_info = item.attributes['onmouseover'].value
re_search = re.search("'(.*)'", hashtag_info).group(1).split(' ')
re_search[0] = int(math.sqrt(int(re_search[0])))
hashtags.append(re_search)
hashtags.sort(reverse=True)
hashtag_weights = {}
for hashtag in hashtags:
hashtag_weights[hashtag[3]] = hashtag[0]
ht_file = open ('TweetCloud.txt', 'w')
for hashtag_weight in hashtag_weights:
content = str(hashtag_weights[hashtag_weight]) + ' ' + hashtag_weight
print(content)
ht_file.write(content+'\n')
ht_file.flush()
ht_file.close()
@gmag11
Copy link
Author

gmag11 commented Dec 31, 2019

Go to http://www.tweetstats.com/graphs/<your_twitter_id>/zoom/2019#tcloud and capture hashtag section in a file called TweetCloud.xml.

This code generates a file TweetCloud.txt with your hashtags and weights to be imported in https://www.wordclouds.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment