Skip to content

Instantly share code, notes, and snippets.

@sneg55
Created August 2, 2016 23:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sneg55/b589dcf5b3ed9815b011912910473832 to your computer and use it in GitHub Desktop.
Save sneg55/b589dcf5b3ed9815b011912910473832 to your computer and use it in GitHub Desktop.
from collections import defaultdict
#declare the dictionary in dictionary for names counting in accordance to tld
counter = defaultdict(lambda: defaultdict(int))
#fetching the data about all domain names and counting the number of names in accordance with names lenghts
with open('gtld.csv','rb') as f:
for x in f:
#to exclude more than second level domains like co.uk and so on, just because we can
if x.count('.') == 1:
fname = x.split(".",1)[1].rstrip('\r\n')
counter[fname][x.rfind('.')] += 1
import pandas as pd
import numpy as np
count_df = pd.DataFrame(counter)
count_df = count_df.ix[1:15, :]
#swap none values with zeros for future processing
count_df[np.isnan(count_df)] = 0
#let's check info about how many gTLDs we've got in total
print len(counter)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment