Skip to content

Instantly share code, notes, and snippets.

@stantonk
Last active July 11, 2022 22:36
Show Gist options
  • Save stantonk/b0a937ca9c035a83b14c to your computer and use it in GitHub Desktop.
Save stantonk/b0a937ca9c035a83b14c to your computer and use it in GitHub Desktop.
Updated list of valid TLDs (Top Level Domains). Python script grabs the IANA database and formats it into a set datatype.
# requirements.txt
# pip install requests
# pip install BeautifulSoup4
import codecs
import requests
from bs4 import BeautifulSoup
PER_LINE = 12
text = requests.get('http://www.iana.org/domains/root/db').text
soup = BeautifulSoup(text)
x = soup.find('table', {'id': 'tld-table'})
tlds = [anchor.text for anchor in x.find_all('a')]
with codecs.open('legal_tlds.py', 'w', encoding='utf8') as f:
f.write(u'# -*- coding: utf-8 -*-\n')
f.write(u'TLDS = set([\n')
for i, tld in enumerate(tlds):
print tld.encode('utf8')
tld = tld.lstrip('.')
if i % PER_LINE == 0:
f.write(' ')
f.write(u'\'%s\',' % tld)
if i % PER_LINE == (PER_LINE - 1):
f.write('\n')
else:
f.write(' ')
f.write(u'])\n')
@Jiali-Qi
Copy link

print tld.encode('utf8')
^
SyntaxError: invalid syntax

Do you know why the error occurs?

@stantonk
Copy link
Author

it's python 2 syntax

@apla
Copy link

apla commented Oct 7, 2021

Please don't use script above. Resulting list will contains test, retired and unassigned TLDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment