Skip to content

Instantly share code, notes, and snippets.

@flavioamieiro
Created March 20, 2012 03:08
Show Gist options
  • Save flavioamieiro/2130649 to your computer and use it in GitHub Desktop.
Save flavioamieiro/2130649 to your computer and use it in GitHub Desktop.
Script para buscar o título da página e formatar um tweet
#!/usr/bin/python3
#-*- coding: utf-8 -*-
import sys
import re
import urllib.request
from html.parser import HTMLParser
def usage():
sys.stderr.write("usage: {0} <url>\n".format(sys.argv[0]))
sys.exit(1)
try:
url = sys.argv[1]
except IndexError:
usage()
title_regex = b'<title.*>(.*?)<\/title>'
content = urllib.request.urlopen(url).read()
# re.DOTALL makes . match newlines
matches = re.search(title_regex, content, re.IGNORECASE | re.DOTALL)
title = matches.group(1).decode('utf-8').replace('\n', '').strip()
title = HTMLParser().unescape(title)
endchar = '\n' if sys.stdout.isatty() else ''
sys.stdout.write('{0} - {1}{2}'.format(title, url, endchar))
@turicas
Copy link

turicas commented Sep 25, 2012

urllib.request.urlopen(url).read() poderia ser urllib.urlopen(url).read()
O requests já resolve alguns dos probleminhas, como detetar encoding e tal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment