Skip to content

Instantly share code, notes, and snippets.

@mortymacs
Created May 25, 2013 07:05
Show Gist options
  • Save mortymacs/5648219 to your computer and use it in GitHub Desktop.
Save mortymacs/5648219 to your computer and use it in GitHub Desktop.
Simple regex for fetching href and title attributes of <a> tag
import re
pattern = re.compile('.*<a.*href="(?P<link>[a-zA-Z0-9\+\/\.\?#:_-]+)".*>(?P<title>\w+)</a>.*')
data = '<div calss="bla"><a href="http://link1.domain/index.php" class="test">Link1</a></div>'
print pattern.findall(data)
##output : [('http://link1.domain/index.php', 'Link1')]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment