Skip to content

Instantly share code, notes, and snippets.

@simoncos
Created October 9, 2017 06:28
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save simoncos/0162e19297859b3c7b900b6fba2b0db6 to your computer and use it in GitHub Desktop.
Save simoncos/0162e19297859b3c7b900b6fba2b0db6 to your computer and use it in GitHub Desktop.
recognize url in html and extract the id part
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
p = re.compile('("http://hk\.centadata\.com/ccichart/estate_info\.aspx\?id=)([0-9]+)')
s = '<a href="http://hk.centadata.com/ccichart/estate_info.aspx?id=008600" target="_top"> sdfasdfasd <a href="http://hk.centadata.com/ccichart/estate_info.aspx?id=05600" target="_top">'
for match in re.findall(p,s):
print(match[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment