Skip to content

Instantly share code, notes, and snippets.

@mnbi
Created September 15, 2010 10:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mnbi/580543 to your computer and use it in GitHub Desktop.
Save mnbi/580543 to your computer and use it in GitHub Desktop.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# get_tag_and_class.rb: extract tag and class names from a HTML document.
import sys
import re
regex = re.compile('<(\w+)[^<>]*class=[\'\"]([^\'\"]+)[\'\"][^<>]*>')
result = []
line_number = 0
for line in sys.stdin:
line_number = line_number + 1
offset = 0
while offset < (len(line) - 1):
md = regex.search(line, offset)
if md:
result.append((line_number, md.group(1), md.group(2)))
offset = md.end(0) + 1
else:
break
for t in result:
line = "%d: %s.%s\n" % (t[0], t[1], t[2])
sys.stdout.write(line)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment