Skip to content

Instantly share code, notes, and snippets.

@twneale
Last active August 29, 2015 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save twneale/9085174 to your computer and use it in GitHub Desktop.
Save twneale/9085174 to your computer and use it in GitHub Desktop.
Basic uscode citation-to-neoid function.
import re
def citations(text):
rgx = u'(\\d+)\\s+U\\.?S\\.?C\\.?\\s*\xa7*\\s\\s*([\\d\\w\\-\\\u2013\\.\u2013]+)(\\([\\d\\w\\-\\\u2013\\.\u2013()]+\\))*'
matches = []
for match in re.finditer(rgx, text):
title, section, path = match.groups()
section = section.strip(u'\u2013.- ')
start, end = match.span()
ident = '/us/usc/t%s/s%s' % (title, section)
if path:
pathsegs = path.strip('()')
pathsegs = re.split(r'[ ()]', pathsegs)
pathsegs = filter(None, pathsegs)
ident += '/' + '/'.join(pathsegs)
matches.append((start, end, ident))
return matches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment