Some useful entity RegEx patterns
by Christopher de Beer
27 July 2010
They're not perfect,
i'll keep fixing up the leaks as I go. (next... adding lookarounds)
He'll, I'll ,She'd etc : ([a-zA-Z]+['’][a-z]{1,3})
Social-media, T-shirt, movie-star, 2010-02-04 : ([a-zA-Z0-9]+(\-[0-9a-zA-Z]+)+)
Acronyms: U.S. , L.S.D. : ([A-Z]\.)+([A-Z]\.)+
20:32:00am, 8,000km, 20:32 : ([0-9]+[.,:])+[0-9]+(km|m|am|pm|cm|mm|px|pt|ft|)*
2010-02-04, 27/12/2010 : [0-9]{1,4}[/\-][0-9]{1,4}[/\-][0-9]{1,4}
@twitterName : [^a-z,^A-Z,^0-9]@([a-z,A-z,-9]+) , @twitterName : (?:twitter\.com\/([a-zAZ0-9]+))|(?:[^a-z,^A-Z,^0-9]@([a-z,A-z,-9]+))
