Skip to content

Instantly share code, notes, and snippets.

@luztak
Created July 2, 2012 18:56
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save luztak/3034918 to your computer and use it in GitHub Desktop.
Save luztak/3034918 to your computer and use it in GitHub Desktop.
RegEx for [at] and urls.
import re
at_user_filter = re.compile(r'(?:^|\W)@(\w+)')
email_filter = re.compile('(\w{1,63})@([A-Za-z0-9\.\-] ).(com|net|org|me|in|fm|co|biz|info|mobi|cc)')
#you can add suffixes by asking Google for domain suffix information.
url_filter = re.compile(u'((http(s|)|ftp)://|)(\w{0,}\.|)(\w{1,63}).(\w{2,4})((/((.*)|)|))')
@whtsky
Copy link

whtsky commented Jul 16, 2012

http://codepad.org/hRXp9ZOw

看起来@不能很好的处理字符串结尾处的@.

@whtsky
Copy link

whtsky commented Jul 16, 2012

http://codepad.org/SO6tQ8Mr

而且会多取一个.

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky 也许可以用(原来的|)来弥补.至于第二个……暂时没想到.Unicode还能这么被截啊.要么a[x:y-1]?这样两个问题都好了.

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky 也许可以用(原来的|)来弥补.至于第二个……暂时没想到.Unicode还能这么被截啊.要么a[x:y-1]?这样两个问题都好了.

@whtsky
Copy link

whtsky commented Jul 16, 2012

re都是你自己写的么?我去翻翻我以前写的@用户吧...
(s|)为什么不写成s?呢?

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky - -||忘了……
这几个的确是我自己写的.边写边codepad测试.

@whtsky
Copy link

whtsky commented Jul 16, 2012

http://codepad.org/odRA3wrM
还有点小问题,不过可以在中文社区用了。

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky 英文社区就自带分词了好么……
说实话冒号是干嘛的我还真不知道……刀是字符串结束吧?

@whtsky
Copy link

whtsky commented Jul 16, 2012

http://codepad.org/VCRL7ECZ 这样就完美了。^匹配字符串开始,$匹配字符串结束。

@whtsky
Copy link

whtsky commented Jul 16, 2012

http://codepad.org/g32CQYSo 稍微精简了一下,后天部署到PBB里。

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky ?:是啥……
btw,非目标周围的括号一般可以不要.不然你调用还要[0].我调试出来的教训啊.

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky mail/url有问题么?怎么感觉有了url完全可以不要md的inline link了.

@whtsky
Copy link

whtsky commented Jul 16, 2012

擦,?:这个我还真解释不上来,爪机无力自行百度吧...
mail你歧视了一大片非主流后缀,url我看那re写的太蛋疼就没看...这括号跟lisp似的...

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky - -||那邮箱的domain suffix直接跟url一样处理吧……至于url,我用?代替|)试试.

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky url实在晕……先不管了.

@luztak
Copy link
Author

luztak commented Jul 16, 2012

@whtsky ?:明白了……

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment