Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save antoniofrignani/1c26a710858e1850f0ed8e4c85eb495f to your computer and use it in GitHub Desktop.
Save antoniofrignani/1c26a710858e1850f0ed8e4c85eb495f to your computer and use it in GitHub Desktop.
PHP Twitter Hashtag Validation Regex
<?php
/**
* PHP Regex to validate a Twitter hashtag
*
* Useful for validating a text input (an HTML form in your CMS or custom application) that must be a valid Twitter hashtag.
* Valid examples: #a, #_, #_1, #_a, #1a, #áéìôü, #123hàsh_täg446
* Invalid examples: #1, ##hashtag, #hash-tag, #hash.tag, #hash tag, #hashtag!, (any hashtag that is more than 140 characters long, hash symbol included)
*
* Regex explanation:
* First, the lookahead assertion (?=.{2,140}$) checks the minimum and max length, as explained here http://stackoverflow.com/a/4223213/1441613
* A hash symbol must be the first character. The allowed values for the hash symbol can be expressed with any of the following subpatterns: (#|\\uff0){1}, (#|\x{ff03}){1}, or (#|#){1}.
* A hashtag can contain any UTF-8 alphanumeric character, plus the underscore symbol. That's expressed with the character class [0-9_\p{L}]*, based on http://stackoverflow.com/a/5767106/1441613
* A hashtag can't be only numeric, it must have at least one alpahanumeric character or the underscore symbol. That condition is checked by ([0-9_\p{L}]*[_\p{L}][0-9_\p{L}]*), similar to http://stackoverflow.com/a/1051998/1441613
* Finally, the modifier 'u' is added to ensure that the strings are treated as UTF-8.
*
* More info:
* https://github.com/twitter/twitter-text-conformance
* https://github.com/nojimage/twitter-text-php
* https://github.com/ngnpope/twitter-text-php
*/
preg_match('/^(?=.{2,140}$)(#|\x{ff03}){1}([0-9_\p{L}]*[_\p{L}][0-9_\p{L}]*)$/u', '#hashtag');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment