Skip to content

Instantly share code, notes, and snippets.

@mylamour
Last active January 18, 2017 13:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mylamour/e1422aa5e86db54734fdf1d357210f5b to your computer and use it in GitHub Desktop.
Save mylamour/e1422aa5e86db54734fdf1d357210f5b to your computer and use it in GitHub Desktop.
Regex USEFUL Regex FROM : http://www.regexr.com/v1/RegExr.php , different programe language was uncompelete support the regex features, you should take care about it.
  • name="UniProt+Fastaheader"
`/^>[^\|]*\|([^\|]*)\|.*OS=([^=]*).*GN=([^ ]*).*$/g`

Matches UniProt accessionnumber, genename and organism in a UniProt fasta header

  • name="E-mail+validator+for+International+Domain"
`/^[\w-\._\+%]+@(?:[\w-]+\.)+[\w]{2,6}$/g`

It's an E-mail validator that supports International Domains with Country codes

  • name="0x+HexColor"
`/\b0x(?:[0-9A-Fa-f]{6}|0-9A-Fa-f]{8})\b/g`

Matches a 24 or 32 bit hex color in the format 0xFFFFFF or 0xFFFFFFFF.

  • name="Byte"
`/[01]{8}/g`

To find bytes in a string. Bytes consists of 8 digits of 0s and 1s

  • name="Telephone+Digits+Pattern"
`/^(\(?(\d{3})\)?\s?-?\s?(\d{3})\s?-?\s?(\d{4}))$/gm`

Validation for different ways of writing 10 digit telephone number

  • name="UK+Driving+licence+-+v.2"
`/([a-zA-Z]{2}[9]{3}|[a-zA-Z]{3}[9]{2}|[a-zA-Z]{4}[9]{1}|[a-zA-Z]{5})\d[0156]\d([0][1-9]|[12]\d|3[01])\d([a-zA-Z]{2}|[a-zA-Z][9])[789][a-zA-Z]{2}/g`

a code to check the legitimacy of UK driving licence numbers Version 2

  • name="UK+Postcode+Validation"
`/\b(GIR ?0AA|SAN ?TA1|(?:[A-PR-UWYZ](?:\d{0,2}|[A-HK-Y]\d|[A-HK-Y]\d\d|\d[A-HJKSTUW]|[A-HK-Y]\d[ABEHMNPRV-Y])) ?\d[ABD-HJLNP-UW-Z]{2})\b/gim`

This regex matches all valid, current UK Postcodes, including Girobank and non-geographic postcodes, irrespective of whether they contain a space. It does not include overseas territories. Adapted from the BS7666 postcode rules at: http://www.cabinetoffice.gov.uk/govtalk/schemasstandards/e-gif/datastandards/address/postcode.aspx

  • name="Phone+Number+Match"
`/[(]?\d{3}[)]?\s?-?\s?\d{3}\s?-?\s?\d{4}/`

Matches all 10-digit phone numbers.

  • name="Thousand+Separator"
`/\d{1,3}(?=(\d{3})+(?!\d))/g`

Separates every 3 digits. Replace with a period, comma, space, etc.

  • name="Hex+Color+(3+or+6+hexdigit,+#+optional)"
`/^#?([0-9a-f]{3}){1,2}$/i`

Matches valid hexadecimal colors, 3 or 6 hexdigits only. # sign is optional. Matches both lower and upper case.

  • name="Find+Hex+Val+without+(00|FF)"
`/\b((?!(?>(00)|(FF)))[0-9A-F]{2})\b/g`

Find Hex Val without (00|FF). Separators not needed.

  • name="Skype+Username"
`/[a-zA-Z][a-zA-Z0-9]{5,31}/g`

Validates usernames for Skype's format following these rules:

  • name="dns+validation"

(a-zA-Z0-9?.)+[a-zA-Z]{2,6} [Matches]

  • name="(?<=href=")[^]+?(?=")"
`/(?<=href\=")[^]+?(?=")/g`

Estrapola i link dentro gli href

  • name="SQL+to+Java+String"
`/^\s*(.*?);?\s*$/gm`

Wraps a SQL string with double quotes and concatenates them together to form a valid Java String.

  • name="XML/HTML+tags"
`/<(?:([a-zA-Z\?][\w:\-]*)(\s(?:\s*[a-zA-Z][\w:\-]*(?:\s*=(?:\s*"(?:\\"|[^"])*"|\s*'(?:\\'|[^'])*'|[^\s>]+))?)*)?(\s*[\/\?]?)|\/([a-zA-Z][\w:\-]*)\s*|!--((?:[^\-]|-(?!->))*)--|!\[CDATA\[((?:[^\]]|\](?!\]>))*)\]\])>/g`

Version 1.0. Supports:

  • name="URL+Similar"
`/[-a-zA-Z0-9@:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)?/gi`

Matches things that look like URLs, even small things.

  • name="Get+Text+Between+HTML+Tags+with+attributes"
`/(?<=[>])[^<>]+(?=[<])/g`

Will grab text between 2 HTML tags(ex. will get "hello" hello)

  • name="Skype+Username+2.0"
`/[a-zA-Z][a-zA-Z0-9.\-_]{5,31}/g`

UPDATE:

  • name="Email+Validator"

="3.27896" rated="89" author="Robert" dateAdded="2013-04-17" categoryID="4"> /[a-zA-Z0-9-]{1,}@([a-zA-Z\.])?[a-zA-Z]{1,}\.[a-zA-Z]{1,4}/gi

Privet Russkie)) Kto uchilsya u Evgenia Popova stavte layk!! Regulyarka dlya poiska email.

  • name="Non-word+chars"
`/[^\w \xC0-\xFF]/g`

everything that isn't an alphanumeric char [\w] or a char with accent [\xCO-\xFF] or a space [ ] char.

  • name="Image+URL+Match+(fixed)"
`/(http|https):\/\/(www\.)?[\w-_\.]+\.[a-zA-Z]+\/((([\w-_\/]+)\/)?[\w-_\.]+\.(png|gif|jpg))/g`

Matches a image url/link like: http://example.com/image.png

  • name="Advanced+URL+v5"
`/([a-z0-9_\-]{1,5}:\/\/)?(([a-z0-9_\-]{1,}):([a-z0-9_\-]{1,})\@)?((www\.)|([a-z0-9_\-]{1,}\.)+)?([a-z0-9_\-]{3,})(\.[a-z]{2,4})(\/([a-z0-9_\-]{1,}\/)+)?([a-z0-9_\-]{1,})?(\.[a-z]{2,})?(\?)?(((\&)?[a-z0-9_\-]{1,}(\=[a-z0-9_\-]{1,})?)+)?/gi`

RegExp, that splits your URL into protocoll, user, password, subdomain, domain, topleveldomain, path, filename, filetype and GET-parameters (with/without value).

  • name="Google+Docs+Viewer+link+creator"
`/<[ ]*a.*href="(.*\.pdf)".*>.*<[ ]*/a[ ]*>/g`

Adds a google docs viewer link to anchors with .PDF in their href.

  • name="CSV+Grouper:+""+or+,+inside+"string""
`/(?:"((?:[^"]|"")*+)"|([^"\,\r\n]*+))(,|\r\n?|\n|$)/g`

Matches and groups:

  • name="ip+address"
`/(([2]([0-4][0-9]|[5][0-5])|[0-1]?[0-9]?[0-9])[.]){3}(([2]([0-4][0-9]|[5][0-5])|[0-1]?[0-9]?[0-9]))/gi`

ip address

  • name="Multiline+Comment"
`/\/\*[\s\S]*?\*\//gm`

Will match all multiline (/* */) comments.

  • name="Strong+password+validator"
`/^(?=[\x21-\x7E]*[0-9])(?=[\x21-\x7E]*[A-Z])(?=[\x21-\x7E]*[a-z])(?=[\x21-\x7E]*[\x21-\x2F|\x3A-\x40|\x5B-\x60|\x7B-\x7E])[\x21-\x7E]{6,}$/g`
  • minimum one [a-z]
  • name="SRT+subtitles+parser"
`/([\d]+)\r([\d]{2}:[\d]{2}:[\d]{2}),([\d]{3})[\s]*-->[\s]*([\d]{2}:[\d]{2}:[\d]{2}),([\d]{3})\r(([^\r]+(\r|$))+)/g`

Parse an SRT subtitle file and extract each caption.

  • name="Double+Quoted+String"
`/"(?:\.|(\\\")|[^\""\n])*"/g`

Will match any string in double quotes.

  • name="Remove+BBML"
`/\[.*?\]/g`

Remove BBML (Bulletin Board Markup Language)

  • name="CSS+Property+Match"
`/background[\s]?\:([\s\S]*?(?=([;|\}])))/gm`

Search any CSS property and the value returned in any Stylesheet or embedded CSS.

  • name="Parse+all+inline+JS+(<script>)+from+HTML"
`/<\s*script.*?>(.*?)<\/\s*?script[^>\w]*?>/gis

This pattern will find all inline JS code within the current HTML page.

  • name="Optimize+SVG+coordinates+V2"
`/(\d*\.\d{3})\d*/g`

Reduces the number of deciaml places in SVG coordinates. Usefull for compressing SVGs.

  • name="JSON+parser+2"
`/("([^"]|(?<=\\)")*")|-?\d+(.?\d+)?(e-?\d+)?|(\[((?R)(\s*,\s*(?R))*)?])|({\s*((?1)\s*:\s*(?R)(\s*,\s*(?1)\s*:\s*(?R))*)?\s*})|null|false|true/g`

Accepts valid JSON-strings

  • name="Windows+File+Path+Validation"
`/^(([a-zA-Z]{1}:|\\)(\\[^\\/<>:\|\*\?\"]+)+\.[^\\/<>:\|]{3,4})$/i`

Validates a windows file path allowing for both UNC and Standard windows Drive references. File allowed to have an extention length of 3 to 4 chars.

  • name="YouTube+video+ID+(improved+&+fixed)"
`/(?<=v(\=|\/))([-a-zA-Z0-9_]+)|(?<=youtu\.be\/)([-a-zA-Z0-9_]+)/gm`

This will get the ID for YouTube links, written in 3 different formats, and even if the "v=" it's not right after the slash.

  • name="Two+decimals+number"
`/^([0-9]+(\.?[0-9]?[0-9]?)?)/g`

Usefull for texinput number restriction

  • name="YouTube+ID+(works+all)"
`/(?<=\?v=)([a-zA-Z0-9_-])+/g`

Select the video ID from YouTube. Works with all

  • name="Strict+Canadian+Postal+Code"
`/([a-z])\d([a-z])(.?)\d([a-z])\d/g`

must be letters and numbers.

  • name="Image+Url"
`/(((http://www)|(http://)|(www))[-a-zA-Z0-9@:%_\+.~#?&//=]+)\.(jpg|jpeg|gif|png|bmp|tiff|tga|svg)/g`

Captures image url

  • name="JavaScript+Floating+Number"
`/[+-]?(?=\.\d|\d)(?:\d+)?(?:\.?\d*)(?:[eE][+-]?\d+)?/g`

Integer Numbers: 1 2 3 987 +4 -8

  • name="Replace+&+if+no+valid+entity"
`/&(?!([\w\n]{2,7}|#[\d]{1,4});)/g`

This RE replaces all & characters which are not part of a valid HTML entity with &

  • name="Date+Validator+DD/MM/YYYY+includes+leap+years"

^(((0[1-9]|[12][0-9]|3[01])- /.|(0[1-9]|[12][0-9]|30)- /.|(0[1-9]|1\d|2[0-8])[- /.]02)[- /.]\d{4}|29[- /.]02- /.)$ This will match valid dates in the format DD/MM/YYYY.

  • name="CSV+Split+(MS+Excel+dialect)"
`/(?:,|^)([^",]+|"(?:[^"]|"")*")?/g`

Seperate fields in an MS Excel dialect CSV. O'Reilly's 'Regular Expressions Cookbook', p.466 and available from http://oreilly.com/catalog/9780596520694/preview, was a v.good starting point.

  • name="US+Numbers+and+Money"
`/((\$?(([0-9]{0,1})?\.[0-9]{1,2}))|(\$?([1-9]{1}[0-9]{0,2}([,][0-9]{3})*)(\.[0-9]{1,2})?))/g`

$ is optional.

  • name="Better+Canadian+Postal+Code+Validator"
`/^[ABCEGHJ-NPRSTVXY]{1}\d{1}[A-Z]{1}\s?\d{1}[A-Z]{1}\d{1}$/gim`

Case-insensitive Canadian postal code validator.

  • name="URL+validator(with+subdomains)"
`/((https?|ftp)://)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?/gm`

This will match urls of following types:-

  • name="ip+address:++extra+preceding+0s+allowed"
`/([2]([5][0-5]|[0-4][0-9])|[0-1]?[0-9]{1,2})(\.([2]([5][0-5]|[0-4][0-9])|[0-1]?[0-9]{1,2})){3}/g`

Rule 1:

  • name="All+HTML+tags"
`/</?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?>/g`

Complete and bullet-proof HTML tags detection.It's not mine, it was posted by Phil Haack on his blog http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx

  • name="Dutch+Postal+Code"
`/[1-9]\d{3}\s?[a-zA-Z]{2}/g`

Matches Dutch postal codes, with or without single space and with lower- or uppercase letters.

  • name="METAR+Extractor"
`/^(METAR |SPECI )?([A-Z]{4})(?: ([0-3]\d)([012]\d)([0-5]\d)Z)(?: (COR))?(?: (\d{3}|VRB)(\d?\d{2})G?(\d?\d{2})?(KT|MPH|MPS|KPH)(?:( \d{3})V(\d{3}))?)(?: (?:(?:(\d?\d ?)?(M?[13]/[24])?(SM))|(\d{4}))) ?(?:(?:(.*?)(RMK)(.*))|(.*))/gm`

This reads the first 5 elements from a METAR. Works with AFMAN15-111 Chapter 14.

  • name="Hyperlinks+to+Text"
`/<a\s(?:[^\s>]*?\s)*?href="(?:mailto:)?(.*?)".*?>(.+?)</a>/gi`

Converts hyperlink tags to text, with the link in brackets.

  • name="non-ascii+characters+(imporoper+unicode)"
`/[\x80-\xFF]/g`

Find all characters greater than 127 from a string. Trying to find characters that aren't properly unicode encoded.

  • name="Web+detector"
`/((?:https\:\/\/)|(?:http\:\/\/)|(?:www\.))?([a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(?:\??)[a-zA-Z0-9\-\._\?\,\'\/\\\+&%\$#\=~]+)/gi`

URL detector with capture groups.

  • name="/\d{5}[+]?+\d{6}/g"
`/\d{5}[ ]? \d{6}/g`

Uk mobile Number Validation

  • name="Single+line+comment"
`/\/\/.*$/gm`

Will match all single line (// ..) comments

  • name="Password+Validator"

(?!^[0-9]$)(?!^[a-zA-Z!@#$%^&()+=<>?]$)^([a-zA-Z!@#$%^&()+=<>?0-9]{6,15})$ Matches the good password strings. Must contains a number and an alphabet and should be more than 6 characters long. String can also contain special characters

  • name="HTTP+URL"
`/(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)/g`

Looks for the URL string starting with http://g

  • name="regular+function+declaration.+php,+java,+etc."
`/function\s([a-z0-9]+)\((.*)\)(\t|\r|\s)*\{(.*)\}/gi`

Searches for functions in a script.

  • name="Detect+phone+numbers"
`/(?!:\A|\s)(?!(\d{1,6}\s+\D)|((\d{1,2}\s+){2,2}))(((\+\d{1,3})|(\(\+\d{1,3}\)))\s*)?((\d{1,6})|(\(\d{1,6}\)))\/?(([ -.]?)\d{1,5}){1,5}((\s*(#|x|(ext))\.?\s*)\d{1,5})?(?!:(\Z|\w|\b\s))/gm`

This will detect number patterns that appear to be phone numbers. This is a very general (although comprehensive) regex that should work with both US & International phone numbers.

  • name="link+grabber"
`/((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)/g`

user:

  • name="CPF+-+Brasil"
`/[0-9]{3}\.[0-9]{3}\.[0-9]{3}\-[0-9]{2}/g`

Binary file ./tes.xml.txt matches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment