supaspoida/regext.txt

## regext.txt
- any character you use it will literally match it except special characters
^ $ ? . / \ [ ] { } ( ) + * - all the special characters that will need escaping if you don't want them to be special
// - regexp ruby class

Common Patterns (I authored)
/[\w+\.~-]+@[\w~-]+.[\w\.]+/ - match emails, conforms to RFC 3986 section 3.3
/\+?(\d)?[-|\.|\s]?\(?(\d{3})\)?[-|\.|\s]?(\d{3})[-|\.|\s]?(\d{4})/ - match phone numbers, https://gist.github.com/1009331

Strategies

foo(?!.*foo) - negative lookahead, find the foo that does not have a foo following it. use to find the last match in a string.

Anchors

^      - start of line
\A     - start of string
$      - end of line
\Z     - end of string
\b     - any any word boundary character
\B     - any non word boundary
\<     - start of word
\>     - end of word
/^apple/.match 'pear apple' # no match, ^ looks for apple at beginning of string with not whitespace before it
\A - longhand for ^, /\Aapple/ same as /^apple/
/apple$/.match 'apple pear' # no match, $ looks for apple at end of string with not whitespace after it
\Z - longhand for $, /apple\Z/ same as /apple$/

Character Classes

[]       - character class, a quasi-wildcard, matches only characters specified
[abc]    - match a single character, a, b, or c
[^abc]   - match a single character except for a, b, or c
[a-zA-Z] - match single character in the range a-z or A-Z
.        - any character
\c       - control character
\s       - any whitespace character
\S       - any non-whitespace character
\d       - any digit, shorthand for [0-9]
\D       - any non-digit
\w       - any word character, shorthand for [0-9a-fA-F_]
\W       - any non-word character
\xhh     - hexadecimal char hh @expand
\X       - ??
\Oxxx    - octal char xxx @expand

Also Note: Any special characters within a character class become literal characters unless
they are escaped (e.g. [.] matches a period versus [\.] which is any character)

Quantifiers

a?     - nothing or a, ? marks previous character as optional
a*     - nothing or more of a
a+     - one or more of a
a{3}   - exactly 3 of a
a{3,}  - 3 or more of a
a{3,6} - 3 to 6 of a

Ranges

(a|b)   - a or b
(...)   - contents are captured
(:?...) - passive group. gain the benefits of using parens but without having to capture its match.
\n      - nth group/subpattern

Ruby Matching

- there are 2 components to a ruby regexp, the pattern and the modifers. modifers are optional, example
  /something/i # something is the pattern, i is the modifier
- every match operation either succeeds or fails, if no match it will always be nil
"an interesting ruby string".match(/ruby/) # returns a matchdata class
"test this".match(/banana/)                # returns nil
/ruby/.match("an interesting ruby string") # returns a matchdata class
/ruby/.match("an interesting ruby string") # returns a matchdata class
"test this" =~ /this/                      # returns 5, the beginning location of the match
/this/ =~ "test this"                      # ''  ''
- class MatchData has a boolean value of true making it useful for logic operations
- class MatchData also stores information about the match
"before after before".scan(/before/) - returns an array of all matches, if the pattern contains captures, you'll get an array of arrays
"before after before".split(/before/) - returns an array of everything except the matches

MatchData, example methods:
  match = /ejected/.match 'ejected'
  match.string                      # ejected, the string we matched agains
  match[0]                          # the entire part of the string matched
  match[1]                          # first match
  match[2]                          # second match
  match.captures[0]                 # first match
  match.captures[1]                 # second match

Modifiers

/i - case insensitive
/m - makes wildcard, . , match newlines
/x - ignore whitespace in pattern
/o - perform #{...} substitutions only once
/s - treat string as single line
/[rd]ejected/imxo - chain multiple modifiers

Substitution

"after it all".gsub(/after/, "before") # "before it all"
"after it all".gsub(/after/, "before \\0") # before after it all, reinsert the first capture. increment for additional

Special Chars

\  - escape char
\n - newline
\r - carriage return
\t - tab
\v - vertical tab
\f - form feed
	- any character you use it will literally match it except special characters
	^ $ ? . / \ [ ] { } ( ) + * - all the special characters that will need escaping if you don't want them to be special
	// - regexp ruby class

	Common Patterns (I authored)
	/[\w+\.~-]+@[\w~-]+.[\w\.]+/ - match emails, conforms to RFC 3986 section 3.3
	/\+?(\d)?[-\|\.\|\s]?\(?(\d{3})\)?[-\|\.\|\s]?(\d{3})[-\|\.\|\s]?(\d{4})/ - match phone numbers, https://gist.github.com/1009331

	Strategies

	foo(?!.*foo) - negative lookahead, find the foo that does not have a foo following it. use to find the last match in a string.

	Anchors

	^ - start of line
	\A - start of string
	$ - end of line
	\Z - end of string
	\b - any any word boundary character
	\B - any non word boundary
	\< - start of word
	\> - end of word
	/^apple/.match 'pear apple' # no match, ^ looks for apple at beginning of string with not whitespace before it
	\A - longhand for ^, /\Aapple/ same as /^apple/
	/apple$/.match 'apple pear' # no match, $ looks for apple at end of string with not whitespace after it
	\Z - longhand for $, /apple\Z/ same as /apple$/

	Character Classes

	[] - character class, a quasi-wildcard, matches only characters specified
	[abc] - match a single character, a, b, or c
	[^abc] - match a single character except for a, b, or c
	[a-zA-Z] - match single character in the range a-z or A-Z
	. - any character
	\c - control character
	\s - any whitespace character
	\S - any non-whitespace character
	\d - any digit, shorthand for [0-9]
	\D - any non-digit
	\w - any word character, shorthand for [0-9a-fA-F_]
	\W - any non-word character
	\xhh - hexadecimal char hh @expand
	\X - ??
	\Oxxx - octal char xxx @expand

	Also Note: Any special characters within a character class become literal characters unless
	they are escaped (e.g. [.] matches a period versus [\.] which is any character)

	Quantifiers

	a? - nothing or a, ? marks previous character as optional
	a* - nothing or more of a
	a+ - one or more of a
	a{3} - exactly 3 of a
	a{3,} - 3 or more of a
	a{3,6} - 3 to 6 of a

	Ranges

	(a\|b) - a or b
	(...) - contents are captured
	(:?...) - passive group. gain the benefits of using parens but without having to capture its match.
	\n - nth group/subpattern

	Ruby Matching

	- there are 2 components to a ruby regexp, the pattern and the modifers. modifers are optional, example
	/something/i # something is the pattern, i is the modifier
	- every match operation either succeeds or fails, if no match it will always be nil
	"an interesting ruby string".match(/ruby/) # returns a matchdata class
	"test this".match(/banana/) # returns nil
	/ruby/.match("an interesting ruby string") # returns a matchdata class
	/ruby/.match("an interesting ruby string") # returns a matchdata class
	"test this" =~ /this/ # returns 5, the beginning location of the match
	/this/ =~ "test this" # '' ''
	- class MatchData has a boolean value of true making it useful for logic operations
	- class MatchData also stores information about the match
	"before after before".scan(/before/) - returns an array of all matches, if the pattern contains captures, you'll get an array of arrays
	"before after before".split(/before/) - returns an array of everything except the matches

	MatchData, example methods:
	match = /ejected/.match 'ejected'
	match.string # ejected, the string we matched agains
	match[0] # the entire part of the string matched
	match[1] # first match
	match[2] # second match
	match.captures[0] # first match
	match.captures[1] # second match

	Modifiers

	/i - case insensitive
	/m - makes wildcard, . , match newlines
	/x - ignore whitespace in pattern
	/o - perform #{...} substitutions only once
	/s - treat string as single line
	/[rd]ejected/imxo - chain multiple modifiers

	Substitution

	"after it all".gsub(/after/, "before") # "before it all"
	"after it all".gsub(/after/, "before \\0") # before after it all, reinsert the first capture. increment for additional

	Special Chars

	\ - escape char
	\n - newline
	\r - carriage return
	\t - tab
	\v - vertical tab
	\f - form feed