Relavent XKCD
Regex is based on set theory logic. Since all computation uses this, not just search engines, There are ties to regex from all the way of understanding how computers work , because regular expressions are used in the theory of computation itself. On how to teach a computer to understand human language. Regex at first looks like computer language, Regex is a very powerful skill but can be at times very fustrating. The power involves being able to do functions without having to type out long code lines in IDE's.
Regular expressions are used in search engines, search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK and in lexical analysis. Many programming languages provide regex capabilities either built-in or via libraries, as it has uses in many situations.
This gist will explain most of the syntax and commands that regex use's, hope to help people gain a better understanding of this concept, and at the same time, searing the information into own my brain; I usually write stuff as informative as possible, trying to take the best out of it, and making it that I am able to go back to it later, and since this is an assignment as well, ill put in extra effort.
For this assignment requirement, the gist will include a Hexdecimal regular expression
^#?([a-f0-9]{6}|[a-f0-9]{3})$/
The assigment requires I set out and explain how this reads. Ill go through every character in this expression to show how its doing the logic. The rest of Gist shows the rules, the syntax, the what and why's and hows of Regex. Ill include some things that can further augment these logical understandings.
These are Expressions, just like mathamatical expressions, that denotes the regular language.
They are called "Metacharacters"
I included some tables. They will appear here.
And the rest will be explained below
- POSIX
- Pattern Modifiers
- 8 REGULAR EXPRESSIONS YOU SHOULD KNOW:
- Solution To Assignment
- Resources
- Compiler
The up arrow indicates beginning of the string being evaluated. While the dollar sign indicates the end of the string being evaluated. For example, ^Let's Go!$, displays the exact string match which starts and ends with Let's Go!
The anchors allow to catch more similiar types of matches even though theres variability the text where errors can occur.
*
: 0 or more+
: 1 or more?
: 0 or 1{3
: Exactly 3{3,}
: 3 or more{3,5}
: 3, 4 or 5
Note: Quantifiers are greedy - they match as many times as possible. Add a ? after the quantifier to make it ungreedy. By using the ? we then provide a conditon only if it resolves true, then the quanifier will run
"+" allows for us to repeat a number of times as long as they match accepted characters. "{2,6}" this means we need to two characters for the capturing group, but no more than 6.
.
: Any character except newline (\n)(a|b)
: a or b(…)
: Group(?:…)
: Passive (non-capturing) group[abc]
: a, b or c[^abc]
: Not a, b or c[a-z]
: Letters from a to z[A-Z]
: Uppercase letters from A to Z[0-9]
: Digits from 0 to 9
Note: Ranges are inclusive.
Grouping allows for checks or matches that are contained within each others bounds depending on what type they are groups as.
[abc]
: a, b or c[^abc]
: Not a, b or c[a-z]
: Letters from a to z[A-Z]
: Uppercase letters from A to Z[0-9]
: Digits from 0 to 9
Note: Ranges are inclusive.
\d indicates any single character that is a digit. This can also be represented by 0-9. This is also known ask character set. It is one of the most ocmmonly used features of regular expression. It allows for you to locate a word even if it is misspelled. It does not matter the order of the characters inside the character class, the results will be identicial. You can also include together a range and single character. For example,[0-9a-fxA-FX] is considered a hexadecimal digit match.
\s
: Whitespace\S
: Not whitespace\w
: Word\W
: Not word\d
: Digit\D
: Not digit\x
: Hexadecimal digit\O
: Octal digit
Flags signify what option the search with take or what the search is defined as. These are part of regular expressions, and can be used seperatly or combined .
|
: Alternation
Also called the Union Operator , it is identified with | . Similiar to the condtional "||" it allows for checks instead of having to write out more code for other conditions.
- 1. i : the search is case-insensitive
- 2. g : Look at the entire medium until it finds its first match then returns
- 3. m : this is called enabling multiline code where regex is now able to use lines that are not just part of the first line
- 4. s : Mode "dotall" is now enabled, allows for a multiline process that allows "dot" => (.) operator to match a newline
- 5. u : allows processing of surrogate type pairs
- 6. y : sticky mode is now enabled, allows precision control within mediums
The options allow for variability and results that can easily adapt to many different sitations. THis allows even code reusability to different substraits.
\
:Escape following character. Used to escape any of the following metacharacters: {}^$.|*+?.\Q
: Begin literal sequence\E
: End literal sequence
This escape is initated only when you start off with "". Also allows for options included above. With options we can vary the text to give choice on the escape. Must always be used specifically when it involes a template literal \ chararcter
[:upper:]
: Uppercase letters[:lower:]
: Lowercase letters[:alpha:]
: All letters[:alnum:]
: Digits and letters[:digit:]
: Digits[:xdigit:]
: Hexadecimal digits[:punct:]
: Punctuation[:blank:]
: Space and tab[:space:]
: Blank characters[:cntrl:]
: Control characters[:graph:]
: Printed characters[:print:]
: Printed characters and spaces[:word:]
: Digits, letters and underscore
g
: Global matchi
: Case-insensitivem
: Multi-line mode. Causes ^ and $ to also match the start/end of lines.s
: Single-line mode. Causes . to match all, including line breaks.x
: Allow comments and whitespace in patterne
: Evaluate replacementU
: Ungreedy mode
- Matching a Username:
/^[a-z0-9_-]{3,16}$/
- Matching a Password:
/^[a-z0-9_-]{6,18}$/
- Matching a Hex Value:
/^#?([a-f0-9]{6}|[a-f0-9]{3})$/
- Matching a Slug:
/^[a-z0-9-]+$/
- Matching an Email:
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
- Matching a URL:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
- Matching an IP Address:
/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/
- Matching an HTML Tag:
/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/
^#?([a-f0-9]{6}|[a-f0-9]{3})$
[^] = >Matches the beginning of the string, or the beginning of a line if the multiline flag (m) is enabled. This matches a position, not a character.
[#?] = Matches 0 or 1 of the preceding token, effectively making it optional.
([a-f0-9]{6}|[a-f0-9]{3}) = Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference.
([a-f0-9]{6}|[a-f0-9]{3})$
- Within Capturing Group 1 = Then character set [a-f0-9]
- [a-f0-9] = Natches a character in the range of a to f. case sensitive
{6} = This is the quantifier, and is stating must match 6 of the preceding token, which is the character set
[|] = Acts like a boolean OR. Matches the expression before or after the |. It can operate within a group, or on a whole expression. The patterns will be tested in order. In between both character sets
Within Capturing Group 1 = Then character set [a-f0-9] [a-f0-9] = Natches a character in the range of a to f. case sensitive
{3} = This is the quantifier, and is stating must match 6 of the preceding token, which is the character set
last: {$} = Matches the end of the string, or the end of a line if the multiline flag (m) is enabled. This matches a position, not a character.
Interactive Game that is themed XKCD!
Stephen Puthenpurackal Information obtained from the internet from all various sources.