Last active
January 30, 2022 12:00
-
-
Save sysnucleus/aa9fd7400f1f4d0590aae76c55262de0 to your computer and use it in GitHub Desktop.
Commonly used Regular Expressions, with WebHarvy
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(.*) | |
Selects only first line from a block of text or HTML | |
[\s]*(.*) | |
Selects first line, ignoring the starting white-spaces, (spaces, line feeds and carriage returns). | |
[\s]* matches all white-spaces till the first view-able character. | |
href=”([^”]*) | |
Gets the href link/URL from HTML. [^”]* matches till the next " character. | |
src=”([^”]*) | |
Gets src link/URL from HTML | |
Also can be modified according to requirement as shown below. | |
zoom-image=”([^”]*) | |
data-large-image=”([^”]*) | |
mailto:([^”]*) | |
Gets email address from HTML | |
Alloy Wheels([\s\S]*?)<div class="icon"> | |
Gets the string between 'Alloy Wheels' and <div class="icon">. This can be modified to match | |
any string which is guaranteed to appear between 2 other strings in HTML or in TEXT. | |
[\s\S]* matches everything (white-space and non white-space - includes all characters) | |
Starting Text([\s\S]*?)Ending Text | |
General format of the above case. Just place ([\s\S]*?) between the starting and ending portion | |
and the in-between text or HTML is matched and selected. | |
itemprop="name">([^<]*)<div class="line"> | |
Gets HTML code between itemprop="name"> and <div class="line">. [^<]* matches all characters till <. | |
itemprop="name">([\s\S]*?)<div class="line"> | |
Same as above. | |
(?=[^M]*MAP)[^M]*MAP: \$(.*)|List Price: \$(.*) | |
Conditional regular expression. Captures MAP price if available, else capture List Price. | |
RegEx special characters like $, ., ^ etc. should be escaped by \ (example: \$, \. etc). | |
<img src="([^"]*) | |
First image URL in HTML | |
<img src=[\s\S]*?<img src="([^"]*) | |
Second image URL in HTML. src value of second img tag in HTML. | |
(In Stock) | |
Matches and gives value 'In Stock', only if the selected HTML or TEXT has the text 'In Stock'. | |
This can be used to check if the selected HTML or TEXT contains a specific string. | |
merch_name[^>]*>([^<]*) | |
Matches the string which comes between 2 HTML tags where the starting tag contains the text 'merch_name'. | |
[^>]*> matches till the next > | |
[^<]* matches till the next < | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment