Skip to content

Instantly share code, notes, and snippets.

<div class="call-to-action ">
<a title="Email" class="contact contact-main contact-email "
href="mailto:info@canberraeyelaser.com.au?subject=Enquiry%2C%20sent%20from%20yellowpages.com.au&amp;
body=%0A%0A%0A%0A%0A------------------------------------------%0AEnquiry%20via%20yellowpages.com.au%0Ahttp%3A%2F%2Fyellowpages.com.au%2Fact%2Fphillip%2Fcanberra-eye-laser-15333167-listing.html%3Fcontext%3DbusinessTypeSearch"
rel="nofollow" data-email="info@canberraeyelaser.com.au">
<span class="glyph icon-email border border-dark-blue with-text"></span><span class="contact-text">Email</span>
</a>
</div>
The Man in the High Castle Kindle Edition
by
Philip K. Dick (Author)
@sysnucleus
sysnucleus / Commonly used Regular Expressions
Last active January 30, 2022 12:00
Commonly used Regular Expressions, with WebHarvy
(.*)
Selects only first line from a block of text or HTML
[\s]*(.*)
Selects first line, ignoring the starting white-spaces, (spaces, line feeds and carriage returns).
[\s]* matches all white-spaces till the first view-able character.
href=”([^”]*)
Gets the href link/URL from HTML. [^”]* matches till the next " character.
To collect 'Title' / 'Seller' / 'In Stock'
1. Click on the required text
2. Select More Options > Capture More Content 10 times (The objective here is to get the complete HTML source of the page)
3. Select More Options > Capture HTML
4. Select More Options > Apply Regular Expression. Paste any one of following RegEx string and Apply.
id="productTitle">[\s]*([^<]*)
by[\s\S]*?id="brand"[^>]*>[\s]*([^<]*)
@sysnucleus
sysnucleus / gist:f91f8b8b84d918918b6857b3a7acff3a
Created June 22, 2018 04:47
WebHarvy Amazon extraction regular expressions
src="([^_]*)_[^\.]*\.([^"]*)
<div id="feature-bullets"[^>]*>([\s\S]*?)</div>
@sysnucleus
sysnucleus / yellow pages egypt.js
Created October 15, 2018 05:08
RegEx strings to extract listing name, telephone, website and address
[\s]*(.*)
tel: ([^"]*)
title="Website"[\s\S]*?href="([^"]*)
class="col-md-9 company_address"[^>]*>([^<]*)
@sysnucleus
sysnucleus / ebay
Created June 16, 2019 06:07
RegEx to extract email/phone from eBay sellers page
id="email"[\s\S]*?cell_value">([^<]*)
id="phone_number"[\s\S]*?cell_value">([^<]*)
@sysnucleus
sysnucleus / gist:84a0574cbf908813787d2d95b8a6c2ed
Created August 20, 2020 13:22
JS code to configure pagination (scroll) in WebHarvy for Twitter scraping
groupEl = document.getElementsByTagName('article')[0].parentElement.parentElement.parentElement.parentElement;
groupEl.children[groupEl.childElementCount-1].scrollIntoView();
@sysnucleus
sysnucleus / gist:436a2b0be80882f0ae61a391931abf5d
Created August 31, 2020 13:55
RegEx strings to extract email, phone, website and address from yellowpages.com.au
data-email="([^"]*)
tel:([^"]*)
title="([^\s]*)\s*\(opens in a new window\)
<p class="listing-address[^>]*>([^<]*)
@sysnucleus
sysnucleus / tripadvisor
Created September 3, 2020 03:51
Codes to extract reviewer submitted images from TripAdvisor using WebHarvy
// RegEx to Follow links
href="([^"]*)
// More button click
document.getElementsByClassName('moreBtn')[0].click();
// Get images block