sysnucleus

## Sample HTML with Email
<div class="call-to-action ">
<a title="Email" class="contact contact-main contact-email "
href="mailto:info@canberraeyelaser.com.au?subject=Enquiry%2C%20sent%20from%20yellowpages.com.au&amp;
body=%0A%0A%0A%0A%0A------------------------------------------%0AEnquiry%20via%20yellowpages.com.au%0Ahttp%3A%2F%2Fyellowpages.com.au%2Fact%2Fphillip%2Fcanberra-eye-laser-15333167-listing.html%3Fcontext%3DbusinessTypeSearch"
rel="nofollow" data-email="info@canberraeyelaser.com.au">
<span class="glyph icon-email border border-dark-blue with-text"></span><span class="contact-text">Email</span>
</a>
</div>

## Sample Text 1.txt
The Man in the High Castle  Kindle Edition

by

Philip K. Dick (Author)

## Commonly used Regular Expressions
(.*)
Selects only first line from a block of text or HTML

[\s]*(.*)
Selects first line, ignoring the starting white-spaces, (spaces, line feeds and carriage returns).
[\s]* matches all white-spaces till the first view-able character.

href=”([^”]*)
Gets the href link/URL from HTML. [^”]* matches till the next " character.

## Amazon WebHarvy RegEx
To collect 'Title' / 'Seller' / 'In Stock'

1. Click on the required text
2. Select More Options >  Capture More Content 10 times (The objective here is to get the complete HTML source of the page)
3. Select More Options >  Capture HTML
4. Select More Options > Apply Regular Expression. Paste any one of following RegEx string and Apply.

id="productTitle">[\s]*([^<]*)

by[\s\S]*?id="brand"[^>]*>[\s]*([^<]*)

## gist:f91f8b8b84d918918b6857b3a7acff3a
src="([^_]*)_[^\.]*\.([^"]*)

<div id="feature-bullets"[^>]*>([\s\S]*?)</div>

## yellow pages egypt.js
[\s]*(.*)

tel: ([^"]*)

title="Website"[\s\S]*?href="([^"]*)

class="col-md-9 company_address"[^>]*>([^<]*)

## ebay
id="email"[\s\S]*?cell_value">([^<]*)

id="phone_number"[\s\S]*?cell_value">([^<]*)

## gist:84a0574cbf908813787d2d95b8a6c2ed
groupEl = document.getElementsByTagName('article')[0].parentElement.parentElement.parentElement.parentElement;
groupEl.children[groupEl.childElementCount-1].scrollIntoView();

## gist:436a2b0be80882f0ae61a391931abf5d
data-email="([^"]*)

tel:([^"]*)

title="([^\s]*)\s*\(opens in a new window\)

<p class="listing-address[^>]*>([^<]*)

## tripadvisor

// RegEx to Follow links

href="([^"]*)

// More button click

document.getElementsByClassName('moreBtn')[0].click();

// Get images block
	<div class="call-to-action ">
	<a title="Email" class="contact contact-main contact-email "
	href="mailto:info@canberraeyelaser.com.au?subject=Enquiry%2C%20sent%20from%20yellowpages.com.au&
	body=%0A%0A%0A%0A%0A------------------------------------------%0AEnquiry%20via%20yellowpages.com.au%0Ahttp%3A%2F%2Fyellowpages.com.au%2Fact%2Fphillip%2Fcanberra-eye-laser-15333167-listing.html%3Fcontext%3DbusinessTypeSearch"
	rel="nofollow" data-email="info@canberraeyelaser.com.au">
	<span class="glyph icon-email border border-dark-blue with-text"></span><span class="contact-text">Email</span>
	</a>
	</div>
	The Man in the High Castle Kindle Edition

	by

	Philip K. Dick (Author)
	(.*)
	Selects only first line from a block of text or HTML

	[\s](.)
	Selects first line, ignoring the starting white-spaces, (spaces, line feeds and carriage returns).
	[\s]* matches all white-spaces till the first view-able character.

	href=”([^”]*)
	Gets the href link/URL from HTML. [^”]* matches till the next " character.
	To collect 'Title' / 'Seller' / 'In Stock'

	1. Click on the required text
	2. Select More Options > Capture More Content 10 times (The objective here is to get the complete HTML source of the page)
	3. Select More Options > Capture HTML
	4. Select More Options > Apply Regular Expression. Paste any one of following RegEx string and Apply.

	id="productTitle">[\s]([^<])

	by[\s\S]?id="brand"[^>]>[\s]([^<])
	src="([^_])_[^\.]\.([^"]*)

	<div id="feature-bullets"[^>]>([\s\S]?)</div>
	[\s](.)

	tel: ([^"]*)

	title="Website"[\s\S]?href="([^"])

	class="col-md-9 company_address"[^>]>([^<])
	id="email"[\s\S]?cell_value">([^<])

	id="phone_number"[\s\S]?cell_value">([^<])
	groupEl = document.getElementsByTagName('article')[0].parentElement.parentElement.parentElement.parentElement;
	groupEl.children[groupEl.childElementCount-1].scrollIntoView();
	data-email="([^"]*)

	tel:([^"]*)

	title="([^\s])\s\(opens in a new window\)

	<p class="listing-address[^>]>([^<])

	// RegEx to Follow links

	href="([^"]*)

	// More button click

	document.getElementsByClassName('moreBtn')[0].click();

	// Get images block