Skip to content

Instantly share code, notes, and snippets.

@Netcelal
Netcelal / mini-scrape.php
Created May 26, 2012 15:04 — forked from ChrisMcKee/mini-scrape.php
Small part of a php page scraping application
<?php
function get_web_page( $url,$curl_data )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
@Netcelal
Netcelal / DomHelper
Created May 26, 2012 15:03
PHP class for parsing html and get links
<?php
class helper {
//---------------------------------------------------------------------------
public function __construct() {
}
@Netcelal
Netcelal / dom_scraper.php
Created May 26, 2012 15:03 — forked from gati/dom_scraper.php
Quick scrape using simple_html_dom
include('simple_html_dom.php'); // DOM parsing library.
$url = (isset($_GET['site'])) ? $_GET['site'] : 'http://www.yelp.com'; //just an example, clean this
$dom = file_get_html($url);
foreach ($dom->find('a') as $node) {
// Replace href attribute value
$node->href = 'http://YOURPROXYSERVER.COM?requestedurl='.urlencode($url.$node->href);
}
// Output modified DOM
echo $dom->outertext;
@Netcelal
Netcelal / FindLinks.php
Created May 26, 2012 14:55 — forked from zachbrowne/FindLinks.php
Find Out All Links In any Website using PHP