Skip to content

Instantly share code, notes, and snippets.

View qtproduction's full-sized avatar

blackant qtproduction

View GitHub Profile
@qtproduction
qtproduction / scrape.py
Created September 26, 2012 12:41
Retrive website from Google Cache without blocking IP
#Retrive old website from Google Cache. Optimized with sleep time, and avoid 504 error (Google block Ip send many request).
#Programmer: Kien Nguyen - QTPros http://qtpros.info/kiennguyen
#change search_site and search_term to match your requirement
#Original: http://www.guyrutenberg.com/2008/10/02/retrieving-googles-cache-for-a-whole-website/
#!/usr/bin/python
import urllib, urllib2
import re
import socket
<div itemscope itemtype="http://schema.org/LocalBusiness">
<meta itemprop="name" content="Immigration Attorney Office NYC - Immigration Lawyer New York">
<!-- start of review #1-->
<div itemprop="review" itemscope itemtype="http://schema.org/Review">
<span itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">
<!-- Value of rating -->
<span itemprop="ratingValue">5</span>
</span>
<span itemprop="author" itemscope itemtype="http://schema.org/Person">
<!-- Review's author name -->
<div class="all_review" itemscope="" itemtype="http://schema.org/LocalBusiness">
<div class="residence_title title_our_client" style="margin-bottom:20px;"><h2>reviews</h2></div>
<!-- AgreegateRating -->
<div class="top_review" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
<meta id="top_value" itemprop="ratingValue" content="5">
<meta id="top_img" itemprop="bestRating" content="5">
<meta id="top_count" itemprop="ratingCount" content="6">
</div>
<!-- End AgreegateRating -->
<meta itemprop="name" content="Immigration Attorney Office NYC - Immigration Lawyer New York">
<div class="all_review" itemscope="" itemtype="http://schema.org/LocalBusiness">
<div class="residence_title title_our_client" style="margin-bottom:20px;"><h2>reviews</h2></div>
<!-- AgreegateRating -->
<div id="top_review" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
<ul>
<li>
<span id="top_value" itemprop="ratingValue">5</span>
</li>
<li>
<span id="top_img" itemprop="bestRating">5</span><img src="http://www.immigrationlawyernewyork.com/images/imm/five_star.png">
<div class="view challenge-detail-index" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="challenge-category-title tracking">Well-Being</span>
<div class="challenge-detail-header">
<a class="back-button" href="/challenges/17811-Well-Being ">All Well-Being Programs</a>
<div class="detail-wrapper">
<div class="detail-image">
<img src="https://cms.lifereimagined.org/sites/default/files/styles/challenge-large/public/refire-dont-retire.png" itemprop="image">
</div>
<div class="detail-description">
<div class="author-info">
<div class="container testimonials hp-b clearfix" itemscope="" itemtype="http://schema.org/Organization">
<meta itemprop="name" content="Life Reimagined">
<!-- AgreegateRating -->
<span itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
<meta id="top_value" itemprop="ratingValue" content="5">
<meta id="top_img" itemprop="bestRating" content="5">
<meta id="top_count" itemprop="ratingCount" content="3">
</span>
<!-- End AgreegateRating -->
<!-- Review -->