Skip to content

Instantly share code, notes, and snippets.

Simon Pieters zcorpan

Block or report user

Report or block zcorpan

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@zcorpan
zcorpan / pages_fieldset_by_rank.csv
Created Jul 28, 2018
Pages in httparchive with fieldset ordered by rank
View pages_fieldset_by_rank.csv
We can't make this file beautiful and searchable because it's too large.
rank,p_url
13,http://www.tmall.com/
35,http://www.yahoo.co.jp/
36,http://www.pornhub.com/
89,http://www.so.com/
105,http://www.mozilla.org/
114,http://www.naver.com/
164,http://www.dailymail.co.uk/
180,http://www.uol.com.br/
View results-20170911-112126.csv
pageDomain pageHttpCharset pageMetaCharset pageGuessedCharset element elementSrcDomain elementCharset
www.gs4u.net iso-8859-1 utf-8 iso-8859-1 script vk.com windows-1251
www.5tu.cn gbk gbk gbk script s111.cnzz.com gb2312
www.talewiki.com euc-jp euc-jp link talewiki.com shift_jis
www.doramy.net iso-8859-1 utf-8 iso-8859-1 script vk.com windows-1251
www.great-tv.ru iso-8859-1 utf-8 iso-8859-1 script code.directadvert.ru windows-1251
www.mopedmarket.ru iso-8859-1 windows-1251 iso-8859-1 script vkontakte.ru windows-1251
www.varsity.co.uk iso-8859-1 utf-8 iso-8859-1 script cdn.datatables.net utf8
www.wowchallenges.com iso-8859-1 utf-8 iso-8859-1 script cdn.datatables.net utf8
www.tall-f.com iso-8859-1 utf-8 iso-8859-1 script rranking12.ziyu.net shift_jis
View link_script_domains_charset.csv
View option-label-20170831-112136.csv
page label content
http://www.big-m-one.com/ すべての商品 全ての商品
http://www.moro-ichikara.com/ すべての商品 全ての商品
http://www.csc.edu/ getting started ... getting started …
http://www.geno-web.jp/ すべてのカテゴリー カテゴリーを選択
http://www.amicashop.com/ すべての商品 カテゴリを指定しない
http://www.list.co.uk/ any time
http://www.kingdom.vc/ すべての商品 全ての商品
http://www.kokkaen-ec.jp/ すべての商品 全ての商品
http://www.pantiescollection.net/ すべての商品 全ての商品
View results-20170620-121659.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 7 should actually have 3 columns, instead of 2. in line 6.
page,url,match
http://www.bezeqint.net/,https://www.bezeqint.net/bbl-log-1.0-SNAPSHOT/faq,"rtl"
http://www.acoforum.org/,http://www.acoforum.org/uc/intro.html,l2r
http://www.marieclaire.co.uk/,http://timeinc-under-20.bannerflow.com/resources/custom-resource-410bec79-ded3-44d7-841a-00ef6e225de6.html?cb=636293162136339001,==
http://www.decanter.com/,http://timeinc-under-20.bannerflow.com/resources/custom-resource-410bec79-ded3-44d7-841a-00ef6e225de6.html?cb=636293162136339001,==
http://www.gayroyal.com/,https://www.gayroyal.com/framed/menu.asp,up
http://www.ransnet.com/,https://inffuse-fbpopup.appspot.com/widget.html?cacheKiller=1495341596035&compId=i1daflzh&deviceType=desktop&height=40&instance=kX3fEmUDstGIz_ozKLWkYvKoOMAihvN3hA1aVNgUsSc.eyJpbnN0YW5jZUlkIjoiMTM5ZGUyZTItNjBmZi00YjdkLWY2MWItNWNlYjYxYzkyYTM2Iiwic2lnbkRhdGUiOiIyMDE3LTA1LTIxVDA0OjM5OjUwLjMzMVoiLCJ1aWQiOm51bGwsImlwQW5kUG9ydCI6IjE0OS4yMC42My4xMy81Nzk3NiIsInZlbmRvclByb2R1Y3RJZCI6bnVsbCwiZGVtb01vZGUiOmZhbHNlLCJhaWQiOiJkNTlmNjA4Ni1mYzdhLTQ4YzY
View results-20170530-111817.csv
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 7.
page,url,match
http://www.marinador.com/,http://www.marinador.com/sites/default/files/css/css_08fO24WLFt4GeGtewYNV9sV0JSjtWFWA9MXjpa7ihOg.css,hr:after{float:right
http://www.sojospaclub.com/,http://www.sojospaclub.com/bundles/Content/css/customstyle?v=eB_H9Y5lV3wCvjiftWC0FHub9LVzGb4a6fLQUansD581,"hr:after {
float: right"
http://www.digitalremedy.com/,http://21477-presscdn.pagely.netdna-cdn.com/wp-content/themes/digitalremedy/css/style.css,"hr:after,hr:before{content:""\2022"";display:inline-block;float:left"
http://www.pmcm.ir/,http://up.3nafari.ir/view/2005556/style-end.css,"hr:before{
float:left"
http://www.ostaniha.com/,http://up.3nafari.ir/view/2005556/style-end.css,"hr:before{
float:left"
http://www.digitalprivacyalert.org/,http://www.digitalprivacyalert.org/css/style.css,"hr:after {
View results-20170511-171859.csv
elm attr
img border
img border
img border
img border
img height
img border
img border
img border
img height
View results-20170322-143829.csv
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 7.
page,url,match
http://www.bankemardom.ir/,http://bankemardom.ir/,"<img src=""c:\users\jandaghi\desktop\mardombank\ø¨ø§ù†ú© ù…ù„øª.jpg"" />"
http://www.maksimov.su/,http://www.maksimov.su/in.php?dvar=http://www.maksimov.su/titul/&var=titul.htm&tnum=1,"<img height=1 alt="""" src=""c:\ru\ru.files\top100(1).gif"" width=1 border=0>"
http://www.squarerootcalculator.co/,http://www.squarerootcalculator.co/,"<img src=""c:\users\c\pictures\2013_07_09\square-root-ti-84-plus.jpg"" alt=""test"" style=""float:right;margin:0 5px 50px 50px;"" >"
http://www.k1news.ru/,http://k1news.ru/,"<img src=""c:/users/system f/desktop/ava_banner.jpg"" width=""240"" height=""400"" />"
http://www.aceweb.com/,http://www.aceweb.com/,"<script src=""c:\websites\aceweb/wp-content/themes/satria/js/html5.js"">"
http://www.iisjed.com/,http://www.iisjed.com/,"<script
src=""c:\documents and settings\rumi.iisjed-b25af583\desktop\message from chairman global schools foundation global indian international school (giis)_files\scriptresource(1).axd""
View results-20170320-171444.csv
page url match
http://www.fanporfan.es/ http://www.fanporfan.es/ BLANK
http://www.brightonbeautysupply.com/ http://www.brightonbeautysupply.com/ TOP
http://www.over-blog.it/ https://it.over-blog.com/ BLANK
http://www.twisto.fr/ http://dev.actigraph.fr/actipages/twisto/pivk/media_Twisto2014/relais.html.php Blank
http://www.k-popped.com/ http://k-popped.com/ BLANK
http://www.initpro.ru/ http://initpro.ru/ BLANK
http://www.wizebot.tv/ https://wizebot.tv/ BLANK
http://www.siyassa.org.eg/ http://www.ahram.org.eg/newadv/a.aspx?ZoneID=308&Task=Get&PageID=16219&SiteID=1 Blank
http://www.dicolatin.com/ http://www.dicolatin.com/ TOP
@zcorpan
zcorpan / results-20170223-081749.csv
Created Feb 23, 2017
SELECT page, url, REGEXP_EXTRACT(body, r'(.{20}\bclientInformation\b.{20})') AS match
View results-20170223-081749.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 9 should actually have 3 columns, instead of 2. in line 8.
page,url,match
http://www.elm.sa/,http://www.elm.sa/_layouts/search.js?rev=mNvuYQIlFFUBb3Q8Ktm7hw%3D%3D,lse;if(null!=window.clientInformation)d=window.clientInfo
http://www.alonely.com.cn/,http://images.sohu.com/cs/jsfile/js/c.js,"n""liebao""}if(window.clientInformation&&window.clientInfor"
http://www.mkelectric.com/,http://www.mkelectric.com/_layouts/search.js?rev=BjP0%2BmPXUFhF7kDZmHIaVg%3D%3D,lse;if(null!=window.clientInformation)d=window.clientInfo
http://www.jimmychoo.com/,http://d16fk4ms6rqz1v.cloudfront.net/capture/jimmychoo.js,"ew r(this.api),this.clientInformation=this.getClientInfor"
http://www.86y.org/,http://images.sohu.com/cs/jsfile/js/f.js,"n""liebao""}if(window.clientInformation&&window.clientInfor"
http://www.zuilxy.com/,http://images.sohu.com/cs/jsfile/js/c.js,"n""liebao""}if(window.clientInformation&&window.clientInfor"
http://www.kilimall.co.ke/,http://script.kilimall.co.ke/js/kui/babel.min.js,":!1,clearTimeout:!1,clientInformation:!1,ClientRect:!1,Cl"
http://www.zwijsen.nl/,http://wpg.blue
You can’t perform that action at this time.