Skip to content

Instantly share code, notes, and snippets.

Avatar

Simon Pieters zcorpan

View GitHub Profile
@zcorpan
zcorpan / pages_fieldset_by_rank.csv
Created Jul 28, 2018
Pages in httparchive with fieldset ordered by rank
View pages_fieldset_by_rank.csv
We can't make this file beautiful and searchable because it's too large.
rank,p_url
13,http://www.tmall.com/
35,http://www.yahoo.co.jp/
36,http://www.pornhub.com/
89,http://www.so.com/
105,http://www.mozilla.org/
114,http://www.naver.com/
164,http://www.dailymail.co.uk/
180,http://www.uol.com.br/
View results-20170911-112126.csv
pageDomain pageHttpCharset pageMetaCharset pageGuessedCharset element elementSrcDomain elementCharset
www.gs4u.net iso-8859-1 utf-8 iso-8859-1 script vk.com windows-1251
www.5tu.cn gbk gbk gbk script s111.cnzz.com gb2312
www.talewiki.com euc-jp euc-jp link talewiki.com shift_jis
www.doramy.net iso-8859-1 utf-8 iso-8859-1 script vk.com windows-1251
www.great-tv.ru iso-8859-1 utf-8 iso-8859-1 script code.directadvert.ru windows-1251
www.mopedmarket.ru iso-8859-1 windows-1251 iso-8859-1 script vkontakte.ru windows-1251
www.varsity.co.uk iso-8859-1 utf-8 iso-8859-1 script cdn.datatables.net utf8
www.wowchallenges.com iso-8859-1 utf-8 iso-8859-1 script cdn.datatables.net utf8
www.tall-f.com iso-8859-1 utf-8 iso-8859-1 script rranking12.ziyu.net shift_jis
View link_script_domains_charset.csv
View option-label-20170831-112136.csv
page label content
http://www.big-m-one.com/ すべての商品 全ての商品
http://www.moro-ichikara.com/ すべての商品 全ての商品
http://www.csc.edu/ getting started ... getting started …
http://www.geno-web.jp/ すべてのカテゴリー カテゴリーを選択
http://www.amicashop.com/ すべての商品 カテゴリを指定しない
http://www.list.co.uk/ any time
http://www.kingdom.vc/ すべての商品 全ての商品
http://www.kokkaen-ec.jp/ すべての商品 全ての商品
http://www.pantiescollection.net/ すべての商品 全ての商品
View results-20170620-121659.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 7 should actually have 3 columns, instead of 2. in line 6.
page,url,match
http://www.bezeqint.net/,https://www.bezeqint.net/bbl-log-1.0-SNAPSHOT/faq,"rtl"
http://www.acoforum.org/,http://www.acoforum.org/uc/intro.html,l2r
http://www.marieclaire.co.uk/,http://timeinc-under-20.bannerflow.com/resources/custom-resource-410bec79-ded3-44d7-841a-00ef6e225de6.html?cb=636293162136339001,==
http://www.decanter.com/,http://timeinc-under-20.bannerflow.com/resources/custom-resource-410bec79-ded3-44d7-841a-00ef6e225de6.html?cb=636293162136339001,==
http://www.gayroyal.com/,https://www.gayroyal.com/framed/menu.asp,up
http://www.ransnet.com/,https://inffuse-fbpopup.appspot.com/widget.html?cacheKiller=1495341596035&compId=i1daflzh&deviceType=desktop&height=40&instance=kX3fEmUDstGIz_ozKLWkYvKoOMAihvN3hA1aVNgUsSc.eyJpbnN0YW5jZUlkIjoiMTM5ZGUyZTItNjBmZi00YjdkLWY2MWItNWNlYjYxYzkyYTM2Iiwic2lnbkRhdGUiOiIyMDE3LTA1LTIxVDA0OjM5OjUwLjMzMVoiLCJ1aWQiOm51bGwsImlwQW5kUG9ydCI6IjE0OS4yMC42My4xMy81Nzk3NiIsInZlbmRvclByb2R1Y3RJZCI6bnVsbCwiZGVtb01vZGUiOmZhbHNlLCJhaWQiOiJkNTlmNjA4Ni1mYzdhLTQ4YzY
View results-20170530-111817.csv
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 7.
page,url,match
http://www.marinador.com/,http://www.marinador.com/sites/default/files/css/css_08fO24WLFt4GeGtewYNV9sV0JSjtWFWA9MXjpa7ihOg.css,hr:after{float:right
http://www.sojospaclub.com/,http://www.sojospaclub.com/bundles/Content/css/customstyle?v=eB_H9Y5lV3wCvjiftWC0FHub9LVzGb4a6fLQUansD581,"hr:after {
float: right"
http://www.digitalremedy.com/,http://21477-presscdn.pagely.netdna-cdn.com/wp-content/themes/digitalremedy/css/style.css,"hr:after,hr:before{content:""\2022"";display:inline-block;float:left"
http://www.pmcm.ir/,http://up.3nafari.ir/view/2005556/style-end.css,"hr:before{
float:left"
http://www.ostaniha.com/,http://up.3nafari.ir/view/2005556/style-end.css,"hr:before{
float:left"
http://www.digitalprivacyalert.org/,http://www.digitalprivacyalert.org/css/style.css,"hr:after {
View results-20170511-171859.csv
elm attr
img border
img border
img border
img border
img height
img border
img border
img border
img height
View results-20170322-143829.csv
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 7.
page,url,match
http://www.bankemardom.ir/,http://bankemardom.ir/,"<img src=""c:\users\jandaghi\desktop\mardombank\ø¨ø§ù†ú© ù…ù„øª.jpg"" />"
http://www.maksimov.su/,http://www.maksimov.su/in.php?dvar=http://www.maksimov.su/titul/&var=titul.htm&tnum=1,"<img height=1 alt="""" src=""c:\ru\ru.files\top100(1).gif"" width=1 border=0>"
http://www.squarerootcalculator.co/,http://www.squarerootcalculator.co/,"<img src=""c:\users\c\pictures\2013_07_09\square-root-ti-84-plus.jpg"" alt=""test"" style=""float:right;margin:0 5px 50px 50px;"" >"
http://www.k1news.ru/,http://k1news.ru/,"<img src=""c:/users/system f/desktop/ava_banner.jpg"" width=""240"" height=""400"" />"
http://www.aceweb.com/,http://www.aceweb.com/,"<script src=""c:\websites\aceweb/wp-content/themes/satria/js/html5.js"">"
http://www.iisjed.com/,http://www.iisjed.com/,"<script
src=""c:\documents and settings\rumi.iisjed-b25af583\desktop\message from chairman global schools foundation global indian international school (giis)_files\scriptresource(1).axd""
View results-20170320-171444.csv
page url match
http://www.fanporfan.es/ http://www.fanporfan.es/ BLANK
http://www.brightonbeautysupply.com/ http://www.brightonbeautysupply.com/ TOP
http://www.over-blog.it/ https://it.over-blog.com/ BLANK
http://www.twisto.fr/ http://dev.actigraph.fr/actipages/twisto/pivk/media_Twisto2014/relais.html.php Blank
http://www.k-popped.com/ http://k-popped.com/ BLANK
http://www.initpro.ru/ http://initpro.ru/ BLANK
http://www.wizebot.tv/ https://wizebot.tv/ BLANK
http://www.siyassa.org.eg/ http://www.ahram.org.eg/newadv/a.aspx?ZoneID=308&Task=Get&PageID=16219&SiteID=1 Blank
http://www.dicolatin.com/ http://www.dicolatin.com/ TOP
@zcorpan
zcorpan / results-20170223-081749.csv
Created Feb 23, 2017
SELECT page, url, REGEXP_EXTRACT(body, r'(.{20}\bclientInformation\b.{20})') AS match
View results-20170223-081749.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 9 should actually have 3 columns, instead of 2. in line 8.
page,url,match
http://www.elm.sa/,http://www.elm.sa/_layouts/search.js?rev=mNvuYQIlFFUBb3Q8Ktm7hw%3D%3D,lse;if(null!=window.clientInformation)d=window.clientInfo
http://www.alonely.com.cn/,http://images.sohu.com/cs/jsfile/js/c.js,"n""liebao""}if(window.clientInformation&&window.clientInfor"
http://www.mkelectric.com/,http://www.mkelectric.com/_layouts/search.js?rev=BjP0%2BmPXUFhF7kDZmHIaVg%3D%3D,lse;if(null!=window.clientInformation)d=window.clientInfo
http://www.jimmychoo.com/,http://d16fk4ms6rqz1v.cloudfront.net/capture/jimmychoo.js,"ew r(this.api),this.clientInformation=this.getClientInfor"
http://www.86y.org/,http://images.sohu.com/cs/jsfile/js/f.js,"n""liebao""}if(window.clientInformation&&window.clientInfor"
http://www.zuilxy.com/,http://images.sohu.com/cs/jsfile/js/c.js,"n""liebao""}if(window.clientInformation&&window.clientInfor"
http://www.kilimall.co.ke/,http://script.kilimall.co.ke/js/kui/babel.min.js,":!1,clearTimeout:!1,clientInformation:!1,ClientRect:!1,Cl"
http://www.zwijsen.nl/,http://wpg.blue
You can’t perform that action at this time.