Skip to content

Instantly share code, notes, and snippets.

View phonedude's full-sized avatar

Michael L. Nelson phonedude

View GitHub Profile
% curl -s "http://web.archive.org/cdx/search/cdx?url=https://godisloving.wordpress.com&matchType=prefix" | awk '{print "https://web.archive.org/web/" $2 "/" $3};'
https://web.archive.org/web/20221028165233/https://godisloving.wordpress.com/
https://web.archive.org/web/20221028165334/http://godisloving.wordpress.com/
https://web.archive.org/web/20221028165350/https://godisloving.wordpress.com/
https://web.archive.org/web/20221028170958/https://godisloving.wordpress.com/
https://web.archive.org/web/20221028171006/http://godisloving.wordpress.com/
https://web.archive.org/web/20221028171020/http://godisloving.wordpress.com/
https://web.archive.org/web/20221028172158/https://godisloving.wordpress.com/
https://web.archive.org/web/20221028172941/https://godisloving.wordpress.com/
https://web.archive.org/web/20221028182338/https://godisloving.wordpress.com/
@phonedude
phonedude / wsdl-pubs-2021.md
Last active January 29, 2022 20:30
@WebSciDL's 2021 Publications
Collapse/expand the list of WSDL's 2021 publications

Web Science and Digital Libraries Research Group 2021 Publications

  1. M Aturban, ML Nelson, MC Weigle, Where Did the Web Archive Go?, International Conference on Theory and Practice of Digital Libraries, 73-84, 2021.

  2. S Rajtmajer, C Griffin, J Wu, R Fraleigh, L Balaji, A Squicciarini, ..., A Synthetic Prediction Market for Estimating Confidence in Published Work, arXiv preprint arXiv:2201.06924, 2021.

  3. TR Griffin, EJ Healy, IV Ramakrishnan, VG Ashok,
@phonedude
phonedude / rthk_enews-tweets.txt
Created August 4, 2021 14:16
rthk_enews tweets (deleted from live web as of 2021-08-04)
This file has been truncated, but you can view the full file.
https://web.archive.org/web//
https://web.archive.org/web/20101130004659/http://twitter.com/rthk_enews/statuses/8353089555800064
https://web.archive.org/web/20101130005004/http://twitter.com/rthk_enews/statuses/%7B%7D
https://web.archive.org/web/20101130005006/http://twitter.com/rthk_enews/statuses/%7Btime:'Sat%20Nov%2027%2002:55:06%20+0000%202010'%7D
https://web.archive.org/web/20101130005007/http://twitter.com/rthk_enews/statuses/twitter.com
https://web.archive.org/web/20101203215821/http://twitter.com/rthk_enews/statuses/8353089555800064
https://web.archive.org/web/20101203220450/http://twitter.com/rthk_enews/statuses/%7B%7D
https://web.archive.org/web/20101203220451/http://twitter.com/rthk_enews/statuses/%7Btime:'Sat%20Nov%2027%2002:55:06%20+0000%202010'%7D
https://web.archive.org/web/20101203220451/http://twitter.com/rthk_enews/statuses/twitter.com
https://web.archive.org/web/20141003220136/https://twitter.com/rthk_enews/status/517494407520456706
@phonedude
phonedude / sra-uri-ms-uniq.txt
Created July 10, 2021 16:12
archived SRA run files, duplicate URI-Rs possible, but deduped on URI-Ms (i.e., could be archived > 1)
This file has been truncated, but you can view the full file.
https://web.archive.org/web/20101206173116/http://trace.ncbi.nlm.nih.gov:80/Traces/sra/sra.cgi?cmd=viewer&m=data&s=viewer&run=SRR027262
https://web.archive.org/web/20110115075803/http://trace.ncbi.nlm.nih.gov:80/Traces/sra/sra.cgi?run=SRR005482&cmd=viewer&m=data&s=viewer
https://web.archive.org/web/20110116162043/http://trace.ncbi.nlm.nih.gov:80/Traces/sra/sra.cgi?run=SRR005481&cmd=viewer&m=data&s=viewer
https://web.archive.org/web/20110507075609/http://trace.ncbi.nlm.nih.gov:80/Traces/sra/sra.cgi?run=SRR005482&cmd=viewer&m=data&s=viewer
https://web.archive.org/web/20110507102439/http://trace.ncbi.nlm.nih.gov:80/Traces/sra/sra.cgi?run=SRR005481&cmd=viewer&m=data&s=viewer
https://web.archive.org/web/20110508124604/http://trace.ncbi.nlm.nih.gov:80/Traces/sra/sra.cgi?run=SRR005482&cmd=viewer&m=data&s=viewer
https://web.archive.org/web/20110508125336/http://trace.ncbi.nlm.nih.gov:80/Traces/sra/sra.cgi?run=SRR005481&cmd=viewer&m=data&s=viewer
https://web.archive.org/web/20110605000706/http://trace.ncbi.nlm.nih.gov
@phonedude
phonedude / bytes-diff-ia-gcp-aws.txt
Created July 9, 2021 15:52
All file triples (IA raw, GCP, AWS) are identical - files from: https://www.biorxiv.org/content/10.1101/2021.06.18.449051v1
sh-3.2$ cat uri-aws.txt uri-gcp.txt uri-ia-raw.txt
#!/bin/csh -x
curl -Ls https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313485/SRR11313485 > SRR11313485.aws.txt
curl -Ls https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313486/SRR11313486 > SRR11313486.aws.txt
curl -Ls https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313274/SRR11313274 > SRR11313274.aws.txt
curl -Ls https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313275/SRR11313275 > SRR11313275.aws.txt
curl -Ls https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313285/SRR11313285 > SRR11313285.aws.txt
curl -Ls https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313286/SRR11313286 > SRR11313286.aws.txt
curl -Ls https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313448/SRR11313448 > SRR11313448.aws.txt
curl -Ls https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313449/SRR11313449 > SRR11313449.aws.txt
sh-3.2$ more uri-aws.txt
#!/bin/csh -x
curl -iLs https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313485/SRR11313485 > SRR11313485.aws.txt
curl -iLs https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313486/SRR11313486 > SRR11313486.aws.txt
curl -iLs https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313274/SRR11313274 > SRR11313274.aws.txt
curl -iLs https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313275/SRR11313275 > SRR11313275.aws.txt
curl -iLs https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313285/SRR11313285 > SRR11313285.aws.txt
curl -iLs https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313286/SRR11313286 > SRR11313286.aws.txt
curl -iLs https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313448/SRR11313448 > SRR11313448.aws.txt
curl -iLs https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR11313449/SRR11313449 > SRR11313449.aws.txt
sh-3.2$ more uri-ms.txt
#!/bin/csh -x
curl -iLs http://web.archive.org/web/1/https://storage.googleapis.com/nih-sequence-read-archive/run/SRR11313485/SRR11313485 > SRR11313485.txt
curl -iLs http://web.archive.org/web/1/https://storage.googleapis.com/nih-sequence-read-archive/run/SRR11313486/SRR11313486 > SRR11313486.txt
curl -iLs http://web.archive.org/web/1/https://storage.googleapis.com/nih-sequence-read-archive/run/SRR11313274/SRR11313274 > SRR11313274.txt
curl -iLs http://web.archive.org/web/1/https://storage.googleapis.com/nih-sequence-read-archive/run/SRR11313275/SRR11313275 > SRR11313275.txt
curl -iLs http://web.archive.org/web/1/https://storage.googleapis.com/nih-sequence-read-archive/run/SRR11313285/SRR11313285 > SRR11313285.txt
curl -iLs http://web.archive.org/web/1/https://storage.googleapis.com/nih-sequence-read-archive/run/SRR11313286/SRR11313286 > SRR11313286.txt
curl -iLs http://web.archive.org/web/1/https://storage.googleapis.com/nih-sequence-read-archive/run/SRR11313448/SRR11313448 > SRR113
@phonedude
phonedude / wolf.ia.noarg.uniqids.sorted.txt
Created June 6, 2021 13:29
Internet Archive mementos for https://twitter.com/naomirwolf/status/* -- (nearly) uniq ids, sorted by archive date
This file has been truncated, but you can view the full file.
https://web.archive.org/web/20131028034250/https://twitter.com/naomirwolf/statuses/394197316347129856
https://web.archive.org/web/20131028040619/https://twitter.com/naomirwolf/status/394197316347129856
https://web.archive.org/web/20131204140704/http://twitter.com/naomirwolf/status/408210725148950528
https://web.archive.org/web/20140125162556/http://twitter.com/naomirwolf/status/365801181576036352
https://web.archive.org/web/20140304020125/https://twitter.com/naomirwolf/status/440177158695440386
https://web.archive.org/web/20140305172418/https://twitter.com/naomirwolf/status/422551189465481216
https://web.archive.org/web/20140306014539/https://twitter.com/naomirwolf/status/440174041283186688
https://web.archive.org/web/20140510214847/https://twitter.com/naomirwolf/status/464009863920893952
https://web.archive.org/web/20140613014902/https://twitter.com/naomirwolf/status/341670104674729985
https://web.archive.org/web/20140722085015/https://twitter.com/naomirwolf/status/483572348709117952
https://twitter.com/JacobRubashkin/status/1326266748727943169
https://twitter.com/RonHogan/status/1326269115997237249
https://twitter.com/MeidasTouch/status/1326277097480843264
https://twitter.com/DeanBrowningPA/status/1326285974305972226
https://twitter.com/DeanBrowningPA/status/1326190098086506496
http://archive.is/CgyS0
http://web.archive.org/web/20201110204901/https://twitter.com/DeanBrowningPA/status/1326190098086506496
https://twitter.com/DanPurdy322/status/1325115802606039047
% curl -s https://www.loc.gov/ | grep 'name=\"dc\.\|property=\"og:'
<meta name="dc.identifier"
<meta name="dc.language" content="eng" />
<meta name="dc.source" content="Library of Congress, Washington, D.C. 20540 USA" />
<meta property="og:site_name" content="The Library of Congress"/>
<meta property="og:type" content="article" />
<meta name="dc.title"
<meta property="og:title"
<meta property="og:description" content="The world's largest library. View historic
photos, maps, books and more. Contact experts for help with research. Plan a visit.