This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
from warcio.archiveiterator import ArchiveIterator | |
with open('archive/rec-20230722210008512613-81a34b41ee13.warc.gz', 'rb') as stream: | |
for i, record in enumerate(ArchiveIterator(stream)): | |
print(i, record.rec_headers.get_header('WARC-Target-URI')) | |
if record.rec_type == 'response': | |
content = record.content_stream().read() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from warcio.warcwriter import WARCWriter | |
with open('test.warc.gz', 'wb') as output: | |
writer = WARCWriter(output, gzip=True) | |
# write some metadata for the warc as a info record | |
rec = writer.create_warcinfo_record('test.warc.gz', { | |
'software': 'warcio', | |
'description': 'An example of packaging up two images in a WARC' | |
}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# run like this: | |
# | |
# $ python3 warc2mbox.py yahoo-groups-2016-03-20T12:45:19Z-nyzp9w.warc.gz | |
# | |
# and it will generate an mbox file for each Yahoo Group: | |
# | |
# $ ls -l mboxes | |
# -rw-r--r-- 1 edsummers staff 12522488 Jul 15 14:14 amicigranata.mbox |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import csv | |
import sys | |
import json | |
import time | |
import requests | |
def get_snapshots(url): | |
url = f"https://swap.stanford.edu/was/cdx?url={url}&output=json" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import csv | |
import sys | |
import json | |
import time | |
import requests | |
def get_snapshots(url): | |
url = f"https://swap.stanford.edu/was/cdx?url={url}&output=json" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
for n in range(0,1002): | |
with open("files/file-{:04n}.txt".format(n), "w") as fh: | |
fh.write(str(n)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import os | |
import dotenv | |
import requests | |
# | |
# You'll need to put your Pinboard API token in a .env file in the same directory as this program. | |
# | |
# PINBOARD_KEY=abc:123 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import csv | |
import time | |
from ipwhois import IPWhois | |
output = csv.DictWriter(open("blocks.csv", "w"), ["ip", "affected", "name", "country", "description"]) | |
output.writeheader() | |
for line in open("blocks.txt"): | |
line = line.strip() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<html> | |
<body> | |
<div id="root" /> | |
<template id="my-template"> | |
<div> | |
<input type="text" /> | |
<button>Remove</button> | |
</div> | |
</template> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@base <https://id.loc.gov/resources/instances/7977519.nt> . | |
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . | |
<http://id.loc.gov/resources/instances/7977519> | |
<http://id.loc.gov/ontologies/bibframe/adminMetadata> [ | |
<http://id.loc.gov/ontologies/bflc/encodingLevel> <http://id.loc.gov/vocabulary/menclvl/1> ; | |
<http://id.loc.gov/ontologies/bibframe/assigner> <http://id.loc.gov/vocabulary/organizations/dlc> ; | |
<http://id.loc.gov/ontologies/bibframe/changeDate> "2010-06-14T09:46:36"^^<http://www.w3.org/2001/XMLSchema#dateTime> ; | |
<http://id.loc.gov/ontologies/bibframe/creationDate> "1973-05-11"^^<http://www.w3.org/2001/XMLSchema#date> ; | |
<http://id.loc.gov/ontologies/bibframe/descriptionAuthentication> <http://id.loc.gov/vocabulary/marcauthen/premarc> ; |