Skip to content

Instantly share code, notes, and snippets.

@mbklein
Created July 22, 2014 16:40
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save mbklein/53bcdb18d3cc81f7e6d0 to your computer and use it in GitHub Desktop.
Save mbklein/53bcdb18d3cc81f7e6d0 to your computer and use it in GitHub Desktop.
EZProxy Stanza screen scraping
#!/usr/bin/env perl
if (@ARGV == 0) {
open(my $data, '-|', 'curl -s http://www.oclc.org/support/services/ezproxy/documentation/db.en.html');
while (<$data>) {
if (/^.+?<li.+?<a href=".+db\/(.+?).en.html">(.+?)<\/a>.+$/) {
printf "%-33s\t%s\n", $1, $2;
}
}
} else {
while (my $key = shift @ARGV) {
open(my $data, '-|', "curl -s http://www.oclc.org/support/services/ezproxy/documentation/db/${key}.en.html");
local $/ = undef;
$_ = <$data>;
/<code>(.+)<\/code>/m;
$_ = $1;
s/\s*<br\s*\/>\s*/\n/g;
chomp;
print;
}
}
$ ./ezproxy_stanzas
123library 123Library
abc-clio ABC-CLIO Databases and Ebooks
academic_charts Academic Charts Online - Music Data & Analysis
accessmedicine Access Medicine
accessscience Access Science
acland Acland's Video Atlas of Human Anatomy
acls ACLS Humanities E-Book (ACLS History E-Book Project)
Acoustical_Society_America Acoustical Society of America
alexanderstreetpress Alexander Street Press
acs American Chemical Society
aota American Occupational Therapy Association
asco American Society of Clinical Oncology
annual_reviews Annual Reviews
apa_psycNET APA PsycNet
ap AP Photo Archive
adrs Applied Development Research Solutions
aps_journals APS Journals
artstor ARTstor
asme ASME Digital Collection
astm ASTM
atla ATLA Religion and ATLA Serials (ATLAS)
atomiclearning Atomic Learning
bankscope Bankscope
bestpractice Best Practice
bihr Bibliographie Internationale de l'Humanisme et de la Renaissance
biblioline BiblioLine
bioone BioOne Journals Online
bmjjournals BMJ Journals
bmjlearning BMJ Learning
boneandjoint Bone and Joint Journal
bkflix BookFlix
books24x7 Books24x7
brainfuse Brainfuse
brepolis BREPOLiS
brainpop Brainpop
brainpopjr BrainpopJr
british_standards_online British Standards Online
bna Bureau of National Affairs (BNA)
cabdirect CAB Direct
cambridge Cambridge Companions Online
pharmacists Canadian Pharmacists Association
casting_about Casting About
chadwyck Chadwyck
chestpublications CHEST Publications
clcd Children's Literature Comprehensive Database (CLCD)
choice_reviews Choice Reviews
chronicle Chronicle of Higher Education
clinicalevidence Clinical Evidence
clinicalkey ClinicalKey and EMBASE
clinicallearning Clinical Learning
cnki CNKI
cochrane Cochrane Library
collegesource CollegeSource
earthscape Columbia Earthscape
ciao Columbia International Affairs Online (CIAO)
cios Communication Institute for Online Scholarship (CIOS)
cpacanada CPA Canada
cq_researcher CQ Researcher
crcnetbase CRCnetBASE
credoreference Credo Reference (formerly Xreferplus)
csa CSA Illumina
dram Database of Recorded American Music (DRAM)
data_planet_statistical_datasets Data-Planet Statistical Datasets
data_planet_statistical_ready_ref Data-Planet Statistical Ready Reference
dawsonera dawsonera
dnsa Digital National Security Archive
doi DOI System
dukeupress Duke University Press
duxiu DUXIU
eebo Early English Books Online (EEBO)
easybib EasyBib
ebrary ebrary
ebscohost EBSCOhost
ebscoatoz EBSCO A-to-Z
ebscoejs EBSCO Electronic Journals Service
ebscolinksource EBSCO LinkSource
eiu Economist Intelligence Unit
edweek Education Week
ehrafarchaeology eHRAF Archaeology (Human Relations Area Files at Yale)
ehrafworldcultures eHRAF World Cultures (Human Relations Area Files at Yale)
elsevier Elsevier ScienceDirect and Scopus
encycbritannica Encyclopedia Britannica
ency-astro Encyclopedia of Astronomy and Astrophysics
brill Encyclopedia of Islam
engineering_village Engineering Village
esgmanager ESG Manager
exacteditions Exact Editions
factiva Factiva
Fergusons_Career_Guidance_Center Fergusons_Career_Guidance_Center
films_on_demand Films on Demand
forensicnetbase FORENSICnetBASE
freedomflix FreedomFlix
freegal Freegal Music Service
frost Frost & Sullivan
gale Gale InfoTrac
gideon GIDEON Global Infectious Diseases Epidemiology Online Network
gmid Global Market Information Database (GMID)
googlescholar Google Scholar
grolier Grolier Online
hstalks Henry Stewart Talks
human_kinetics Human Kinetics
Ibuk Ibuk
ichushiweb Ichushi-Web
ieee IEEE Xplore
IGAKU-SHOIN_Library IGAKU-SHOIN Library
IHS_Standards_Expert IHS Standards Expert
indiastat IndiaStat
informaworld informaworld
informit Informit
ingentaconnect INGENTAconnect
isi Institute for Scientific Information (ISI)
icea International Journal of Childbirth Education
imf International Monetary Fund (IMF)
iter Iter: Gateway to the Middle Ages and Renaissance
intelliconnect IntelliConnect
jama JAMA
janes Jane's Online Service
japanknowledge JapanKnowledge
jisc JISC Historical Texts
jove Journal of Visual Experiments
jstor JSTOR
kanopy Kanopy
kluwerarbitration Kluwer Arbitration
kluwercompetitionlaw Kluwer Competition Law
kluwerccdr Kluwer Corporate Counsel Dispute Resolution
kluwerlaw Kluwer Law
kluwer Kluwer Online
kluwerpatentlaw Kluwer Patent Law
knovel Knovel
lamyline Lamyline
learning_express Learning Express Library 3.0
lexisnexis LexisNexis
pressdisplay Library PressDisplay
literati Literati
livedgar LIVEDGAR
lois_law LoisLaw
lwwhealthlibrary LWW Health Library
mahealthcare MA Healthcare
Malaysiakini Malaysiakini
mangolanguages Mango Languages
mathscinet MathSciNet
matthewbender Matthew Bender Online
Mergent_Intellect Mergent Intellect
mergentonline Mergent Online
md_consult MD Consult
micromedex Micromedex
mintel Mintel Oxygen
morningstar Morningstar
myilibrary MyiLibrary
nachschlage nachschlage.NET
ntis National Technical Information Service (NTIS)
Nature_Publishing_Group Nature Publishing Group
naxos Naxos
naxos_sheet_music_library Naxos Sheet Music Library
nbclearn NBC Learn
netanatomy NetAnatomy
netlibrary NetLibrary
New_England_Journal_Medicine New England Journal of Medicine
new_left_review New Left Review
brill2 New Pauly Online
newsbank NewsBank
nfpacodes NFPA Codes Online
noodletools Noodletools
nutritioncaremanual Nutrition Care Manual
archivegrid OCLC ArchiveGrid
camio OCLC CAMIO (Catalog of Art Museum Images Online)
firstsearch OCLC FirstSearch
worldcatdiscovery OCLC WorldCat Discovery
worldcatlocal OCLC WorldCat Local
oecdilibrary OECD iLibrary
oed OED Online
onepetro OnePetro (formerly Society of Petroleum Engineers)
oma Online Music Anthology
openlibrary Open Library
openedition OpenEdition
openthesis OpenThesis
opinionarchives Opinion Archives
originsonline Origins Online
overdrive OverDrive
ovid Ovid
oxforddictionaries Oxford Dictionaries
oxfordhandbooks Oxford Handbooks Online
oxfordjournals Oxford Journals
oxfordislamicstudies Oxford Islamic Studies
oxfordsocialwork Oxford Social Work
pianomedia Piano Media
pdcc Philosophy Documentation Center Collection
picarta PiCarta
pmn Plant Management Network
praegersecurity Praeger Security International
primark Primark (including Global Access)
primatelit PrimateLit
projectmuse Project Muse
prometheus Prometheus
pronunciator Pronunciator
proquest ProQuest
psyccritiques PsycCRITIQUES
psychiatryonline PsychiatryOnline
pubmed PubMed
r2library R2 Digital Library
rand RAND State Statistics
reaxys Reaxys
redbooks Redbooks
refworks RefWorks
researchnow ResearchNow
riag RIA Checkpoint
roubiniglobaleconomics Roubini Global Economics
routledgeonline Routledge Online
Royal_Society_Chemistry Royal Society of Chemistry
safaribooks Safari Books Online
sage SAGE Publications
sagereference Sage Reference
science Science
sciencesignaling Science Signaling
scitransmed Science Translational Medicine
sciencexpress ScienceXpress
scifinder SciFinder
scitation Scitation
sheetmusicnow Sheet Music Now
sikuquanshu Siku Quanshu
SNL SNL
socialexplorer Social Explorer
SPIE_Digital_Library11 SPIE Digital Library
springerlink SpringerLink
springermaterials SpringerMaterials
springerprotocols Springer Protocols
statref STAT!Ref
STRATFOR STRATFOR
swetswise Swetswise
symptommedia SymptomMedia
taxnetpro Taxnet Pro
taylorandfrancis_online Taylor and Francis Online
techniques-ingenieur Techniques de l'Ingénieur
teen_health_and_wellness Teen Health and Wellness
The_Oklahoman The Oklahoman
thepinksheet The Pink Sheet
thiemeebooks Thieme Ebooks
visualthesaurus Thinkmap: Visual Thesaurus
The_Source_Magpies_Magazine The Source Magpies Magazine
tlg Thesaurus Linguae Graecae
thomsonone Thomson ONE
thomsonib Thomson ONE Banker
torpedo Torpedo Ultra
trueflix TrueFlix
turnitin Turnitin
uptodate UpToDate
valueline Value Line
very_short_introductions Very Short Introductions
vlex vLex
webofknowledge Web of Knowledge
westlaw Westlaw
westlawchina WestLaw China
westlawnext WestLawNext
wgsn WGSN: Creative Intelligence
wileyonlinelibrary Wiley Online Library
hwwilsonweb WilsonWeb
wbol World Book Online
zephyr Zephyr
$ ./ezproxy_stanzas webofknowledge hwwilsonweb
OPTION DOMAINCOOKIEONLY
Title Web of Knowledge
URL http://www.webofknowledge.com
Host www.webofknowledge.com
Host webofknowledge.com
Host webofscience.com
Host www.webofscience.com
DJ webofscience.com
DJ isiknowledge.com
DJ webofknowledge.com
DJ webofknowledgev4.com
DJ isiwebofknowledge.com
DJ myendnoteweb.com
Find value="http://
Replace value="http://^A
Find VALUE="http://
Replace VALUE="http://^A
Find rurl=http://
Replace rurl=http://^A
Find product_st_thomas=http://
Replace product_st_thomas=http://^A
Find return_url=http://
Replace return_url=http://^A
Find ST_URL=http://
Replace ST_URL=http://^A
Find "CIT.GATEWAY_URL">gateway.isiknowledge.com
Replace "CIT.GATEWAY_URL">^pgateway.isiknowledge.com^
Find value='gateway.isiknowledge.com/gateway/Gateway.cgi'
Replace value='^pgateway.isiknowledge.com^/gateway/Gateway.cgi'
Find "CIT.GATEWAY_URL">gateway.webofknowledge.com
Replace "CIT.GATEWAY_URL">^pgateway.webofknowledge.com^
Find value='gateway.webofknowledge.com/gateway/Gateway.cgi'
Replace value='^pgateway.webofknowledge.com^/gateway/Gateway.cgi'
OPTION COOKIE
Title WilsonWeb
URL http://vnweb.hwwilsonweb.com/hww/jumpstart.jhtml
Host wilsonweb2.hwwilson.com
DJ hwwilson.com
DJ hwwilsonweb.com
Find 'http://vnweb.hwwilsonweb.com' +
Replace 'http://^Avnweb.hwwilsonweb.com' +
Find &url=http://
Replace &url=http://^A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment