Created
July 22, 2014 16:40
-
-
Save mbklein/53bcdb18d3cc81f7e6d0 to your computer and use it in GitHub Desktop.
EZProxy Stanza screen scraping
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env perl | |
if (@ARGV == 0) { | |
open(my $data, '-|', 'curl -s http://www.oclc.org/support/services/ezproxy/documentation/db.en.html'); | |
while (<$data>) { | |
if (/^.+?<li.+?<a href=".+db\/(.+?).en.html">(.+?)<\/a>.+$/) { | |
printf "%-33s\t%s\n", $1, $2; | |
} | |
} | |
} else { | |
while (my $key = shift @ARGV) { | |
open(my $data, '-|', "curl -s http://www.oclc.org/support/services/ezproxy/documentation/db/${key}.en.html"); | |
local $/ = undef; | |
$_ = <$data>; | |
/<code>(.+)<\/code>/m; | |
$_ = $1; | |
s/\s*<br\s*\/>\s*/\n/g; | |
chomp; | |
print; | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ./ezproxy_stanzas | |
123library 123Library | |
abc-clio ABC-CLIO Databases and Ebooks | |
academic_charts Academic Charts Online - Music Data & Analysis | |
accessmedicine Access Medicine | |
accessscience Access Science | |
acland Acland's Video Atlas of Human Anatomy | |
acls ACLS Humanities E-Book (ACLS History E-Book Project) | |
Acoustical_Society_America Acoustical Society of America | |
alexanderstreetpress Alexander Street Press | |
acs American Chemical Society | |
aota American Occupational Therapy Association | |
asco American Society of Clinical Oncology | |
annual_reviews Annual Reviews | |
apa_psycNET APA PsycNet | |
ap AP Photo Archive | |
adrs Applied Development Research Solutions | |
aps_journals APS Journals | |
artstor ARTstor | |
asme ASME Digital Collection | |
astm ASTM | |
atla ATLA Religion and ATLA Serials (ATLAS) | |
atomiclearning Atomic Learning | |
bankscope Bankscope | |
bestpractice Best Practice | |
bihr Bibliographie Internationale de l'Humanisme et de la Renaissance | |
biblioline BiblioLine | |
bioone BioOne Journals Online | |
bmjjournals BMJ Journals | |
bmjlearning BMJ Learning | |
boneandjoint Bone and Joint Journal | |
bkflix BookFlix | |
books24x7 Books24x7 | |
brainfuse Brainfuse | |
brepolis BREPOLiS | |
brainpop Brainpop | |
brainpopjr BrainpopJr | |
british_standards_online British Standards Online | |
bna Bureau of National Affairs (BNA) | |
cabdirect CAB Direct | |
cambridge Cambridge Companions Online | |
pharmacists Canadian Pharmacists Association | |
casting_about Casting About | |
chadwyck Chadwyck | |
chestpublications CHEST Publications | |
clcd Children's Literature Comprehensive Database (CLCD) | |
choice_reviews Choice Reviews | |
chronicle Chronicle of Higher Education | |
clinicalevidence Clinical Evidence | |
clinicalkey ClinicalKey and EMBASE | |
clinicallearning Clinical Learning | |
cnki CNKI | |
cochrane Cochrane Library | |
collegesource CollegeSource | |
earthscape Columbia Earthscape | |
ciao Columbia International Affairs Online (CIAO) | |
cios Communication Institute for Online Scholarship (CIOS) | |
cpacanada CPA Canada | |
cq_researcher CQ Researcher | |
crcnetbase CRCnetBASE | |
credoreference Credo Reference (formerly Xreferplus) | |
csa CSA Illumina | |
dram Database of Recorded American Music (DRAM) | |
data_planet_statistical_datasets Data-Planet Statistical Datasets | |
data_planet_statistical_ready_ref Data-Planet Statistical Ready Reference | |
dawsonera dawsonera | |
dnsa Digital National Security Archive | |
doi DOI System | |
dukeupress Duke University Press | |
duxiu DUXIU | |
eebo Early English Books Online (EEBO) | |
easybib EasyBib | |
ebrary ebrary | |
ebscohost EBSCOhost | |
ebscoatoz EBSCO A-to-Z | |
ebscoejs EBSCO Electronic Journals Service | |
ebscolinksource EBSCO LinkSource | |
eiu Economist Intelligence Unit | |
edweek Education Week | |
ehrafarchaeology eHRAF Archaeology (Human Relations Area Files at Yale) | |
ehrafworldcultures eHRAF World Cultures (Human Relations Area Files at Yale) | |
elsevier Elsevier ScienceDirect and Scopus | |
encycbritannica Encyclopedia Britannica | |
ency-astro Encyclopedia of Astronomy and Astrophysics | |
brill Encyclopedia of Islam | |
engineering_village Engineering Village | |
esgmanager ESG Manager | |
exacteditions Exact Editions | |
factiva Factiva | |
Fergusons_Career_Guidance_Center Fergusons_Career_Guidance_Center | |
films_on_demand Films on Demand | |
forensicnetbase FORENSICnetBASE | |
freedomflix FreedomFlix | |
freegal Freegal Music Service | |
frost Frost & Sullivan | |
gale Gale InfoTrac | |
gideon GIDEON Global Infectious Diseases Epidemiology Online Network | |
gmid Global Market Information Database (GMID) | |
googlescholar Google Scholar | |
grolier Grolier Online | |
hstalks Henry Stewart Talks | |
human_kinetics Human Kinetics | |
Ibuk Ibuk | |
ichushiweb Ichushi-Web | |
ieee IEEE Xplore | |
IGAKU-SHOIN_Library IGAKU-SHOIN Library | |
IHS_Standards_Expert IHS Standards Expert | |
indiastat IndiaStat | |
informaworld informaworld | |
informit Informit | |
ingentaconnect INGENTAconnect | |
isi Institute for Scientific Information (ISI) | |
icea International Journal of Childbirth Education | |
imf International Monetary Fund (IMF) | |
iter Iter: Gateway to the Middle Ages and Renaissance | |
intelliconnect IntelliConnect | |
jama JAMA | |
janes Jane's Online Service | |
japanknowledge JapanKnowledge | |
jisc JISC Historical Texts | |
jove Journal of Visual Experiments | |
jstor JSTOR | |
kanopy Kanopy | |
kluwerarbitration Kluwer Arbitration | |
kluwercompetitionlaw Kluwer Competition Law | |
kluwerccdr Kluwer Corporate Counsel Dispute Resolution | |
kluwerlaw Kluwer Law | |
kluwer Kluwer Online | |
kluwerpatentlaw Kluwer Patent Law | |
knovel Knovel | |
lamyline Lamyline | |
learning_express Learning Express Library 3.0 | |
lexisnexis LexisNexis | |
pressdisplay Library PressDisplay | |
literati Literati | |
livedgar LIVEDGAR | |
lois_law LoisLaw | |
lwwhealthlibrary LWW Health Library | |
mahealthcare MA Healthcare | |
Malaysiakini Malaysiakini | |
mangolanguages Mango Languages | |
mathscinet MathSciNet | |
matthewbender Matthew Bender Online | |
Mergent_Intellect Mergent Intellect | |
mergentonline Mergent Online | |
md_consult MD Consult | |
micromedex Micromedex | |
mintel Mintel Oxygen | |
morningstar Morningstar | |
myilibrary MyiLibrary | |
nachschlage nachschlage.NET | |
ntis National Technical Information Service (NTIS) | |
Nature_Publishing_Group Nature Publishing Group | |
naxos Naxos | |
naxos_sheet_music_library Naxos Sheet Music Library | |
nbclearn NBC Learn | |
netanatomy NetAnatomy | |
netlibrary NetLibrary | |
New_England_Journal_Medicine New England Journal of Medicine | |
new_left_review New Left Review | |
brill2 New Pauly Online | |
newsbank NewsBank | |
nfpacodes NFPA Codes Online | |
noodletools Noodletools | |
nutritioncaremanual Nutrition Care Manual | |
archivegrid OCLC ArchiveGrid | |
camio OCLC CAMIO (Catalog of Art Museum Images Online) | |
firstsearch OCLC FirstSearch | |
worldcatdiscovery OCLC WorldCat Discovery | |
worldcatlocal OCLC WorldCat Local | |
oecdilibrary OECD iLibrary | |
oed OED Online | |
onepetro OnePetro (formerly Society of Petroleum Engineers) | |
oma Online Music Anthology | |
openlibrary Open Library | |
openedition OpenEdition | |
openthesis OpenThesis | |
opinionarchives Opinion Archives | |
originsonline Origins Online | |
overdrive OverDrive | |
ovid Ovid | |
oxforddictionaries Oxford Dictionaries | |
oxfordhandbooks Oxford Handbooks Online | |
oxfordjournals Oxford Journals | |
oxfordislamicstudies Oxford Islamic Studies | |
oxfordsocialwork Oxford Social Work | |
pianomedia Piano Media | |
pdcc Philosophy Documentation Center Collection | |
picarta PiCarta | |
pmn Plant Management Network | |
praegersecurity Praeger Security International | |
primark Primark (including Global Access) | |
primatelit PrimateLit | |
projectmuse Project Muse | |
prometheus Prometheus | |
pronunciator Pronunciator | |
proquest ProQuest | |
psyccritiques PsycCRITIQUES | |
psychiatryonline PsychiatryOnline | |
pubmed PubMed | |
r2library R2 Digital Library | |
rand RAND State Statistics | |
reaxys Reaxys | |
redbooks Redbooks | |
refworks RefWorks | |
researchnow ResearchNow | |
riag RIA Checkpoint | |
roubiniglobaleconomics Roubini Global Economics | |
routledgeonline Routledge Online | |
Royal_Society_Chemistry Royal Society of Chemistry | |
safaribooks Safari Books Online | |
sage SAGE Publications | |
sagereference Sage Reference | |
science Science | |
sciencesignaling Science Signaling | |
scitransmed Science Translational Medicine | |
sciencexpress ScienceXpress | |
scifinder SciFinder | |
scitation Scitation | |
sheetmusicnow Sheet Music Now | |
sikuquanshu Siku Quanshu | |
SNL SNL | |
socialexplorer Social Explorer | |
SPIE_Digital_Library11 SPIE Digital Library | |
springerlink SpringerLink | |
springermaterials SpringerMaterials | |
springerprotocols Springer Protocols | |
statref STAT!Ref | |
STRATFOR STRATFOR | |
swetswise Swetswise | |
symptommedia SymptomMedia | |
taxnetpro Taxnet Pro | |
taylorandfrancis_online Taylor and Francis Online | |
techniques-ingenieur Techniques de l'Ingénieur | |
teen_health_and_wellness Teen Health and Wellness | |
The_Oklahoman The Oklahoman | |
thepinksheet The Pink Sheet | |
thiemeebooks Thieme Ebooks | |
visualthesaurus Thinkmap: Visual Thesaurus | |
The_Source_Magpies_Magazine The Source Magpies Magazine | |
tlg Thesaurus Linguae Graecae | |
thomsonone Thomson ONE | |
thomsonib Thomson ONE Banker | |
torpedo Torpedo Ultra | |
trueflix TrueFlix | |
turnitin Turnitin | |
uptodate UpToDate | |
valueline Value Line | |
very_short_introductions Very Short Introductions | |
vlex vLex | |
webofknowledge Web of Knowledge | |
westlaw Westlaw | |
westlawchina WestLaw China | |
westlawnext WestLawNext | |
wgsn WGSN: Creative Intelligence | |
wileyonlinelibrary Wiley Online Library | |
hwwilsonweb WilsonWeb | |
wbol World Book Online | |
zephyr Zephyr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ./ezproxy_stanzas webofknowledge hwwilsonweb | |
OPTION DOMAINCOOKIEONLY | |
Title Web of Knowledge | |
URL http://www.webofknowledge.com | |
Host www.webofknowledge.com | |
Host webofknowledge.com | |
Host webofscience.com | |
Host www.webofscience.com | |
DJ webofscience.com | |
DJ isiknowledge.com | |
DJ webofknowledge.com | |
DJ webofknowledgev4.com | |
DJ isiwebofknowledge.com | |
DJ myendnoteweb.com | |
Find value="http:// | |
Replace value="http://^A | |
Find VALUE="http:// | |
Replace VALUE="http://^A | |
Find rurl=http:// | |
Replace rurl=http://^A | |
Find product_st_thomas=http:// | |
Replace product_st_thomas=http://^A | |
Find return_url=http:// | |
Replace return_url=http://^A | |
Find ST_URL=http:// | |
Replace ST_URL=http://^A | |
Find "CIT.GATEWAY_URL">gateway.isiknowledge.com | |
Replace "CIT.GATEWAY_URL">^pgateway.isiknowledge.com^ | |
Find value='gateway.isiknowledge.com/gateway/Gateway.cgi' | |
Replace value='^pgateway.isiknowledge.com^/gateway/Gateway.cgi' | |
Find "CIT.GATEWAY_URL">gateway.webofknowledge.com | |
Replace "CIT.GATEWAY_URL">^pgateway.webofknowledge.com^ | |
Find value='gateway.webofknowledge.com/gateway/Gateway.cgi' | |
Replace value='^pgateway.webofknowledge.com^/gateway/Gateway.cgi' | |
OPTION COOKIE | |
Title WilsonWeb | |
URL http://vnweb.hwwilsonweb.com/hww/jumpstart.jhtml | |
Host wilsonweb2.hwwilson.com | |
DJ hwwilson.com | |
DJ hwwilsonweb.com | |
Find 'http://vnweb.hwwilsonweb.com' + | |
Replace 'http://^Avnweb.hwwilsonweb.com' + | |
Find &url=http:// | |
Replace &url=http://^A |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment