Skip to content

Instantly share code, notes, and snippets.

@edsu
Forked from acdha/horrible-beta-markup.html
Created October 10, 2012 14:01
Show Gist options
  • Save edsu/3865821 to your computer and use it in GitHub Desktop.
Save edsu/3865821 to your computer and use it in GitHub Desktop.
Experiment adding HTML5 microdata following schema.org to a WDL item page and processing with rdflib-microdata
#!/usr/bin/env python
# you'll need to pip install microdata
import urllib
import rdflib
import microdata
items = microdata.get_items(open("horrible-beta-markup.html"))
open("horrible-beta-markup.json", "w").write(items[0].json())
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Antietam, Maryland. Allan Pinkerton, President Lincoln, and Major General John A. McClernand: Another View - World Digital Library</title>
<link rel="dns-prefetch" href="http://static.wdl.org">
<link rel="dns-prefetch" href="http://content.wdl.org">
<link rel="dns-prefetch" href="http://cmon.loc.gov">
<link rel="dns-prefetch" href="http://www.google-analytics.com">
<link rel="dns-prefetch" href="https://api.twitter.com">
<link title="Dublin Core Metadata Schema" rel="schema.dc" href="http://purl.org/DC/elements/1.1/">
<link title="MODS Metadata Schema" rel="schema.mods" href="http://www.loc.gov/standards/mods/mods.xsd">
<meta name="description" content="At the outset of the U.S. Civil War, Mathew Brady dispatched a team of photographers to document the conflict. Among them was a Scottish-born immigrant named Alexander Gardner, the photographer who took this photo of Lincoln at Antietam as well as other famous wartime shots. The man to Lincoln&#39;s right is Allan Pinkerton, founder of the Pinkerton National Detective Agency, whom Lincoln had as head of a personal security detail during the war. Gardner titled another shot of Pinkerton and his brother William at Antietam “The Secret Service.” Gardner photographed Lincoln on seven separate occasions, the last one on February 5, 1865, only a few weeks before Lincoln’s assassination. In 1866 he published Gardner’s Sketchbook of the War, combining plates and text, commemorating such battles as Fredericksburg, Gettysburg, and Petersburg, but the book was a commercial failure. Photographic historians also have suggested that Gardner staged many of his photos, moving dead bodies and using a regular prop gun to create romanticized pictorial narratives.">
<link rel="canonical" href="http://testing.wdl.org/en/item/1/"><link rel="shortlink" href="http://testing.wdl.org/1"><meta name="dc.title" content="Antietam, Maryland. Allan Pinkerton, President Lincoln, and Major General John A. McClernand: Another View"><meta name="dc.date" content="1862-10-03"><meta name="dc.identifier" content="http://hdl.loc.gov/loc.wdl/wdl.1"><meta name="mods.url" content="http://hdl.loc.gov/loc.wdl/wdl.1"><link rel="image_src" href="/media/1/thumbnail/308x255.jpg">
<link rel="stylesheet" href="/static/css/main.min.00de8d2b85e9.css" type="text/css">
<!--[if lte IE 8]>
<link rel="stylesheet" href="/static/css/ie-fixes.15262e9647f1.css" type="text/css">
<![endif]-->
</head>
<body class="lang_en" itemscope itemtype="http://schema.org/Photograph">
<div id="container">
<a id="skip" href="#content">skip to page content</a>
<div id="topnav">
<div id="logo"><div><a href="/en/"><img src="/static/img/wdl_logo_en.865f671a9f7b.png" height="55" width="140" alt="World Digital Library"/></a></div></div>
<div id="language-selector">
<div>
<form action="#" method="get">
<fieldset>
<label for="language">Language</label><br/>
<select id="language" name="language">
<option dir="rtl" lang="ar" value="ar" >العربية</option>
<option dir="ltr" lang="en" value="en" selected="selected">English</option>
<option dir="ltr" lang="es" value="es">Español</option>
<option dir="ltr" lang="fr" value="fr">Français</option>
<option dir="ltr" lang="pt" value="pt">Português</option>
<option dir="ltr" lang="ru" value="ru">Русский</option>
<option dir="ltr" lang="zh" value="zh">简体中文</option>
</select>
<button type="submit"></button>
</fieldset>
</form>
</div>
</div>
<div id="browse">
<div>
<span>Browse</span>
<ul>
<li id="place"><a href="/en/place/">Place</a> |</li>
<li id="time"><a href="/en/time/">Time</a> |</li>
<li id="topic"><a href="/en/topic/">Topic</a> |</li>
<li id="item"><a href="/en/type/">Type of Item</a> |</li>
<li id="institute"><a href="/en/institution/">Institution</a></li>
</ul>
</div>
</div>
<div id="search">
<div>
<form action="/en/search/gallery/" method="get" accept-charset="UTF-8" id="search_fm">
<fieldset>
<label for="autosuggest">Search</label>
<input id="autosuggest" name="q" type="text" autocomplete="off">
<input name="qla" type="hidden" value="en">
<button type="submit"></button>
</fieldset>
</form>
</div>
</div>
</div>
<div id="content" class="content_block detail">
<div id="pg_header" class="header_footer_block"><ul class="search_nav"><li class="search_label">
&lt; <a href=''>Search Results</a>:</li><li class="prev disabled">Previous</li><li>|</li><li class="next disabled">Next</li></ul></div>
<div id="pg_content" class="content_block">
<div id="aside" class="column_block">
<div class="media_block">
<a class="item-zoom" href="/en/item/1/zoom/"><img itemprop="image" src="/media/1/thumbnail/308x255.jpg" alt="Antietam, Maryland. Allan Pinkerton, President Lincoln, and Major General John A. McClernand: Another View" width="308" height="255"></a>
<div class="itemnav">
<a class="pageturner item-zoom" href="/en/item/1/zoom/">Open</a>
</div>
</div>
<div class="item_block">
<div class="download">
<a href="/media/1.png">PNG</a>
</div>
<div class="addthis_toolbox addthis_default_style">
<a class="addthis_button_facebook" title="Facebook"></a>
<a class="addthis_button_twitter" title="Twitter"></a>
<a class="addthis_button_email" title="Email"></a>
</div>
</div>
<div class="item_block reader">
<div id="readspeaker_button1" class="rs_skip">
<a accesskey="L" href="http://app.readspeaker.com/cgi-bin/rsent?customerid=5147&amp;readid=main&amp;lang=en_us&amp;audiofilename=Antietam%2C%20Maryland.%20Allan%20Pinkerton%2C%20President%20...&amp;url=http%3A//testing.wdl.org/en/item/1/" onclick="readpage(this.href, 'xp_1'); return false;">
Listen to this page
</a>
</div>
<div id='xp_1'></div>
</div>
<div id="similar-items" class="item_block hidden">
<h2>Similar Items</h2>
<ul class="item similar"></ul>
</div>
</div>
<div id="main" class="column_block">
<!-- RSPEAK_START -->
<div id="main_header" class="header_footer_block">
<h1>Title: <span class="item_title" itemprop="name">Antietam, Maryland. Allan Pinkerton, President Lincoln, and Major General John A. McClernand: Another View</span></h1>
</div>
<div id="main_content" class="content_block">
<h2>Description</h2>
<ul><li class="description" itemprop="description">At the outset of the U.S. Civil War, Mathew Brady dispatched a team of photographers to document the conflict. Among them was a Scottish-born immigrant named Alexander Gardner, the photographer who took this photo of Lincoln at Antietam as well as other famous wartime shots. The man to Lincoln's right is Allan Pinkerton, founder of the Pinkerton National Detective Agency, whom Lincoln had as head of a personal security detail during the war. Gardner titled another shot of Pinkerton and his brother William at Antietam “The Secret Service.” Gardner photographed Lincoln on seven separate occasions, the last one on February 5, 1865, only a few weeks before Lincoln’s assassination. In 1866 he published <em>Gardner’s Sketchbook of the War</em>, combining plates and text, commemorating such battles as Fredericksburg, Gettysburg, and Petersburg, but the book was a commercial failure. Photographic historians also have suggested that Gardner staged many of his photos, moving dead bodies and using a regular prop gun to create romanticized pictorial narratives.</li></ul>
<h2>Photographer</h2>
<ul>
<li itemprop="creator" itemscope itemtype="http://schema.org/Person">
<a itemprop="url" href="/en/search/gallery/?contributors=Gardner%2C%20Alexander%20%281821-1882%29"><span itemprop="name">Gardner, Alexander (1821-1882)</span></a>
<a itemprop="url" href="http://viaf.org/viaf/57416946">VIAF</a>
</li>
</ul>
<div class="vevent">
<h2 class="summary">Date Created</h2>
<ul>
<li><time itemprop="dateCreated" datetime="1862-10-03">October 3, 1862 CE</time></li>
</ul>
</div>
<!-- RSPEAK_STOP -->
<h2>Title in Original Language</h2>
<ul class="title original">
<li class="lang_eng" lang="en">
Antietam, Maryland. Allan Pinkerton, President Lincoln, and Major General John A. McClernand
</li>
</ul>
<!-- RSPEAK_START -->
<h2>Place</h2>
<ul itemprop="contentLocation">
<li itemscope itemtype="http://schema.org/Place"><span itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<a href="/en/search/gallery/?regions=north-america">North America</a>
&gt; <a href="/en/search/gallery/?regions=north-america&amp;countries=US"><span itemprop="addressCountry">United States of America</span></a>
&gt; <a href="/en/search/gallery/?regions=north-america&amp;countries=US&amp;provinces=maryland"><span itemprop="addressRegion">Maryland</span></a>
&gt; <a href="/en/search/gallery/?regions=north-america&amp;countries=US&amp;provinces=maryland&amp;cities=antietam"><span itemprop="addressLocality">Antietam</span></a>
</span></li>
</ul>
<h2>Time</h2>
<ul>
<li>
<a href="/en/search/gallery/?time_periods=1850-1899">1850 CE - 1899 CE</a>
</li>
</ul>
<h2>Topic</h2>
<ul class="topics">
<li itemscope itemtype="http://dewey.info/schema-terms/">
<a href="/en/search/gallery/?ddc=9"><span itemprop="keywords">History &amp; geography</span></a>
&gt;
<a href="/en/search/gallery/?ddc=97"><span itemprop="keywords">History of North America</span></a>
&gt;
<a href="/en/search/gallery/?ddc=973"><span itemprop="keywords">United States</span></a>
</li>
</ul>
<h2>Additional Subjects</h2>
<ul class="inline subjects">
<li itemscope itemtype="http://id.loc.gov/authorities/subjects">
<a href="/en/search/gallery/?additional_subjects=Antietam%2C%20Battle%20of%2C%20Maryland%2C%201862"><span itemprop="keywords">Antietam, Battle of, Maryland, 1862</span></a>;
</li>
<li itemscope itemtype="http://id.loc.gov/authorities/subjects">
<a
href="/en/search/gallery/?additional_subjects=Lincoln%2C%20Abraham%2C%201809-1865"><span itemprop="keyword">Lincoln, Abraham, 1809-1865</span></a>;
</li>
<li itemscope itemtype="http://id.loc.gov/authorities/subjects">
<a
href="/en/search/gallery/?additional_subjects=McClernand%2C%20John%20A.%20%28John%20Alexander%29%2C%201812-1900"><span itemprop="keyword">McClernand, John A. (John Alexander), 1812-1900</span></a>;
</li>
<li itemscope itemtype="http://id.loc.gov/authorities/subjects">
<a
href="/en/search/gallery/?additional_subjects=Pinkerton%2C%20Allan%2C%201819-1884"><span itemprop="keyword">Pinkerton, Allan, 1819-1884</span></a>;
</li>
<li itemscope itemtype="http://id.loc.gov/authorities/subjects">
<a
href="/en/search/gallery/?additional_subjects=United%20States--History--Civil%20War%2C%201861-1865"><span itemprop="keyword">United States--History--Civil War, 1861-1865</span></a>
</li>
</ul>
<h2>Type of Item</h2>
<ul>
<li><a href="/en/search/gallery/?item_type=print-photograph">Prints, Photographs</a></li>
</ul>
<h2>Physical Description</h2>
<ul>
<li>1 negative : glass, wet collodion</li>
</ul>
<h2>Notes</h2>
<ul>
<li>Table of Contents: European settlement of Central America, by Professor Dr. K. Sapper. European settlement of the Lesser Antilles, by Professor Dr. K. Sapper. Dutch West Indies, by Professor Dr. D. Van Blom. Comments on the investigations by the association of social policy for Dutch East Indies, by ministry director (ret.) Dr. I. A. Nederburgh.</li>
<li>From the series: Writings by the Association for Social Policy, and the series: European settlement of the Tropics.</li>
</ul>
<h2>Institution</h2>
<ul itemprop="provider" itemscope itemtype="http://schema.org/Organization">
<li itemprop="name"><a href="/en/search/gallery/?institution=library-of-congress">Library of Congress</a></li>
</ul>
<h2>External Resource</h2>
<ul>
<li itemprop="url"><a href="http://hdl.loc.gov/loc.wdl/dlc.1" rel="nofollow">http://hdl.loc.gov/loc.wdl/dlc.1</a></li>
</ul>
</div>
<!-- RSPEAK_STOP -->
</div>
</div>
<div id="pg_footer" class="header_footer_block">
<p class="last_updated">
Last updated: August 14, 2012
</p>
</div>
</div>
<div id="footer">
<ul>
<li><a href="/en/">Home</a> |</li>
<li><a href="/en/about/">About</a> |</li>
<li><a href="/en/help/">Help</a> |</li>
<li><a href="/en/contact/">Contact</a> |</li>
<li><a href="/en/legal/">Legal</a> |</li>
<li class="twitter"><a href="https://twitter.com/WDLorg" class="twitter-follow-button" data-show-count="false" data-dnt="true">Follow @WDLorg</a></li>
<li class="last">
<img src="/static/img/unesco_eng.d3bb084c14b2.png" width="227" height="55" alt="United Nations Educational, Scientific and Cultural Organization">
</li>
</ul>
</div>
</div>
</body>
</html>
{
"name": [
"Antietam, Maryland. Allan Pinkerton, President Lincoln, and Major General John A. McClernand: Another View"
],
"keyword": [
"Lincoln, Abraham, 1809-1865",
"McClernand, John A. (John Alexander), 1812-1900",
"Pinkerton, Allan, 1819-1884",
"United States--History--Civil War, 1861-1865"
],
"creator": [
{
"url": [
"/en/search/gallery/?contributors=Gardner%2C%20Alexander%20%281821-1882%29",
"http://viaf.org/viaf/57416946"
],
"type": "http://schema.org/Person"
}
],
"url": [
"http://hdl.loc.gov/loc.wdl/dlc.1"
],
"image": [
"/media/1/thumbnail/308x255.jpg"
],
"dateCreated": [
"1862-10-03"
],
"provider": [
{
"type": "http://schema.org/Organization",
"name": [
"Library of Congress"
]
}
],
"keywords": [
"History & geography",
"History of North America",
"United States",
"Antietam, Battle of, Maryland, 1862"
],
"type": "http://schema.org/Photograph",
"contentLocation": [
"\n \n \n \n North America\n \n > United States of America\n \n \n > Maryland\n \n \n > Antietam\n \n \n \n \n \n "
],
"description": [
"At the outset of the U.S. Civil War, Mathew Brady dispatched a team of photographers to document the conflict. Among them was a Scottish-born immigrant named Alexander Gardner, the photographer who took this photo of Lincoln at Antietam as well as other famous wartime shots. The man to Lincoln's right is Allan Pinkerton, founder of the Pinkerton National Detective Agency, whom Lincoln had as head of a personal security detail during the war. Gardner titled another shot of Pinkerton and his brother William at Antietam \u201cThe Secret Service.\u201d Gardner photographed Lincoln on seven separate occasions, the last one on February 5, 1865, only a few weeks before Lincoln\u2019s assassination. In 1866 he published Gardner\u2019s Sketchbook of the War, combining plates and text, commemorating such battles as Fredericksburg, Gettysburg, and Petersburg, but the book was a commercial failure. Photographic historians also have suggested that Gardner staged many of his photos, moving dead bodies and using a regular prop gun to create romanticized pictorial narratives."
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment