Skip to content

Instantly share code, notes, and snippets.

@jalperin
Created September 18, 2013 20:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jalperin/6615371 to your computer and use it in GitHub Desktop.
Save jalperin/6615371 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "Final Assignment"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Web Scraping Final Assignment ###\n",
"By Juan Pablo Alperin (@juancommander)\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import re\n",
"import urllib, urllib2\n",
"from bs4 import BeautifulSoup\n",
"\n",
"import random\n",
"\n",
"titles = [\"The Lady Anatomist: The Life and Work of Anna Morandi Manzolini\",\n",
"\"Moments of Despair: Suicide, Divorce, and Debt in Civil War Era North Carolina\",\n",
"\"The illusion of free markets: punishment and the myth of natural order\",\n",
"\"Le fran\u00e7ais en Am\u00e9rique du Nord: \u00c9tat pr\u00e9sent\",\n",
"\"Single?Molecule Techniques: A Laboratory Manual\",\n",
"\"Foreign Firms, Investment, and Environmental Regulation in the People\u0092s Republic of China\",\n",
"\"The Origins of Canadian and American Political Differences\",\n",
"\"Getting Even: Why Women Don't Get Paid like Men, and What to Do about It\",\n",
"\"Fighting Against the Odds: A History of Southern Labor Since World War II\",\n",
"\"Blandford Cemetery: Death and Life at Petersburg, Virginia\",\n",
"\"The Irish General: Thomas Francis Meagher\",\n",
"\"Ming Erotic Novellas: Genre, Consumption and Religiosity in Cultural Practice\",\n",
"\"Victorian Culture and Classical Antiquity: Art, Opera, Fiction, and the Proclamation of Modernity\",\n",
"\"Anatomy of the Red Brigades: The Religious Mind-Set of Modern Terrorists\",\n",
"\"Dark Nights: The Museum Soliloquies\",\n",
"\"Inventing the Business of Opera: The Impresario and His World in Seventeenth-Century Venice\",\n",
"\"The Legacy of the Siege of Leningrad, 1941-1995: Myth, Memories, and Monuments\",\n",
"\"Social Science Research and Conservation Management in the Interior of Borneo: Unravelling Past and Present Interactions of People and Forests\",\n",
"\"Jacobite Prisoners of the 1715 Rebellion: Preventing and Punishing Insurrection in Early Hanoverian Britain\",\n",
"\"Revolutionary Mothers: Women in the Struggle for America's Independence\",\n",
"\"South Koreans in the Debt Crisis: The Creation of a Neoliberal Welfare Society\",\n",
"\"Kindai Seishigy? no Koy? to Keiei (Employment and Management in the Modern Silk Reeling Industry)\",\n",
"\"Tensions and Reversals in Democratic Transitions: The Kenya 2007 General Elections\",\n",
"\"A Viola da Gamba Miscellanea. Articles from and Inspired by Viol Symposiums Organized by the Ensemble Baroque de Limoges, France\",\n",
"\"D\u00daN AILINNE: EXCAVATION AT AN IRISH ROYAL SITE, 1968-1975\",\n",
"\"A Radical Approach to Real Analysis\",\n",
"\"St. Peter\u0092s in the Vatican\",\n",
"\"Sublime Historical Experience\",\n",
"\"Subjectivity and Selfhood: Investigating the First-Person Perspective\",\n",
"\"Early Medieval Bible Illumination and the Ashburnham Pentateuch\",\n",
"\"Neither with Them, nor without Them: The Russian Writer and the Jew in the Age of Realism\",\n",
"\"Metzler Lexikon Avantgarde\",\n",
"\"The Ethical Archivist\",\n",
"\"CHINESE SHAKESPEARES: TWO CENTURIES OF CULTURAL EXCHANGE\",\n",
"\"The Rhetoric of Power in Late Antiquity: Religion and Politics in Byzantium, Europe and the Early Islamic World. Library of Classical Studies 2\",\n",
"\"Ecosystems and human health: a critical approach to ecohealth research and practice\",\n",
"\"Bayesian Analysis for Population Ecology\",\n",
"\"<italic>The Conquests of Alexander the Great</italic>\",\n",
"\"Shaped by Stories: The Ethical Power of Narratives\",\n",
"\"Die politische Auslandsarbeit der DDR in Schweden: Zur Public Diplomacy der DDR gegen\u00fcber Schweden nach der diplomatischen Anerkennung (1972\u00961989)\",\n",
"\"Aesopic Conversations: Popular Tradition, Cultural Dialogue, and the Invention of Greek Prose\",\n",
"\"SUBJECTIVE CONSCIOUSNESS: A SELF-REPRESENTATIONAL THEORY\",\n",
"\"Paul's Utilization of Preformed Traditions in 1 Timothy: An Evaluation of the Apostle's Literary, Rhetorical, and Theological Tactics\",\n",
"\"RELATIVISM AND MONADIC TRUTH\",\n",
"\"Inside Toyland: Working, Shopping, and Social Inequality\",\n",
"\"The Greek Wars. The Failure of Persia\",\n",
"\"Creative Code\",\n",
"\"War at a Distance: Romanticism and the Making of Modern Wartime\",\n",
"\"Confession and Bookkeeping: The Religious, Moral, and Rhetorical Roots of Modern Accounting\",\n",
"\"Leaders: The Strategies for Taking Charge\",\n",
"\"Innovation in Strategic Philanthropy: Local and Global Perspectives\",\n",
"\"THE POLITICS OF LIFE ITSELF: Biomedicine, Power, and Subjectivity in the Twenty-First Century\",\n",
"\"Cross, Crescent and Conversion: Studies on Medieval Spain and Christendom in Memory of Richard Fletcher\",\n",
"\"Laughter and Ridicule: Towards a Social Critique of Humour\",\n",
"\"The Music Has Gone out of the Movement: Civil Rights and the Johnson Administration, 1965-1968\",\n",
"\"<italic>Everyday Life in Central Asia: Past and Present</italic>\",\n",
"\"C\u00e9zanne and the Eternal Feminine\",\n",
"\"Millennial Monsters: Japanese Toys and the Global Imagination\",\n",
"\"Theatrika: I orea thymomeni, i Pentanostimi, Ikogeniako Dikeo\",\n",
"\"Canada and the British Empire\",\n",
"\"Kant and the Early Moderns\",\n",
"\"Ireland's Huguenots and Their Refuge, 1662-1745: An Unlikely Haven\",\n",
"\"The Great War for Civilisation: The Conquest of the Middle East\",\n",
"\"Zo ver de wereld strekt: De geschiedenis van Nederland overzee vanaf 1800\",\n",
"\"Medici Women: Portraits of Power, Love, and Betrayal\",\n",
"\"Political Tourists: Travellers from Australia to the Soviet Union in the 1920s\u00961940s\",\n",
"\"Materialising Identity: The Co-construction of the Gotthard Railway and Swiss National Identity\",\n",
"\"Security in the New Europe\",\n",
"\"Made in Sheffield: an ethnography of industrial work and politics\",\n",
"\"Shaping the Industrial Century: The Remarkable Story of the Evolution of the Modern Chemical and Pharmaceutical Industries\",\n",
"\"The Last Oil Shock: A Survival Guide to the Imminent Extinction of Petroleum Man\",\n",
"\"Conjuring the Real: The Role of Architecture in Eighteenth- and Nineteenth-century Fiction\",\n",
"\"The Thirty Years War: Europe's Tragedy\",\n",
"\"Txtng: The gr8 db8\",\n",
"\"Virulence Mechanisms of Bacterial Pathogens\",\n",
"\"The Role of the European Union in Moldova's Transnistria Conflict\",\n",
"\"Empowering Women in Russia: Activism, Aid, and NGOs\",\n",
"\"Introduction to Data Envelopment Analysis and Its Uses: With DEA Solver Software and References\",\n",
"\"Grant Money through Collaborative Partnerships\",\n",
"\"Feminist Thinkers and the Demands of Feminity: The Lives and Work of Intellectual Women\",\n",
"\"Alaska\u0092s Place in the West: From the Last Frontier to the Last Great Wilderness. (Lawrence: University Press of Kansas, 2010. xi + 186 pp. Illustrations, map, notes, bibliography, index. $34.95.)\",\n",
"\"Autour des Libri Coloniarum. Colonisation et colonies dans le monde romain. Actes du Colloque International (Besan\u00e7on, 16-18 Octobre 2003)\",\n",
"\"Aesopic Conversations: Popular Tradition, Cultural Dialogue, and the Invention of Greek Prose\",\n",
"\"Soci\u00e9t\u00e9 et mentalit\u00e9s autour de Henri III\",\n",
"\"Peopling the Russian Periphery: Borderland Colonization in Eurasian History. BASEES/Routledge series on Russian and East European Studies, 38\",\n",
"\"Shaping the Shoreline: Fisheries and Tourism on the Monterey Coast. Weyerhaeuser Series in Environmental History\",\n",
"\"Symphony No. 4. (Charles Ives Society Critical Edition.)\",\n",
"\"The Holocaust and Historical Methodology\",\n",
"\"Law, City, and King: Legal Culture, Municipal Politics, and State Formation in Early Modern Dijon\",\n",
"\"Printer's Devil: Mark Twain and the American Publishing Revolution\",\n",
"\"The Muslim Empires of the Ottomans, Safavids, and Mughals. New Approaches to Asian History\",\n",
"\"A Grotesque Old Woman\",\n",
"\"Sacred Space: The Quest for Transcendence in Science Fiction Film and Television\",\n",
"\"Women Poets on Mentorship: Efforts and Affections\",\n",
"\"Hindle Wakes (1927)\",\n",
"\"Lord Henry Howard, 1540\u00961614: An Elizabethan Life\",\n",
"\"TAKE ME ALONG\",\n",
"\"A Dirty War in West Africa: The Ruf and the Destruction of Sierra Leone\",\n",
"\"The Center of a Great Empire: The Ohio Country in the Early American Republic\",\n",
"\"Isaac Taylor Tichenor: The Creation of the Baptist New South\"]\n",
"\n",
"titles = random.sample(titles, 10)\n",
"for t in titles:\n",
" print t"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The Great War for Civilisation: The Conquest of the Middle East\n",
"Subjectivity and Selfhood: Investigating the First-Person Perspective\n",
"Cross, Crescent and Conversion: Studies on Medieval Spain and Christendom in Memory of Richard Fletcher\n",
"The Origins of Canadian and American Political Differences\n",
"Zo ver de wereld strekt: De geschiedenis van Nederland overzee vanaf 1800\n",
"A Viola da Gamba Miscellanea. Articles from and Inspired by Viol Symposiums Organized by the Ensemble Baroque de Limoges, France\n",
"Shaping the Shoreline: Fisheries and Tourism on the Monterey Coast. Weyerhaeuser Series in Environmental History\n",
"A Radical Approach to Real Analysis\n",
"Blandford Cemetery: Death and Life at Petersburg, Virginia\n",
"Dark Nights: The Museum Soliloquies\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"AMAZON_ADV_SEARCH_BASE_URL = 'http://www.amazon.com/gp/search/ref=sr_adv_b/'\n",
"\n",
"product_ids = []\n",
"\n",
"for title in titles: \n",
" # hint: start with the URL for the search you want to perform and fetch that page\n",
" data = urllib.urlencode({'search-alias': 'stripbooks', 'field-keywords': title})\n",
" content = urllib2.urlopen(AMAZON_ADV_SEARCH_BASE_URL + '?' + data).read()\n",
"\n",
" soup = BeautifulSoup(content, \"html.parser\")\n",
" result0=soup.find(id='result_0')\n",
" \n",
" p_id = None\n",
" if (result0): \n",
" amazon_titles = result0.find('div', {'class': 'productTitle'}).find('a').text.strip()\n",
" title_in_amazon = result0.find('div', {'class': 'productTitle'}).find('a').text.strip()\n",
" p_id = result0.attrs['name']\n",
" print title \n",
" print title_in_amazon\n",
" else:\n",
" print 'not found:', title\n",
" print \n",
" \n",
" product_ids.append(p_id)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"The Great War for Civilisation: The Conquest of the Middle East\n",
"The Great War for Civilisation: The Conquest of the Middle East\n",
"\n",
"Subjectivity and Selfhood: Investigating the First-Person Perspective"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Subjectivity and Selfhood: Investigating the First-Person Perspective (Bradford Books)\n",
"\n",
"Cross, Crescent and Conversion: Studies on Medieval Spain and Christendom in Memory of Richard Fletcher"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Cross, Crescent and Conversion: Studies on Medieval Spain and Christendom in Memory of Richard Fletcher (The Medieval Mediterranean)\n",
"\n",
"The Origins of Canadian and American Political Differences"
]
}
],
"prompt_number": "*"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Bonus:_ See if you can figure out how to improve the quality of the matches. Can you identify any of the titles that were not found? Can you fix any incorrect matches?"
]
}
],
"metadata": {}
}
]
}
@nazywamsiepawel
Copy link

Interesting read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment