Last active
November 25, 2018 09:34
-
-
Save loleg/b736cfc0bdc732bc6d3e1babce4838b4 to your computer and use it in GitHub Desktop.
(WIP) Röstigrabendetektor #plurilinguism hack
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# #röstigrabenism\n", | |
"\n", | |
"Explore news data to help understanding trump clichées\n", | |
"\n", | |
"#### Find more details about this project, aggregated results and discussion at the [School of Data CH forum](https://forum.schoolofdata.ch/t/explore-swiss-news-in-code-rostigrabenism/483/1)\n", | |
"\n", | |
"_This is a project started at the [#plurilinguism](https://hack.opendata.ch/event/22) hackathon, in response to challenge [#5 Röstigrabendetektor](https://hack.opendata.ch/project/265)._\n", | |
"\n", | |
"We set out to use the open web and machine learning tools to together explore the dimensions of social geography. What is the *röstigraben*? It is a kind of meme used to distinguish interests between the two largest language regions in Switzerland. One of those curious things about Switzerland that you learn about in time. Whether or not a röstigraben exists, what character it has gets [hotly debated](https://www.nzz.ch/meinung/von-wegen-halbe-schweizer-romands-normalos-wie-du-und-ich-ld.1307803) (nzz.ch), typically with statements like this:\n", | |
"\n", | |
"> \"In social and foreign policy, the Romands tend to favour government regulation (influenced by the centralistic political mentality prevailing in France) and an active foreign policy (somewhat discarding Switzerland's neutrality), especially in relation to the European Union.\" -- https://en.m.wikipedia.org/wiki/Röstigraben\n", | |
"\n", | |
"<img src=\"http://files.newsnetz.ch/story/3/0/7/30701665/11/topelement.jpg\" width=\"300\">\n", | |
"<center><small>-- Image via <a href=\"http://www.24heures.ch/suisse/Un-musee-veut-inscrire-leRoestigraben-a-l-Unesco/story/30701665\" target=\"_blank\">20minuten.ch</a></small></center>\n", | |
"\n", | |
"There is also the concept of a *Polentagraben* with the Romansh/Italian-speaking regions, which could be explored in the same way. For more background on the topic, we suggest reading the [Swissinfo article](https://www.swissinfo.ch/eng/society/german-vs-french_the--roesti-divide---a-barrier-that-binds-the-swiss/41193552), that links to further analysis and books.\n", | |
"\n", | |
"Additional inspirations for this project:\n", | |
"\n", | |
"- [Parliament Impact Project](http://make.opendata.ch/wiki/project:parliament_impact) (opendata.ch)\n", | |
"- [What one artist learned about America from 19 million dating profiles](https://ideas.ted.com/what-one-artist-learned-about-america-from-19-million-dating-profiles/) (ted.com)\n", | |
"- [How Connected Is Your Community to Everywhere Else in America?](https://www.nytimes.com/interactive/2018/09/19/upshot/facebook-county-friendships.html) (nytimes.com)\n", | |
"- [Why Journalists Should Talk About Geography](http://blogs.lse.ac.uk/polis/2015/07/24/why-journalists-should-think-about-geography/) (lse.ac.uk)\n", | |
"- [How East and West think in profoundly different ways](http://www.bbc.com/future/story/20170118-how-east-and-west-think-in-profoundly-different-ways) (bcc.com)\n", | |
"- [Wikinews on Newsworthiness](https://en.wikinews.org/wiki/Wikinews:Newsworthiness) (wikinews.org)\n", | |
"\n", | |
"[![](https://www.wikidata.org/static/images/project-logos/wikidatawiki.png)](https://www.wikidata.org/) [![](https://www.textrazor.com/img/logo.png)](https://www.textrazor.com/) [![](http://jupyter.org/assets/nav_logo.svg)](http://jupyter.org/assets/nav_logo.svg)\n", | |
"\n", | |
"### The hack\n", | |
"\n", | |
"The [Jupyter](https://jupyter.org) notebook we wrote at the event, coded in the [Python](https://python.org) programming language, explores interaction with the [TextRazor API](https://www.textrazor.com/docs/python#Entity) which performs language detection and entity extraction on free-form text. They even have support for classifiers from the [IPTC NewsCodes](http://cv.iptc.org/newscodes/) ontology, support semantic metadata out of the box, etc. It's a pretty cool API, easy to get started with, though not completely open (and there are open alternatives to explore as well), and has a fee starting from 500 requests per day. \n", | |
"\n", | |
"The output provides Freebase identifiers, which are easy to use to filter the list to people, locations and organizations, and Wikidata identifiers (such as [Q214086](https://www.wikidata.org/wiki/Q214086) for Suisse Romande). We expand these through the Wikidata open API to obtain geographic coordinates of headquarters or birthplaces. Through a simple calculation at the end we obtain a score indicating how far *röstigrabenised* the article is.\n", | |
"\n", | |
"After providing a link to the article, the tool (through a Web interface, Twitter/chat bot, etc.), runs and provides visual results of the analysis. Additionally the user should be able to see the specific entities in the text that the score is based on, and decide to ignore them in the calculation - to filter out false positives - or even add their own opinion. \n", | |
"\n", | |
"The end result should look something like this sketch:\n", | |
"\n", | |
"![](https://blog.datalets.ch/workshops/2018/dll/IMG_20181124_074118-02.jpeg)\n", | |
"\n", | |
"Ultimately we should be able to crowdsource responses about a variety of news sources, and construct a map of their polarisation towards or against a cultural bias.\n", | |
"\n", | |
"Scroll down to see our hackathon project in action.\n", | |
"\n", | |
"### Note\n", | |
"\n", | |
"The current version uses the old fashioned Wikidata API service instead of [Sparql queries](https://query.wikidata.org), and could be improved using a query like [this one](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#All_museums_(including_subclass_of_museum)_in_Washington,_D.C._with_coordinates) possibly linked to [Q214086](https://www.wikidata.org/wiki/Q214086) (Suisse Romande). An example project that uses this is [wiki-climate](https://github.com/stalker314314/wiki-climate).\n", | |
"\n", | |
"We also considered expanding the reach of our classifications using a tool for Social Network analysis (see [O'Reilly](https://www.oreilly.com/library/view/social-network-analysis/9781449311377/ch04.html), [socnetv](http://socnetv.org/)).\n", | |
"\n", | |
"## Team\n", | |
"\n", | |
"- Celine Zund\n", | |
"- Karlen Kathrin\n", | |
"- Oleg Lavrovsky" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import textrazor\n", | |
"import os\n", | |
"\n", | |
"# Enter a URL to any article in Switzerland here\n", | |
"TEST_URL = \"https://www.letemps.ch/suisse/toni-brunner-va-quitter-monde-politique\"\n", | |
"\n", | |
"textrazor.api_key = os.getenv('TEXTRAZOR_KEY', \"Get an API key and enter it here\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Language --> fre\n", | |
"Entities ---\n", | |
"Keystone 0.02298 1.596 ['/organization/organization', '/business/employer', '/film/film_company', '/business/business_operation']\n", | |
"Europe 0.2611 2.325 ['/book/book_subject', '/fictional_universe/fictional_setting', '/periodicals/newspaper_circulation_area', '/location/statistical_region', '/travel/travel_destination', '/location/dated_location', '/organization/organization_scope', '/location/continent', '/government/governmental_jurisdiction', '/meteorology/cyclone_affected_area', '/cvg/computer_game_region', '/meteorology/forecast_zone', '/biology/breed_origin', '/location/location', '/location/region', '/symbols/namesake', '/symbols/name_source']\n", | |
"2018 0 0.5 []\n", | |
"2018 0 0.5 []\n", | |
"23 0 0.5 []\n", | |
"23 novembre 0 1.198 ['/time/day_of_year']\n", | |
"Toni Brunner 0.4266 3.124 ['/people/person', '/government/politician']\n", | |
"Union démocratique du centre 0.8321 9.121 ['/government/political_party', '/business/employer', '/organization/organization']\n", | |
"Union démocratique du centre 0.8321 9.121 ['/government/political_party', '/business/employer', '/organization/organization']\n", | |
"Union démocratique du centre 0.8321 9.121 ['/government/political_party', '/business/employer', '/organization/organization']\n", | |
"Toni Brunner 0.4266 3.124 ['/people/person', '/government/politician']\n", | |
"Toni Brunner 0.4266 3.124 ['/people/person', '/government/politician']\n", | |
"Agriculteur 0.2297 1.539 ['/fictional_universe/character_occupation', '/film/film_subject', '/people/profession', '/book/book_subject', '/organization/organization_sector', '/visual_art/art_subject']\n", | |
"Anthony Anex 0 0.5 ['/people/person']\n", | |
"23:54 0 0.5 ['']\n", | |
"19:00 0 0.5 ['']\n", | |
"1995 0 1.336 []\n", | |
"Conseil national (Suisse) 0.2651 2.247 ['/government/governmental_body', '/business/employer']\n", | |
"23 0 0.5 []\n", | |
"21 0 0.5 []\n", | |
"2016 0 0.5 []\n", | |
"30 0 0.5 []\n", | |
"23 0 0.5 []\n", | |
"2008 0 0.5 []\n", | |
"2 0 0.5 []\n", | |
"2015 0 0.5 []\n", | |
"44 0 0.5 []\n", | |
"2018 0 0.5 []\n", | |
"23 0 0.5 []\n", | |
"2008 0 0.5 []\n", | |
"1998 0 0.5 []\n", | |
"19:00 0 0.5 ['']\n", | |
"23:54 0 0.5 ['']\n", | |
"2018 0 0.5 []\n", | |
"Conseil fédéral (Suisse) 0.5817 10.16 ['/government/government_office_category', '/government/governmental_body']\n", | |
"Canton de Vaud 0.3365 3.899 ['/travel/travel_destination', '/location/dated_location', '/location/location', '/location/statistical_region', '/location/administrative_division', '/periodicals/newspaper_circulation_area']\n", | |
"Guy Parmelin 0.3612 5.18 ['/people/person', '/government/politician']\n", | |
"Christoph Blocher 0.5627 7.726 ['/film/actor', '/film/person_or_entity_appearing_in_film', '/government/politician', '/organization/organization_founder', '/people/person']\n", | |
"Schweiz am Wochenende 0 0.5 ['/location/location']\n", | |
"Président du Conseil (France) 0.1095 1.016 []\n", | |
"Suisse 0.4221 21.66 ['/education/school_category', '/food/beer_country_region', '/location/dated_location', '/biology/breed_origin', '/location/location', '/fictional_universe/fictional_setting', '/organization/organization_founder', '/periodicals/newspaper_circulation_area', '/travel/travel_destination', '/location/country', '/symbols/name_source', '/government/governmental_jurisdiction', '/sports/sport_country', '/film/film_location', '/olympics/olympic_participating_country', '/organization/organization_scope', '/symbols/flag_referent', '/law/court_jurisdiction_area', '/sports/sports_team_location', '/location/statistical_region', '/organization/organization_member', '/book/book_subject']\n", | |
"Parti conservateur (Royaume-Uni) 0.2517 1.283 ['/organization/organization', '/business/employer', '/government/political_party', '/fictional_universe/fictional_organization']\n", | |
"Ueli Maurer 0.7158 6.658 ['/government/politician', '/people/person']\n", | |
"Canton de Saint-Gall 0.08829 2.172 ['/location/dated_location', '/location/administrative_division', '/location/location', '/location/statistical_region']\n", | |
"23 novembre 0 1.198 ['/time/day_of_year']\n", | |
"23 novembre 0 1.198 ['/time/day_of_year']\n", | |
"23 novembre 0 1.198 ['/time/day_of_year']\n", | |
"Conseil national (Suisse) 0.2651 2.247 ['/government/governmental_body', '/business/employer']\n", | |
"Conseil national (Suisse) 0.2651 2.247 ['/government/governmental_body', '/business/employer']\n", | |
"Conseil national (Suisse) 0.2651 2.247 ['/government/governmental_body', '/business/employer']\n", | |
"Union démocratique du centre 0.8321 9.121 ['/government/political_party', '/business/employer', '/organization/organization']\n", | |
"Conseil national 0 0.5 ['/organization/organization']\n", | |
"Canton de Saint-Gall 0.08829 2.421 ['/location/dated_location', '/location/administrative_division', '/location/location', '/location/statistical_region']\n", | |
"2000 0.02159 1.048 []\n", | |
"Toni Brunner 0.4266 3.124 ['/people/person', '/government/politician']\n", | |
"Topics ---\n", | |
"Swiss People's Party 1 Q385258\n", | |
"Swiss nationalism 1 None\n", | |
"Politics 1 Q7163\n", | |
"Politics of Switzerland 1 Q688192\n", | |
"Government 1 Q7188\n", | |
"Switzerland 0.9181 Q39\n", | |
"Government of Switzerland 0.9087 None\n", | |
"Ueli Maurer 0.8695 Q123979\n", | |
"Toni Brunner 0.8466 Q115614\n", | |
"Anti-Islam political parties 0.8383 None\n", | |
"Right-wing populism 0.7769 Q436860\n", | |
"Right-wing populist parties 0.7765 None\n", | |
"Right-wing parties 0.7609 Q76074\n", | |
"Populist parties 0.7513 None\n", | |
"Eurosceptic parties 0.7342 Q223200\n", | |
"Elections 0.7231 Q40231\n", | |
"Political events 0.7162 Q30111082\n", | |
"Federal Council (Switzerland) 0.7066 Q30917\n", | |
"Swiss People's Party politicians 0.6889 None\n", | |
"Politicians 0.6859 Q82955\n", | |
"Christoph Blocher 0.6835 Q123857\n", | |
"Switzerland–European Union relations 0.6835 Q672237\n", | |
"Swiss eurosceptics 0.6625 None\n", | |
"Swiss Federal Council 0.6603 Q30917\n", | |
"Nationalisms 0.6568 None\n", | |
"Democracy 0.6435 Q7174\n", | |
"Europe 0.6361 Q46\n", | |
"Swiss agrarianists 0.6287 None\n", | |
"Swiss nationalists 0.6019 None\n", | |
"Public sphere 0.5662 Q17945\n", | |
"Euroscepticism 0.5437 Q223200\n", | |
"National Council (Switzerland) 0.5426 Q676078\n", | |
"Voting 0.5397 Q189760\n", | |
"Swiss people 0.4756 Q124216\n", | |
"Federal Assembly (Switzerland) 0.4736 None\n", | |
"Members of the National Council (Switzerland) 0.4712 Q18510612\n", | |
"Members of the Federal Assembly (Switzerland) 0.4573 Q18515554\n", | |
"Human activities 0.4565 None\n", | |
"Right-wing politics 0.452 Q76074\n", | |
"International relations 0.4496 Q166542\n", | |
"Government-related organizations 0.4441 None\n", | |
"Accountability 0.4397 Q5190563\n", | |
"Guy Parmelin 0.4387 Q121160\n", | |
"Political ideologies 0.4191 Q14934048\n", | |
"Heads of state 0.4166 Q48352\n", | |
"Political parties 0.4096 Q7278\n", | |
"Canton of Vaud 0.4088 Q12771\n", | |
"Political spectrum 0.4015 Q210918\n", | |
"Collective heads of government 0.3979 None\n", | |
"Swiss politicians 0.397 None\n", | |
"Political science 0.3906 Q36442\n", | |
"Economy of Switzerland 0.3865 Q685175\n", | |
"Swiss political people 0.3846 None\n", | |
"Members of the Swiss Federal Council 0.383 None\n", | |
"Agrarian parties 0.3774 Q19835256\n", | |
"Populism 0.3674 Q180490\n", | |
"Ideologies 0.3632 Q7257\n", | |
"Law 0.3622 Q7748\n", | |
"Nationalist parties 0.3596 None\n", | |
"Heads of government 0.3557 Q2285706\n", | |
"Political office-holders 0.3527 None\n", | |
"Social ideologies 0.3507 None\n", | |
"Political movements 0.3495 Q2738074\n", | |
"Social institutions 0.3398 Q178706\n", | |
"National legislatures 0.338 None\n", | |
"Members of lower houses 0.3284 Q375928\n", | |
"Political theories 0.3197 Q9357091\n", | |
"Political activism 0.3179 Q19890758\n", | |
"Social movements 0.3139 Q49773\n", | |
"National legislators 0.3092 None\n", | |
"Conservative Party (UK) 0.3057 Q9626\n", | |
"Immigration to Switzerland 0.3043 Q15832389\n", | |
"National lower houses 0.2974 Q375928\n", | |
"Government institutions 0.297 Q2659904\n", | |
"Lower houses 0.2874 Q375928\n", | |
"Political organizations 0.287 Q7210356\n", | |
"Demographics of Switzerland 0.283 Q688599\n", | |
"Farmer 0.279 Q131512\n", | |
"Foreign policy 0.2685 Q181648\n", | |
"Nationalism 0.2677 Q6235\n", | |
"Critics of the European Union 0.265 None\n", | |
"Member states of the Organisation internationale de la Francophonie 0.2598 Q6814234\n", | |
"Global politics 0.2574 Q5570874\n", | |
"Culture 0.2571 Q11042\n", | |
"Campaign for an Independent and Neutral Switzerland 0.2556 Q353975\n", | |
"People from the canton of Zürich 0.2432 None\n", | |
"Opposition to Islam 0.2227 None\n", | |
"Politics of the European Union 0.2201 Q959163\n", | |
"Identity politics 0.2115 Q2914650\n", | |
"Isolationism 0.2052 Q309310\n", | |
"Ethnic organizations 0.1955 None\n", | |
"Bicameral legislatures 0.1927 Q189445\n", | |
"Agrarian politics 0.1909 None\n", | |
"Forms of government 0.1836 Q1307214\n", | |
"Nationalist movements 0.1834 None\n", | |
"Movements 0.181 Q49773\n", | |
"Members of bicameral legislatures 0.1742 Q189445\n", | |
"Politics of France 0.1684 Q1121558\n", | |
"Nationalist organizations 0.1674 None\n", | |
"Anti-imperialism 0.1597 Q1144178\n", | |
"Conflicts 0.1594 Q180684\n", | |
"Public policy 0.1584 Q546113\n", | |
"Immigration 0.1573 Q131288\n", | |
"Immigration to Europe 0.1568 Q3394662\n", | |
"European agrarianists 0.1563 None\n", | |
"Corporatism 0.1496 Q192886\n", | |
"Presidencies 0.1479 None\n", | |
"Popularity 0.146 Q1357284\n", | |
"European integration 0.1445 Q1048268\n", | |
"Political neologisms 0.1427 Q3062289\n", | |
"Canton of Zürich 0.1422 Q11943\n", | |
"Canton of St. Gallen 0.1412 Q12746\n", | |
"Imperialism 0.1313 Q7260\n", | |
"People from Wetzikon 0.1297 None\n", | |
"Political people 0.1222 None\n", | |
"Cantons of Switzerland 0.1202 Q23058\n", | |
"Social conflict 0.1145 Q2672648\n", | |
"Anti-European sentiment 0.113 None\n", | |
"Wetzikon 0.1087 Q68305\n", | |
"Organizational structure of political parties 0.1059 None\n", | |
"Civic organizations 0.1055 Q16995546\n", | |
"Criticism of Islam 0.1045 Q1324153\n", | |
"Agrarianism 0.1032 Q492050\n", | |
"People from Hinwil District 0.1028 None\n", | |
"Change 0.09785 None\n", | |
"Swiss military officers 0.09743 None\n", | |
"Organisation internationale de la Francophonie 0.09616 Q134102\n", | |
"National institutions 0.09378 None\n", | |
"Justice 0.08508 Q5167661\n", | |
"Central Europe 0.08427 Q27509\n", | |
"France 0.08345 Q142\n", | |
"Leaders 0.07849 Q1251441\n", | |
"Presidents 0.0773 Q30461\n", | |
"Western Europe 0.07609 Q27496\n", | |
"Public law 0.07594 Q207892\n", | |
"Military of Switzerland 0.07478 Q332844\n", | |
"Controversies 0.07309 Q1255828\n", | |
"People from Schaffhausen 0.07219 None\n", | |
"Swiss military personnel 0.07179 None\n", | |
"Social issues 0.06844 Q1920219\n", | |
"War 0.06766 Q198\n", | |
"People from the canton of Schaffhausen 0.06671 None\n", | |
"Warfare 0.06517 Q12786121\n", | |
"Swiss Protestants 0.06514 None\n", | |
"Administrative territorial entities 0.0609 Q56061\n", | |
"Federal Department of Defence, Civil Protection and Sports 0.06087 Q667135\n", | |
"Wars 0.05879 Q198\n", | |
"Federal republics 0.05874 None\n", | |
"People from Meilen District 0.05709 None\n", | |
"Conflict (process) 0.05485 Q180684\n", | |
"National governments 0.05427 None\n", | |
"International security 0.05266 Q3312693\n", | |
"Violence 0.05026 Q124490\n" | |
] | |
} | |
], | |
"source": [ | |
"# Set up and configure a API client - see https://www.textrazor.com/docs/python\n", | |
"client = textrazor.TextRazor(\n", | |
" extractors=[\"entities\", \"topics\"],\n", | |
" #set_classifiers=[\"textrazor_newscodes\"],\n", | |
")\n", | |
"response = None\n", | |
"try:\n", | |
" response = client.analyze_url(TEST_URL)\n", | |
"except textrazor.TextRazorAnalysisException as ex:\n", | |
" print (\"Failed to analyze with error: \", ex)\n", | |
"\n", | |
"# Show what we can learn from this URL\n", | |
"print (\"Language -->\", response.language)\n", | |
"\n", | |
"print (\"Entities ---\")\n", | |
"\n", | |
"for entity in response.entities():\n", | |
" print (entity.id, entity.relevance_score, entity.confidence_score, entity.freebase_types)\n", | |
"\n", | |
"print (\"Topics ---\")\n", | |
"\n", | |
"for t in response.topics():\n", | |
" print (t.label, t.score, t.wikidata_id)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# To persist for later analysis\n", | |
"#json_content = response.json\n", | |
"# ..write to file\n", | |
"# ..load the file\n", | |
"#response = textrazor.TextRazorResponse(json_content)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Toni Brunner Q115614 0.4266\n", | |
"Union démocratique du centre Q385258 0.8321\n", | |
"Canton de Vaud Q12771 0.3365\n", | |
"Guy Parmelin Q121160 0.3612\n", | |
"Christoph Blocher Q123857 0.5627\n", | |
"Suisse Q39 0.4221\n", | |
"Ueli Maurer Q123979 0.7158\n" | |
] | |
} | |
], | |
"source": [ | |
"# The kinds of (Freebase Types) we include in our analysis\n", | |
"allowed_types = [\n", | |
" '/people/person',\n", | |
" '/organization/organization',\n", | |
" '/location/location',\n", | |
" '/location/statistical_region',\n", | |
" '/location/administrative_division',\n", | |
"]\n", | |
"\n", | |
"# Parse out the selection\n", | |
"filtered_entities = filter(\n", | |
" lambda e: e.relevance_score > 0.3 and set(e.freebase_types) & set(allowed_types), \n", | |
" response.entities()\n", | |
")\n", | |
"\n", | |
"# Keep only the unique elements\n", | |
"unique_entities = {}\n", | |
"for entity in filtered_entities:\n", | |
" wid = entity.wikidata_id\n", | |
" if not wid in unique_entities.keys():\n", | |
" unique_entities[wid] = entity\n", | |
" print (entity.id, entity.wikidata_id, entity.relevance_score)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Number of results: 7\n" | |
] | |
} | |
], | |
"source": [ | |
"import requests\n", | |
"\n", | |
"# Ask wikidata\n", | |
"ids = \"|\".join([ unique_entities[u].wikidata_id for u in unique_entities.keys() ])\n", | |
"lang = \"en\"\n", | |
"wikidata = requests.get(\"https://www.wikidata.org/w/api.php?action=wbgetentities&ids=%s&languages=%s&format=json\" % (ids, lang)).json()\n", | |
"\n", | |
"print (\"Number of results:\", len(wikidata['entities']))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# To save results to disk for a look through\n", | |
"\n", | |
"#import json\n", | |
"#with open('../data/test.json', 'w') as f:\n", | |
"# json.dump(wikidata, f)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Location not found for Q385258 Union démocratique du centre\n", | |
"6 of 7 locations found\n" | |
] | |
} | |
], | |
"source": [ | |
"def get_location_by_id(wikidata_id, data=None, lang='en'):\n", | |
" #print (\"Fetching location\", wikidata_id)\n", | |
" ids = wikidata_id\n", | |
" data = requests.get(\"https://www.wikidata.org/w/api.php?action=wbgetentities&ids=%s&languages=%s&format=json\" % (ids, lang)).json()\n", | |
" claims = data['entities'][ids]['claims']\n", | |
" # Check if we already have a coordinate_location \n", | |
" if 'P625' in claims:\n", | |
" return claims['P625'][0]['mainsnak']['datavalue']['value']\n", | |
" return None\n", | |
"\n", | |
"# Parse wikidata\n", | |
"geoloc_entities = {}\n", | |
"geoloc_total = 0\n", | |
"for e in unique_entities.keys():\n", | |
" edata = {}\n", | |
" edata['entity'] = unique_entities[e]\n", | |
" edata['data'] = wikidata['entities'][e]\n", | |
" \n", | |
" claims = edata['data']['claims']\n", | |
" \n", | |
" # Check if we already have a coordinate_location \n", | |
" if 'P625' in claims:\n", | |
" loc = claims['P625'][0]['mainsnak']['datavalue']['value']\n", | |
" edata['location'] = loc\n", | |
" geoloc_total = geoloc_total + 1\n", | |
" \n", | |
" # Otherwise fetch a birthplace\n", | |
" elif 'P19' in claims:\n", | |
" wid = claims['P19'][0]['mainsnak']['datavalue']['value']['id']\n", | |
" loc = get_location_by_id(wid)\n", | |
" if loc is not None:\n", | |
" edata['location'] = loc\n", | |
" geoloc_total = geoloc_total + 1\n", | |
" \n", | |
" else:\n", | |
" print(\"Location not found for\", e, edata['entity'].id)\n", | |
"\n", | |
" geoloc_entities[e] = edata\n", | |
" \n", | |
"print (geoloc_total, 'of', len(geoloc_entities), 'locations found')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"fre\n", | |
"Toni Brunner Q115614 47.299501 9.0856081 0.4266\n", | |
"Canton de Vaud Q12771 46.616666666667 6.55 0.3365\n", | |
"Guy Parmelin Q121160 46.45 6.2833333333333 0.3612\n", | |
"Christoph Blocher Q123857 47.69653 8.63386 0.5627\n", | |
"Suisse Q39 46.798562 8.231973 0.4221\n", | |
"Ueli Maurer Q123979 47.320833333333 8.7930555555556 0.7158\n" | |
] | |
} | |
], | |
"source": [ | |
"print(response.language) \n", | |
"for e in geoloc_entities.keys():\n", | |
" ue = geoloc_entities[e]\n", | |
" if not 'location' in ue: continue\n", | |
" loc = ue['location']\n", | |
" print(ue['entity'].id, ue['entity'].wikidata_id, loc['latitude'], loc['longitude'], ue['entity'].relevance_score)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Rösti score: 0.5852999999999999\n" | |
] | |
} | |
], | |
"source": [ | |
"# It would be nice to use geoqueries and actual boundaries, but..this is just a hack\n", | |
"ROESTIGRABEN_LATITUDE = 46.798562\n", | |
"roesti_score = 0\n", | |
"for e in geoloc_entities.keys():\n", | |
" ue = geoloc_entities[e]\n", | |
" if not 'location' in ue: continue\n", | |
" \n", | |
" factor = 1 # assume 'deu' orientation by default\n", | |
" if response.language == 'fre': factor = -1\n", | |
" # inverse the factor depending on which side of the border\n", | |
" if ue['location']['latitude'] > ROESTIGRABEN_LATITUDE: factor = factor * -1\n", | |
" \n", | |
" roesti_score = roesti_score + ue['entity'].relevance_score * factor\n", | |
" \n", | |
"print (\"Numb. entities:\", geoloc_total)\n", | |
"print (\"Rösti score:\", roesti_score)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<a href=\"https://imgflip.com/i/2nbxlk\"><img src=\"https://i.imgflip.com/2nbxlk.jpg\" title=\"made at imgflip.com\"/></a>" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.6" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment