Skip to content

Instantly share code, notes, and snippets.

@jacKlinc
Last active February 13, 2024 15:06
Show Gist options
  • Save jacKlinc/6208d44ecda08c03e0291bfc2cd8c31b to your computer and use it in GitHub Desktop.
Save jacKlinc/6208d44ecda08c03e0291bfc2cd8c31b to your computer and use it in GitHub Desktop.
Jupyter notebook that lists, downloads and saves all Bellingcat's articles to a CSV file
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"from functools import reduce\n",
"\n",
"from bs4 import BeautifulSoup\n",
"from newspaper import Article\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Scrape Bellingcat's Articles\n",
"\n",
"This notebook aims to scrape all of Bellingcat's articles and output them to a file.\n",
"\n",
"The steps include:\n",
"1. Looping over each month of each year and collecting URLS using BeautifulSoup\n",
"2. Using the same URL list with `newspaper3k` to obtain the article data\n",
"3. Save article data to a CSV file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `newspaper3k` Example"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'\n",
"article = Article(url)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"article.download()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"article.parse()\n",
"article.authors"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"datetime.datetime(2013, 12, 30, 0, 0)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"article.publish_date"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'By Leigh Ann Caldwell\\n\\nWASHINGTON (CNN) — Not everyone subscribes to a New Year’s resolution, but Americans will be required to follow new laws in 2014.\\n\\nSome 40,000 measures taking effect range from sweeping, national mandates under Obamacare to marijuana legalization in Colorado, drone prohibition in Illinois and transgender protections in California.\\n\\nAlthough many new laws are controversial, they made it through legislatures, public referendum or city councils and represent the shifting composition of American beliefs.\\n\\nFederal: Health care, of course, and vending machines\\n\\nThe biggest and most politically charged change comes at the federal level with the imposition of a new fee for those adults without health insurance.\\n\\nFor 2014, the penalty is either $95 per adult or 1% of family income, whichever results in a larger fine.\\n\\nThe Obamacare, of Affordable Care Act, mandate also requires that insurers cover immunizations and some preventive care.\\n\\nAdditionally, millions of poor Americans will receive Medicaid benefits starting January 1.\\n\\nThousands of companies will have to provide calorie counts for products sold in vending machines.\\n\\nLocal: Guns, family leave and shark fins\\n\\nConnecticut: While no national legislation was approved to tighten gun laws a year after the Newtown school shooting, Connecticut is implementing a final round of changes to its books: All assault weapons and large capacity magazines must be registered.\\n\\nOregon: Family leave in Oregon has been expanded to allow eligible employees two weeks of paid leave to handle the death of a family member.\\n\\nCalifornia: Homeless youth are eligible to receive food stamps. The previous law had a minimum wage requirement.\\n\\nDelaware: Delaware is the latest in a growing number of states where residents can no longer possess, sell or distribute shark fins, which is considered a delicacy in some East Asian cuisine.\\n\\nIllinois and drones\\n\\nIllinois: passed two laws limiting the use of drones. One prohibits them from interfering with hunters and fisherman. The measure passed after the group People for the Ethical Treatment of Animals said it would use drones to monitor hunters. PETA said it aims through its “air angels” effort to protect against “cruel” and “illegal” hunting.\\n\\nAlso in Illinois, another law prohibits the use of drones for law enforcement without a warrant.\\n\\nGender and voting identity\\n\\nCalifornia: Students can use bathrooms and join school athletic teams “consistent with their gender identity,” even if it’s different than their gender at birth.\\n\\nArkansas: The state becomes the latest state requiring voters show a picture ID at the voting booth.\\n\\nMinimum wage and former felon employment\\n\\nWorkers in 13 states and four cities will see increases to the minimum wage.\\n\\nWhile most amount to less than 15 cents per hour, workers in places like New Jersey, and Connecticut.\\n\\nNew Jersey residents voted to raise the state’s minimum wage by $1 to $8.25 per hour. And in Connecticut, lawmakers voted to raise it between 25 and 75 cents to $8.70. The wage would go up to $8 in Rhode Island and New York.\\n\\nCalifornia is also raising its minimum wage to $9 per hour, but workers must wait until July to see the addition.\\n\\nRhode Island: It is the latest state to prohibit employers from requiring job applicants to signify if they have a criminal record on a job application.\\n\\nSocial media and pot\\n\\nOregon: Employers and schools can’t require a job or student applicant to provide passwords to social media accounts.\\n\\nColorado: Marijuana becomes legal in the state for buyers over 21 at a licensed retail dispensary.\\n\\n(Sourcing: much of this list was obtained from the National Conference of State Legislatures).'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"article.text"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"''"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"article.summary"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"article.additional_data"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'New Year, new laws: Obamacare, pot, guns and drones'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"article.title"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Month's Articles' URLs"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"BASE_URL = \"https://www.bellingcat.com\"\n",
"BELLINGCAT_START_YEAR = 2014 # earliest article on site"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'year': 2023,\n",
" 'month': 3,\n",
" 'url': 'https://www.bellingcat.com/news/2023/03/29/how-online-investigators-proved-video-of-ukrainian-soldiers-harassing-woman-was-staged/'},\n",
" {'year': 2023,\n",
" 'month': 3,\n",
" 'url': 'https://www.bellingcat.com/news/2023/03/21/tiger-sheikhs-uae-royals-wildlife-shoots/'},\n",
" {'year': 2023,\n",
" 'month': 3,\n",
" 'url': 'https://www.bellingcat.com/news/2023/03/03/ryodan-anime-teens-kremlin-russia-ukraine/'}]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def list_months_article(year: int, month: int):\n",
" url = f\"{BASE_URL}/news/{year}/0{month}/\"\n",
" res = requests.get(url)\n",
" articles = BeautifulSoup(res.content, \"html.parser\")\n",
" news_item_tags = articles.find_all(\"div\", {\"class\": \"news_item__image\"})\n",
"\n",
" create_object = lambda tag: {\n",
" \"year\": year,\n",
" \"month\": month,\n",
" \"url\": tag.findChild(\"a\")[\"href\"],\n",
" }\n",
"\n",
" return [create_object(t) for t in news_item_tags]\n",
"\n",
"\n",
"list_months_article(2023, 3)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'year': 2023,\n",
" 'month': 1,\n",
" 'url': 'https://www.bellingcat.com/news/2023/01/27/anatomy-of-a-shelling-how-russian-rocket-artillery-struck-mykolaiv/'},\n",
" {'year': 2023,\n",
" 'month': 2,\n",
" 'url': 'https://www.bellingcat.com/news/2023/02/24/russias-assault-on-daily-life-in-ukraine/'},\n",
" {'year': 2023,\n",
" 'month': 2,\n",
" 'url': 'https://www.bellingcat.com/news/2023/02/21/borderless-vigilantism-the-nativist-us-militias-entering-mexico/'},\n",
" {'year': 2023,\n",
" 'month': 2,\n",
" 'url': 'https://www.bellingcat.com/news/uk-and-europe/2023/02/20/ukraine-war-anniversary-one-year/'},\n",
" {'year': 2023,\n",
" 'month': 2,\n",
" 'url': 'https://www.bellingcat.com/news/2023/02/13/how-wagner-gave-three-90s-russian-crime-bosses-a-new-lease-of-death/'},\n",
" {'year': 2023,\n",
" 'month': 2,\n",
" 'url': 'https://www.bellingcat.com/news/2023/02/03/wanted-by-interpol-relaxing-in-dubai-geolocating-isabel-dos-santos-life-of-luxury/'},\n",
" {'year': 2023,\n",
" 'month': 3,\n",
" 'url': 'https://www.bellingcat.com/news/2023/03/29/how-online-investigators-proved-video-of-ukrainian-soldiers-harassing-woman-was-staged/'},\n",
" {'year': 2023,\n",
" 'month': 3,\n",
" 'url': 'https://www.bellingcat.com/news/2023/03/21/tiger-sheikhs-uae-royals-wildlife-shoots/'},\n",
" {'year': 2023,\n",
" 'month': 3,\n",
" 'url': 'https://www.bellingcat.com/news/2023/03/03/ryodan-anime-teens-kremlin-russia-ukraine/'},\n",
" {'year': 2023,\n",
" 'month': 4,\n",
" 'url': 'https://www.bellingcat.com/news/2023/04/28/anatomy-of-three-leaked-us-intelligence-documents/'},\n",
" {'year': 2023,\n",
" 'month': 4,\n",
" 'url': 'https://www.bellingcat.com/news/2023/04/09/from-discord-to-4chan-the-improbable-journey-of-a-us-defence-leak/'},\n",
" {'year': 2023,\n",
" 'month': 5,\n",
" 'url': 'https://www.bellingcat.com/news/2023/05/25/mapping-the-aftermath-of-the-kyrgyzstan-tajikistan-border-clashes/'},\n",
" {'year': 2023,\n",
" 'month': 5,\n",
" 'url': 'https://www.bellingcat.com/news/2023/05/11/grain-trail-tracking-russias-ghost-ships-with-satellite-imagery/'},\n",
" {'year': 2023,\n",
" 'month': 5,\n",
" 'url': 'https://www.bellingcat.com/news/2023/05/09/tracing-the-odnoklassniki-profile-of-the-texas-mall-shooter/'},\n",
" {'year': 2023,\n",
" 'month': 5,\n",
" 'url': 'https://www.bellingcat.com/news/2023/05/04/lucas-villa-cell-phone-data-gives-new-insight-into-investigation-of-colombian-protestors-killing/'},\n",
" {'year': 2023,\n",
" 'month': 5,\n",
" 'url': 'https://www.bellingcat.com/news/2023/05/03/international-far-right-fight-night-comes-to-budapest/'},\n",
" {'year': 2023,\n",
" 'month': 6,\n",
" 'url': 'https://www.bellingcat.com/news/2023/06/30/the-mainstream-publishers-distributors-and-bookshops-selling-satanist-neo-nazi-books/'},\n",
" {'year': 2023,\n",
" 'month': 6,\n",
" 'url': 'https://www.bellingcat.com/news/2023/06/29/analysing-the-cost-of-wagner-revolt/'},\n",
" {'year': 2023,\n",
" 'month': 6,\n",
" 'url': 'https://www.bellingcat.com/news/2023/06/29/satellite-imagery-reveals-russia-caused-flooding-in-occupied-ukrainian-town-before-counter-offensive/'},\n",
" {'year': 2023,\n",
" 'month': 6,\n",
" 'url': 'https://www.bellingcat.com/news/2023/06/23/site-of-alleged-wagner-camp-attack-recently-visited-by-war-blogger/'},\n",
" {'year': 2023,\n",
" 'month': 6,\n",
" 'url': 'https://www.bellingcat.com/news/2023/06/23/russias-foreign-fighters-geolocating-the-nepalis-training-in-the-russian-army/'},\n",
" {'year': 2023,\n",
" 'month': 7,\n",
" 'url': 'https://www.bellingcat.com/news/2023/07/24/over-500-days-of-the-russia-ukraine-monitor-map/'},\n",
" {'year': 2023,\n",
" 'month': 8,\n",
" 'url': 'https://www.bellingcat.com/news/2023/08/31/saunas-and-swastikas-finlands-summertime-neo-nazi-meet-up/'},\n",
" {'year': 2023,\n",
" 'month': 8,\n",
" 'url': 'https://www.bellingcat.com/news/2023/08/21/russias-ghost-ships-and-the-evolution-of-a-grain-smuggling-operation/'},\n",
" {'year': 2023,\n",
" 'month': 8,\n",
" 'url': 'https://www.bellingcat.com/news/2023/08/16/untangling-the-mystery-of-the-worlds-first-rooftop-solar-panel/'},\n",
" {'year': 2023,\n",
" 'month': 8,\n",
" 'url': 'https://www.bellingcat.com/news/2023/08/10/revealing-andrew-tates-secretive-war-room-brothers/'},\n",
" {'year': 2023,\n",
" 'month': 8,\n",
" 'url': 'https://www.bellingcat.com/news/2023/08/04/solving-world-war-ii-photo-mysteries-with-open-source-techniques/'},\n",
" {'year': 2023,\n",
" 'month': 8,\n",
" 'url': 'https://www.bellingcat.com/news/2023/08/02/jenin-open-source-insights-on-israels-july-raids/'},\n",
" {'year': 2023,\n",
" 'month': 9,\n",
" 'url': 'https://www.bellingcat.com/news/2023/09/28/azerbaijan-consolidates-control-armenians-flee-nagorno-karabakh/'},\n",
" {'year': 2023,\n",
" 'month': 9,\n",
" 'url': 'https://www.bellingcat.com/news/2023/09/21/chaos-and-crisis-as-azerbaijan-attacks-nagorno-karabakh/'},\n",
" {'year': 2023,\n",
" 'month': 9,\n",
" 'url': 'https://www.bellingcat.com/news/2023/09/13/us-neo-nazi-says-he-fought-in-ukraine-records-place-him-in-florida/'},\n",
" {'year': 2023,\n",
" 'month': 9,\n",
" 'url': 'https://www.bellingcat.com/news/2023/09/06/geolocating-russias-disgraced-general-surovikin/'}]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def flatten_list(x, y):\n",
" return x + y\n",
"\n",
"\n",
"def list_all_articles():\n",
" nested_links = [list_years_articles(y) for y in range(BELLINGCAT_START_YEAR, 2024)]\n",
" return reduce(flatten_list, nested_links)\n",
"\n",
"\n",
"# TODO refactor\n",
"def list_years_articles(year: int):\n",
" nested_links = [list_months_article(year, i) for i in range(1, 13)]\n",
" return reduce(flatten_list, nested_links)\n",
"\n",
"\n",
"list_years_articles(2023)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"all_articles = list_all_articles() # 29s to run"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>url</th>\n",
" <th>path</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014</td>\n",
" <td>7</td>\n",
" <td>https://www.bellingcat.com/news/uk-and-europe/...</td>\n",
" <td>/news/uk-and-europe/2014/07/31/did-coulsons-ne...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014</td>\n",
" <td>7</td>\n",
" <td>https://www.bellingcat.com/news/uk-and-europe/...</td>\n",
" <td>/news/uk-and-europe/2014/07/30/the-context-of-...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2014</td>\n",
" <td>7</td>\n",
" <td>https://www.bellingcat.com/news/uk-and-europe/...</td>\n",
" <td>/news/uk-and-europe/2014/07/30/other-dark-arts...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2014</td>\n",
" <td>7</td>\n",
" <td>https://www.bellingcat.com/news/uk-and-europe/...</td>\n",
" <td>/news/uk-and-europe/2014/07/28/how-rebekah-bro...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2014</td>\n",
" <td>7</td>\n",
" <td>https://www.bellingcat.com/news/uk-and-europe/...</td>\n",
" <td>/news/uk-and-europe/2014/07/28/the-buk-that-co...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>586</th>\n",
" <td>2023</td>\n",
" <td>8</td>\n",
" <td>https://www.bellingcat.com/news/2023/08/02/jen...</td>\n",
" <td>/news/2023/08/02/jenin-open-source-insights-on...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>587</th>\n",
" <td>2023</td>\n",
" <td>9</td>\n",
" <td>https://www.bellingcat.com/news/2023/09/28/aze...</td>\n",
" <td>/news/2023/09/28/azerbaijan-consolidates-contr...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>588</th>\n",
" <td>2023</td>\n",
" <td>9</td>\n",
" <td>https://www.bellingcat.com/news/2023/09/21/cha...</td>\n",
" <td>/news/2023/09/21/chaos-and-crisis-as-azerbaija...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>589</th>\n",
" <td>2023</td>\n",
" <td>9</td>\n",
" <td>https://www.bellingcat.com/news/2023/09/13/us-...</td>\n",
" <td>/news/2023/09/13/us-neo-nazi-says-he-fought-in...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>590</th>\n",
" <td>2023</td>\n",
" <td>9</td>\n",
" <td>https://www.bellingcat.com/news/2023/09/06/geo...</td>\n",
" <td>/news/2023/09/06/geolocating-russias-disgraced...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>591 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" year month url \\\n",
"0 2014 7 https://www.bellingcat.com/news/uk-and-europe/... \n",
"1 2014 7 https://www.bellingcat.com/news/uk-and-europe/... \n",
"2 2014 7 https://www.bellingcat.com/news/uk-and-europe/... \n",
"3 2014 7 https://www.bellingcat.com/news/uk-and-europe/... \n",
"4 2014 7 https://www.bellingcat.com/news/uk-and-europe/... \n",
".. ... ... ... \n",
"586 2023 8 https://www.bellingcat.com/news/2023/08/02/jen... \n",
"587 2023 9 https://www.bellingcat.com/news/2023/09/28/aze... \n",
"588 2023 9 https://www.bellingcat.com/news/2023/09/21/cha... \n",
"589 2023 9 https://www.bellingcat.com/news/2023/09/13/us-... \n",
"590 2023 9 https://www.bellingcat.com/news/2023/09/06/geo... \n",
"\n",
" path \n",
"0 /news/uk-and-europe/2014/07/31/did-coulsons-ne... \n",
"1 /news/uk-and-europe/2014/07/30/the-context-of-... \n",
"2 /news/uk-and-europe/2014/07/30/other-dark-arts... \n",
"3 /news/uk-and-europe/2014/07/28/how-rebekah-bro... \n",
"4 /news/uk-and-europe/2014/07/28/the-buk-that-co... \n",
".. ... \n",
"586 /news/2023/08/02/jenin-open-source-insights-on... \n",
"587 /news/2023/09/28/azerbaijan-consolidates-contr... \n",
"588 /news/2023/09/21/chaos-and-crisis-as-azerbaija... \n",
"589 /news/2023/09/13/us-neo-nazi-says-he-fought-in... \n",
"590 /news/2023/09/06/geolocating-russias-disgraced... \n",
"\n",
"[591 rows x 4 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(all_articles)\n",
"df['path'] = df.url.apply(lambda x: x.split(BASE_URL, 1)[1])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 https://www.bellingcat.com/news/uk-and-europe/...\n",
"1 https://www.bellingcat.com/news/uk-and-europe/...\n",
"2 https://www.bellingcat.com/news/uk-and-europe/...\n",
"3 https://www.bellingcat.com/news/uk-and-europe/...\n",
"4 https://www.bellingcat.com/news/uk-and-europe/...\n",
"Name: url, dtype: object"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_5 = df.url[:5]\n",
"first_5"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"# df.to_csv(\"all-bellingcat-articles.csv\", index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parse Article"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'text': '\\n\\nMore on the Fake Sheikh, the Police, and News of the World by occasional blogger @jpublik.\\n\\nAndy Coulson‘s News of the World sent a man to jail after luring him to sell them drugs he was terrified of carrying by promising him a job. He was sentenced to four years in prison before his conviction was quashed – after he’d already served his time.\\n\\nIn a case which has hardly received any publicity, according to high court documents, Albanian Besnik Qema was asked to supply News of the World cocaine and a passport on a promise of job as security for a wealthy Arab family.\\n\\nThe High Court documents detail how in January 2005, Mazher Mahmood had asked Florim Gashi, a contact of his who he had used in previous “set-up” stings to find someone who could be implicated in a story he or the News of the World wanted to run about false passports, drugs and guns. Gashi then adopted the identity of a female called Aurora and through an internet chat room used by expatriate Albanians established contact with Qema.\\n\\nOver the course of lengthy 4 or 5 telephone calls, Gashi then “honey trapped” him. Using his false female identity, he held out the prospect of romance/sex between them and also the prospect Qema might be able to get employment as a security guard with a rich Arab family at the rate of £8,000 a month.\\n\\nGashi told Qema ‘she’ had facilitated a meeting between him and a member of this family called Mohammed, and Qema’s chances of employment chances of employment would be enhanced if he could supply the family with cocaine. And a false British passport. Mohammed was in fact Mazher Mahmood.\\n\\nQema obtained 3 grams of cocaine from a man called Mehmet who was an associate of his. On February 4th, at a pre-arranged meeting meeting at a McDonalds in Liverpool Street supplied the three individual wraps to Mohammed (Mahmood) in return for £210 and a further supply was discussed. Mahmood was accompanied by Kishan Athulathmudali, an employee of News of the World who was posing as a member of Mohammed’s family.\\n\\nGashi, again under the guise of Aurora, then asked Qema to obtain a false British passport for Mohammed, so it was said, Mohammed wanted one for his cousin who was in the country illegally. At a further meeting with Kishan, Qema was given a passport photograph to use for the fake passport and £200 deposit for it which he then handed to Mehmet.\\n\\nOn 11th February, arrangements are made between Qema and Aurora for more cocaine and the passport to be handed over to Mahmood the next day at the Hilton Hotel in Park Lane, London.\\n\\nMahmood then tipped off the police that an undercover operation had led the newspaper to discover via a confidential source [Gashi] that Qema was actively involved in crime and dealing with drugs and false passports and he had access to firearms. Mahmood did not tell the police the source was Gashi. The reason for this is because a case involving Gashi had already collapsed 18 months earlier when the Crown Prosecution Service had found Mahmood had paid Gashi £10,000 who then become main prosecution witness. It’s clear from then on, Mahmood had to hide Gashi from prosecutors.\\n\\nMahmood supplied the police with with a photograph of Qema and told them analysis had showed the powder already supplied by Qema on 4th February was cocaine. He also told them Qema would be in possession of more cocaine and a passport at a meeting arranged at the Hilton Hotel on 12th February.\\n\\nMahmood didn’t tell police of the circumstances in which Qema came to be in possession of cocaine and a fake passport, in particular, the inducement offered of a job as a security guard or that the drugs had been a suggested sweetener to enhance Qema’s chances of getting the job.\\n\\nOn 12th February, Qema met Mahmood as agreed at the Hilton Hotel. He was so terrified of carrying the drugs in his pockets, Mehmet delivered it to the hotel and gave it to Qema outside the hotel. Qema went back into the Hikton and was arrested by police in the coffee shop minutes later, in possession of both the cocaine and false passport.\\n\\nThe next day on 13th February News of the World ran the story under a Mazher Mahood byline story;\\n\\n‘Cops swoop after we expose a scandal; he [Qema] met us in McDonald’s at London’s Liverpool Street Station. After claiming he supplied drugs to celebs, Qema gave us three wraps of cocaine “it’s good stuff, around £70 a gram.”‘\\n\\n‘In another meeting, Qema turned into an immigrant smuggler. He said; “it’s £1,200 for a travel document and I can get you a passport for £1,700.”‘\\n\\n‘But evil Qema has another sideline. He told our reporter; “if you like, I can get you a gun. They start from £300″‘\\n\\nThere is also a quote given from ‘Aurora’ claiming Qema is a pimp.\\n\\nThe next day, Febuary the 14th Qema pleads guilty at Bow Street Magistrates Court to 3 charges: one supply of a class A drug (the 3 grams of cocaine supplied in McDonald’s), one of possession with intent to supply a class A drug, and one of possession of a false instrument (respectively the drugs and false French passport he had in his possession when arrested at the Hilton). He was remanded in custody and committed to Southwark Crown Court for sentencing.\\n\\nOn 14th March Qema appeared at the the Southwark Crown Court for sentence before HH Judge Dodgson. In his witness statement Mahmood did not reveal the name of Gashi, but said he was told of Qema activities by a confidential source and he exhibited one of the telephone calls between Qema and his confidential source [Gashi].\\n\\nDuring Qema’s plea in mitigation, an account was given to the Judge of his entrapment by Mahmood and his colleagues from the News of the World involving a female called Aurora. Counsel accepted on Qema’s behalf that; “It is one of those cases…where entrapment can be used as full mitigation, not a defence” and said “it is accepted that this defendant was a willing participant in the matter. It was also said Qema had been “momentarily blinded by an offer, as fake as it was, of a glamorous well paid job…”, that “he fully accepts by his plea of guilty that (the supply of drugs) was an entirely a stupid thing to do” and that entrapment “does not afford a full defence. He went in with his eyes open as it were.”\\n\\nThe Judge sentenced Qema to 4 ½ years imprisonment: 3 years for the supply of cocaine; 12 months concurrent for possession of cocaine with intent to supply and 18 months consecutive for possession of a false instrument.\\n\\nOn June 24 2005 Qema sentenced was reduced by 9 months on appeal.\\n\\nOn September 6th, three Scotland Yard detectives flew out to Dubrovnik, Croatia to interview Florim Gashi. Gashi claimed Mahmood had told him he needed a story about someone who could get a British passport, a gun and drugs, so Gashi had gone on the internet posing as a female under the name of Aurora and found Qema. He told police Qema was a nice man and that he had induced him to obtain drugs and a false passport. He said; “Qema said I can’t do this but for your sake I’ll do it but i won’t carry drugs in my pocket. I said please do it. Qema says he’ll ask his friend to bring the drugs up to the hotel.” Gashi tells detectives Mehmet brought the drugs to the meeting – not Qema.\\n\\nTwo more Scotland Yard detectives flew out to Vienna this time to Gashi again where he told the; “I feel particularly guilty about 2 cases. One has resulted in a totally innocent man being sentenced to 4 years [Qema]. Another is the girl whose child was taken into care.”\\n\\nThe very next month, in October, Guardian columnist Roy Greenslade revealed Scotland Yard were investigating Mazher Mahmood and News of the World – whilst collaborating at the same time – because of Gashi’s explosive new claims. However, Qema remains in prison.\\n\\nHe would remain there for almost another year.\\n\\nQema was released on license on 17th August 2006 but tagged, and made the subject of a curfew. The next day Qema sought leave to appeal against his conviction out of tint on the basis that material had come to light as a result of other trials which had taken place following “sting” operations by Mazher Mahmood and his associates, but the Court of Appeal declined jurisdiction: given his plea of guilty.\\n\\nThe previous month, Gashi told the jury at the red mercury trial he told that he set up the Victoria Beckham plot with Mahmood. He said; “Maz said I would get £10,000 and another £5,000 if they got prosecuted,” he added “I would get it if I could convince them to talk about the kidnap of Victoria Beckham and her children.” The case is thrown out – three men walk free.\\n\\nOn September 9th 2010 at Southwark Crown Court with the consent if the CPS, Qema was permitted to vacate his guilty plea. His conviction was the quashed.\\n\\nMr Bowen QC for Qema told the hearing Mr Mahmood knew there was little or no prospect Mr Qema being brough to or convicted after a “fair and impartial trial ” because he was aware that the crucial evidence on which the prosecution would be based, was that of Gashi, who’s evidence could not be relied on by the prosecution; and because he know Mr Qema had been entrapped and that if the circumstances of the entrapment had been known, it was unlikely that prosecution would be brought.\\n\\nThis case has never reported by the media – apart from Roy Greenslade four months later. New allegations of phone hacking is reported by New York Times the first week of September so that remains the agenda before it explodes in the Summer of 2011.\\n\\nRelated articles',\n",
" 'publish_date': datetime.datetime(2014, 7, 31, 0, 0),\n",
" 'title': 'Did Coulson’s News of the World Incite Others to Commit Crimes and Cause Unsafe Convictions?'}"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def get_article_text(url: str):\n",
" article = Article(url)\n",
" article.download()\n",
" article.parse()\n",
"\n",
" return {\n",
" \"text\": (\n",
" article.text.split(article.title)[1]\n",
" if article.title in article.text\n",
" else article.text\n",
" ), # removes title from text if there\n",
" \"publish_date\": article.publish_date,\n",
" \"title\": article.title,\n",
" }\n",
"\n",
"\n",
"t = get_article_text(\n",
" \"https://www.bellingcat.com/news/uk-and-europe/2014/07/31/did-coulsons-news-of-the-world-incite-others-to-commit-crimes-and-cause-unsafe-convictions/\"\n",
")\n",
"t"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"articles_text = df.url.map(get_article_text)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"extract_dict_key = lambda s, key: s.apply(lambda x: x[key])\n",
"\n",
"df[\"articles_text\"] = extract_dict_key(articles_text, \"text\")\n",
"df[\"publish_date\"] = extract_dict_key(articles_text, \"publish_date\")\n",
"df[\"title\"] = extract_dict_key(articles_text, \"title\")\n",
"\n",
"df.to_csv(\"all-bellingcat-articles.csv\", index=False)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "env",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment