Skip to content

Instantly share code, notes, and snippets.

@bici-sancta
bici-sancta / lemondeScraper.py
Created December 20, 2022 18:30 — forked from xiaoouwang/lemondeScraper.py
Complete tutorial on scraping French news from le monde ❤️
# Author: Xiaoou Wang, Master’s student (currently in Paris) in NLP looking for a phd position/contrat cifre. [linkedin](https://www.linkedin.com/in/xiaoou-wang)/[email](mailto:xiaoouwangfrance@gmail.com)
# https://xiaoouwang.medium.com/complete-tutorial-on-scraping-french-news-from-le-monde-%EF%B8%8F-4fa92bc0a07b
# Have a look at https://soshace.com/responsible-web-scraping-gathering-data-ethically-and-legally/ before using the code.
import os # helper functions like check file exists
import datetime # automatic file name
import requests # the following imports are common web scraping bundle
from urllib.request import urlopen # standard python module
from bs4 import BeautifulSoup
from urllib.error import HTTPError
from collections import defaultdict
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# Special thanks for insights from flowingdata.com regarding this.
# Forked from danielecook/plot_runkeeper.R on 10-Apr-2015
library(plotKML)
library(plyr)
library(dplyr)
library(fpc)
num_locations <- 5