Skip to content

Instantly share code, notes, and snippets.

Avatar
🎯
Focusing

lobstr lobstrio

🎯
Focusing
View GitHub Profile
@lobstrio
lobstrio / google_maps_scraping_selenium.py
Created Aug 3, 2021
Collect all data from a Search URL on Google Maps 👋
View google_maps_scraping_selenium.py
# _*_ coding: utf-8 _*°
# Copyright(C) 2021 lobstr
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import time
import csv
@lobstrio
lobstrio / lacentrale_scraper.py
Created Apr 15, 2021
Collect BMW vehicle data on lacentrale.fr
View lacentrale_scraper.py
# -*- coding: utf-8 -*-
# Copyright(C) 2021 Sasha Bouloudnine
import requests
from lxml import html
import csv
class CrawlerLaCentrale():
@lobstrio
lobstrio / amazon_xmas.py
Created Dec 20, 2018
Web Scraping Python Script for the Xmas Deals on Amazon using Requests
View amazon_xmas.py
# -*- coding: utf-8 -*-
# Copyright(C) 2018 Sasha Bouloudnine
import requests
import sys
import re
import ast
import json
import time
@lobstrio
lobstrio / lemonde_headlines.py
Created Dec 14, 2018
Extract headlines from French Media website lemonde.fr with Python3, Requests, and lxml
View lemonde_headlines.py
#!/usr/bin/python3
# coding: utf-8
import requests
from lxml import html
import re
import csv
from collections import Counter
class LeMondeScraper:
@lobstrio
lobstrio / twitter_dtrump.py
Last active Jan 8, 2021
Really simple Web Scraping Python Script for the first Tweets of Donald Trump using Requests, and lxml
View twitter_dtrump.py
#!/usr/bin/python3
# coding: utf-8
import requests
from lxml import html
def extract():
"""
Export all Tweets from @realDonaldTrump
@lobstrio
lobstrio / pagesjaunes_extract.py
Created Nov 21, 2018
Extract name and phone on PageJaunes.fr through Python 3, Request and lxml
View pagesjaunes_extract.py
#!/usr/bin/python3
# coding: utf-8
import requests
import csv
from lxml import html
import datetime
import argparse
@lobstrio
lobstrio / tripadvisor_mail.py
Last active Jun 2, 2021
Extract dynamically @mail on Tripadvisor.com, using Python 3, Request, and lxm
View tripadvisor_mail.py
#!/usr/bin/python3
# coding: utf-8
import requests
from lxml import html
import datetime
import re
import argparse
@lobstrio
lobstrio / pdf_parser.py
Created Aug 16, 2018
Python 3 script to convert .pdf file into .txt output using PDFMiner
View pdf_parser.py
#!/usr/bin/python3
# coding: utf-8
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.pdfpage import PDFPage
from io import BytesIO
import argparse
@lobstrio
lobstrio / captcha-solver.py
Last active Oct 9, 2021
Solving (simple) Captcha, using PyTesseract, PIL, and Python 3
View captcha-solver.py
#!/usr/bin/python3
# coding: utf-8
import pytesseract
import os
import argparse
try:
import Image, ImageOps, ImageEnhance, imread
except ImportError:
from PIL import Image, ImageOps, ImageEnhance
@lobstrio
lobstrio / leboncoin_avgprice.py
Created Aug 2, 2018
Compute dynamically avg. price of an item on Leboncoin.fr based on the 100 first items, using Python 3 and Request
View leboncoin_avgprice.py
#!/usr/bin/python3
# coding: utf-8
import requests
from bs4 import BeautifulSoup
from scrapy import Selector
import datetime
import argparse