Skip to content

Instantly share code, notes, and snippets.

Created August 11, 2023 17:43
Show Gist options
  • Save lobstrio/9ca72017c01e27323007a9e9b37a2537 to your computer and use it in GitHub Desktop.
Save lobstrio/9ca72017c01e27323007a9e9b37a2537 to your computer and use it in GitHub Desktop.
🧙 Scrape all topics from the famous French forum — 'scraping' category only!
""" Forum Scraper
This script is used to scrape data from the forum, specifically from the "Scraping" category.
It retrieves information about forum topics and saves it as CSV data.
1. Install the required library using the following command:
$ pip install requests
2. Run this script using the following command:
$ python
Note: Make sure you have Python installed on your system.
Author: Sasha Bouloudnine
Date: 11/08/2023
Required Library:
- requests: Used for making HTTP requests to the forum API.
import requests
import time
import csv
import random
def scrap_growthhackingforum():
CURL = """curl '' \
-H 'sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"' \
-H 'Discourse-Present: true' \
-H 'X-CSRF-Token: GE3UrIV9vAoQodpEWcjAnl-zDWKL7XfLD4NrTqvZBiU4XqqFAf2s9-a3e0HFTh9c4Vsu_G9B5uHTAJbQZ4ymTw' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36' \
-H 'Discourse-Logged-In: true' \
-H 'Accept: application/json, text/javascript, */*; q=0.01' \
-H 'Referer:' \
-H 'X-Requested-With: XMLHttpRequest' \
-H 'sec-ch-ua-platform: "macOS"' \
"title":"Scrapper adresse mail avocats Paris ",
"fancy_title":"Scrapper adresse mail avocats Paris ",
"description":"Créateur du sujet",
"description":"Auteur fréquent",
"description":"Auteur le plus récent",
s = requests.Session()
DATA = []
TIMESTAMP = str(time.time()).replace('.','')
page = 1
while True:
headers = {
'sec-ch-ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"',
'Discourse-Present': 'true',
'X-CSRF-Token': 'GE3UrIV9vAoQodpEWcjAnl-zDWKL7XfLD4NrTqvZBiU4XqqFAf2s9-a3e0HFTh9c4Vsu_G9B5uHTAJbQZ4ymTw',
'sec-ch-ua-mobile': '?0',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36',
'Discourse-Logged-In': 'true',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Referer': '',
'X-Requested-With': 'XMLHttpRequest',
'sec-ch-ua-platform': '"macOS"',
params = {
'ascending': 'false',
'page': page,
print('> accessing page %s' % page)
response = requests.get('', params=params, headers=headers)
assert response.status_code == 200
j = response.json()
topics = j['topic_list']['topics']
if not topics:
for t in topics:
d = {}
for k in FIELDNAMES:
d[k] = t[k]
page += 1
return DATA
def write_data(DATA):
print('> writing data')
TIMESTAMP = str(time.time()).replace('.','')
with open('results_growthhackingscraping_%s.csv' % TIMESTAMP, 'w') as f:
writer = csv.DictWriter(f, fieldnames=FIELDNAMES)
for d in DATA:
if __name__ == '__main__':
DATA = scrap_growthhackingforum()
assert DATA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment