Skip to content

Instantly share code, notes, and snippets.

Last active Mar 23, 2022
What would you like to do?
Python: get FOSS4G 2022 abstracts into a JSON file

This simple scripts scraps the FOSS4G Community Review pages to convert the abstracts into a JSON file that will contain for each abstract:

  • Page number
  • Title
  • Abstract in HTML
  • Your score if it exists

Requirements: Python 3, BeautifulSoup, and Requests

It expects a FOSS4G_ID environment variable that is the variable part of the URL that you get when you sign in for the community review:{FOSS4G_ID}?.

# -*- coding: utf-8 -*-
import os, sys
from bs4 import BeautifulSoup
import requests
import json
def is_checked(tag):
return tag.has_attr('checked')
def processPage(page):
r = requests.get(url,params={'page': page})
soup = BeautifulSoup(r.text, 'html.parser')
cards = soup.find_all('div', class_="submission-card")
abstracts = []
for card in cards:
abstract = {'page': page}
abstract['title'] = card.find('h3').text
html_tags = filter(lambda x: x != '\n',card.find('div', class_='card-text').children)
abstract['html'] = ''.join(map(lambda x : str(x), html_tags))
checked = card.find_all(is_checked)
abstract['score'] = int(checked[0]['value']) if len(checked) == 1 else None
return abstracts
if 'FOSS4G_ID' not in os.environ:
print('FOSS4G_ID environment variable not found', file=sys.stderr)
FOSS4G_ID = os.environ['FOSS4G_ID']
abstracts = []
abstracts = [processPage(page) for page in range(1,21)]
print(json.dumps(abstracts, sort_keys=True, indent=4))
print("Unexpected error:", sys.exc_info()[0], file=sys.stderr)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment