Skip to content

Instantly share code, notes, and snippets.

@realeroberto
Created February 10, 2021 23:19
Show Gist options
  • Save realeroberto/ad99d7cbfaf52594706f5822d62f6344 to your computer and use it in GitHub Desktop.
Save realeroberto/ad99d7cbfaf52594706f5822d62f6344 to your computer and use it in GitHub Desktop.
Generates a wordcloud from the most recent version of the Italian National recovery and resilience plan
#!/bin/sh
# pnrr2wordcloud
#
# Generates a wordcloud from the most recent version of the Italian National
# recovery and resilience plan (Piano Nazionale di Ripresa e Resilienza - PNRR)
#
# Author: Roberto Reale <roberto@reale.me>
# Last updated: 2021-02-11
PNRR=http://www.governo.it/sites/new.governo.it/files/PNRR_2021.pdf
STOPWORDS=https://raw.githubusercontent.com/stopwords-iso/stopwords-it/master/stopwords-it.txt
# Download stopwords for Italian
curl -s -o stopwords $STOPWORDS
# 1) Download the PNRR
# 2) PDF -> text
# 3) Generate the wordcloud
curl -so - $PNRR \
| pdftotext - - \
| wordcloud_cli --stopwords stopwords \
--imagefile pnrr-wordcloud.jpg \
--background black \
--width 1080 --height 1350
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment