dannguyen/fetch-and-extract-facebook-adpdfs.md

## fetch-and-extract-facebook-adpdfs.md

      
    Raw
  

              fetch-and-extract-facebook-adpdfs.md
            
          
    Context

Facebook has been under scrutiny because of how its ad platform may have been used by foreign actors during the 2016 election. In May 2018, Facebook released ad data to a House committee, which subsequently published the data online.

As part of that continuing effort to educate the public and seek additional analysis, the Committee Minority is making available all IRA advertisements identified by Facebook. This is an effort to be fully transparent with the public, allow outside experts to analyze the data, and provide the American people a fuller accounting of Russian efforts to sow discord and interfere in our democracy.

You can read more about the events here:
https://democrats-intelligence.house.gov/facebook-ads/
The landing page with a list of zip files is here:
https://democrats-intelligence.house.gov/facebook-ads/social-media-advertisements.htm
Some folks have released their versions of the parsed data, such as data.world: https://data.world/scottcame/us-house-psci-social-media-ads
Fetching code

The following snippets run in Bash:
Downloading the zip files

There's only about a dozen files right now, so it probably feels easy enough to point-and-click to download. But it's still easier to just use wget.
The following script downloads the zip files into the current working directory:
wget --recursive \
     --level 1 \
     --no-directories \
     --accept *.zip \
     https://democrats-intelligence.house.gov/facebook-ads/social-media-advertisements.htm
Unzipping the zip files

The following snippet creates a pdfs subdirectory relative to the current working directory and unzips the contents of the zip files into ./pdfs:
find ./*.zip -type f -exec unzip {} -d pdfs \;