Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Created May 11, 2018 21:48
Show Gist options
  • Save dannguyen/7b62a29612daae1698756cbc563083db to your computer and use it in GitHub Desktop.
Save dannguyen/7b62a29612daae1698756cbc563083db to your computer and use it in GitHub Desktop.
Some quickie Shell snippets to do the fetching and unzipping of the Facebook-Congress-Russia data files

Context

Facebook has been under scrutiny because of how its ad platform may have been used by foreign actors during the 2016 election. In May 2018, Facebook released ad data to a House committee, which subsequently published the data online.

As part of that continuing effort to educate the public and seek additional analysis, the Committee Minority is making available all IRA advertisements identified by Facebook. This is an effort to be fully transparent with the public, allow outside experts to analyze the data, and provide the American people a fuller accounting of Russian efforts to sow discord and interfere in our democracy.

You can read more about the events here:

https://democrats-intelligence.house.gov/facebook-ads/

The landing page with a list of zip files is here:

https://democrats-intelligence.house.gov/facebook-ads/social-media-advertisements.htm

Some folks have released their versions of the parsed data, such as data.world: https://data.world/scottcame/us-house-psci-social-media-ads

Fetching code

The following snippets run in Bash:

Downloading the zip files

There's only about a dozen files right now, so it probably feels easy enough to point-and-click to download. But it's still easier to just use wget.

The following script downloads the zip files into the current working directory:

wget --recursive \
     --level 1 \
     --no-directories \
     --accept *.zip \
     https://democrats-intelligence.house.gov/facebook-ads/social-media-advertisements.htm

Unzipping the zip files

The following snippet creates a pdfs subdirectory relative to the current working directory and unzips the contents of the zip files into ./pdfs:

find ./*.zip -type f -exec unzip {} -d pdfs \;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment