Skip to content

Instantly share code, notes, and snippets.

@hancush
Last active December 21, 2018 17:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hancush/8f652b0950d402ba587ef7ed1bdd9057 to your computer and use it in GitHub Desktop.
Save hancush/8f652b0950d402ba587ef7ed1bdd9057 to your computer and use it in GitHub Desktop.

🤖 Make some data

Your mission

Write a Makefile that gets data from Data.gov and creates a CSV of the 50 most recent datasets uploaded. For each dataset, the output should include:

  • The name of the dataset
  • The date it was uploaded
  • A download link

Then, also using Data.gov, create a summary CSV showing how many datasets are in each topic category (Environment, Education, Transportation, etc).

The rules

  • You should be able to run a single command to produce both CSV files.
  • You should be able to accomplish this task using only Make and command line utilities, namely csvkit. (See their tutorial, here.)

Some hints

  1. Data.gov has an API from which you can fetch information about its datasets.
  2. in2csv can convert more than Excel files to CSV.
  3. It is very helpful to write down the steps, before you start writing recipes, e.g., "Get the data from the API", ..., "Write the CSV." Each recipe should accomplish one step.

Reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment