Skip to content

Instantly share code, notes, and snippets.

Last active December 15, 2015 14:59
What would you like to do?
Accompanying outline for Eric Mill's talk at the Arizona Association of Law Libraries' Congressional Information Symposium.

Scout is a search engine and alert system for government information.

Scout takes a very broad approach, allowing simultaneous search over documents spanning different branches and levels of government.

  • Covers the bills and speeches of Congress, bills in all 50 US states, federal regulations, and other government documents.
  • Provides email alerts for saved searches, and for any activity on specific bills you follow.
  • URL:

It has some special abilities with legal citations.

Citations are the secret codes of legal documents. If you follow an issue, your research is incomplete without them.

  • A search for "5 USC 601" will turn up results that match "section 601 of title 5", and vice versa.
  • Allows for missing periods, section symbol (§), subsections, etc.
  • Crucial for tracking activity related to a particular area of the law.
  • Requires specialized pre-processing of every document.

Scout is possible because the documents it searches are publicly available, for free.

Scout isn't an original source for anything: it sits atop the work that people in and out of government have done to provide reliable and free public data.

  • The text of Congress' bills are published freely as PDF, XML, and plain text by the Government Printing Office, as part of their FDSys program.
  • Congress' speeches are also published freely in GPO's FDSys as plain text, and republished as data by the Sunlight Foundation's Capitol Words project.
  • State legislation is retrieved from official online sources in all 50 states, as part of the Sunlight Foundation's Open States project, standardized, and republished freely as data.
  • Federal regulations and notices for the entire executive branch are made freely available by in a wide variety of forms.

That "for free" part is important.

If the US Congress or the Federal Register charged by the page, or banned automatic downloading of the information - like PACER does - then the only third party services out there for searching through or analyzing government information would, by necessity, be quite expensive.

  • Systems like Scout, GovTrack, and CourtListener require bulk access to every document in a collection.
  • There's no such thing as a small fee for documents when answering questions requires thousands or even millions of them.

And it's not always obvious to everyone.

There is a common misconception that the only people who need legal data like this have a budget to pay for it, and leads us to make hobbling decisions that favor those who do.

  • The National Association of Public Administration recommended in a report to Congress that GPO seriously consider charging user fees for downloading its documents. This includes the text of all bills, laws, regulations, and a growing number of court opinions.
  • Until recently, the District of Columbia's own laws were owned and provided by a private contractor. DC had no means to guarantee its availability, and you had no right to copy its contents, due to the contractor's license agreement.
  • Some of the US' most important laws are behind a paywall - many public safety standards are only incorporated into public documents "by reference". To read them means paying hundreds or thousands of dollars for a copy.

Civic empowerment begins with a shared commons of data.

Information must be made available for complete reuse, without gateway, paywall, or permission.

  • Legal information belongs to the public, not to a government agency.
  • Government services can't provide every feature or analysis that people can conceive of.
  • Josh Tauberer, Derek Willis, myself, and others work on core tools and data for Congress and government at
  • Carl Malamud publishes reams of information at

The future of Congressional information is non-Congressional information.

Real issues rarely play out in Congress alone, and public input and attention is valuable across the government. Yet few people, even experienced public interest professionals, understand how the government or its information is organized well enough to look everywhere they need to.

  • Government in the US is a federated collection of independent beasts, and they (naturally) see their information in terms of the silos that they are collected in.
  • Civil society and the public tend to (naturally) focus heavily on Congress, the most directly responsive and melodramatic branch of government.
  • Any citizen or advocacy group that followed a bill closely in Congress should be on that bill as it becomes law, is put into practice, and gets molded by the courts.
  • The private sector does a great job of following their interests wherever and whenever they're affected. The public sector could be doing better.

The open government community has a responsibility to defend and expand our access to free public information, and to make it possible for anyone to take advantage of the power that access confers. Scout is one of the most dirt simple ways of bringing disparate information together. Let's make some others.

Other free search and alert services

  1. GovTrack - Bills in Congress, and the 50 states.
  2. CourtListener - Federal and state court opinions.
  3. Open States - State legislative activity in every state.
  4. Federal Register - The official source for US executive branch activity.
  5. They Work For You - The activity of the UK parliament.

Further reading

  1. Keeping GPO's Data Free
  2. The DC Code's transition from bad situation to new beginning
  3. Breaking the Law, by Reading It
  4. Recommendations to [Congress'] Bulk Data Task Force
  5. Government: Do You Really Need an API?

You can also find all the links that were shown at the talk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment