Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Final report for Google Season of Docs 2020

Improving Data Commons getting started documentation: final project report

Data Commons aims to simplify the process of data science by linking data from a variety of sources into one knowledge graph of information, simplifying the process of data cleaning for modeling. The purpose of this project was to make the knowledge graph more accessible to end users and easier for developers to add contributions.

Planning: Initial objectives & changes to those objectives

In my proposal, I initially noted the following pain points in the Data Commons documentation:

  • The directions for adding data sets in the 'Get Involved' section were short and unclear.
  • The tutorials section only offered Python notebooks, with no reference to other Data Commons API wrappers.
  • At the time of application, Data Commons did not offer any tools or visualizations built using its knowledge graph.

The first two goals were retained with some changes. We rewrote the dataset contribution guidelines, providing additional information on how to contribute datasets to the knowledge graph. We also added tutorial material for the API's Sheets wrapper--a change from the initial plan to focus on the knowledge graph's R wrapper as the team re-focused engineering resources.

With the release of Data Commons' Place Explorer tool, the initial third goal to build a sample application with the API was rendered moot. Therefore, we pivoted the project proposal to restructure the total endpoint documentation, providing examples for every endpoint across all the API wrappers.

Phase I: Rewriting documentation for aspiring Data Commons contributors

Our first documentation update was to the dataset contribution pages. We created the following pull requests to address this need:

NOTE: These PRs no longer correspond to what is published on the Data Commons documentation main site--after Google incorporated the API into search results for demographic information, the product team wanted to temporarily scale back on community dataset contributions.

Phase II: Updating and restructuring documentation for the Data Commons API and its wrappers

Next, we moved to update and restructure the examples and informations for all endpoints, methods, and formulae provided by Data Commons through its REST API and through its wrappers available for Python and Sheets. We made these pull requests:

REST API PRs

Python wrapper PRs

Pandas wrapper PRs

Sheets wrapper PRs

I also created the general cleanup PR https://github.com/datacommonsorg/docsite/pull/154.

The results of these PRs can be seen on the Data Commons main documentation site in the API section.

In this phase, I tried a couple of different approaches to re-formatting the API docs using Swagger and Redoc.ly. We eventually decided to move away from these approaches, since they didn't translate to consistent design appearance across REST, Python, and Google Sheets docs. However, we were able to incorporate a new plugin to tab between different examples of REST endpoint usage, including sample code in Javascript presented using JS Fiddle.

Phase III: Creating educational examples using the Sheets wrapper for the Data Commons API

In the final months of the season, we worked on creating tutorial material for the Google Sheets wrapper for Data Commons. Here are the PRs created in connection with that:

And here are the links to the final tutorials:

Other PRs along the way

In addition, I brought entirely new content to the Data Commons by writing the glossary (https://docs.datacommons.org/glossary.html) and style guide (https://github.com/datacommonsorg/docsite/blob/master/STYLE_GUIDE.md) The style guide in particular brought industry standards to the project documentation, creating a foundation for Data Commons' future approach to project documentation.

PRs associated with this (as well as other fixes):

Conclusion

Google Season of Docs was an incredible opportunity to grow in my technical writing abilities. I saw noticeable personal improvement in my ability to create documentation that used both formal and informal style protocols to effectively communicate challenging technical concepts. I also learned about the costs and benefits of deviating from established formats when I tried to move away from our existing patterns for API documentation and towards new approaches using Redoc.ly and Swagger, gaining technical experience and providing perspective on the power of Data Commons' relatively simple information design. Finally, my soft abilities to connect with people holding diverse perspectives and create content meaningful to all of their levels improved substantially over the course of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment