Improving Data Commons getting started documentation: final project report
Data Commons aims to simplify the process of data science by linking data from a variety of sources into one knowledge graph of information, simplifying the process of data cleaning for modeling. The purpose of this project was to make the knowledge graph more accessible to end users and easier for developers to add contributions.
Planning: Initial objectives & changes to those objectives
In my proposal, I initially noted the following pain points in the Data Commons documentation:
- The directions for adding data sets in the 'Get Involved' section were short and unclear.
- The tutorials section only offered Python notebooks, with no reference to other Data Commons API wrappers.
- At the time of application, Data Commons did not offer any tools or visualizations built using its knowledge graph.
The first two goals were retained with some changes. We rewrote the dataset contribution guidelines, providing additional information on how to contribute datasets to the knowledge graph. We also added tutorial material for the API's Sheets wrapper--a change from the initial plan to focus on the knowledge graph's R wrapper as the team re-focused engineering resources.
With the release of Data Commons' Place Explorer tool, the initial third goal to build a sample application with the API was rendered moot. Therefore, we pivoted the project proposal to restructure the total endpoint documentation, providing examples for every endpoint across all the API wrappers.
Phase I: Rewriting documentation for aspiring Data Commons contributors
Our first documentation update was to the dataset contribution pages. We created the following pull requests to address this need:
NOTE: These PRs no longer correspond to what is published on the Data Commons documentation main site--after Google incorporated the API into search results for demographic information, the product team wanted to temporarily scale back on community dataset contributions.
Phase II: Updating and restructuring documentation for the Data Commons API and its wrappers
Next, we moved to update and restructure the examples and informations for all endpoints, methods, and formulae provided by Data Commons through its REST API and through its wrappers available for Python and Sheets. We made these pull requests:
REST API PRs
Python wrapper PRs
Pandas wrapper PRs
Sheets wrapper PRs
I also created the general cleanup PR https://github.com/datacommonsorg/docsite/pull/154.
The results of these PRs can be seen on the Data Commons main documentation site in the API section.
Phase III: Creating educational examples using the Sheets wrapper for the Data Commons API
In the final months of the season, we worked on creating tutorial material for the Google Sheets wrapper for Data Commons. Here are the PRs created in connection with that:
And here are the links to the final tutorials:
Other PRs along the way
In addition, I brought entirely new content to the Data Commons by writing the glossary (https://docs.datacommons.org/glossary.html) and style guide (https://github.com/datacommonsorg/docsite/blob/master/STYLE_GUIDE.md) The style guide in particular brought industry standards to the project documentation, creating a foundation for Data Commons' future approach to project documentation.
PRs associated with this (as well as other fixes):
Google Season of Docs was an incredible opportunity to grow in my technical writing abilities. I saw noticeable personal improvement in my ability to create documentation that used both formal and informal style protocols to effectively communicate challenging technical concepts. I also learned about the costs and benefits of deviating from established formats when I tried to move away from our existing patterns for API documentation and towards new approaches using Redoc.ly and Swagger, gaining technical experience and providing perspective on the power of Data Commons' relatively simple information design. Finally, my soft abilities to connect with people holding diverse perspectives and create content meaningful to all of their levels improved substantially over the course of the project.