Skip to content

Instantly share code, notes, and snippets.

@jvfe
Last active August 22, 2021 20:57
Show Gist options
  • Save jvfe/542099400542a1e324d7cc568d530c05 to your computer and use it in GitHub Desktop.
Save jvfe/542099400542a1e324d7cc568d530c05 to your computer and use it in GitHub Desktop.
GSoC 2021 Project Summary - R Community Explorer - Exploration of the R community on Twitter

R Community Explorer - Exploration of the R community on Twitter

Mentors: Ben Ubah, Rick Pack and Gergely Daroczi

Mentee: João Vitor F. Cavalcante

Main Project Repository: https://github.com/r-community/central

Commit log in main repository: https://github.com/r-community/central/commits?author=jvfe

GSoC project page: https://summerofcode.withgoogle.com/projects/#5764835679141888

Hi! In this gist I’ll be describing my journey in Google’s Summer of Code 2021 as a student developer for The R project for Statistical Computing, in which I helped develop key aspects of the R Community Explorer (Source code) project. The journey, for ease of understanding, is divided into two phases, even though the work between these different phases might’ve occurred simultaneously.

Here is a summary of my contributions to the R Community Explorer project during this year’s GSoC:

Subject Pull Requests to r-community/central & Repositories
Adapting, updating and building dashboards #1, #23, twitterdata repository, user-tweets repository
Bug fixes and minor improvements #2, #3, #5, #6, #8, #14, #15, #19, #22

All pull requests sent to r-community/central can be browsed at the following URL: https://github.com/r-community/central/pulls?q=is%3Apr+is%3Aclosed+author%3Ajvfe.

1. Adapting and updating existing dashboards

During the first phase of my GSoC work I helped to adapt dashboards from the previous R Community Explorer into a new format, that instead of using pure HTML and Javascript, uses an RMarkdown-Flexdashboard template to build the static websites, this template uses centralized HTML fragments and stylesheets that make the dashboards cohesive and easier to maintain. Along with this, the datasets that power these dashboards were also updated to include the most recent information. This work, included in the central repository by Pull Request #1, encompasses primarily the R User Groups dashboard (Source code), that showcases information retrieved from Meetup.com, and the R-GSoC dashboard (Source code), that displays information about previous Google Summer of Code projects from the R project for Statistical Computing. The data updates and template development were done with significant help from Meet Bhatnagar, the other GSoC mentee for R Community Explorer - who also adapted and built other dashboards in the central repository, and mentor Ben Ubah.

There were some challenges faced as I had to learn a lot more about how RMarkdown works under the hood and how to make it more extensible by adding external HTML fragments, but it’s my hope that by using this new RMarkdown-Flexdashboard based structure for the R Community Explorer projects, it will make the code more accessible for R users and therefore make it easier for the project to attract new contributors and sponsors.

2. Building new dashboards

By establishing the new project structure to be used for R Community Explorer, we could begin building new dashboards using data sources not addressed by the previous project, like the RStudio Community website (Source code) and, what serves as the main aspect of my GSoC work, Twitter (Source code).

The work with the Twitter dashboard started with a data source compiled by the mentors across an entire year of tweets that contain the “#rstats” hashtag, and I used this data to build a dashboard with components such as timelines, counters, tables, embedded tweets and bar charts - even though there is much more information that can be extracted from an extensive dataset such as this, I believe I helped to identify key points of the dataset, like the most active users and commonly used hashtags.

Nevertheless, the building of this dashboard was faced with several challenges, mostly due to the size of the compiled dataset and how inefficient it was to read and process the dataset, a step required to compile the static website, but, with the help of my mentors, I managed to overcome these challenges by building a new repository to where the data was moved and is now located, twitterdata, that uses packages that take advantage of lazy loading and data compression to create an efficient script to read and update the dataset, there, the dataset is automatically updated daily through the use of a custom GitHub Action I developed myself and the same dataset can be read remotely from within the Twitter dashboard. Following this improvement, I developed a workflow for the central repository which also updates the Twitter dashboard daily, resulting in a project that is kept well updated and is easy to maintain. The Twitter dashboard was initially added by Pull Request #1 in r-community/central, and the update workflow added by Pull Request #23.

As an additional aspect of exploring the R community on Twitter, I built a dashboard that followed the useR! 2021 conference as it was occuring, keeping updated through the use of custom workflows, acquiring and showcasing data regarding popular tweets and users, which proved a great success and might return in future useR! conferences. Finally, I also built a simple template repository to help me build these automated twitter dashboards, the source code is open and the repository can be used by other R users.

Future goals

  • Expand the Twitter dashboard to encompass more aspects of the dataset with other components, like interactive rankings tables.
  • Establish automatic updates for the other websites in r-community/central.

Acknowledgements

I’d like to give many thanks to Google and the R project for Statistical Computing for accepting me and providing such an amazing opportunity for me to learn new things and build maintainable open-source software, at the end I feel that my skill set has significantly expanded and I understand how to better solve problems related to my software work. And, of course, many thanks as well to my mentors, in particular to Ben Ubah, who accompanied me all throughout the project, providing invaluable tips and advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment