Skip to content

Instantly share code, notes, and snippets.

@aksh555
Last active October 9, 2023 13:26
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aksh555/8dd970b4baa7aa01aa8486dda7186ae0 to your computer and use it in GitHub Desktop.
Save aksh555/8dd970b4baa7aa01aa8486dda7186ae0 to your computer and use it in GitHub Desktop.

P Akshara | @aksh555 | CHAOSS | GSoC Work Product

Project Abstract

Augur is a Flask based prototyping web stack for CHAOSS metrics. It provides structured data mined from various sources like git repositories, mailing lists and issue trackers using a plugin architecture incorporating other open-source metrics projects like Facade and FOSSology. Augur enables users to keep track of the activities happening across the repositories they care about and compare their performance. The main goals of this project are to detect anomalies in various metrics with focus on text messages, discussion comments (on the basis of sentiment and novelty) and developer activity in the open-source community and notify the community managers at the earliest; providing API endpoints for the required metrics.

Mentors: @sgoggins, @gabe-heim

UPDATE: Jupyter notebook links might be broken as they have been removed due to integration with the workers after review. Please refer to the respective workers for the latest code: Message Insights Worker & Pull Requests Analysis Worker


Community Bonding

Fine-tuned the proposal and decided the focus areas of the project.

Main Tasks

  • Analyze messages and comments of a repository to gauge the sentiment and novelty in an open-source organization

  • Identify anomalous pull requests — flag a pull request as anomalous based on its probability of getting accepted and merged

  • Build Gunicorn workers to perform these tasks within the Augur Flask application to get real-time insights

  • Send the insights to Amazon Lex server to be sent as push notifications to a slackbot, Auggie

Blog 1: Community Bonding


Coding Phase I

  • Text cleaning & preprocessing for analysis

  • Autoencoder based approach for novelty detection using cosine similarity metric (improvised later with Otsu thresholding)

  • Vader & TextBlob for sentiment analysis (improvised later with ideas from SentiCR & SentEmoji)

  • Modified SentiCR by including better preprocessing

  • Detailed Blogs: Blog 2, Blog 3

  • Jupyter notebooks: Sentiment analysis, Custom dataset


Coding Phase II


Coding Phase III

  • Collected dataset with pull requests, repository stats, discussions and creator features

  • Prediction of acceptance of an open GitHub pull request, fine-tuning

  • Created Pull Requests Analysis Worker to identify anomalous PRs

  • Insight communication with Auggie and set up a monthly cron job to handle different types of insights

  • Unit testing & documentation for both workers

  • Detailed Blogs: Blog 6, Blog 7

  • Jupyter Notebooks: PR analysis

  • Relevant PRs: PR analysis worker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment