Skip to content

Instantly share code, notes, and snippets.

What would you like to do?

P Akshara | @aksh555 | CHAOSS | GSoC Work Product

Project Abstract

Augur is a Flask based prototyping web stack for CHAOSS metrics. It provides structured data mined from various sources like git repositories, mailing lists and issue trackers using a plugin architecture incorporating other open-source metrics projects like Facade and FOSSology. Augur enables users to keep track of the activities happening across the repositories they care about and compare their performance. The main goals of this project are to detect anomalies in various metrics in the open-source community and notify the community managers at the earliest; providing API endpoints for the required metrics.

Mentors: @sgoggins, @gabe-heim

Community Bonding

Fine-tuned the proposal and decided the focus areas of the project.

Main Tasks

  • Analyze messages and comments of a repository to gauge the sentiment and novelty in an open-source organization

  • Identify anomalous pull requests — flag a pull request as anomalous based on its probability of getting accepted and merged

  • Build Gunicorn workers to perform these tasks within the Augur Flask application to get real-time insights

  • Send the insights to Amazon Lex server to be sent as push notifications to a slackbot, Auggie

Blog 1: Community Bonding

Coding Phase I

  • Text cleaning & preprocessing for analysis

  • Autoencoder based approach for novelty detection using cosine similarity metric (improvised later with Otsu thresholding)

  • Vader & TextBlob for sentiment analysis (improvised later with ideas from SentiCR & SentEmoji)

  • Modified SentiCR by including better preprocessing

  • Detailed Blogs: Blog 2, Blog 3

  • Jupyter notebooks: Sentiment analysis, Custom dataset

Coding Phase II

Coding Phase III

  • Collected dataset with pull requests, repository stats, discussions and creator features

  • Prediction of acceptance of an open GitHub pull request, fine-tuning

  • Created Pull Requests Analysis Worker to identify anomalous PRs

  • Insight communication with Auggie and set up a monthly cron job to handle different types of insights

  • Unit testing & documentation for both workers

  • Detailed Blogs: Blog 6, Blog 7

  • Jupyter Notebooks: PR analysis

  • Relevant PRs: PR analysis worker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment