Augur is a Flask based prototyping web stack for CHAOSS metrics. It provides structured data mined from various sources like git repositories, mailing lists and issue trackers using a plugin architecture incorporating other open-source metrics projects like Facade and FOSSology. Augur enables users to keep track of the activities happening across the repositories they care about and compare their performance. The main goals of this project are to detect anomalies in various metrics with focus on text messages, discussion comments (on the basis of sentiment and novelty) and developer activity in the open-source community and notify the community managers at the earliest; providing API endpoints for the required metrics.
Mentors: @sgoggins, @gabe-heim
UPDATE: Jupyter notebook links might be broken as they have been removed due to integration with the workers after review. Please refer to the respective workers for the latest code: Message Insights Worker & Pull Requests Analysis Worker
Fine-tuned the proposal and decided the focus areas of the project.
Main Tasks
-
Analyze messages and comments of a repository to gauge the sentiment and novelty in an open-source organization
-
Identify anomalous pull requests — flag a pull request as anomalous based on its probability of getting accepted and merged
-
Build Gunicorn workers to perform these tasks within the Augur Flask application to get real-time insights
-
Send the insights to Amazon Lex server to be sent as push notifications to a slackbot, Auggie
-
Text cleaning & preprocessing for analysis
-
Autoencoder based approach for novelty detection using cosine similarity metric (improvised later with Otsu thresholding)
-
Vader & TextBlob for sentiment analysis (improvised later with ideas from SentiCR & SentEmoji)
-
Modified SentiCR by including better preprocessing
-
Jupyter notebooks: Sentiment analysis, Custom dataset
-
Prepared a custom dataset using Jira, StackOverflow, Oracle DB for validation & testing
-
Deep autoencoders with Otsu thresholding
-
Added emoji support to SentiCR
-
Created Message Insights Worker by incorporating both the models
-
Jupyter Notebooks: SentiCR, Novelty detection
-
Relevant PRs: Message Insights Worker, Worker doc & tests
-
Collected dataset with pull requests, repository stats, discussions and creator features
-
Prediction of acceptance of an open GitHub pull request, fine-tuning
-
Created Pull Requests Analysis Worker to identify anomalous PRs
-
Insight communication with Auggie and set up a monthly cron job to handle different types of insights
-
Unit testing & documentation for both workers
-
Jupyter Notebooks: PR analysis
-
Relevant PRs: PR analysis worker