Skip to content

Instantly share code, notes, and snippets.

@g31pranjal
Last active August 29, 2015 14:17
Show Gist options
  • Save g31pranjal/3dd8a257372927fb877d to your computer and use it in GitHub Desktop.
Save g31pranjal/3dd8a257372927fb877d to your computer and use it in GitHub Desktop.

Tracking of jQuery Issues on Github

###Abstract At present, there is no feature available to monitor various jQuery repositories & the activities that take place on them. The project aims to create an interface which looks over these repositories. It provides a platform to monitor the activities in repositories by means of tracing issues opened, pull requests, merging of branches, closing of pull requests etc. The project further aims to derive valuable statistics & visualize data from the repository and present it efficiently & precisely.

###PERSONAL INFORMATION

  • Name : Pranjal Gupta

  • University : Birla Institute of Technology and Science, Pilani, INDIA

  • Field of Study : Mathematics and Computer Science

  • Begin of Studies : August 2013

  • Expected Graduation : 2018

  • Degree : B.E. (Hons) Computer Science & M.Sc. (Hons) Mathematics

  • Email : g31.pranjal@gmail.com

  • Telephone : +91 80940 76999

  • LinkedIn Profile : in.linkedin.com/in/g31pranjal/en

  • GitHub Account : https://github.com/g31pranjal

  • Other Interests : Reading, Designing, Badminton

Programming Skills & Experience

I started web development 4 years back out of personal interest of using web to communicate information. My first full stack project was a PHP-based web application to store, retrieve, analyze and print billing invoice which I developed for the firm my father works in. I have taken the project till date and updated it as and when required.

Since then, I have worked in Python web development frameworks namely Django and web2py, which I have extensively used to develop backend for my websites. Lately, I switched to JavaScript and have enjoyed working in Node environment. I have also done frontend web development wherein I have used a vast range of JS libraries ranging from jQuery to Angular.js

I have taken an active interest in Web Development , thus a lot of my projects are related to the field. I’ve also done a lot of scripting projects. I am also the lead frontend web developer at Department of Visual Media (club at my university responsible for development of festival websites, apps and overall technical support).

Most of my works are accessible through web and available on GitHub in collaboration with other members :

####Main Website for APOGEE 2015 This was done for Apogee the annual technical festival of our college. The website was inaugurated by winner of Nobel Peace Prize 2015 Mr. Kailash Satyarthi.The backend was done using Django implementing a RESTful API for information exchange. The site serves as the information page of the fest as well as registration portal wherein participants can register, create teams, join teams etc.

#####Live Link : http://bits-apogee.org/ #####GitHub Link : https://github.com/g31pranjal/apogee2015/

####Lacuna 2015 : An online puzzle-cum-quiz game Developed a click-based puzzle game in JavaScript where the player has to find the clues and solve them to proceed to further in the game and get more clues. The backend is written in Javascript and deployed as an Express.js app Live Link : http://bits-apogee/lacuna/ GitHub Link : https://github.com/g31pranjal/lacuna2015 Main Website for OASIS 2014 This was done for Oasis the annual cultural festival of our college.The website was made in the form a of a online game whose prices were sponsored by Ebay. The aim was to create a unique experience that would make the website stand out from other fest websites and we delivered it ! #####Live Link : http://bits-oasis.org #####GitHub Link : https://github.com/g31pranjal/Oasis-2014

####Discoverify Discoverify was made as entry for a 24 hour Hackathon at Jaipur and also won a consolation prize. The backend was done in Django. The main idea behind the web app was to make a website where users can make learning paths for acquiring skills and other users can follow those paths. #####Github Link : https://github.com/g31pranjal/discoverify

####AutoINV : Automated Invoice Generator A PHP-based web application which is used to input, print, store and analyze the billing invoices of a manufacturing firm. The project is deployed since 3 years without any major glitches. #####Github Link : https://github.com/g31pranjal/autoINV

###PROJECT PROPOSAL

####Brief Description

Issue and Pull Request(PR) plays a central part in development and maintenance of code on GitHub. Issues provide a way to discuss upon the ideas as well as the problems and hence allow the collaborators to arrive no solutions. The threaded structure of the issues allows active participation. The issue can be closed when appropriate conclusion on a topic is reached. Similarly, Pull Requests are a way by which changes to the code can be absorbed to the main code after being reviewed and discussed upon.

The task of tracking issues/pull-requests on repositories can be disjointly be broken into 3 major segments with each part functioning relatively independent from the other :

#####Data Retrieval : Collecting and maintaining raw data in Database using GitHub API The GitHub API provides HTTPS access through api.github.com which can be used to create and maintain a core list of all the repositories that is currently hosted on GitHub. This will be updated from time to time by means of scheduled tasking using Cron which will execute a JS files to fetch repository information(name, id, full_name) from GitHub and fill into the database. The interval of this task can be variated as per the need.

Another script scheduled using Cron will be set up to look after the issues and pull requests being opened or closed on each repository of the database, in turns. This script will compare the latest issue/PR number in the database with the latest issue/PR available through API, and based on the difference, the function will run to insert the new issues/PR into the database, capturing its (id, number, title, user, labels, state, created_at) properties. It will capture (closed_at, closed_by) properties as well if the corresponding issue is closed, (comments, commits) property if the issue is a pull request and (merged, mergered_by) property if the issues is a closed pull request. Also, the opened issues/PR will be checked separately for further activities(and closure) on them and appropriate new properties will be scrapped. The proposed scheme, hence, aims to keep the local database up-to-date with the data available over web. The USP of the service is that the fetching activities of different repositories can be scheduled variably by using a SHELL (.sh) script so as to spread it evenly and prevent unintentional CPU load.

#####Analysis : Computing and Consolidating the raw data The arrival of new issue/PR data into the database will trigger a separate class of functions aimed at analyzing and using the raw data to derive conclusions based on it. Some of the vital statistics to be derived of this data will be : Age Range of issues/PR, Average Age of issues/PR, Total number of Issues (Opened + Closed), Total number of Pull Request (Opened + Closed), number of merged PR, number of rejected PR, references of PR, labels, % PR merged of total, Average number of Issues/PR per month etc.

These vital stats will be computed independently of the data retrieval process, being triggered on arrival of fresh data for a particular repository. The information will then be updated in the core repository list.

#####Representation : Displaying the analyzed data effectively Representation forms the part of the project which actually matters to the World. It deals with the manner and amount of information to be displayed out to the user over the web so as to make the most out of it. This will be achieved by an Express.js app to display different views and route HTTP requests over web. The objective of constructing this segment of the project is to display information about various repositories in a seamless and effective manner by making use of graphical and interactive displays so as to provide deeper insights to the data. Hence, we can do this by means of various content representation JS libraries like d3.js.

####Technologies Used

  • The platform will be a complete deployment on Node.js framework for server-side javascript.
  • For data storage, CouchDB can be used. I am currently working using this, however, I am fully comfortable in shifting to other database, as required.
  • Express.js framework for displaying information on web and routing purposes.
  • nano, a Node.js mode which serves as a client for CouchDB
  • Jade, a web page templating engine to be used with Express.js
  • d3.js, for graphical and interactive display of analyzed data of different repositories.
  • Bootstrap, for enhancing content representation on web.

####Proposed Features

  • Scheduled fetching of repository data which can be variated manually based on the requirements and availability of computing resources through change of Environment Variables.
  • Based on the repositories fetched, scheduled scrapping of Issue/PR data whose frequency can be variated.
  • A database of all the users who have ever performed activity on issues/PR. The documents of this database will serve as foreign key for “created by”, “merged by” and “closed by” fields in Issue/PR.
  • Using effectively constructed functions so as to avoid repetitive calls to GitHub API for data. That is, the service would be such that it will retrieve information for only new issues/PR on a repository and monitor activities on that issues/PR that are still open.
  • Modular class approach in writing modules to calculate vital stats from raw data so that functions can be changed/appended/deleted over time. Also new functions can be easily added.
  • Bootstrapped Web GUI to be compatible across all the devices.
  • Graphical methods to represent information about Issues/PR for each repository on web such as, pie charts to represents quantities in %, line graph to represent variation of a quantity over time etc.

####Optional Features

  • Store the value of a particular stat over time so as to study the variation in quantity over time.
  • Angular.js frontend app for better data management in the frontend web development.
  • Custom views for each Issue/PR to display recent activities, followers, commits, comments etc. The information can be further extended based on explicit requirements.

####Brief Implementation Scheme (in order of flow)

  • Implementing a script to fetch the repository list from api.github.com and store in the database

  • Creating an executable SHELL script that will contain variable to alter the execution interval of the script to fetch repositories.

  • Scheduling SHELL script to execute using Cron.

  • Implementing a script that will choose one repository to fetch its issues/PR. This will be done by some sort of counter mechanism that will decide which repository’s data should be fetched.

  • Implementing the mechanism to determine the repository whose turn it is to be fetched.

  • Creating an executable SHELL script that will contain variable to alter the execution interval of the script to fetch repository’s Issue/PR.

  • Scheduling SHELL script to execute using Cron.

  • Implementing the basic mechanisms of the Script that will fetch the repository data.

  • Implementing the “pull” feature of GIT. The function will see the latest issue/PR number in the database and compare it with the latest issue/PR number in the data from api.github.com. Based on the difference, only the appropriate data will be fetched and documented in the database.

  • Another function will look into all Issues/PR in the repository and update it based on new activities such as comment, commit, labels,merge and close.

  • Linking the Script that will fetch the repository data to the class of functions that will calculate and consolidate the raw data to get vital stats. This class of functions will be triggered only if there is fresh data in the database.

  • Implementing the class of functions that will calculate vital stats : Age Range of issues/PR, Average Age of issues/PR, Total number of Issues (Opened + Closed), Total number of Pull Request (Opened + Closed), number of merged PR, number of rejected PR, references of PR, labels, % PR merged of total, Average number of Issues/PR per month etc.

  • Setting up an Express.js app to create Data Dashboard.

  • Deciding upon the views and content to be displayed on each view.

  • Creating a robust RESTful API for proper communication of web pages and JSON objects over HTTP.

  • Implementing ways to enhance data model and representation on web pages. Using charting and data visualization JS libraries to present data effectively and concisely on the dashboard.

####Deliverables

  • A full fledged platform to track and store information about the activities (Issues/Pull requests) on all jQuery repositories hosted on GitHub.
  • A full-stack web portal to view various vital statistics and insights on the repository data captured over time by means of effective data visualization.
  • Documentation of the project (esp. Data Retrieval and Analyzing Scripts) being deployed at the backend.

####Proposed Timeline

#####April 28 to May 25 : Community bonding Period Discuss the proposed outline and implementation with the community. Exploring and studying about new libraries that might be helpful in implementation. Work on deployment strategy and setting up the development environment. #####May 26 to June 1 : Week 1 Implementing the Script to fetch list of repositories from GitHub. Improving the function to optimize the frequency of calls writing to database. Writing the SHELL script and scheduling the task using Cron #####June 2 to June 15 : Week 2 - Week 3 Implementing the Script to fetch repository data from Github Writing functions to fetch information of a particular Issue/PR, update the information for open Issues/PR, writing to the database etc. Implementing “pull” feature of the GIT on the fetching Script so as to optimizing fetching. #####June 2 to June 15 : Week 4 Implementing upon the Algorithm to decide the order and sequence of repositories in which data will be fetched. Writing the SHELL script and scheduling the task using Cron. #####June 23 to June 29 : Week 5 Checking over the consistency of fetched data. Fix Bugs in the code and improve efficiency. #####MidTerm Evaluation Finalizing upon the Data Retrieval Segment of the project. #####June 30 to July 13 : Week 6 - Week 7 Writing the functions to calculate vital stats of the repository. Wrapping these functions into a module. The purpose of this module will be to calculate statistics and storing them in repository database. Implementing event-based triggering of this module such that the module functions gets executed only if there is any change in repository data. #####July 14 to July 20 : Week 8 Setting up the Express.js app and configuring it to use web page templating engine. Setting up CouchDB with the app. Deciding the flow and linking of web pages. #####July 21 to July 27 : Week 9 Writing proper RESTful API for communicating web pages and JSON objects. Developing views to confirm the content-display on each page. #####July 28 to August 3 : Week 10 Bootstrapping the views. Implementing JS libraries for data Visualization and representation on web pages. Finalizing upon the overall appearance of the web dashboard. #####August 4 to August 10 : Week 11 Documenting the Data Retrieval and Analyzing Scripts. Fix Bugs and commit changes Implementing Optional features (if required) #####August 11 to August 21 : Week 12 + 3 days Complete documentation of the platform. Community Feedback and Testing.

###AVAILABILITY

#####Work Hours per Week I would be able to devote 40 - 50 hours per week during the project, since I have no other big project devoted for the summer. #####Other commitments between May and August 2015 I don't have any commitments between May and August 2015. My Fifth semester of college will start from August 1, 2015, but still I can manage to give 40 hours per week since there will be no evaluative components in academics. #####Do you plan to apply for other internships during the summer? No. I will be at my home during the summer. #####I feel comfortable discussing issues in the public developer forum (yes / no) Yes. I am absolutely comfortable in discussing issues and problems in the public developer forum, public mailing list and IRC. #####I feel comfortable posting a weekly update to the public developer forum (yes/no) Yes. I totally agree to post a weekly update on my progress on public forum.

####ADDITIONAL INFORMATION

I am trying to work around a basic prototype to retrieve and store repository list and data from each repository into the data. Till now, I have succeeded in :

Scheduling the fetching and updating of repository list. The implementation seems to work fine however I was think to tidy up the code and use in-built for handling GitHub API requests Fetching the data of a particular repository and construction an object with required data about each issue/PR. I have implemented this using node.js, CouchDB and nano. You can check out my progress at https://github.com/g31pranjal/trackJquery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment