apsknight/proposal.md

## proposal.md

      
    Raw
  

              proposal.md
            
          
    Project Proposal
Google Summer of Code 2018

Large-scale computing backend for Jupyter notebooks - HTCondor batch job submission and monitoring using the Ganga toolkit

CERN-HSF
High Energy Physics Software Foundation

Mentors:

Ulrik Egede
Jakub Moscicki
Ben Jones
Enric Tejedor
Diogo Castro


Aman Pratap Singh

Email: aps10@iitbbs.ac.in

Phone: +91-8266928969
Index


Introduction
Synopsis
Project Goals
Timeline
Deliverables
References

Introduction

Personal Information


Full Name
Aman Pratap Singh


Institute
2nd Year B.Tech Student
Computer Science and Engineering
Indian Institute of Technology Bhubaneswar


Email
aps10@iitbbs.ac.in
amanprtpsingh@gmail.com


Phone
+91-8266928969


Blog
https://blog.amanpratapsingh.in


Github
https://github.com/apsknight


IRC Nick
apsknight (Freenode/OFTC)


Timezone
Indian Standard Time (GMT +0530)


Address
54, Transit Hostel-2, NISER Campus,
Indian Institute of Technology Bhubaneswar
Jatni, Odisha, India 752050


Reference Contact
Tatvam Dadheech
Email: td14@iitbbs.ac.in
Phone: +91-8107407676


About Me

I am Aman Pratap Singh, a 2nd year undergraduate Computer Science and Engineering student at Indian Institute of Technology Bhubaneswar. I have experience in programming with multiple languages such as Python, C/C++, Java, Javascript etc. I frequently use Jupyter Notebooks for my lab assignments and similar purposes as it provides simple and user friendly interface for programming and simultaneously allows documenting the explanations and output of code. I like coding for fun and have worked on various small projects which can be found on my Github Profile.
Recently I have been involved in JupyterLab community. JupyterLab is a next-generation user interface for Project Jupyter and provides all features of classic Jupyter Notebook. I fixed few documentation and UI bugs in JupyterLab repository and created a JupyterLab extension which scraps comics from XKCD and shows it in JupyterLab.
I also enjoy Competitive Programming and actively participate in coding challenges on various sites like Codeforces, Codechef etc. I also have deep interest in Physics, Indian History and Cricket.
Why GSoC with CERN-HSF?

I have been always passionate about the projects which links Basic Sciences with Programming, which is surely the main inspiration for me to work with CERN. I eagerly want to work on this project since it will help many scientists and researchers in their research work. Since I am regular user of Jupyter Notebook, I strongly believe interactive programming greatly simplifies the efforts required in performing complex experiments and elucidate the output. This project integrate powerful backends for big computation with interactive programming environment of notebooks so I believe I am the perfect match for working on this project.
How much time will I be able to contribute to this project?

I will be working 6-8 hours per day for the entire duration of the project.
Other commitments during summer

I have my end semester exams from April 28 to Mar 5 during which I'll be little busy, other than that I do not have any commitments for summer.
Preferred medium for communication

I am perfectly fine with IRC, Email, Skype or any other similar medium of communication. My preferred language for communication is English.
Synopsis

Jupyter Notebook is an interactive computing environment that creates notebooks which contains computer code as well as rich text elements like equations, figures, plots, widgets and theory. These notebooks are easily understandable and can be executed to perform interactive data analysis, scientific computing and code prototyping.
In experiments like LHC(Large Hadron Collider), a very large amount of data (in order of petabytes) is generated. This huge amount of data is then processed using a collection of powerful computers at multiple computing sites by distributing the data in small chunks and processing them individually at remote distributed computing network and then finally collecting the result. These multiple sites are interconnected by a grid. These type of Grids can be accessed by a toolkit called Ganga.
Ganga is an open source iPython based interface tool to the computing grid which leverage the power of distributed computing grid and provide scientists an interface supported by a powerful backend where they can submit their computation intensive programs to Ganga as a batch job. After submitting the job, Ganga processes the program somewhere on the grid, it keeps track of status of the job and after completion of job it gives back output to the user. It can also provide job statistics and job errors, if any.
HTCondor is a workload management system created by University of Wisconsin-Madison. It is based on High-Throughput Computing which effectively utilizes the computing power of idle computers on a network or on a computing grid and offload computing intensive tasks on the idle machines available on a network or computing grid. It provides various features such as job queueing, job prioritization, resource monitoring and management etc. HTCondor provides intelligent resource management by match-making resources available on different machines and resources required by program.
This project aims to create a plugin for Jupyter Notebook and also integrate it to SWAN Notebook service which is a cloud data analysis service developed and powered by CERN. This plugin will easily submit and monitor batch computation jobs to HTCondor using Ganga toolkit. The plugin will  display status of ongoing job, progress bar, job statistics and errors in Notebook itself and will also allow termination of ongoing jobs. The plugin shall provide user-friendly Notebook interface to easily perform computation intensive task on Notebook by integrating cell based structure of Notebook to submit jobs and peeking the progress and statistics of the job executed from a cell.
Benefits to Community

This project streamlines the process of large scale computation by providing an integration of powerful backend to Jupyter Notebook which is an interactive web application easily deployable on cloud and remotely accessible. The project will provide scientists and researchers a unified application to write interactive computing intensive program, executing it, monitoring its progress and run-time statistics as well as getting output on successful execution of the program. The project will enhance the process of large scale computing of batch jobs at CERN and other similar organizations.
Project Goals

Objectives


Create a plugin for Jupyter Notebook that can offload batch jobs from notebook.
Using HTCondor, apply the plugin to real batch jobs of CERN.
Test the plugin on CERN’s batch infrastructure.
Integrate the plugin to CERN’s notebook service SWAN.

Tasks


Create a plugin to submit and monitor batch computation jobs from notebook

Design a prototype of plugin for submitting and monitoring jobs from Jupyter Notebook.

Design the user interface and kernel side module prototype.


Determine an architecture for the plugin.

Explore all possible widgets and features of Jupyter Notebook that can be applied to the plugin.
Determine how plugin will interact with Ganga Toolkit.
Design an interface to display progress bar, job statistics and output of the job.


Implement Kernel side of the plugin.

Integrate the designed user interface with kernel side module.


Test the plugin on local backend server

Test the plugin by running small jobs on local backend server.
Perform tests for various corner cases that can arise.


Implement error handling mechanism of plugin

Intentionally create errors to test various event listeners.
Implement how plugin should respond in case of any unexpected request/error.


Write comprehensive documentation of the code written for Task 1.


Apply the plugin to real batch jobs at CERN using HTCondor

Apply the plugin to real and small batch jobs at CERN on local backend.

Test the plugin for complex but low computation real batch jobs at CERN.


Use HTCondor instead of local backend.

Change backend server from local to one provided by HTCondor.
Test the plugin for complex and relatively large computation batch jobs at CERN.
Implement some sample notebooks illustrating the process.


Ask for feedback from users and implement the suggestions.
Write comprehensive documentation of the code written for Task 2.


Deploy and test the plugin to CERN IT Infrastructure.

Test the plugin on CERN IT Infrastructure.
Integrate the plugin with SWAN notebook service.
Ask for feedback from users and implement the suggestions.
Write comprehensive documentation of the code written for Task 3.


Timeline


Duration
Task


March 27
Deadline for submitting Project Proposal


March 27 - April 23
Learn more about Ganga Toolkit.
Read Documentation and learn more about HTCondor
Learn more about Jupyter Notebook.


April 23 - May 14
Official Community Bonding PeriodGet Involved with CERN, HTCondor and Jupyter community.
Know more about mentors such as their timezone, preferred medium of communication etc.
Learn about other projects and ongoing experiments at CERN.
Get acquainted with various tools used at CERN.
Begin Task 1 : Design and Implement PluginFigure out prototype and plan how the plugin will work.
Set up development environment.


May 14 - June 6
Official Coding Period StartFinish implementation of plugin
Test the plugin to some sample jobs on local backend server.
Perform UI tests and fix bugs.


June 6 - June 11
Time period for any unexpected delay.
Finish Task 1


June 11 - June 15
Phase 1 evaluationSubmit git repository of Code with documentation for Task 1


June 15 - July 4
Begin Task 2 : Integrate plugin with HTCondorImplement functionality for integrating HTCondor as backend.
Test plugin for real batch jobs at CERN.
Ask for feedbacks from the users and implement suggestions.


July 4 - July 9
Time period for any unexpected delay.
Finish Task 2


July 9 - July 13
Phase 2 evaluationSubmit git repository of Code with documentation for Task 2


July 13 - August 10
Begin Task 3 : Deploy plugin to CERN IT InfrastructureTest plugin on CERN’s batch infrastructure.
Integrate plugin to SWAN Notebook service.
Ask feedback from the users and implement suggestions.


August 10 - August 14
Finish Task 3 
Final SubmissionSubmit git repository of final code with complete documentation.


Deliverables


Working Jupyter Notebook plugin with following features.

Submitting batch jobs from Jupyter Notebook
Displaying progress bar and job statistics of the ongoing jobs.
Cancellation of ongoing jobs.


SWAN Notebook service integration of the plugin.
Detailed documentation for the plugin.
Sample Notebooks demonstrating application of plugin using Ganga toolkit and HTCondor.

Future Goals


Explore the possibility of improving the plugin and implement a similar plugin for JupyterLab which is next generation user interface of Project Jupyter.

References


Ganga Toolkit
HTCondor
SWAN
LHC
High-Throughput Computing
iPython
Jupyter Notebook
Jupyter Extensions
Jupyter Lab

Full Name	Aman Pratap Singh
Institute	2nd Year B.Tech Student Computer Science and Engineering Indian Institute of Technology Bhubaneswar
Email	aps10@iitbbs.ac.in amanprtpsingh@gmail.com
Phone	+91-8266928969
Blog	https://blog.amanpratapsingh.in
Github	https://github.com/apsknight
IRC Nick	apsknight (Freenode/OFTC)
Timezone	Indian Standard Time (GMT +0530)
Address	54, Transit Hostel-2, NISER Campus, Indian Institute of Technology Bhubaneswar Jatni, Odisha, India 752050
Reference Contact	Tatvam Dadheech Email: td14@iitbbs.ac.in Phone: +91-8107407676
Duration	Task
March 27	Deadline for submitting Project Proposal
March 27 - April 23	Learn more about Ganga Toolkit. Read Documentation and learn more about HTCondor Learn more about Jupyter Notebook.
April 23 - May 14	Official Community Bonding Period Get Involved with CERN, HTCondor and Jupyter community. Know more about mentors such as their timezone, preferred medium of communication etc. Learn about other projects and ongoing experiments at CERN. Get acquainted with various tools used at CERN. Begin Task 1 : Design and Implement Plugin Figure out prototype and plan how the plugin will work. Set up development environment.
May 14 - June 6	Official Coding Period Start Finish implementation of plugin Test the plugin to some sample jobs on local backend server. Perform UI tests and fix bugs.
June 6 - June 11	Time period for any unexpected delay. Finish Task 1
June 11 - June 15	Phase 1 evaluation Submit git repository of Code with documentation for Task 1
June 15 - July 4	Begin Task 2 : Integrate plugin with HTCondor Implement functionality for integrating HTCondor as backend. Test plugin for real batch jobs at CERN. Ask for feedbacks from the users and implement suggestions.
July 4 - July 9	Time period for any unexpected delay. Finish Task 2
July 9 - July 13	Phase 2 evaluation Submit git repository of Code with documentation for Task 2
July 13 - August 10	Begin Task 3 : Deploy plugin to CERN IT Infrastructure Test plugin on CERN’s batch infrastructure. Integrate plugin to SWAN Notebook service. Ask feedback from the users and implement suggestions.
August 10 - August 14	Finish Task 3 Final Submission Submit git repository of final code with complete documentation.