Skip to content

Instantly share code, notes, and snippets.

@ritiek
Last active March 25, 2019 13:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ritiek/6a3b1ae0e670f1f71ba818288fb2515d to your computer and use it in GitHub Desktop.
Save ritiek/6a3b1ae0e670f1f71ba818288fb2515d to your computer and use it in GitHub Desktop.
My OpenAstronomy GSoC 2018 Application

Organisation

PlasmaPy under OpenAstronomy.

Personal Information

Current Education

Open Source Experience

Here is a list of my notable contributions to open source:

  • mps-youtube/mps-youtube - Added features and made several bug fixes to this python based terminal YouTube player.
  • plamere/spotipy - I haven't got the chance to submit a PR to this python library based on Spotify Web API as the original owner of the repository has been inactive for a while and is lagging on its documentation, but I do sometimes help users in the issues section on general issues and outdated documentation.
  • rust-lang/rust - Made minor fixes to the Rust compiler and its test suite.
  • mps-youtube/pafy - Made a minor improvement to the download interface to this python library to fetch YouTube content.
  • theSage21/handwritten - A tiny python based command-line tool that is based on Alex Graves' paper on generating handwriting from text. I implemented a friendlier command-line interface and added some additional features.
  • mitmproxy/mitmproxy - Wrote instructions for using mitmproxy on different platforms and show them dynamically on their magic domain using JavaScript.

I have also worked on many of my own open-source projects which can be found on my GitHub profile.

Contributions to OpenAstronomy

  • astropy/astropy - Proposed a bug fix to what returned a list when adding two HDUList instances (but was rejected in favor of relying less on list inheritance). Also implemented shallow copy and deep copy on an HDUList instance, and wrote appropriate tests for it.
  • poliastro/poliastro - A small patch to return time values when sampling an orbit in Orbit.sample(). Also reported a bug that caused the examples on master branch to fail.
  • PlasmaPy/PlasmaPy - Created a module level reduced_mass() function from Particle.reduced_mass() method, so that it can be called outside of the Particle class. Also added support for accepting Roman numerals as charge states (using the roman package).

I use Linux Mint and Tmux + Neovim as my primary development environment. I am also proficient in development using Python and comfortable with Git and GitHub.

Interest in OpenAstronomy

Scientists and researchers have to forge their own tools which are neither usually shared with other researchers, nor are well documented. Since they also make use of low level languages like FORTRAN and IDL, it further increases troubles and makes the learning curve unnecessarily steeper. Ultimately, it gets harder to produce insightful results from the research carried out while leading to the development of many iterations of the same software. This reinventing of the wheel every time results in several imitations of libraries with low value addition for other scientists and the cycle continues to repeat itself.

I love how OpenAstronomy is working to fix these problems via use of high level technologies and creating well documented software, thereby reducing the learning curve considerably. OpenAstronomy comprises of several sub-organisations wherein every individual shares a common goal to provide a well maintained software platform for scientists so that they can work on actual scientific problems instead of creating their own and dealing with unnecessary burden of maintenance. OpenAstronomy for the most part develops python libraries for various astronomical purposes, and scientists (or anyone with basic knowledge of python) all over the globe can use these open-source tools in their experiments with ease without facing the hassle of compiling and setting up their software before use.

I absolutely favor open-source development and believe that such an initiative will save a lot of time for scientists so that they can spend more time thinking about stuff they actually care about and would want to bring impact on. The organization is also very active and provides a very welcoming environment to new-comers. It has been a great experience contributing to OpenAstronomy projects so far. I'm glad to be a part of OpenAstronomy and the feeling that this code will actually bring impact on the lives of scientists all across the world is overwhelming and makes the effort even more rewarding. Google Summer of Code has given me this brilliant opportunity to be a part of something this big.


Application

Project: Implement a new Plasma metaclass in PlasmaPy

Mentors: Drew Leonard, Nick Murphy, Dominik Stańczak

Abstract

PlasmaPy is an open-source Python package and aims to provide a set of common functionality used in plasma physics. Currently it implements a Plasma class which does all the fundamental work of calculating plasma parameters like alfven speed, density, electric field, magnetic field, pressure, etc. The current Plasma class does a good job at the moment but it will be very messy to work with different kinds of plasmas in the future by relying solely on current implementation using classes. Since dealing with different kinds of plasmas is an important part of plasma physics, having a more compact way of working with these kinds of plasmas will be a milestone of substantial importance and priority for PlasmaPy.

Flexibility

This Plasma class works well in its current iteration but it restricts us from having more flexibility that will be needed when PlasmaPy begins to work with different kinds of plasmas. There are several ways to deal with this but most of would require repeatedly doing the same set of stuff every time a class gets instantiated. Apparently, the best way to overcome this restriction is to implement a new Plasma metaclass, where different kinds of plasma subclasses will emerge from this metaclass and the appropriate behavior needs to be provided once where the class is defined unlike other possible solutions (like relying only on decorators or inheritance). Being able to dynamically allocate types is going to be very useful in providing a single-agnostic user interface.

plasma_metaclass

Similar Implementation

The idea is quite similar to the way SunPy uses factory class sunpy.map.Map (in here) which accepts a wide variety of inputs and implements several Instrumental Map Classes and some complex Map Classes by subclassing this base object. This is how SunPy constructs Map objects using this special factory class Map in practice:

>>> import sunpy.map
>>> import sunpy.data.sample

# Map an AIA based image
>>> mapped_aia = sunpy.map.Map(sunpy.data.sample.AIA_171_IMAGE)
# AIAMap deals with AIA images
>>> type(mapped_aia)
sunpy.map.sources.sdo.AIAMap

# Map an EIT based image
>>> mapped_eit = sunpy.map.Map(sunpy.data.sample.EIT_195_IMAGE)
# EITMap deals with EIT images
>>> type(mapped_eit)
sunpy.map.sources.soho.EITMap

SunPy makes it easier to work with image data by providing a number of methods on the Map object for commonly used operations. 2D map objects are subclasses of MapBase and all Map objects are created using the Map factory Map.

This project will go on to implement a similar mechanism for Plasma metaclass in PlasmaPy. This Plasma metaclass will automatically create an appropriate plasma subclass when instantiated based on the input it receives. We could have a different class instance to handle specific data, say a yt data object for working with 3D simulation, etc. It will make use of an Abstract Base Class (similar to sunpy.map.mapbase.GenericMapMetaclass) to define interface for creating an object, but let subclasses decide which class to instantiate. Factory method lets a class defer instantiation to subclasses.

Documentation

The new Plasma metaclass is going to need good effort on its documentation since an average python developer is not particularly familiar with the way metaclasses work and how using them in situations like these can make our source code shorter and more compact than just relying on simple classes and functions, here in our case, providing an API for dealing with different kinds of plasmas.

How I Intend To Complete The Project

Learning more about the Plasma metaclass

  • This project will first involve learning more about the Plasma metaclass, the way it should be implemented in code, how it will lower the barrier of dealing with different kinds of plasmas and lastly, how it will instantiate the appropriate subclass when invoked.

Writing appropriate tests and documentation

  • Once it is clear how the basic structure of the Plasma metaclass should be implemented, we can begin writing appropriate tests and basic documentation that would shape our future Plasma metaclass. Doing so before the actual implementation of Plasma metaclass will help us abide by the core idea of Test Driven Development.

Implement the Plasma metaclass

  • We will then work on to implement the Plasma metaclass and the factory method in code which will involve creating various special methods to control how the subclass objects are created. This will also abide by the tests and documentation written in the earlier phase and make appropriate changes to them if needed.

Refactor the Plasma metaclass, and finalize tests and update documentation as necessary

  • After the basic implementation of metaclass which passes all our tests, we can begin to refactor our code and improve its quality. Since we would have already implemented test cases, refactoring our code will not pose any risk of damaging our metaclass implementation.
  • We can then update tests and write better documentation for our Plasma metaclass by keeping in mind that scientists with basic exposure to software development should not face trouble while trying to understand and follow up with this metaclass based implementation. By then, we should be ready to merge this in the main PlasmaPy repository.
  • Also, if all this is done ahead of schedule, I would further work on to implement subclasses that would be appropriately instantiated by our Plasma metaclass based on the input received.
  • With all this done, I also plan to work on having clean examples with necessary comments and documentation in the future depicting how this metaclass works in practice, so our users understand what actually is happening below the surface.

Benefits to the Community

This implementation of Plasma metaclass will set up the ground for supporting different kinds of plasma structures and make it easier for researchers to work with them which would otherwise be very messy and involve a lot of work on the users' end for what can be done in a more compact way with the use of metaclass. With this, PlasmaPy will be able to further implement appropriate subclasses for handling different data structures which would inturn allow us to provide support for working with many kinds of plasmas including classical plasmas, magnetized plasmas, relativistic plasmas, warm dense matter, photoionized plasmas, strongly coupled plasmas, high energy density plasmas, etc. spanning from industrial production of televisions and thin film coating, to fusion energy experiments, to stars, lightning, the Aurora and accretion discs around black holes. PlasmaPy, currently is the only open-source Python package for plasma physics which is being actively worked upon with the aim to provide an easy API for working with anything and everything related to plasma. Therefore, it is very important to provide an easy way to work with different plasma structures. A metaclass based implementation makes this viable to a great extent for the end user.


Timeline

This is the approximate timeline I propose and would abide by.

Time Period Task Description
Community Bonding Period This is the phase where I'll get more familiar with PlasmaPy's codebase. This period will also involve more discussions around the implementation of Plasma metaclass.

• What possible plasma data structures could PlasmaPy support in the future?
• How should we go about implementing the Plasma metaclass? A rough idea of all special methods that will be needed on this metaclass and its factory function.
• Even though, we haven't had any major releases as of now, what if any part of the implementation results in unavoidable breaking of the existing API? Should we take careful steps so as to keep backward compatibility?
• What are plasma physicists expecting from this implementation and how can we make this implementation more intuitive for scientists with only basic exposure to software development?
Phase 1 (May 14 - June 11)

First Week - By this time period, we will have a clear understanding about how and what purpose this implementation of Plasma metaclass will serve us. Since, PlasmaPy takes code coverage very seriously (which is a great thing), I'll first start out by writing appropriate basic red/green tests (which is the core of Test Driven Development) and explaining what each of these tests do. This will help us keep up with current code coverage and also help us write effective modular code for our Plasma metaclass. Our tests will cover the behavior of this metaclass on instantiation and that of its dummy test classes.

Second Week - After we have completed writing tests. I'll write suitable documentation for our Plasma metaclass and all necessary methods on it for what purpose they aim to accomplish.

Third Week - I'll talk with mentors and know if there are any further tests that could be added and ways to improve the documentation. If all this is completed beforehand, I'll start working on the second phase.
Phase 1 Evaluation (June 11 - June 15)

During this evaluation period, I'll have more in-depth discussion regarding the actual implementation of our Plasma metaclass during the next phase.

Deliverables - Appropriate tests and documentation for this metaclass based implementation.
Phase 2 (June 15 - July 9)

First Week - We'll first implement basic methods for our metaclass. The metaclass will declare methods (__new__ or __init__) to control the creation and initialization for every class that declares it as a metaclass. I'll also work on any other special methods that are going to be necessary for our metaclass instantiation during this week.

Second Week - I'll continue working on the metaclass implementation and provide a factory method. During this period, I'll also make sure the tests written in the first phase are linked properly and are passing.

Third Week - Once we have the basic implementation of the metaclass and its factory method, we can work on refactoring and improving our codebase. It will involve making possible improvements to the code quality. We can be certain that this will not result in any breakage in the working of the metaclass implementation because if anything goes wrong, it will be caught immediately after running the test suite (implemented in first phase).
Phase 2 Evaluation (July 9 - July 13)

During this evaluation period, I'll continue working on ways to refactor our metaclass based implementation.

Deliverables - Actual implementation of the Plasma metaclass and its factory method with all tests passing (written during the first phase).
Phase 3 (July 13 - August 6)

First Week - I'll improve the tests to make them more explicit and cover any edge cases. This period will also involve improving the documentation of our Plasma metaclass. It is important to have good documentation so that scientific programmers are able to understand the use of metaclass without any major hassles as dealing with metaclasses can at times be very confusing and not many python programmers are familiar with them. I'll also devote this period to look for any potential bugs that may arise and fix them.

Second Week - If our Plasma metaclass and factory method have been successfully implemented by this time with proper documentation and appropriate tests, I'll further discuss and start working to implement various subclasses to deal with specific data structures that will emerge from this Plasma metaclass.

Third Week - This will be a spare week in case of any emergencies or if any of the above tasks take longer than expected. Otherwise, I'll continue working on the implementation of subclasses.
Phase 3 Evaluation (August 6 - August 14)

I'll write a blog post mentioning about everything I've done on the project and what could be further done to make our metaclass implementation better, anything that's left to do and what code got merged / what didn't.

Deliverables - Final working implementation of the Plasma metaclass including proper documentation and tests covering any edge cases. As a secondary milestone, implemented one or more subclasses to handle specific data structures.

I also intend to publish a detailed blog post after every phase, and a relatively shorter one every week during the GSoC period describing my current status and my thoughts regarding the next milestone.

Schedule Availability

  • I have my semester finals in the last week of April and my summer vacation begins immediately after that. So, I'll able to commit 42 hours per week (6 hours per day) to the project for the summer.
  • Since my timezone is UTC +05:30 (India) whereas my project mentors are behind approximately six hours. This should not be much of a problem since we were able to effectively communicate during the proposal period. However, I am absolutely willing to make slight adjustments to my work schedule if needed.
  • I also currently don't have any plans to go on vacation or have any other work or internships this summer.
  • My summer vacation will end somewhere around the mid of July. I'll still be able to devote around 35 hours per week in this time period since the academic workload is low during the first few weeks of the semester.

Future Plans

I'd love to contribute to PlasmaPy even after GSoC ends. As GSoC alone will probably not be enough for implementing all necessary subclasses including their documentation and tests due to time constraints, I am looking quite forward to work more on this after GSoC ends. In fact, there is also another project I'd love to work on (#52) that aims to provide users with access to important atomic data similar to astropy (using astroquery) and also find a way to merge mendeleev in plasmapy since we have a bit of overlap and missing some of the functionality provided in mendeleev.

GSoC

Are you also applying to any other projects?

  • PlasmaPy is the only organization and this project is the only project that I have applied for in GSoC.

Have you participated previously in GSoC?

  • No, I haven't participated in GSoC before and this is the first time I would be participating in GSoC.

Eligibility

Yes, I am eligible to take part in Google Summer of Code 2018 and receive payments from Google.

@SolarDrew
Copy link

Hi @ritiek, this loks very good overall. I have a few, mostly general, comments:

  • In the "Flexibility" section you suggest that subclasses will be dynamically generated. This is not entirely true and I think it would be more accurate to say they're dynamically allocated - on instantiation the correct subclass will be selected from a list of subclasses which are already defined. (I think from your descriptions further down that you do understand this point, I just wanted to be completely clear.)
  • I would suggest scheduling blog posts more often than every three weeks - in my previous experience posts have been expected at least every two weeks (though I'm not sure exactly what the actual rules are on this).
  • If possible, it would be good to allow perhaps an extra week at the end of the timeline for the development of subclasses. Although this is an additional goal rather than the primary deliverable, I think it's worth scheduling for it and then if there isn't time for it after all that's not a problem.
  • The list of plasma types in the "Benefits to the Community" section is good, but note that the more important distinction between plasma subclasses (at least initially) will be between different data formats, and the aim will be to handle these consistently.

@ritiek
Copy link
Author

ritiek commented Mar 26, 2018

@SolarDrew thanks a lot for the review!

In the "Flexibility" section you suggest that subclasses will be dynamically generated.

Yep, that was a mistake on my part. It would indeed be better to say that they're dynamically allocated.

I would suggest scheduling blog posts more often than every three weeks

I thought students are expected to write full-fledged blog posts for each phase. It would certainly make more sense to have blog posts more frequently. One post every week should be good enough I guess.

If possible, it would be good to allow perhaps an extra week at the end of the timeline for the development of subclasses.

Good idea. I've re-adjusted my timeline to make more time to work on subclasses in the final phase.

The list of plasma types in the "Benefits to the Community" section is good, but note that the more important distinction between plasma subclasses (at least initially) will be between different data formats, and the aim will be to handle these consistently.

I do understand that subclasses will be differentiated based on the data structure they handle. I've modified it to make this a bit more clear. The plasma examples seem appealing so I wouldn't want to remove them.

@SolarDrew
Copy link

No problem @ritiek :)

I thought students are expected to write full-fledged blog posts for each phase.

Again, I'll have to check both the GSoC and OpenAstronomy rules on that, but I would certainly recommend weekly, personally. That said, I'd be fine with the weekly ones being shorter updates and then having a more detailed report after each phase. As an aside, and just so you know, I would also expect regular (weekly or every two weeks) calls with the mentors to discuss progress, and attendence at the group video conferences.

The plasma examples seem appealing so I wouldn't want to remove them.

Of course :)

@ritiek
Copy link
Author

ritiek commented Mar 27, 2018

I would also expect regular (weekly or every two weeks) calls with the mentors to discuss progress, and attendence at the group video conferences.

Sure! That would be very interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment