This report summarizes the work done in my GSoC 2019 project, Enhancement of Statistics Module wth SymPy. My mentors were, Francesco Bonazzi and Sidhant Nagpal. A step by step development of the project is available at czgdp1807.github.io.
I am, Gagandeep Singh, a third year Bachelor of Technology student at Indian Institute of Technology, Jodhpur in the department of Computer Science and Engineering.
The project plan was focused on the following areas of statistics that were required to be added to sympy.stats.
- Community Bonding - I was supposed to add, Dirichlet Distribution, Multivariate Ewens Distribution, Multinomial Distribution, Negative multinomial distribution, and Generalized multivariate log-gamma distribution to
sympy.stats.joint_rv_types. - Phase 1 - I was supposed to work on stochastic processes, primarily on Markov chains, including it's API design, algorithm and implementation.
- Phase 2 - I was expected to work on random matrices, including Gaussian ensembles and matrices with random expressions as their elements.
- Phase 3 - I planned to work on assumptions of dependence, improving result generation by
sympy.statsand improving other modules so thatsympy.statscan function properly.
This section describes the actual work done during the coding period in terms of merged PRs.
-
#16576: This PR added
DirichletandMultivariteEwensdistributions. -
#16808 : This PR added
MultinomialandNegativeMultinomialdistribution. -
#16810 : This PR improved the API of
Sumby allowingRangeas the limits. -
#16825 : This PR in continuation, added
GeneralizedMultivariateLogGammadistribution. This was an interesting one due to the complexity involved in its PDF. -
#16834 : This PR enhanced the
MultinomialandNegativeMultinomialdistributions by allowing symbolic dimensions for them.
-
#16897 : This was related to
sympy.coreand it helped in removing disparity in the results of special functiongamma. -
#16908 : This PR improved
sympy.stats.frvby allowing conditions with foreign symbols. -
#16913 : This removed the unreachable code from
sympy.stats.frv. -
#16914 : This PR allowed symbolic dimensions to
MultivariateEwensdistribution. -
#16929 : This one was for the
sympy.tensormodule. It optimized theArrayComprehensionand covered some corner cases. -
#16981 : This PR added the architecture of stochastic processes. It also added discrete Markov chain to
sympy.stats. -
#17030 : Some features like,
joint_distributionwere added to stochastic processes in this PR. -
#17046 : Some common properties of discrete Markov chains, like fundamental matrix, fixed row vector were added.
-
#16934 : The bug fixes for
sympy.stats.joint_rv_typeswere complete and the further work has been handed over to my co-student, Ritesh. -
#16962 : This was continuation of the work done in phase 1 for allowing symbolic dimensions in finite random variables. As I planned, this PR got merged in phase 2, after some changes.
-
#17083: The work done in this PR framed the platform and reason for the next one. The algorithm that got merged was a bit difficult to extend, and maintain. Thanks to Francesco for his comment for motivating me to re-think the whole framework.
-
#17163 : This was one of the most challenging PRs of the project, because, it involved re-designing the algorithm, refactoring the code and moreover lot of thinking. The details can be found at this comment.
-
#17174 : In this PR, Gaussian ensembles were added to
sympy.stats. -
#17304 : While working on the above PR, I got an idea to open this one to add cicular ensembles to
sympy.stats. I learned a lot about Haar measure while working. -
#17306: This PR added matrices with random expressions. The challenging part of this PR was to generate canonical results for passing the tests.
-
#17336 : This was related to bug fix in
Q.askandMatrix. Take a look at an example here.
This section contains some of my PRs related to miscellanous issues like, workflow improvement, etc.
-
#16899 : This was a workflow related to PR to ignore the
.vscodefolder. -
#17003 : This PR ignored the
__pycache__folder by adding it.gitignorefile.
The following PRs are open and are in their last stages for merging. Any interested student can take a look at them to extend my work in his/her GSoC project.
-
#17387 : This PR aims to add support for assumptions of dependence among random variables, like,
Covariance, etc. -
#17146 : This PR is in its last stages to fix and upgrade the
Rangeset and we are finalizing few things, like changes in the output ofRange. As planned I was successful at writing exhaustive and systematic tests.
Apart from the above, work on densities of Circular ensembles remains to be done. One can read the Theorem 3, page 8 of this paper.
Concise and precise. Good work Gagan