kunalsharma05/gsoc2018.md

## gsoc2018.md

      
    Raw
  

              gsoc2018.md
            
          
    Google Summer of Code, 2018

Organization: Open Chemistry (cclib)

Name: Kunal Sharma

Project Title: Implementing new parsers (Molcas & Turbomole)

Mentors: Adam Tenderholt, Karol M. Langner & Eric Berquist

Desciption

This summer I worked with Open Chemistry, specifically, cclib.
cclib is an open source library, written in Python, for parsing and interpreting the results of computational chemistry packages. You can find more about cclib at https://cclib.github.io.
The main focus of this GSoC was to implement two more parser for parsing the Molcas (& OpenMolcas) and Turbomole output files.
The process of the development of parsers went somewhat like this:

Generating the output files: The cclib test suite has various tests for different types of calculations for three molecules: Divinylbenzene, Water and Molybdenum(VI) tetrachloride oxide.
Adding skipFor* decorators on the unit tests: cclib utilizes skipFor* decorators to skip a certain test for a particular parser.
Parsing an attribute & removing the skipFor* from the respective test: Basically, writing code to implement parsing of an attribute and submitting a small PR so that the code could be reviewed properly. The code would get tested using Travis and changes would be made if needed. If the code was reviewed to be fine and the tests were cleared then the code would be merged. This could be achieved since the parsed attributes were uniform across various softwares and the unit tests could be used for the new parsers as well. This not only helps in testing the code for a particular attribute but also helps in keeping track of the attributes implemented. Using this approach the code for a particular attribute could be merged in the master branch using small PRs which could be properly assessed without breaking it for other developers and users.

Molcas

Most of the attributes for Molcas were implemented. It now parses relevant data from logfiles of various types of calculations like Single point (both Restricted and Unrestricted), Geometry optimization, vibrational, Møller-Plesset & Coupled cluster, post HF, thermochemistry etc.
The parsed atributes include moelcular information (coordinates, atomic numbers, core electrons etc.), information about the job performed (basis set used etc.) and results of the job (e.g. energies, optimized structures, molecular orbitals, frequencies etc.)
Turbomole

While most of the features for Molcas were completed, Turbomole posed a few challenges. Since Turbomole is a proprietary software, getting full access was difficult and did not happen until last week of July. While there is a demo version, it is restricted to calculations with molecules containing 1-3, 6-12 atoms. Turbomole also has complicated usage, and unlike other programs, there are multiple output files for a single job.
Currently, Turbomole parser supports parsing for single point calculation attributes, frequency analysis attributes, Møller-Plesset & Coupled cluster corrected energies, molecular orbitals and energies etc.
A method to sort the input files to Turbomole parser was also created so that the interdependent data is also parsed correctly.
The Turbomole parser is not yet complete and is included as a stretch goal.
Github Repositories


cclib/cclib
cclib/cclib-data

My Contributions


Commits in cclib/cclib
Project page to keep a track of things
List of merged PRs