Skip to content

Instantly share code, notes, and snippets.

@shnitish
Last active July 10, 2021 06:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shnitish/ea39d79594339581650b025343690a9e to your computer and use it in GitHub Desktop.
Save shnitish/ea39d79594339581650b025343690a9e to your computer and use it in GitHub Desktop.
Project report Google Summer of Code 2019

GSOC'19 Work Product: SPDX specifications in PDF and HTML

Nitish Sharma | Github | Email | LinkedIn

Relevant Project Links

Project Goals:

  • PDF version of the specification
    • Status: Done
  • HTML version of the specification
    • Status: Done
  • Multilanguage Support
    • Status: Pending

Contributions

  • [#1] Added and updated the packages in the latex template to generate the PDF version of the specifications.
  • [#2] Updated the markdown files to resolve the internal links issue.
  • [#3] Added the functionality to generate the HTML version of the specs.
  • [#4] Updated the final changes and added the final version of the PDF and HTML after the review from the SPDX memebers.

Project Description

The primary objective of the project is to generate both HTML and PDF versions of the SPDX Specification from markdown. The HTML and PDF version will be generated for each draft and release version of the specification. A tool which would help in easy circulation of the the SPDX specifications by creating a HTML and PDF version of the same along with language conversion capabilities. This is acheived by using pandoc for format conversion which when integrated with the latex template generates a beautiful PDF and using an HTML template for the HTML version of the specification. Conversion is important for the sharing capability of the specifications.

Task 1: PDF version of the specifications

This is achieved by using a latex template and integrating it with pandoc. Pandoc is a tool which converts files in one markup language to another. Pandoc supports multiple formats for conversion. Using latex techniques to design a template which mainly consists the skeleton for the PDF. Just passing an input file along with pandoc and certain flags produces a beautiful PDF. The overall look and appearance is depended on the latex template. It contains several packages which beautify the content for the PDF while conversion.

Task 2: HTML version of the specifications

This is achieved by using an HTML template with the pandoc which provides the overall structure of the HTML file. Pandoc converts the markdown version into HTML. The HTML produced is very basic and needs to beautify. This is done by providing the CSS file. The template finalises the position of the contents during conversion and provide some necessary modifications too. The HTML version is hosted temporarily on my github account here.

GSOC WEEKLY REPORT

WEEK 1 :

  • First I forked the repository spdx-spec and checkout to new branch for my contribution.
  • My mentors provided details of the task for the first week.
  • It included a pandoc-latex-template which I have to take as a reference. Basically it was a template which will help to design the skeleton of the PDF we are about to generate.
  • There is to make specific customizations to the template, to use it in our favour. It was written in latex.
  • I go through documentation of each package that is included to understand it’s workflow. The first task was to include header,footer,title page and table of contents in the resulting PDF. And that was it for the week 1 I pushed the work to the branch I’ve created and provided the link to mentor for the review.

WEEK 2 :

  • My mentor decided to video call to discuss the next step towards the development. The next task was to customize the header, footer, code snippets included in the documents and the internal link references. The generated PDF does not have a styling to the code snippets of the code references. This is achieved by going through the documentation of Latex packages. I pushed the changes to the branch and updated the Readme and provided the link to the mentor for a review.

WEEK 3:

  • The internal links works for the markdown but not in the PDF version after conversion. These links direct to the external markdown files in the separate window of the browser except for linking each PDF files.
  • I made some changes according to the instructions provided by the mentors.
  • I then mailed to all the people working on the project for the fix.
  • The second approach for the internal references is to compile the markdown into tex file and using latex techniques to fix internal links and then compile the latex to PDF.

Week 4:

  • I Continued working on internal links issue,I figured out the problem but the main problem arises when compiling the markdown files for the PDF output using pandoc. Some of the contents of the markdown files are not rendering properly in the PDF.
  • My mentor evaluates the work done until first evaluation, through video call. +I demonstrated the full working and the output of the project/work done till week 4 and discussed the problems faced and their possible solutions.
  • He assigned the next task for the coming weeks. That’s it for upto first evaluation.

Week 5:

  • My next task was to generate the HTML version of the specs which contains same header, footer introduction and title page.
  • I used pandoc for file conversion, css for styling and HTML template for the positioning of elements.

Week 6:

  • I designed the CSS for the HTML and integrated it with pandoc.
  • Pandoc then generates the HTML alongwith css and further continued working on the template.

Week 7:

  • Combined HTML with following: CSS and HTML template. The HTML template works same as the Latex template to provide the skeleton for the HTML. Then I provided the link of the HTML to the mentor for a review.
  • Pandoc consists the default templates of the HTML and Latex which can be generated using pandoc --print-default-template=FORMAT

Week 8:

  • After the HTML and PDF versions are completed I started working on the language translation feature which would translate the specs into different languages mainly in french and spanish initially.
  • I first thought of using Googletrans as a library and write python code to perform translations.I wrote a python script which does the task. Unfortunately the library I was using is not stable and have limitations for usage. It only converts first 15,000 characters only which was not suitable. Then I thought of using other open source python libraries but they have also the same problem and not suitable for long term use.
  • The other approach is to use Google translate API which would allow us to convert the 500k characters for free and then charge money for the continue usage.

Week 9:

  • I reached out Jack(mentor) and he suggested to work on other things and drop the idea for the language translation and get the HTML and PDF reviewed by the tech-team members through mailing lists. After second phase evaluations I caught viral fever and not able to work 3-4 days.

Week 10:

  • I sends out PDF and the HTML for the review through mailing list and the SDPX members go throughout them in detail.
  • My college days have started and I get very less time to work during the weekends, so I mainly complete my tasks during weekends.

Week 11:

  • The PDF and the HTML have some minor tweaks which can be fixed easily like header and footers in the HTML are little distracting while reading the specs, so I decided to remove them and the tables don't have fixed coloumns widths which can be fixed by minor changes in the CSS.
  • The PDF have uneven text in the code snippets and the certain keywords gets bold unevenly mainly after hyphens and the font size. After changes the PDF and HTML both looks fine and my mentors also apprecitate the work.
  • For the easy readability I made the seperate repository on Github to host the HTML version.

Week 12:

  • The main task was completed and I ensure the proper documentation for the project repository is done and review from the SPDX members as well as from my mentors is done.
  • During the whole GSoC period I was maintaining a seperate google doc of all the completed tasks and what was done and what was not.
  • I also wrote a blog on my successful cracking of GSoC in between the first and the second evaluations which I will be mentioning in this project report somewhere in between.

Plans after GSoC

The project is complete(In terms of the PDF and HTML which was the main task). I plan to continue contributing to SPDX even after the GSoC period ends and would really like to continue with the current project if possible.

Link to my blog

https://shnitish.hashnode.dev/

Words of thanks

I would really like to thanks my mentors Krys and Jack for their continuous support throughout summer. Without their guidance it would not be possible for me to achieve this. The experience was quite amazing while working under you guys and thanks to the SPDX community for providing such a great opportunity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment