Skip to content

Instantly share code, notes, and snippets.

@kapilkd13
Last active August 28, 2017 19:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kapilkd13/309f9893ebf5657f33c823b154555212 to your computer and use it in GitHub Desktop.
Save kapilkd13/309f9893ebf5657f33c823b154555212 to your computer and use it in GitHub Desktop.

GSOC 2017: Final Report

Final report of my GSOC'17 project with Common Workflow Language Organization by kapil kumar

  Hi all, I am kapil kumar. This summer I worked with Common Workflow Language Organization as part of Google Summer of Code, 2017 project. Official coding period for GSOC'17 had just ended and its time to sum up my work and present a final report on it.

Aim of our project was to make cwltool and schema-salad windows compatible and work on other bugs and features. Common Workflow Language is a specification which is used to describe command line tools and workflows, providing benefits like flexibility, scalability and portability. Cwltool is a reference implementation of the Common Workflow Language. It is intended to feature complete and provide comprehensive validation of CWL files as well as provide other tools related to working with CWL. Salad is a Apache Avro based schema language for describing JSON or YAML structured linked data documents. Cwltool depends on schema-salad for object creation, reference resolution and validation of CWL files. Since workflows and tool definitions can use unix tools, we decided to allow workflow execution inside a docker container when working with windows operating system.

We started our work with making schema-salad compatible with windows operating system. here is the merged PR. After that we started working on windows compatible cwltool. Some of the issues that we came across were unsupported scheme, Symliks on windows and windows path separator related issues. We choose appeveyor CI for testing cwltool implementation on windows OS. Once we passed all units tests on windows, we started working on passing conformance tests on windows and ensuring docker support for cwltool on windows. After resolving some time consuming issues like Non blocking I/O operation and default docker container on windows, we achieved windows compatibility for cwltool. Here is the merged PR.

After the windows compatible cwltool, I worked on the following bugs and features:

  • Python 3 support on Windows: Once we made cwltool windows os compatible we found that it is having some issues with python 3 on windows. We fixed those errors and here is the final PR.
  • Adding documentation file for windows compatibility: Adding documentation for windows users of cwltool.
  • Allowing Http/Https files as input: Earlier we used to load workflows over http but input files were still needed to be present locally. In this PR we added a feature to load inputs over http. File caching is used to avoid downloading files again. Here is the PR.
  • Adding Testsuite to cwltest: Cwltest repository is lacking a test suite to make sure that any new PR do not break the codebase. This PR aims at adding a test suite to the cwltest repo and is currently work-in-progress.
  • Adding --docker-pull flag to force pull latest docker image: We added a feature to force pull latest docker image mentioned with dockerpull variable even if a image is locally present. We can do this using a --docker-pull command line argument.
  • Using stdout field in cachekey calculation: Taking account of stdout field while calculating cachekey for better cache results.
  • Removing unnecessary warning due to generation field: Due to regression, unnecessary warning was being generated which we fixed in this PR.
  • [Future module] Harcoded tmp folder prevent windows compatibilty of past module: Since we are using past module (part of future) to run avro-cwl on python 3. This module has some hardcoded paths and is not compatible with windows OS. We made a PR to fix that.
  • avoiding use of exception.message in python 3: Creating a message attribute for a exception is advisable but is not a requirement. Some libraries do not have message attributes to their exception object so we removed references to them.
  • Adding Build Badges to cwltool and schema-salad: Here we added build status badges to our cwltool and schema-salad repository.
  • adding warning when default docker container is used: On windows we use a default alpine docker image if no external image is provided, when this default container is used we issue a warning to the user.
  • fixing Nonetype not iterable error: A regression error arrived with updates to cwltest module, we fixed it here.

I would like to thank my mentors Anton Khodak, Janneke van der Zwaan and Michael R. Crusoe for being cool and helping me on almost every step of this project. Also a big thanks to Peter Amstutz and my GSOC buddy Manvendra for their constant help. It was a great experience, working with all of you.

Overall, we met all the requirements and I consider it to be a successfull GSOC. I hope my small contribution will help the open source community :)

As part of my GSOC project requirement, I recorded my progress in blog posts on a weekly basis. All blog posts related to my GSOC journey can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment