Skip to content

Instantly share code, notes, and snippets.

@rak108
Created August 20, 2021 10:42
Show Gist options
  • Save rak108/30ddf7396995312bb662e8cd8108a46b to your computer and use it in GitHub Desktop.
Save rak108/30ddf7396995312bb662e8cd8108a46b to your computer and use it in GitHub Desktop.
GSoC 2021: Work Product

google summer of code logo rucio 

Google Summer of Code 2021: Work Product Submission

Rakshita Varadarajan (@rak108)

Project Abstract

Rucio is an open-source software framework that provides functionality to scientific collaborations to organize, manage, monitor, and access their distributed data and dataflows across heterogeneous infrastructures. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS and is continuously enhanced to support diverse scientific communities. This project seeked to enhance Rucio clients primarily by enabling the availability of different transfer tools to make them easier to use in heterogeneous environments. This project also aimed to implement protocol support for SSH, rsync, and rclone, along with the ability to choose the optimal transfer protocol based upon the local and remote configuration, unless specifically mentioned otherwise.

Mentors: Mario Lassnig (@mlassnig), Martin Barisits (@bari112)

Need for the project

Prior to the project implementation, The file transfer operations between the client and RSE used the transfer protocols statically associated to each Rucio Storage Element (RSE), i.e, they forcibly used the default 'impl' value specified in the RSE and any subsequent file download or upload operations done between the client and the RSEs used the methods of the associated protocol implementation.

Previous implementation of 'impl' parameter

This meant two things:

  • The client had no say in which protocol implementation was to be used for the operation, &
  • If the client had no support for this default 'impl' but had support for other protocol implementations present in the RSE, the operation would fail regardless.

Previous execution of modules

Thus, the project proposed to not only give the clients the option to choose which protocol implementation they would prefer with every upload/download operation ("--impl" parameter), but also proposed to devise an algorithm which was capable of finding the protocol implementation supported by both the client and the RSE to avoid unwanted failed file operations.

Further, Rucio only provided support for the GFAL libraries within the High-Energy Physics (HEP). It lacked support for more widely-deployed non-grid tools used by communities outside the HEP community. Thus the project proposed to provide support for SSH, rsync and rclone so as to allow a larger number of communities to adopt Rucio as their Distributed Data Management System.

Work Done

Presented the daily work to the mentors on a pull request made on forked resporitory. Link: rak108/rucio

Project Pull Request:

List of tasks completed in relation to the project:

  • Refactor the 'impl' parameter to give clients the power (along with tests for the same).
  • Create a docker container (in the Containers Repository) having support for ssh for testing purposes.
  • Provide protocol support for SSH (along with tests for the same).
  • Provide protocol support for rsync (along with tests for the same).
  • Provide protocol support for rclone (along with tests for the same).
  • Devise and implement a working algorithm to select the optimal protocol implementation based upon client and RSE support (along with tests for the same).
  • Implement CLI command: 'list-impls' to list Rucio-supported protocol implementations for convenience of clients.

Discussed the details of the project more in depth with the mentors and began coding during the second week of community bonding.

Weekly Breakdown

Community Bonding Period

Week 1 (24th-30th May):
Discussed in detail various aspects of the project. Began working on 'impl' refactoring. Got accustomed to test cases and understood the different test cases written (Unit and Integration tests). Wrote one dummy test case for single file download.

Week 2 (31st May-6th June):
Finished writing both Unit and Integration tests for 'impl' refactoring.

Phase 1

Week 3 (7th-13th June):
Completed refactoring the 'impl' parameter and ensured that the test cases written in regards to this passed.

Week 4 (14th-20th June):
Worked on the containers repository to create the docker container required for testing the new protocols being set up (SSH, rsync, rclone).

Week 5 (21st-27th June):
Wrote tests for SSH and setup SSH protocol support too. Ensured SSH passed all test cases written.

Week 6 (28th June-4th July):
Wrote tests for rsync and setup rsync implementation support too. Ensured it passed all test cases written.

Week 7 (5th July-11th July):
Made the rsync class a subset of SSH. Wrote tests for rclone and implemented rclone support for primarily ssh/sftp.

Week 8 (12th July-18th July):
Extended rclone implementation to include support for ssh/sftp/rsync, s3, webdav, local and https implementations. Modified tests to adapt to all scenarios and ensured the code passed all tests.

Tested all code written so far to ensure no bugs were cropping up.

Phase 2

Week 9 (19th July-25th July):
Discussed with mentors and decided on most appropriate way of implementing the required algorithm to choose 'impl'. Wrote tests for the same.

Week 10 (26th July-1st August):
Implemented the algorithm and ensured it passed all tests written.

Week 11 (9th August-15th August):
Added CLI support for command 'list-impls'. Tested all code written over GSoC period and fixed small bugs that cropped up.

Results and Conclusion

'impl' Refactoring:

Changed implementation of 'impl' parameter

The option to use variegated Rucio-supported protocol implementations for every file transfer operation by using the 'impl' parameter has increased the customisability of the Rucio clients.

Changed execution of modules

Protocol Support for SSH, rsync and rclone:

Providing the required support for these popular protocols has equipped Rucio to be used by a larger number of communities (especially those outside the HEP community that dont't have support for GFAL libraries), thus increasing the compatibility and accesibility of Rucio to a larger audience.

Algorithm to select optimal 'impl':

The algorithm implemented to choose the apt protocol implementation ('impl' value) based upon both local (client) and remote (RSE) configuration now ensures that the file transfer operation does not fail if the client does not support the default 'impl' in the remote configuration, which made the RSE inaccessible, and was an issue cropping up before. Further, the algorithm also gives the users a choice to add a list of preferred implementations for upload and download in the rucio.cfg file, and these values are given priority while selecting the optimal 'impl' value. Thus, with this algorithm, the Rucio system has been made more robust and adaptable.

Acknowledgement

I would like to extend my sincere gratitude to my mentors for giving me this incredible opportunity. My mentors, along with the organization members, have been very approachable and have provided me with constant support and guidance throughout the project. I look forward to contributing and learning more from the organization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment