Skip to content

Instantly share code, notes, and snippets.

@dynamic-entropy
Last active September 16, 2021 13:41
Show Gist options
  • Save dynamic-entropy/ce664532ab28f8a13b7cf8f68e2d0f2a to your computer and use it in GitHub Desktop.
Save dynamic-entropy/ce664532ab28f8a13b7cf8f68e2d0f2a to your computer and use it in GitHub Desktop.

google summer of code

To enable data management for the ScienceMesh Cloud

Google Summer of Code 2021

This project aims to enable data transfer to and from a site in the Worldwide LHC Computing Grid or WLCG, production-grade infrastructure used by the High-Energy Physics community for over a decade, to sites in ScienceMesh an emerging infrastructure to federate several Enterprise File Sync&Share platforms across the globe. CERN, the European Organization for Nuclear Research, has historically led the WLCG and is currently leading the effort to deploy the ScienceMesh, as part of the EU-funded project CS3MESH4EOSC. CERN also hosts CERNBox, as one of the ScienceMesh sites.
This will allow various applications in the ScienceMesh to be able to directly use grid data. This integration will enable high-speed transfer of data from remote locations to local sites across different countries, specifically supporting use-cases not able to extend processing capabilities to remote sites.
One project that will benefit from this integration is LOFAR , the LOw Frequency ARray telescope, the largest radio telescope operating at the lowest frequencies that can be observed from Earth.
The project will make data stored in one of the largest astronomy archives on the planet, i.e LOFAR Long Term Archive, located at SURF (Amsterdam), PSNC (Poznan) and FZJ (Jűlich), and approaching 50 PBs of data, available locally. Thus allowing a freedom to choose the tools that researchers are familiar with and that work best for the use case without thinking about interoperability and availability of the tools on the remote sites.

Cloud Services for Synchronisation and Sharing Mesh for European Open science Cloud :) The research and education space substantially makes use of Cloud services for synchronisation and sharing. However these hubs of scientific collaboration are confined within themselves. The ScienceMesh aims to solve this problem by federating a frictionless data transfer between these services. To solve this problem a set of interfaces called the CS3APIs was developed. And the Reva project was born that leverages CS3APIs to connect these services in a scalable and portable way.

Data management is an increasingly complex and complicated challenge. This is especially true for cern and its various experiments. The Atlas community has relied on Rucio for this insurmountable task.
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The data can be distributed across heterogeneous data centers at widely distributed locations. We want to use Rucio to orchestarate on demand data transfers both within the ScienceMesh and between a ScienceMesh site and a Grid Site.

We add a plugin or extend existing plugins in the GFAL library and allow the mulitplexer to use the custom flow for ScienceMesh sites based on the the url prefix. Further we devise an authentication flow to retreive tokens from reva.

communication flow

The HTTP TPC mechanism relies on utilizing the existing WebDAV COPY verb (RFC 4918) and interoperable implementations / interpretations of this part of the specification. HTTP-TPC is an extension to HTTP [RFC7231] that allows a client to request that an HTTP entity is copied from one server to another without the data describing that entity passing through the controlling client. Any such transfer of an entity is called a third-party transfer. I
Particularly, in many WebDAV implementations (also in Reva), COPY is limited to resources inside the same service; for third-party-copy, we allow this to trigger transfers from remote services.

For any given transfer, the two servers involved are labelled in two distinct ways.
The SOURCE is the server that has the resource the client wishes to transfer initially, while the DESTINATION is the server that should receive the resource.
Independently, the ACTIVE PARTY is the server to which the client makes the third-party copy request, while the remaining server is the PASSIVE PARTY.

When thesource server is the active party then the transfer is referred to as a PUSH REQUEST:

      +----------------+                     +-----------------+
      |  Src server    |    Data Transfer    |  Dest server    |
      |  Active party  |  ================>  |  Passive party  |
      +----------------+       Request       +-----------------+
              ^
              | Copy
              | Request
              |
         +----------+
         |  Client  |
         +----------+

while the destination server is the active party for a PULL REQUEST.

      +-----------------+                     +----------------+
      |  Src server     |    Data Transfer    |  Dest server   |
      |  Passive party  |  <================  |  Active party  |
      +-----------------+       Request       +----------------+
                                                      ^
                                                      | Copy
                                                      | Request
                                                      |
                                                 +----------+
                                                 |  Client  |
                                                 +----------+

In order to achieve the objectives we came up with the following flow of data and control:

Control and Data transfer flow diagram


Realisation of the above flow required:

  1. Extending davix to communicate with a reva server via HTTP Protocol: cern-fts/gfal2/pull/7
  2. Configuring the GFAL library to utilise the davix extension when requests are made to a Reva server.
  3. Implementing logic inside Reva to respond to Http TPC requests from the Reva server.

Below we perform a COPY in pull mode from one site in the ScienceMesh to other.
Note: The authentication tokens for the transfer are available as environment variables to the GFAL process for the time being and will be taken up as a future work.

gfal-copy -vf \                              
          --copy-mode=pull \
          cs3://reva:19001/remote.php/webdav/home/srcFile \ 
          cs3://reva2:17001/remote.php/webdav/home/dstFile

Result Gfal Copy gif

  1. Devising an authentication flow that allows full interoperability between WLCG sites and ScienceMesh sites.
  2. Retrieving storage-issued tokens from ScienceMesh sites.

I would like to earnestly acknowledge the sincere efforts and valuable time given by my mentors. And I am humbled by how empathetically they answered my trivial and at times repetitive questions.
There have been weeks when we had video meetings daily. And the debugging sessions with Giuseppe proved to be of utmost value.
I am grateful to Mihai Patrascoiu, Steve Murray, Joao Lopes from the FTS team, Ishank Arora from cs3org and Mayank Sharma, Benedikt Ziemons and Thomas Beerman from Rucio for taking out time from their busy schedule to help me. I have disturbed them a lot, :), sometimes even when they were on vacations.

Thanks to all of you for making this an invaluable experience. Along with the intereting project it was you guys who kept me going and because of whom I feel motivated to continue contributing.

Footer

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment