Jeff Kinnison
Apache Airavata is a Science Gateway designed for easy interaction with remote computing resources and easy customization to address the concerns heterogeneous scientific applications. All parts of my GSoC project have been completed to the satisfaction of my mentors and merged into the development version of the Airavata project. This project involved 1) data streaming and 2) data sharing for scientific applications. Data streaming allows users to monitor remote applications and meaningfully visualize their state with regards to the science being carried out. Data sharing allows users to easily share and replicate computational science experiments while respecting security concrns.
Despite progress in making scientific computing accessible, science gateways still face the challenges of providing feedback to and sharing research among users. To address these challenges, this summer I added the capability to stream data from remote computing nodes and share projects and experiments with users to Apache Airavata.
Data streaming allows for application-level remote monitoring using secure communications protocols. Data to be streamed is defined at the application-level and may be incorporated into gateways using a WebSockets server deployed next to Airavata and JavaScript client-side code. Project and experiment sharing allows multiple users to access experiment inputs and outputs in addition to allowing users to clone shared projects. User permissions are set coarsely at the project level and can be fine-tuned on a per-experiment basis to allow easy, secure collaboration.
SimStream is a data collection and message sending application that is configured to periodically run user-defined functions and send the results to a remote message broker. In Airavata, it is used to parse remote application output files and pipe data to the PGA. The current repository has example code using that parses outputs from the OpenMM Molecular Dynamics simulation suite and sends log entries and Root Mean Squared Deviation calculations as JSON data.
To accommodate heterogeneous scientific applications, SimStream is designed to run user-defined parsing functions. This means that, in effect, it is a long-running polling application that facilitates customized analyses. All data collection is concurrent (not parallel) using Python Threads. Messages are passed using the pika
library to communicate with a RabbitMQ server.
All work described in the proposal has been completed and merged into the Airavata project with the exception of event monitoring and handling. Event monitoring/handling was determined to be low-priority, and while it is an interesting idea, little interest was generated. It may be included in future versions of SimStream.
- Writing up a how-to guide for designing data collection functions
- Dynamically configure SimStream on an application-by-application basis
- Continuing testing, profiling, and maintenance
AMQPWSTunnel is the solution designed to distribute SimStream data to the PGA. It uses the Tornado Websocket framework and pika
library to consume data from RabbitMQ and stream it to clients ove the wss
protocol. In the PGA, a JavaScript WebSocket client can read the data and render it in a meaningful format.
This module was necessary because the PGA was not designed to incorporate both standard HTTP and WebSockets endpoints. A preferred solution would be to incorporate WebSockets endpoints directly into the PGA to unify data access and security.
AMQPWSTunnel has been merged into the Airavata as a submodule. Security policies and deployment methods are still being defined.
- Integrating into the Airavata startup process (start AMQPWSTunnel with Airavata)
- Defining security policies to ensure streams are only available to the correct users
- Writing a how-to guide for creating client-side data visualizations of SimStream data
Grouper is a database interface for creating and managing group membership. Though not included in the original proposal, enabling project and experiment sharing was a major part of my work this summer. Using Grouper, I enabled coarse- and fine-grained data sharing controls in the PGA to enable collaboration and replication of research.
Sharing policies are still being defined, however basic sharing features have been enabled in the PGA and Airavata.
- Refining sharing policies
- Modifying the API to work well with use cases
The results of this project were presented at the XSEDE Gateways & Workflows Symposium Series on August 19, 2016. This presentation involved a demonstration of each aspect of the project from the client perspective.