Skip to content

Instantly share code, notes, and snippets.

@argahsuknesib
Last active October 19, 2022 07:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save argahsuknesib/0468e00440a2bd35bdca7e2704d5ab57 to your computer and use it in GitHub Desktop.
Save argahsuknesib/0468e00440a2bd35bdca7e2704d5ab57 to your computer and use it in GitHub Desktop.
Aggregators to realise improved data summarization on data streams in a network of solid pods.

Pitch

Healthcare data is sensitive, for which the solid specification can be used to store the sensitive data. In patient monitoring system, datastreams produced by personal vitality sensors and activity trackers are semantically annotated and stored in the data pods. Monitoring by a healthcare expert would require computations on the healthcare data streams, and monitoring paitients simultaneously will be tougher as the number of patients increase. Aggregators are therefore required to realise an improved data summarisation from the solid pods. Aggregations can be multi-paitients across the solid pod, or on a single paitient's data pod. The DAHCC Dataset will be used as the data stored for each patient in a solid pod to realise the aggregators.

Desired Solution

A proof of concept aggregator which runs as a service over the datastores, with which a client application interacts. The client application can specify the nature of the aggregation to be performed. The solution is required to,

  • Execute the queries as requested by the client application over a specific window (to be specified by the client application).
  • Is able to do aggregation on a single pod as well as over multiple pods.
  • Is able to perform time-based tumbling window queries over the data streams.
  • Store the result of the bindings into the solid pod, so that the client can execute a GET request to access the aggregated data summarization.
  • Optimisation strategies to handle new incoming data streams (to decide if a new data will be executed in the current window in incremental fashion or in the next window)
  • Schema Alignment while doing the aggregation over multiple pods.
  • Reasoning over the results of the bindings.

Use Case

The dataset has sensor values from multiple paitients. To monitor the patient's location, we use the sensors which detects the presence of the person in the house. The person detection sensor is employed in the 3 halls, kitchen and the bedroom in the DAHCC dataset. We will aggregate each patient's location in a particular window, as well as the location of all the patients. We can further reason over the data, for example to check and alert if the patient's location is not in the bedroom after a certain period.

Acceptance Criteria

A demo resulting from the solution should be able to,

  • Accept queries for a single or multiple pod and get back the bindings to the client.
  • The system needs to be evaluated on the latency, throughput and correctedness and scalability with burst of data streams.
  • The aggregator's scalability can be evaluated with adding more data pods.

Assumptions

Compared to (SolidLabResearch/Challenges#24), we focus on the streamming and windowing aspect for aggregation of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment