argahsuknesib/problem-statement-and-goals.md

## problem-statement-and-goals.md

      
    Raw
  

              problem-statement-and-goals.md
            
          
    Problem Statement

"There is a recent trend towards privacy and ownership of the data on the web. Solid is a specification which extends the current web for an interoperable and privately owned data. All of the data published on the web is in the form of a stream. To process these streams in a privacy-focused SOLID context, we will need to develop scalable technologies for efficiently querying and processing these streams.”
Research Questions


How to discover relevant streams for aggregation?
How to choose the strategy for cooperation and therefore optimisation of aggregators?
How should we handle the access constraints of the pod?

Discovering relevant streams for aggregation.

Using an approximation graph:

Approximation graphs can be seen as graph summarizations for source discovery and selection.
Aggregation of the static graphs to create a new graph where the recent edges have more weight than the old (in time) ones.
Optimisation can be done by pruning the edges which were old / don’t have relevant information.


Challenge : Even with the approximation graph, we will have to do a SPARQL ASK query to find out if / there is relevant triple pattern fragment in the stream, which is a bit expensive.
Strategies for cooperation of the aggregators.

An approximation graph service will summarize / give an (evolving )overview the stream from a pod.
Using the summarization, and the query to evaluate. We can find if there is relevant data in the particular stream.
An orchestrator service can then use summarization to execute the local aggregation on the pod(s).
How to handle the access constraints of the pod?


With aggregator policies for data use.
Describing the aggregation service with an ontology, which the orchestrator can use to check if it needs the data from the service.

Next Goals


Develop a simple stream based aggregator.
Improve it to use the approximation graph and orchestrator and compare the results (throughput, latency, correctness and maybe scalability with burst of pods)
Include the access policies for the aggregator and then to compare the three.