Skip to content

Instantly share code, notes, and snippets.

@kylebrandt
Last active November 17, 2021 13:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kylebrandt/eed826897043eba085f04e800d9eb5b0 to your computer and use it in GitHub Desktop.
Save kylebrandt/eed826897043eba085f04e800d9eb5b0 to your computer and use it in GitHub Desktop.
Data source Info / Ref Notes

Some Notes on Data source Info / Data source Ref:

  • Terms I'm using for these notes:

    • DatasourceRef: is model that includes (uid+type), is saved by dashboards and alerting (note: alerting is current UID only)
    • DatasourceInfo: is full information that a datasource neededs to execute a request, and can be looked up by the Ref
  • Contexts:

    • Alerting: The Ref is saved. Alerts run as a service, so in this context there is no user (although perhaps there should be a service user or role). Therefore for saved alerts run by the service, authorization is done at creation time, not run time - (as users would not want alerts to stop working if a user is removed).
    • Query API: Runtime, in this context there is a user
  • Main Question:

    • What Services are responsbile for what for data source execution?
    • Where trailing "What" is:
      • Authorizing a User (or Service) to a data source (or data source ref?), and who caches this?
      • Taking a data source ref and getting data source info
      • Decryption of secrets within data source info?
  • Considerations:

    • Security (e.g. Defense in depth, services not having secrets they don't need)
    • Performance (e.g. not looking up things twice)
    • Making things services - and the stateless aspect (TBH "stateless" in this context isn't entirely clear to me)
  • Misc Notes:

    • SSE doesn't need data source info, only the ref, I think only data source plugins and editing of data sources needs the full info
  • A Possible Architecture

    • Service to Authorize Datasource Ref to Users (To be called at borders (e.g. APIs or exported services))
    • Another service to Translate Datasource Ref -> Datasource Info (to be called by internal services)
    • With this, things like recorded queries would use the Data Source Ref Authorization service, SSE as well only sees the Datasoure Ref but isn't responsbile for authorization. Data sources get info (I imagine an execute queries call could use the ref to fetch data source info before sending the request to a data source).
@marefr
Copy link

marefr commented Nov 15, 2021

Querying /api/ds/query with no expressions:
Note: datasource.CacheService.GetDatasource handles both caching of looking up data sources and for Grafana Enterprise data source permissions checks.

Sequence diagram - Sequence diagram

@marefr
Copy link

marefr commented Nov 15, 2021

Querying /api/ds/query with expressions:
Note: datasource.CacheService.GetDatasource handles both caching of looking up data sources and for Grafana Enterprise data source permissions checks.
Note 2: Recorded queries are doing something similar: https://github.com/grafana/grafana-enterprise/blob/dc2ced652fce46bed8c8660bb87abd6335227c9b/src/pkg/extensions/recordedqueries/metric/datasourcegetter.go#L103

Sequence diagram - Sequence diagram (1)

@marefr
Copy link

marefr commented Nov 15, 2021

@kylebrandt
Copy link
Author

The main with the flow diagrams of the current architecture is the difference if a single request can be multiple data source or not (SSE is multiple DSs per request).

My bias towards SSE makes me think it is better if our query structure is should always be multiple data source or not.

  • Pro Multids: Same structure for multiple data sources as it is for one
  • Con Multids: If considered an edge case, than information is being repeated in each query that doesn't need to be repeated (although not as bed if it is just the data source ref)

@marefr
Copy link

marefr commented Nov 15, 2021

New query service introduced by grafana/grafana#41653 basically adds a simplified interface for consumers so that each consumer don't need to lookup each data source:
seq drawio (1)

@marefr
Copy link

marefr commented Nov 15, 2021

The main with the flow diagrams of the current architecture is the difference if a single request can be multiple data source or not (SSE is multiple DSs per request).

MetricRequest DTO is actually suitable for multiple data sources and/or other types of queries, such as SSE. Except the time range, something we want to have per each query.

One optimization we do now when there's no SSE query is to bundle multiple queries for the same data source, but that's can perhaps seen as an implementation detail? Are we doing this for SSE as well?

Thinking about the current API payload when calling /api/ds/query. To me it would make sense to let SSE handle all data source queries, e.g. "new query service referred to in earlier comment" could be SSE. Can it already today handle queries not containing any expressions?

This approach would require SSE to check ds permissions which might or might not be optimal. One of the nice things though with this is that it doesn't need any models.DataSource for expression queries or grafanads queries, only for real data source queries.

  "queries": [
    {
      "datasource" : { "type": "expr" }
    },
    {
      "datasource" : { "type": "prometheus", "uid": "prom1" }
    },
    {
      "datasource" : { "type": "grafanads" }
    }
  ]

@kylebrandt
Copy link
Author

Thinking about the current API payload when calling /api/ds/query. To me it would make sense to let SSE handle all data source queries, e.g. "new query service referred to in earlier comment" could be SSE. Can it already today handle queries not containing any expressions?

It mutates the shape of incoming data frames, so not ideal to always call it at least as it currently is.

This approach would require SSE to check ds permissions which might or might not be optimal.

This would require the user to be passed through to SSE, which it doesn't need otherwise. In the case of alerting, there is no user, so would need behavior for that as well. So that results in two entries paths into SSE that would be different in terms of how permissions need to be checked. Also seems a bit more intuitive to me to have this checked at the borders (HTTP API for querying or alerting) since those are the places where permissions need to be handled different (run time permissions vs creation time permissions).

@marefr
Copy link

marefr commented Nov 17, 2021

It mutates the shape of incoming data frames, so not ideal to always call it at least as it currently is.

What do you mean by this, can you provide some example/code? Or is it just how expressions work, result from a query is passed as data frames to expression evaluation?

This would require the user to be passed through to SSE, which it doesn't need otherwise.

Yes I know and I tend to agree :) But it doesn't feel optimal that each consumer have to parse and understand each and every query when the requests comes in just for the sake of checking permissions when the actual parsing and evaluation is happening later.

@kylebrandt
Copy link
Author

Or is it just how expressions work.

It is just how it works. This gets into my type stuff, but basically they are all converted to one format (Many - one frame per thing (time series or number)) so other operations can work with things in a consistent way. It doesn't have to be this way per say, in the past the code kept things as is, and used specifier data and methods to get things, but this ended up being a bit unwieldy. After types work it is perhaps more feasible.

Yes I know and I tend to agree :) But it doesn't feel optimal that each consumer have to parse and understand each and every query when the requests comes in just for the sake of checking permissions when the actual parsing and evaluation is happening later.

This is one of the reasons I was thinking perhaps the DatasourceRef distinction is important. It is looping over every query, but only to get that DatasourceRef property which is a Grafana reserved property / known type. I guess this also has the underlying assumption that the border (API / user interaction) layers should check permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment