The build for commit 515a7d95 is based on the branch lutter/stuff. The branch backports these changes to v0.34.1:
- parallel query
execution
Parallel query execution is only enabled when the environment variable
GRAPH_PARALLEL_BLOCK_CONSTRAINTSis set totrue. It defaults tofalse. When enabling this, keep a close eye on your database connection wait times and general latency. With it turned on, a query can use as many database connections as there are distinct block constraints on a GraphQL query, which could cause other queries to have to wait. - improved query traces
- improve determining if a subgraph has an error
- resolve block constraints in bulk
- limit the maximum size of a cache entry
The build for commit c266a0ca is based on the same branch but only contains the first three changes from the list.
Query tracing is enabled by setting the environment variable
GRAPH_GRAPHQL_TRACE_TOKEN to some string value. A GraphQL query that
contains the HTTP header X-GraphTraceQuery with that string value will
contain additional data in the response. The response will have two new
entries besides data: trace and http.
All times in the query trace are walltimes, i.e., include possible waits caused by a task having to wait because there are other tasks that need to be scheduled.
Queries are processed by splitting the toplevel GraphQL query fields by
block constraints. Each of these groups is run as one unit. When parallel
query execution is turned on, these groups are run in parallel; when not,
they are run one after the other. For each group, we first check the cache
if we already have that result. If not, each group needs to acquire a query
semaphore before starting to run database queries. By default, the
semaphore has as many permits as the database connection pool has
connections. With that, each group has a reasonable chance of getting the
connections it needs even if there are a lot of other queries happening at
the same time. The environment variable GRAPH_EXTRA_QUERY_PERMITS can be
set to increase the number of permits. Setting this to something very
large, say 100,000 will effectively disable the semaphore.
The additional response fields contain the following:
{
"trace": {
"query": "<graphql query>",
"query_id": "<query id>",
# Overall time for executing the query; excludes parsing and
# serializing the result
"elapsed_ms": "..",
# Time for getting the GraphQL schema for the subgraph, checking where
# its head is and some other bookkeeping
"setup_ms": 63,
# Time for parsing the HTTP request into the AST we use to represent
# the GraphQL query
"query_parsing_ms": 12,
# Rollup of times for all SQL queries that were run
"db": {
# Total time spent sending a SQL query to the database and waiting
# for the result
"elapsed_ms": 82,
# Total time spent waiting to get a database connection
"conn_wait_ms": 19,
# Total time waiting for the query semaphore
"permit_wait_ms": 0,
# Total number of entities returned from the database
"entity_count": 200,
# Number of by-block-constraint groups that actually queried the database
"query_count": 200,
# Number of by-block-constraint groups that were found in the cache
"cached_count": 0
},
# One entry for each by-block-constraint group
"blocks": [
{
"trace": {
# The block at which queries in this group ran
"block": 10600242,
# Total time spent executing all queries for this block. Includes
# all the times listed for each query below
"elapsed_ms": 16,
# One entry for each toplevel query for this block, labeled with
# the GraphQL response key
"<response key>": {
"query": "<SQL query with bind variables>",
# Time for sending query to database and receiving result
"elapsed_ms": 1,
# Time for getting a database connection prior to running a query
"conn_wait_ms": 0,
# Time for aquiring the query semaphore
"permit_wait_ms": 0,
# Number of entities for this query
"entity_count": 1
},
"permit_wait_ms": 0
},
# Whether the query was run against the database or served from
# cache. Possible values:
# miss: run against database, result not added to cache
# insert run against database, result added to cache
# hit: served from cache; query timings will be from when the
# query was executed against the database
# shared: identical query was running simultaneously; we waited
# for that one to finish and use its result. Query timings
# from the execution against the database
"cache": "insert"
},
...
]
},
"http": {
# Time it took to serialize the response from the internal
# representation to JSON
"to_json": "351.743µs",
# Size in bytes of the internal representation
"cache_weight": 11272
}
}