Skip to content

Instantly share code, notes, and snippets.

@cyriltovena
Created November 27, 2019 00:42
Show Gist options
  • Save cyriltovena/dd9b20930f8188f4bf5b82d232225536 to your computer and use it in GitHub Desktop.
Save cyriltovena/dd9b20930f8188f4bf5b82d232225536 to your computer and use it in GitHub Desktop.
[{"body":" General Notes Cortex has evolved over several years, and the command-line options sometimes reflect this heritage. In some cases the default value for options is not the recommended value, and in some cases names do not reflect the true meaning. We do intend to clean this up, but it requires a lot of care to avoid breaking existing installations. In the meantime we regret the inconvenience.\nDuration arguments should be specified with a unit like 5s or 3h. Valid time units are \u0026ldquo;ms\u0026rdquo;, \u0026ldquo;s\u0026rdquo;, \u0026ldquo;m\u0026rdquo;, \u0026ldquo;h\u0026rdquo;.\nQuerier -querier.max-concurrent The maximum number of top-level PromQL queries that will execute at the same time, per querier process. If using the query frontend, this should be set to at least (querier.worker-parallelism * number of query frontend replicas). Otherwise queries may queue in the queriers and not the frontend, which will affect QoS.\n -querier.query-parallelism This refers to database queries against the store (e.g. Bigtable or DynamoDB). This is the max subqueries run in parallel per higher-level query.\n -querier.timeout The timeout for a top-level PromQL query.\n -querier.max-samples Maximum number of samples a single query can load into memory, to avoid blowing up on enormous queries.\nThe next three options only apply when the querier is used together with the Query Frontend:\n -querier.frontend-address Address of query frontend service, used by workers to find the frontend which will give them queries to execute.\n -querier.dns-lookup-period How often the workers will query DNS to re-check where the frontend is.\n -querier.worker-parallelism Number of simultaneous queries to process, per worker process. See note on -querier.max-concurrent\nQuerier and Ruler The ingester query API was improved over time, but defaults to the old behaviour for backwards-compatibility. For best results both of these next two flags should be set to true:\n -querier.batch-iterators This uses iterators to execute query, as opposed to fully materialising the series in memory, and fetches multiple results per loop.\n -querier.ingester-streaming Use streaming RPCs to query ingester, to reduce memory pressure in the ingester.\n -querier.iterators This is similar to -querier.batch-iterators but less efficient. If both iterators and batch-iterators are true, batch-iterators will take precedence.\n -promql.lookback-delta Time since the last sample after which a time series is considered stale and ignored by expression evaluations.\nQuery Frontend -querier.align-querier-with-step If set to true, will cause the query frontend to mutate incoming queries and align their start and end parameters to the step parameter of the query. This improves the cacheability of the query results.\n -querier.split-queries-by-day If set to true, will case the query frontend to split multi-day queries into multiple single-day queries and execute them in parallel.\n -querier.cache-results If set to true, will cause the querier to cache query results. The cache will be used to answer future, overlapping queries. The query frontend calculates extra queries required to fill gaps in the cache.\n -frontend.max-cache-freshness When caching query results, it is desirable to prevent the caching of very recent results that might still be in flux. Use this parameter to configure the age of results that should be excluded.\n -memcached.{hostname, service, timeout} Use these flags to specify the location and timeout of the memcached cluster used to cache query results.\n -redis.{endpoint, timeout} Use these flags to specify the location and timeout of the Redis service used to cache query results.\nDistributor -distributor.shard-by-all-labels In the original Cortex design, samples were sharded amongst distributors by the combination of (userid, metric name). Sharding by metric name was designed to reduce the number of ingesters you need to hit on the read path; the downside was that you could hotspot the write path.\nIn hindsight, this seems like the wrong choice: we do many orders of magnitude more writes than reads, and ingester reads are in-memory and cheap. It seems the right thing to do is to use all the labels to shard, improving load balancing and support for very high cardinality metrics.\nSet this flag to true for the new behaviour.\nUpgrade notes: As this flag also makes all queries always read from all ingesters, the upgrade path is pretty trivial; just enable the flag. When you do enable it, you\u0026rsquo;ll see a spike in the number of active series as the writes are \u0026ldquo;reshuffled\u0026rdquo; amongst the ingesters, but over the next stale period all the old series will be flushed, and you should end up with much better load balancing. With this flag enabled in the queriers, reads will always catch all the data from all ingesters.\n -distributor.extra-query-delay This is used by a component with an embedded distributor (Querier and Ruler) to control how long to wait until sending more than the minimum amount of queries needed for a successful response.\n distributor.ha-tracker.enable-for-all-users Flag to enable, for all users, handling of samples with external labels identifying replicas in an HA Prometheus setup. This defaults to false, and is technically defined in the Distributor limits.\n distributor.ha-tracker.enable Enable the distributors HA tracker so that it can accept samples from Prometheus HA replicas gracefully (requires labels). Global (for distributors), this ensures that the necessary internal data structures for the HA handling are created. The option enable-for-all-users is still needed to enable ingestion of HA samples for all users.\n Ring/HA Tracker Store The KVStore client is used by both the Ring and HA Tracker. - {ring,distributor.ha-tracker}.prefix The prefix for the keys in the store. Should end with a /. For example with a prefix of foo/, the key bar would be stored under foo/bar. - {ring,distributor.ha-tracker}.store Backend storage to use for the ring (consul, etcd, inmemory).\nConsul By default these flags are used to configure Consul used for the ring. To configure Consul for the HA tracker, prefix these flags with distributor.ha-tracker.\n consul.hostname Hostname and port of Consul. consul.acltoken ACL token used to interact with Consul. consul.client-timeout HTTP timeout when talking to Consul. consul.consistent-reads Enable consistent reads to Consul. etcd By default these flags are used to configure etcd used for the ring. To configure etcd for the HA tracker, prefix these flags with distributor.ha-tracker.\n etcd.endpoints The etcd endpoints to connect to. etcd.dial-timeout The timeout for the etcd connection. etcd.max-retries The maximum number of retries to do for failed ops. memberlist (EXPERIMENTAL) Flags for configuring KV store based on memberlist library. This feature is experimental, please don\u0026rsquo;t use it yet.\n memberlist.nodename Name of the node in memberlist cluster. Defaults to hostname. memberlist.retransmit-factor Multiplication factor used when sending out messages (factor * log(N+1)). If not set, default value is used. memberlist.join Other cluster members to join. Can be specified multiple times. memberlist.abort-if-join-fails If this node fails to join memberlist cluster, abort. memberlist.left-ingesters-timeout How long to keep LEFT ingesters in the ring. Note: this is only used for gossiping, LEFT ingesters are otherwise invisible. memberlist.leave-timeout Timeout for leaving memberlist cluster. memberlist.gossip-interval How often to gossip with other cluster members. Uses memberlist LAN defaults if 0. memberlist.gossip-nodes How many nodes to gossip with in each gossip interval. Uses memberlist LAN defaults if 0. memberlist.pullpush-interval How often to use pull/push sync. Uses memberlist LAN defaults if 0. memberlist.bind-addr IP address to listen on for gossip messages. Multiple addresses may be specified. Defaults to 0.0.0.0. memberlist.bind-port Port to listen on for gossip messages. Defaults to 7946. memberlist.packet-dial-timeout Timeout used when connecting to other nodes to send packet. memberlist.packet-write-timeout Timeout for writing \u0026lsquo;packet\u0026rsquo; data. memberlist.transport-debug Log debug transport messages. Note: global log.level must be at debug level as well. HA Tracker HA tracking has two of it\u0026rsquo;s own flags: - distributor.ha-tracker.cluster Prometheus label to look for in samples to identify a Prometheus HA cluster. (default \u0026ldquo;cluster\u0026rdquo;) - distributor.ha-tracker.replica Prometheus label to look for in samples to identify a Prometheus HA replica. (default \u0026ldquo;__replica__\u0026rdquo;)\nIt\u0026rsquo;s reasonable to assume people probably already have a cluster label, or something similar. If not, they should add one along with __replica__ via external labels in their Prometheus config. If you stick to these default values your Prometheus config could look like this (POD_NAME is an environment variable which must be set by you):\nglobal:external_labels:cluster:clustername__replica__:$POD_NAME HA Tracking looks for the two labels (which can be overwritten per user)\nIt also talks to a KVStore and has it\u0026rsquo;s own copies of the same flags used by the Distributor to connect to for the ring. - distributor.ha-tracker.failover-timeout If we don\u0026rsquo;t receive any samples from the accepted replica for a cluster in this amount of time we will failover to the next replica we receive a sample from. This value must be greater than the update timeout (default 30s) - distributor.ha-tracker.store Backend storage to use for the ring (consul, etcd, inmemory). (default \u0026ldquo;consul\u0026rdquo;) - distributor.ha-tracker.update-timeout Update the timestamp in the KV store for a given cluster/replica only after this amount of time has passed since the current stored timestamp. (default 15s)\nIngester -ingester.max-chunk-age The maximum duration of a timeseries chunk in memory. If a timeseries runs for longer than this the current chunk will be flushed to the store and a new chunk created. (default 12h)\n -ingester.max-chunk-idle If a series doesn\u0026rsquo;t receive a sample for this duration, it is flushed and removed from memory.\n -ingester.max-stale-chunk-idle If a series receives a staleness marker, then we wait for this duration to get another sample before we close and flush this series, removing it from memory. You want it to be at least 2x the scrape interval as you don\u0026rsquo;t want a single failed scrape to cause a chunk flush.\n -ingester.chunk-age-jitter To reduce load on the database exactly 12 hours after starting, the age limit is reduced by a varying amount up to this. (default 20m)\n -ingester.spread-flushes Makes the ingester flush each timeseries at a specific point in the max-chunk-age cycle. This means multiple replicas of a chunk are very likely to contain the same contents which cuts chunk storage space by up to 66%. Set -ingester.chunk-age-jitter to 0 when using this option. If a chunk cache is configured (via -memcached.hostname) then duplicate chunk writes are skipped which cuts write IOPs.\n -ingester.join-after How long to wait in PENDING state during the hand-over process. (default 0s)\n -ingester.max-transfer-retries How many times a LEAVING ingester tries to find a PENDING ingester during the hand-over process. Each attempt takes a second or so. Negative value or zero disables hand-over process completely. (default 10)\n -ingester.normalise-tokens Write out \u0026ldquo;normalised\u0026rdquo; tokens to the ring. Normalised tokens consume less memory to encode and decode; as the ring is unmarshalled regularly, this significantly reduces memory usage of anything that watches the ring.\nBefore enabling, rollout a version of Cortex that supports normalised token for all jobs that interact with the ring, then rollout with this flag set to true on the ingesters. The new ring code can still read and write the old ring format, so is backwards compatible.\n -ingester.chunk-encoding Pick one of the encoding formats for timeseries data, which have different performance characteristics. Bigchunk uses the Prometheus V2 code, and expands in memory to arbitrary length. Varbit, Delta and DoubleDelta use Prometheus V1 code, and are fixed at 1K per chunk. Defaults to DoubleDelta, but we recommend Bigchunk.\n -store.bigchunk-size-cap-bytes When using bigchunks, start a new bigchunk and flush the old one if the old one reaches this size. Use this setting to limit memory growth of ingesters with a lot of timeseries that last for days.\n -ingester-client.expected-timeseries When push requests arrive, pre-allocate this many slots to decode them. Tune this setting to reduce memory allocations and garbage. This should match the max_samples_per_send in your queue_config for Prometheus.\n -ingester-client.expected-samples-per-series When push requests arrive, pre-allocate this many slots to decode them. Tune this setting to reduce memory allocations and garbage. Under normal conditions, Prometheus scrapes should arrive with one sample per series.\n -ingester-client.expected-labels When push requests arrive, pre-allocate this many slots to decode them. Tune this setting to reduce memory allocations and garbage. The optimum value will depend on how many labels are sent with your timeseries samples.\n -store.chunk-cache-stubs Where you don\u0026rsquo;t want to cache every chunk written by ingesters, but you do want to take advantage of chunk write deduplication, this option will make ingesters write a placeholder to the cache for each chunk. Make sure you configure ingesters with a different cache to queriers, which need the whole value.\nIngester, Distributor \u0026amp; Querier limits. Cortex implements various limits on the requests it can process, in order to prevent a single tenant overwhelming the cluster. There are various default global limits which apply to all tenants which can be set on the command line. These limits can also be overridden on a per-tenant basis, using a configuration file. Specify the filename for the override configuration file using the -limits.per-user-override-config=\u0026lt;filename\u0026gt; flag. The override file will be re-read every 10 seconds by default - this can also be controlled using the -limits.per-user-override-period=10s flag.\nThe override file should be in YAML format and contain a single overrides field, which itself is a map of tenant ID (same values as passed in the X-Scope-OrgID header) to the various limits. An example overrides.yml could look like:\noverrides:tenant1:ingestion_rate:10000max_series_per_metric:100000max_series_per_query:100000tenant2:max_samples_per_query:1000000max_series_per_metric:100000max_series_per_query:100000 When running Cortex on Kubernetes, store this file in a config map and mount it in each services\u0026rsquo; containers. When changing the values there is no need to restart the services, unless otherwise specified.\nValid fields are (with their corresponding flags for default values):\n ingestion_rate / -distributor.ingestion-rate-limit ingestion_burst_size / -distributor.ingestion-burst-size The per-tenant rate limit (and burst size), in samples per second. Enforced on a per distributor basis, actual effective rate limit will be N times higher, where N is the number of distributor replicas.\nNB Limits are reset every -distributor.limiter-reload-period, as such if you set a very high burst limit it will never be hit.\n max_label_name_length / -validation.max-length-label-name max_label_value_length / -validation.max-length-label-value max_label_names_per_series / -validation.max-label-names-per-series Also enforced by the distributor, limits on the on length of labels and their values, and the total number of labels allowed per series.\n reject_old_samples / -validation.reject-old-samples reject_old_samples_max_age / -validation.reject-old-samples.max-age creation_grace_period / -validation.create-grace-period Also enforce by the distributor, limits on how far in the past (and future) timestamps that we accept can be.\n max_series_per_user / -ingester.max-series-per-user max_series_per_metric / -ingester.max-series-per-metric Enforced by the ingesters; limits the number of active series a user (or a given metric) can have. When running with -distributor.shard-by-all-labels=false (the default), this limit will enforce the maximum number of series a metric can have \u0026lsquo;globally\u0026rsquo;, as all series for a single metric will be sent to the same replication set of ingesters. This is not the case when running with -distributor.shard-by-all-labels=true, so the actual limit will be N/RF times higher, where N is number of ingester replicas and RF is configured replication factor.\nAn active series is a series to which a sample has been written in the last -ingester.max-chunk-idle duration, which defaults to 5 minutes.\n max_global_series_per_user / -ingester.max-global-series-per-user max_global_series_per_metric / -ingester.max-global-series-per-metric Like max_series_per_user and max_series_per_metric, but the limit is enforced across the cluster. Each ingester is configured with a local limit based on the replication factor, the -distributor.shard-by-all-labels setting and the current number of healthy ingesters, and is kept updated whenever the number of ingesters change.\nRequires -distributor.replication-factor and -distributor.shard-by-all-labels set for the ingesters too.\n max_series_per_query / -ingester.max-series-per-query max_samples_per_query / -ingester.max-samples-per-query Limits on the number of timeseries and samples returns by a single ingester during a query.\nStorage s3.force-path-style Set this to true to force the request to use path-style addressing (http://s3.amazonaws.com/BUCKET/KEY). By default, the S3 client will use virtual hosted bucket addressing when possible (http://BUCKET.s3.amazonaws.com/KEY).\n","excerpt":"General Notes Cortex has evolved over several years, and the command-line options sometimes reflect …","ref":"/docs/configuration/arguments.md/","title":"Cortex Arguments"},{"body":" \nCortex provides horizontally scalable, highly available, multi-tenant, long term storage for Prometheus.\n Horizontally scalable: Cortex can run across multiple machines in a cluster, exceeding the throughput and storage of a single machine. This enables you to send the metrics from multiple Prometheus servers to a single Cortex cluster and run \u0026ldquo;globally aggregated\u0026rdquo; queries across all data in a single place. Highly available: When run in a cluster, Cortex can replicate data between machines. This allows you to survive machine failure without gaps in your graphs. Multi-tenant: Cortex can isolate data and queries from multiple different independent Prometheus sources in a single cluster, allowing untrusted parties to share the same cluster. Long term storage: Cortex supports Amazon DynamoDB, Google Bigtable, Cassandra, S3 and GCS for long term storage of metric data. This allows you to durably store data for longer than the lifetime of any single machine, and use this data for long term capacity planning. Cortex is a CNCF sandbox project used in several production systems including Weave Cloud and Grafana Cloud. Cortex is a primarily used as a remote write destination for Prometheus, with a Prometheus-compatible query API.\nDocumentation Read the getting started guide if you\u0026rsquo;re new to the project. Before deploying Cortex with a permanent storage backend you should read:\n An overview of Cortex\u0026rsquo;s architecture A general guide to running Cortex Information regarding configuring Cortex For a guide to contributing to Cortex, see the contributor guidelines.\nFurther reading To learn more about Cortex, consult the following documents \u0026amp; talks:\n May 2019 KubeCon talks; \u0026ldquo;Cortex: Intro\u0026rdquo; (video, slides, blog post) and \u0026ldquo;Cortex: Deep Dive\u0026rdquo; (video, slides) Feb 2019 blog post \u0026amp; podcast; \u0026ldquo;Prometheus Scalability with Bryan Boreham\u0026rdquo; (podcast) Feb 2019 blog post; \u0026ldquo;How Aspen Mesh Runs Cortex in Production\u0026ldquo; Dec 2018 KubeCon talk; \u0026ldquo;Cortex: Infinitely Scalable Prometheus\u0026rdquo; (video, slides) Dec 2018 CNCF blog post; \u0026ldquo;Cortex: a multi-tenant, horizontally scalable Prometheus-as-a-Service\u0026ldquo; Nov 2018 CloudNative London meetup talk; \u0026ldquo;Cortex: Horizontally Scalable, Highly Available Prometheus\u0026rdquo; (slides) Nov 2018 CNCF TOC Presentation; \u0026ldquo;Horizontally Scalable, Multi-tenant Prometheus\u0026rdquo; (slides) Sept 2018 blog post; \u0026ldquo;What is Cortex?\u0026ldquo; Aug 2018 PromCon panel; \u0026ldquo;Prometheus Long-Term Storage Approaches\u0026rdquo; (video) Jul 2018 design doc; \u0026ldquo;Cortex Query Optimisations\u0026ldquo; Aug 2017 PromCon talk; \u0026ldquo;Cortex: Prometheus as a Service, One Year On\u0026rdquo; (videos, slides, write up part 1, part 2, part 3) Jun 2017 Prometheus London meetup talk; \u0026ldquo;Cortex: open-source, horizontally-scalable, distributed Prometheus\u0026rdquo; (video) Dec 2016 KubeCon talk; \u0026ldquo;Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service\u0026rdquo; (video, slides) Aug 2016 PromCon talk; \u0026ldquo;Project Frankenstein: Multitenant, Scale-Out Prometheus\u0026rdquo;: (video, slides) Jun 2016 design document; \u0026ldquo;Project Frankenstein: A Multi Tenant, Scale Out Prometheus\u0026ldquo; Getting Help If you have any questions about Cortex:\n Ask a question on the Cortex Slack channel. To invite yourself to the CNCF Slack, visit http://slack.cncf.io/. File an issue. Send an email to cortex-users@lists.cncf.io Your feedback is always welcome.\nHosted Cortex (Prometheus as a service) There are several commercial services where you can use Cortex on-demand:\nWeave Cloud Weave Cloud from Weaveworks lets you deploy, manage, and monitor container-based applications. Sign up at https://cloud.weave.works and follow the instructions there. Additional help can also be found in the Weave Cloud documentation.\nInstrumenting Your App: Best Practices\nGrafana Cloud To use Cortex as part of Grafana Cloud, sign up for Grafana Cloud by clicking \u0026ldquo;Log In\u0026rdquo; in the top right and then \u0026ldquo;Sign Up Now\u0026rdquo;. Cortex is included as part of the Starter and Basic Hosted Grafana plans.\n","excerpt":"Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for …","ref":"/docs/","title":"Documentation"},{"body":" This document assumes you have read the architecture document.\nIn addition to the general advice in this document, please see these platform-specific notes:\n AWS Planning Tenants If you will run Cortex as a multi-tenant system, you need to give each tenant a unique ID - this can be any string. Managing tenants and allocating IDs must be done outside of Cortex. You must also configure Authentication and Authorisation.\nStorage Cortex requires a scalable storage back-end. Commercial cloud options are DynamoDB and Bigtable: the advantage is you don\u0026rsquo;t have to know how to manage them, but the downside is they have specific costs. Alternatively you can choose Cassandra, which you will have to install and manage.\nComponents Every Cortex installation will need Distributor, Ingester and Querier. Alertmanager, Ruler and Query-frontend are optional.\nOther dependencies Cortex needs a KV store to track sharding of data between processes. This can be either Etcd or Consul.\nIf you want to configure recording and alerting rules (i.e. if you will run the Ruler and Alertmanager components) then a Postgres database is required to store configs.\nMemcached is not essential but highly recommended.\nIngester replication factor The standard replication factor is three, so that we can drop one replica and be unconcerned, as we still have two copies of the data left for redundancy. This is configurable: you can run with more redundancy or less, depending on your risk appetite.\nSchema Schema periodic table The periodic table from argument (-dynamodb.periodic-table.from=\u0026lt;date\u0026gt; if using command line flags, the from field for the first schema entry if using YAML) should be set to the date the oldest metrics you will be sending to Cortex. Generally that means set it to the date you are first deploying this instance. If you use an example date from years ago table-manager will create hundreds of tables. You can also avoid creating too many tables by setting a reasonable retention in the table-manager (-table-manager.retention-period=\u0026lt;duration\u0026gt;).\nSchema version Choose schema version 9 in most cases; version 10 if you expect hundreds of thousands of timeseries under a single name. Anything older than v9 is much less efficient.\nChunk encoding Standard choice would be Bigchunk, which is the most flexible chunk encoding. You may get better compression from Varbit, if many of your timeseries do not change value from one day to the next.\nSizing You will want to estimate how many nodes are required, how many of each component to run, and how much storage space will be required. In practice, these will vary greatly depending on the metrics being sent to Cortex.\nSome key parameters are:\n The number of active series. If you have Prometheus already you can query prometheus_tsdb_head_series to see this number. Sampling rate, e.g. a new sample for each series every 15 seconds. Multiply this by the number of active series to get the total rate at which samples will arrive at Cortex. The rate at which series are added and removed. This can be very high if you monitor objects that come and go - for example if you run thousands of batch jobs lasting a minute or so and capture metrics with a unique ID for each one. Read how to analyse this on Prometheus How compressible the time-series data are. If a metric stays at the same value constantly, then Cortex can compress it very well, so 12 hours of data sampled every 15 seconds would be around 2KB. On the other hand if the value jumps around a lot it might take 10KB. There are not currently any tools available to analyse this. How long you want to retain data for, e.g. 1 month or 2 years. Other parameters which can become important if you have particularly high values:\n Number of different series under one metric name. Number of labels per series. Rate and complexity of queries. Now, some rules of thumb:\n Each million series in an ingester takes 15GB of RAM. Total number of series in ingesters is number of active series times the replication factor. This is with the default of 12-hour chunks - RAM required will reduce if you set -ingester.max-chunk-age lower (trading off more back-end database IO) Each million series (including churn) consumes 15GB of chunk storage and 4GB of index, per day (so multiply by the retention period). Each 100,000 samples/sec arriving takes 1 CPU in distributors. Distributors don\u0026rsquo;t need much RAM. If you turn on compression between distributors and ingesters (for example to save on inter-zone bandwidth charges at AWS) they will use significantly more CPU (approx 100% more for distributor and 50% more for ingester).\nCaching Cortex can retain data in-process or in Memcached to speed up various queries by caching:\n individual chunks index lookups for one label on one day the results of a whole query You should always include Memcached in your Cortex install so results from one process can be re-used by another. In-process caching can cut fetch times slightly and reduce the load on Memcached.\nIngesters can also be configured to use Memcached to avoid re-writing index and chunk data which has already been stored in the back-end database. Again, highly recommended.\nOrchestration Because Cortex is designed to run multiple instances of each component (ingester, querier, etc.), you probably want to automate the placement and shepherding of these instances. Most users choose Kubernetes to do this, but this is not mandatory.\nConfiguration Resource requests If using Kubernetes, each container should specify resource requests so that the scheduler can place them on a node with sufficient capacity.\nFor example an ingester might request:\n resources: requests: cpu: 4 memory: 10Gi The specific values here should be adjusted based on your own experiences running Cortex - they are very dependent on rate of data arriving and other factors such as series churn.\nTake extra care with ingesters Ingesters hold hours of timeseries data in memory; you can configure Cortex to replicate the data but you should take steps to avoid losing all replicas at once: - Don\u0026rsquo;t run multiple ingesters on the same node. - Don\u0026rsquo;t run ingesters on preemptible/spot nodes. - Spread out ingesters across racks / availability zones / whatever applies in your datacenters.\nYou can ask Kubernetes to avoid running on the same node like this:\n affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: name operator: In values: - ingester topologyKey: \u0026quot;kubernetes.io/hostname\u0026quot; Give plenty of time for an ingester to hand over or flush data to store when shutting down; for Kubernetes this looks like:\n terminationGracePeriodSeconds: 2400 Ask Kubernetes to limit rolling updates to one ingester at a time, and signal the old one to stop before the new one is ready:\n strategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 Ingesters provide an http hook to signal readiness when all is well; this is valuable because it stops a rolling update at the first problem:\n readinessProbe: httpGet: path: /ready port: 80 We do not recommend configuring a liveness probe on ingesters - killing them is a last resort and should not be left to a machine.\nOptimising Optimising Storage These ingester options reduce the chance of storing multiple copies of the same data:\n -ingester.spread-flushes=true -ingester.chunk-age-jitter=0 Add a chunk cache via -memcached.hostname to allow writes to be de-duplicated.\nAs recommended under Chunk encoding, use Bigchunk:\n -ingester.chunk-encoding=3 # bigchunk ","excerpt":"This document assumes you have read the architecture document.\nIn addition to the general advice in …","ref":"/docs/guides/running.md/","title":"Running Cortex in Production"},{"body":"All Cortex components take the tenant ID from a header X-Scope-OrgID on each request. They trust this value completely: if you need to protect your Cortex installation from accidental or malicious calls then you must add an additional layer of protection.\nTypically this means you run Cortex behind a reverse proxy, and ensure that all callers, both machines sending data over the remote_write interface and humans sending queries from GUIs, supply credentials which identify them and confirm they are authorised.\nWhen configuring the remote_write API in Prometheus there is no way to add extra headers. The user and password fields of http Basic auth, or Bearer token, can be used to convey tenant ID and/or credentials.\n","excerpt":"All Cortex components take the tenant ID from a header X-Scope-OrgID on each request. They trust …","ref":"/docs/guides/auth.md/","title":"Authentication and Authorisation"},{"body":"You can use the Cortex query frontend with any Prometheus-API compatible service, including Prometheus and Thanos. Use this config file to get the benefits of query parallelisation and caching.\n# Disable the requirement that every request to Cortex has a# X-Scope-OrgID header. `fake` will be substituted in instead.auth_enabled:false# We only want to run the query-frontend module.target:query-frontend# We don\u0026#39;t want the usual /api/prom prefix.http_prefix:server:http_listen_port:9091frontend:log_queries_longer_than:1ssplit_queries_by_day:truealign_queries_with_step:truecache_results:truecompress_responses:trueresults_cache:max_freshness:1mcache:# We\u0026#39;re going to use the in-process \u0026#34;FIFO\u0026#34; cache, but you can enable# memcached below.enable_fifocache:truefifocache:size:1024validity:24h# If you want to use a memcached cluster, configure a headless service# in Kubernetes and Cortex will discover the individual instances using# a SRV DNS query. Cortex will then do client-side hashing to spread# the load evenly.# memcached:# memcached_client:# host: memcached.default.svc.cluster.local# service: memcached# consistent_hash: true","excerpt":"You can use the Cortex query frontend with any Prometheus-API compatible service, including …","ref":"/docs/configuration/prometheus-frontend.md/","title":"Prometheus Frontend"},{"body":" Cortex can be runs as a single binary or as multiple independent microservices. The single-binary mode is easier to deploy and is aimed mainly at users wanting to try out Cortex or develop on it. The microservices mode is intended for production usage, as it allows you to independently scale different services and isolate failures. This document will focus on single-process Cortex. See the architecture doc For more information about the microservices.\nSeparately from single process vs microservices decision, Cortex can be configured to use local storage or cloud storage (DynamoDB, Bigtable, Cassandra, S3, GCS etc). This document will focus on using local storage. Local storage is explicitly not production ready at this time. Cortex can also make use of external memcacheds for caching and although these are not mandatory, they should be used in production.\nSingle instance, single process For simplicity \u0026amp; to get started, we\u0026rsquo;ll run it as a single process with no dependencies:\n$ go build ./cmd/cortex $ ./cortex -config.file=./docs/single-process-config.yaml This starts a single Cortex node storing chunks and index to your local filesystem in /tmp/cortex. It is not intended for production use.\nAdd the following to your Prometheus config (documentation/examples/prometheus.yml in Prometheus repo):\nremote_write:-url:http://localhost:9009/api/prom/push And start Prometheus with that config file:\n$ git clone https://github.com/prometheus/prometheus $ cd prometheus $ go build ./cmd/prometheus $ ./prometheus --config.file=./documentation/examples/prometheus.yml Your Prometheus instance will now start pushing data to Cortex. To query that data, start a Grafana instance:\n$ docker run -d --name=grafana -p 3000:3000 grafana/grafana In the Grafana UI (username/password admin/admin), add a Prometheus datasource for Cortex (http://host.docker.internal:9009/api/prom).\nTo clean up: press CTRL-C in both terminals (for Cortex and Promrtheus) and run docker rm -f grafana.\nHorizontally scale out Next we\u0026rsquo;re going to show how you can run a scale out Cortex cluster using Docker. We\u0026rsquo;ll need: - A built Cortex image. - A Docker network to put these containers on so they can resolve each other by name. - A single node Consul instance to coordinate the Cortex cluster.\n$ make ./cmd/cortex/.uptodate $ docker network create cortex $ docker run -d --name=consul --network=cortex -e CONSUL_BIND_INTERFACE=eth0 consul Next we\u0026rsquo;ll run a couple of Cortex instances pointed at that Consul. You\u0026rsquo;ll note with Cortex configuration can be specified in either a config file or overridden on the command line. See the arguments documentation for more information about Cortex configuration options.\n$ docker run -d --name=cortex1 --network=cortex \\ -v $(pwd)/docs/single-process-config.yaml:/etc/single-process-config.yaml \\ -p 9001:9009 \\ quay.io/cortexproject/cortex \\ -config.file=/etc/single-process-config.yaml \\ -ring.store=consul \\ -consul.hostname=consul:8500 $ docker run -d --name=cortex2 --network=cortex \\ -v $(pwd)/docs/single-process-config.yaml:/etc/single-process-config.yaml \\ -p 9002:9009 \\ quay.io/cortexproject/cortex \\ -config.file=/etc/single-process-config.yaml \\ -ring.store=consul \\ -consul.hostname=consul:8500 If you go to http://localhost:9001/ring (or http://localhost:9002/ring) you should see both Cortex nodes join the ring.\nTo demonstrate the correct operation of Cortex clustering, we\u0026rsquo;ll send samples to one of the instances and queries to another. In production, you\u0026rsquo;d want to load balance both pushes and queries evenly among all the nodes.\nPoint Prometheus at the first:\nremote_write:-url:http://localhost:9001/api/prom/push$ ./prometheus --config.file=./documentation/examples/prometheus.yml And Grafana at the second:\n$ docker run -d --name=grafana --network=cortex -p 3000:3000 grafana/grafana In the Grafana UI (username/password admin/admin), add a Prometheus datasource for Cortex (http://cortex2:9009/api/prom).\nTo clean up: CTRL-C the Prometheus process and run:\n$ docker rm -f cortex1 cortex2 consul grafana $ docker network remove cortex High availability with replication In this last demo we\u0026rsquo;ll show how Cortex can replicate data among three nodes, and demonstrate Cortex can tolerate a node failure without affecting reads and writes.\nFirst, create a network and run a new Consul and Grafana:\n$ docker network create cortex $ docker run -d --name=consul --network=cortex -e CONSUL_BIND_INTERFACE=eth0 consul $ docker run -d --name=grafana --network=cortex -p 3000:3000 grafana/grafana Finally, launch 3 Cortex nodes with replication factor 3:\n$ docker run -d --name=cortex1 --network=cortex \\ -v $(pwd)/docs/single-process-config.yaml:/etc/single-process-config.yaml \\ -p 9001:9009 \\ quay.io/cortexproject/cortex \\ -config.file=/etc/single-process-config.yaml \\ -ring.store=consul \\ -consul.hostname=consul:8500 \\ -distributor.replication-factor=3 $ docker run -d --name=cortex2 --network=cortex \\ -v $(pwd)/docs/single-process-config.yaml:/etc/single-process-config.yaml \\ -p 9002:9009 \\ quay.io/cortexproject/cortex \\ -config.file=/etc/single-process-config.yaml \\ -ring.store=consul \\ -consul.hostname=consul:8500 \\ -distributor.replication-factor=3 $ docker run -d --name=cortex3 --network=cortex \\ -v $(pwd)/docs/single-process-config.yaml:/etc/single-process-config.yaml \\ -p 9003:9009 \\ quay.io/cortexproject/cortex \\ -config.file=/etc/single-process-config.yaml \\ -ring.store=consul \\ -consul.hostname=consul:8500 \\ -distributor.replication-factor=3 Configure Prometheus to send data to the first replica:\nremote_write:-url:http://localhost:9001/api/prom/push$ ./prometheus --config.file=./documentation/examples/prometheus.yml In Grafana, add a datasource for the 3rd Cortex replica (http://cortex3:9009/api/prom) and verify the same data appears in both Prometheus and Cortex.\nTo show that Cortex can tolerate a node failure, hard kill one of the Cortex replicas:\n$ docker rm -f cortex2 You should see writes and queries continue to work without error.\nTo clean up: CTRL-C the Prometheus process and run:\n$ docker rm -f cortex1 cortex2 cortex3 consul grafana $ docker network remove cortex ","excerpt":"Cortex can be runs as a single binary or as multiple independent microservices. The single-binary …","ref":"/docs/getting_started/","title":"Getting Started"},{"body":" [this is a work in progress]\nSee also the Running in Production document.\nCredentials You can supply credentials to Cortex by setting environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY (and AWS_SESSION_TOKEN if you use MFA), or use a short-term token solution such as kiam.\nShould I use S3 or DynamoDB ? Note that the choices are: \u0026ldquo;chunks\u0026rdquo; of timeseries data in S3 and index in DynamoDB, or everything in DynamoDB. Using just S3 is not an option.\nBroadly S3 is much more expensive to read and write, while DynamoDB is much more expensive to store over months. S3 charges differently, so the cross-over will depend on the size of your chunks, and how long you keep them. Very roughly: for 3KB chunks if you keep them longer than 8 months then S3 is cheaper.\nDynamoDB capacity provisioning By default, the Cortex Tablemanager will provision tables with 1,000 units of write capacity and 300 read - these numbers are chosen to be high enough that most trial installations won\u0026rsquo;t see a bottleneck on storage, but do note that that AWS will charge you approximately $60 per day for this capacity.\nTo match your costs to requirements, observe the actual capacity utilisation via CloudWatch or Prometheus metrics, then adjust the Tablemanager provision via command-line options -dynamodb.chunk-table.write-throughput, read-throughput and similar with .periodic-table which controls the index table.\nTablemanager can even adjust the capacity dynamically, by watching metrics for DynamoDB throttling and ingester queue length. Here is an example set of command-line parameters from a fairly modest install:\n -target=table-manager -metrics.url=http://prometheus.monitoring.svc.cluster.local./api/prom/ -metrics.target-queue-length=100000 -dynamodb.url=dynamodb://us-east-1/ -dynamodb.use-periodic-tables=true -dynamodb.periodic-table.prefix=cortex_index_ -dynamodb.periodic-table.from=2019-05-02 -dynamodb.periodic-table.write-throughput=1000 -dynamodb.periodic-table.write-throughput.scale.enabled=true -dynamodb.periodic-table.write-throughput.scale.min-capacity=200 -dynamodb.periodic-table.write-throughput.scale.max-capacity=2000 -dynamodb.periodic-table.write-throughput.scale.out-cooldown=300 # 5 minutes between scale ups -dynamodb.periodic-table.inactive-enable-ondemand-throughput-mode=true -dynamodb.periodic-table.read-throughput=300 -dynamodb.periodic-table.tag=product_area=cortex -dynamodb.chunk-table.from=2019-05-02 -dynamodb.chunk-table.prefix=cortex_data_ -dynamodb.chunk-table.write-throughput=800 -dynamodb.chunk-table.write-throughput.scale.enabled=true -dynamodb.chunk-table.write-throughput.scale.min-capacity=200 -dynamodb.chunk-table.write-throughput.scale.max-capacity=1000 -dynamodb.chunk-table.write-throughput.scale.out-cooldown=300 # 5 minutes between scale ups -dynamodb.chunk-table.inactive-enable-ondemand-throughput-mode=true -dynamodb.chunk-table.read-throughput=300 -dynamodb.chunk-table.tag=product_area=cortex Several things to note here:\n -metrics.url points at a Prometheus server running within the cluster, scraping Cortex. Currently it is not possible to use Cortex itself as the target here. -metrics.target-queue-length: when the ingester queue is below this level, Tablemanager will not scale up. When the queue is growing above this level, Tablemanager will scale up whatever table is being throttled. The plain throughput values are used when the tables are first created. Scale-up to any level up to this value will be very quick, but if you go higher than this initial value, AWS may take tens of minutes to finish scaling. In the config above they are set. ondemand-throughput-mode tells AWS to charge for what you use, as opposed to continuous provisioning. This mode is cost-effective for older data, which is never written and only read sporadically. ","excerpt":"[this is a work in progress]\nSee also the Running in Production document.\nCredentials You can supply …","ref":"/docs/guides/aws-specific.md/","title":"Running Cortex at AWS"},{"body":"Configuration for running Cortex in single-process mode. This should not be used in production. It is only for getting started and development.\n# Disable the requirement that every request to Cortex has a# X-Scope-OrgID header. `fake` will be substituted in instead.auth_enabled:falseserver:http_listen_port:9009# Configure the server to allow messages up to 100MB.grpc_server_max_recv_msg_size:104857600grpc_server_max_send_msg_size:104857600grpc_server_max_concurrent_streams:1000distributor:shard_by_all_labels:truepool:health_check_ingesters:trueingester_client:grpc_client_config:# Configure the client to allow messages up to 100MB.max_recv_msg_size:104857600max_send_msg_size:104857600use_gzip_compression:trueingester:#chunk_idle_period: 15mlifecycler:# The address to advertise for this ingester. Will be autodiscovered by# looking up address on eth0 or en0; can be specified if this fails.# address: 127.0.0.1# We want to start immediately and flush on shutdown.join_after:0claim_on_rollout:falsefinal_sleep:0snum_tokens:512# Use an in memory ring store, so we don\u0026#39;t need to launch a Consul.ring:kvstore:store:inmemoryreplication_factor:1# Use local storage - BoltDB for the index, and the filesystem# for the chunks.schema:configs:-from:2019-07-29store:boltdbobject_store:filesystemschema:v10index:prefix:index_period:168hstorage:boltdb:directory:/tmp/cortex/indexfilesystem:directory:/tmp/cortex/chunks","excerpt":"Configuration for running Cortex in single-process mode. This should not be used in production. It …","ref":"/docs/configuration/single-process-config.md/","title":"Single-process"},{"body":" Context You can have more than a single Prometheus monitoring and ingesting the same metrics for redundancy. Cortex already does replication for redundancy and it doesn\u0026rsquo;t make sense to ingest the same data twice. So in Cortex, we made sure we can dedupe the data we receive from HA Pairs of Prometheus. We do this via the following:\nAssume that there are two teams, each running their own Prometheus, monitoring different services. Let\u0026rsquo;s call the Prometheis T1 and T2. Now, if the teams are running HA pairs, let\u0026rsquo;s call the individual Prometheis, T1.a, T1.b and T2.a and T2.b.\nIn Cortex we make sure we only ingest from one of T1.a and T1.b, and only from one of T2.a and T2.b. We do this by electing a leader replica for each cluster of Prometheus. For example, in the case of T1, let it be T1.a. As long as T1.a is the leader, we drop the samples sent by T1.b. And if Cortex sees no new samples from T1.a for a short period (30s by default), it\u0026rsquo;ll switch the leader to be T1.b.\nThis means if T1.a goes down for a few minutes Cortex\u0026rsquo;s HA sample handling will have switched and elected T1.b as the leader. This failover timeout is what enables us to only accept samples from a single replica at a time, but ensure we don\u0026rsquo;t drop too much data in case of issues. Note that with the default scrape period of 15s, and the default timeouts in Cortex, in most cases you\u0026rsquo;ll only lose a single scrape of data in the case of a leader election failover. For any rate queries the rate window should be at least 4x the scrape period to account for any of these failover scenarios, for example with the default scrape period of 15s then you should calculate rates over at least 1m periods.\nNow we do the same leader election process T2.\nConfig Client Side So for Cortex to achieve this, we need 2 identifiers for each process, one identifier for the cluster (T1 or T2, etc) and one identifier to identify the replica in the cluster (a or b). The easiest way to do with is by setting external labels, ideally cluster and replica (note the default is __replica__). For example:\ncluster: prom-team1 replica: replica1 (or pod-name) and\ncluster: prom-team1 replica: replica2 Note: These are external labels and have nothing to do with remote_write config.\nThese two label names are configurable per-tenant within Cortex, and should be set to something sensible. For example, cluster label is already used by some workloads, and you should set the label to be something else but uniquely identifies the cluster. Good examples for this label-name would be team, cluster, prometheus, etc.\nThe replica label should be set so that the value for each prometheus is unique in that cluster. Note: Cortex drops this label when ingesting data, but preserves the cluster label. This way, your timeseries won\u0026rsquo;t change when replicas change.\nServer Side To enable handling of samples, see the distributor flags having ha-tracker in them.\n","excerpt":"Context You can have more than a single Prometheus monitoring and ingesting the same metrics for …","ref":"/docs/guides/ha-pair-handling.md/","title":"Config for sending HA Pairs data to Cortex"},{"body":" Cortex consists of multiple horizontally scalable microservices. Each microservice uses the most appropriate technique for horizontal scaling; most are stateless and can handle requests for any users while some (namely the ingesters) are semi-stateful and depend on consistent hashing. This document provides a basic overview of Cortex\u0026rsquo;s architecture.\nThe role of Prometheus Prometheus instances scrape samples from various targets and then push them to Cortex (using Prometheus\u0026rsquo; remote write API). That remote write API emits batched Snappy-compressed Protocol Buffer messages inside the body of an HTTP PUT request.\nCortex requires that each HTTP request bear a header specifying a tenant ID for the request. Request authentication and authorization are handled by an external reverse proxy.\nIncoming samples (writes from Prometheus) are handled by the distributor while incoming reads (PromQL queries) are handled by the query frontend.\nServices Cortex has a service-based architecture, in which the overall system is split up into a variety of components that perform specific tasks and run separately (and potentially in parallel).\nCortex is, for the most part, a shared-nothing system. Each layer of the system can run multiple instances of each component and they don\u0026rsquo;t coordinate or communicate with each other within that layer.\nDistributor The distributor service is responsible for handling samples written by Prometheus. It\u0026rsquo;s essentially the \u0026ldquo;first stop\u0026rdquo; in the write path for Prometheus samples. Once the distributor receives samples from Prometheus, it splits them into batches and then sends them to multiple ingesters in parallel.\nDistributors communicate with ingesters via gRPC. They are stateless and can be scaled up and down as needed.\nIf the HA Tracker is enabled, the Distributor will deduplicate incoming samples that contain both a cluster and replica label. It talks to a KVStore to store state about which replica per cluster it\u0026rsquo;s accepting samples from for a given user ID. Samples with one or neither of these labels will be accepted by default.\nHashing Distributors use consistent hashing, in conjunction with the (configurable) replication factor, to determine which instances of the ingester service receive each sample.\nThe hash itself is based on one of two schemes:\n The metric name and tenant ID All the series labels and tenant ID The trade-off associated with the latter is that writes are more balanced but they must involve every ingester in each query.\n This hashing scheme was originally chosen to reduce the number of required ingesters on the query path. The trade-off, however, is that the write load on the ingesters is less even.\n The hash ring A consistent hash ring is stored in Consul as a single key-value pair, with the ring data structure also encoded as a Protobuf message. The consistent hash ring consists of a list of tokens and ingesters. Hashed values are looked up in the ring; the replication set is built for the closest unique ingesters by token. One of the benefits of this system is that adding and remove ingesters results in only 1/N of the series being moved (where N is the number of ingesters).\nQuorum consistency All distributors share access to the same hash ring, which means that write requests can be sent to any distributor.\nTo ensure consistent query results, Cortex uses Dynamo-style quorum consistency on reads and writes. This means that the distributor will wait for a positive response of at least one half plus one of the ingesters to send the sample to before responding to the user.\nLoad balancing across distributors We recommend randomly load balancing write requests across distributor instances, ideally by running the distributors as a Kubernetes Service.\nIngester The ingester service is responsible for writing sample data to long-term storage backends (DynamoDB, S3, Cassandra, etc.).\nSamples from each timeseries are built up in \u0026ldquo;chunks\u0026rdquo; in memory inside each ingester, then flushed to the chunk store. By default each chunk is up to 12 hours long.\nIf an ingester process crashes or exits abruptly, all the data that has not yet been flushed will be lost. Cortex is usually configured to hold multiple (typically 3) replicas of each timeseries to mitigate this risk.\nA hand-over process manages the state when ingesters are added, removed or replaced.\nWrite de-amplification Ingesters store the last 12 hours worth of samples in order to perform write de-amplification, i.e. batching and compressing samples for the same series and flushing them out to the chunk store. Under normal operations, there should be many orders of magnitude fewer operations per second (OPS) worth of writes to the chunk store than to the ingesters.\nWrite de-amplification is the main source of Cortex\u0026rsquo;s low total cost of ownership (TCO).\nRuler Ruler executes PromQL queries for Recording Rules and Alerts. Ruler is configured from a database, so that different rules can be set for each tenant.\nAll the rules for one instance are executed as a group, then rescheduled to be executed again 15 seconds later. Execution is done by a \u0026lsquo;worker\u0026rsquo; running on a goroutine - if you don\u0026rsquo;t have enough workers then the ruler will lag behind.\nRuler can be scaled horizontally.\nAlertManager AlertManager is responsible for accepting alert notifications from Ruler, grouping them, and passing on to a notification channel such as email, PagerDuty, etc.\nLike the Ruler, AlertManager is configured per-tenant in a database.\nUpstream Docs.\nQuery frontend The query frontend is an optional service that accepts HTTP requests, queues them by tenant ID, and retries in case of errors.\n The query frontend is completely optional; you can use queriers directly. To use the query frontend, direct incoming authenticated reads at them and set the -querier.frontend-address flag on the queriers.\n Queueing Queuing performs a number of functions for the query frontend:\n It ensures that large queries that cause an out-of-memory (OOM) error in the querier will be retried. This allows administrators to under-provision memory for queries, or optimistically run more small queries in parallel, which helps to reduce TCO. It prevents multiple large requests from being convoyed on a single querier by distributing them first-in/first-out (FIFO) across all queriers. It prevents a single tenant from denial-of-service-ing (DoSing) other tenants by fairly scheduling queries between tenants. Splitting The query frontend splits multi-day queries into multiple single-day queries, executing these queries in parallel on downstream queriers and stitching the results back together again. This prevents large, multi-day queries from OOMing a single querier and helps them execute faster.\nCaching The query frontend caches query results and reuses them on subsequent queries. If the cached results are incomplete, the query frontend calculates the required subqueries and executes them in parallel on downstream queriers. The query frontend can optionally align queries with their step parameter to improve the cacheability of the query results.\nParallelism The query frontend job accepts gRPC streaming requests from the queriers, which then \u0026ldquo;pull\u0026rdquo; requests from the frontend. For high availability it\u0026rsquo;s recommended that you run multiple frontends; the queriers will connect to—and pull requests from—all of them. To reap the benefit of fair scheduling, it is recommended that you run fewer frontends than queriers. Two should suffice in most cases.\nQuerier The querier service handles the actual PromQL evaluation of samples stored in long-term storage.\nIt embeds the chunk store client code for fetching data from long-term storage and communicates with ingesters for more recent data.\nChunk store The chunk store is Cortex\u0026rsquo;s long-term data store, designed to support interactive querying and sustained writing without the need for background maintenance tasks. It consists of:\n An index for the chunks. This index can be backed by DynamoDB from Amazon Web Services, Bigtable from Google Cloud Platform, Apache Cassandra. A key-value (KV) store for the chunk data itself, which can be DynamoDB, Bigtable, Cassandra again, or an object store such as Amazon S3 Unlike the other core components of Cortex, the chunk store is not a separate service, job, or process, but rather a library embedded in the three services that need to access Cortex data: the ingester, querier, and ruler.\n The chunk store relies on a unified interface to the \u0026ldquo;NoSQL\u0026rdquo; stores—DynamoDB, Bigtable, and Cassandra—that can be used to back the chunk store index. This interface assumes that the index is a collection of entries keyed by:\n A hash key. This is required for all reads and writes. A range key. This is required for writes and can be omitted for reads, which can be queried by prefix or range. The interface works somewhat differently across the supported databases:\n DynamoDB supports range and hash keys natively. Index entries are thus modelled directly as DynamoDB entries, with the hash key as the distribution key and the range as the range key. For Bigtable and Cassandra, index entries are modelled as individual column values. The hash key becomes the row key and the range key becomes the column key. A set of schemas are used to map the matchers and label sets used on reads and writes to the chunk store into appropriate operations on the index. Schemas have been added as Cortex has evolved, mainly in an attempt to better load balance writes and improve query performance.\n The current schema recommendation is the v10 schema. v11 schema is an experimental schema.\n ","excerpt":"Cortex consists of multiple horizontally scalable microservices. Each microservice uses the most …","ref":"/docs/architecture.md/","title":"Cortex Architecture"},{"body":" [this is a work in progress]\nRemote API Cortex supports Prometheus\u0026rsquo; remote_read and remote_write APIs. The encoding is Protobuf over http.\nRead is on /api/prom/read and write is on /api/prom/push.\nConfigs API The configs service provides an API-driven multi-tenant approach to handling various configuration files for prometheus. The service hosts an API where users can read and write Prometheus rule files, Alertmanager configuration files, and Alertmanager templates to a database.\nEach tenant will have it\u0026rsquo;s own set of rule files, Alertmanager config, and templates. A POST operation will effectively replace the existing copy with the configs provided in the request body.\nConfigs Format At the current time of writing, the API is part-way through a migration from a single Configs service that handled all three sets of data to a split API (Tracking issue). All APIs take and return all sets of data.\nThe following schema is used both when retrieving the current configs from the API and when setting new configs via the API.\nSchema: { \u0026#34;id\u0026#34;: 99, \u0026#34;rule_format_version\u0026#34;: \u0026#34;2\u0026#34;, \u0026#34;config\u0026#34;: { \u0026#34;alertmanager_config\u0026#34;: \u0026#34;\u0026lt;standard alertmanager.yaml config\u0026gt;\u0026#34;, \u0026#34;rules_files\u0026#34;: { \u0026#34;rules.yaml\u0026#34;: \u0026#34;\u0026lt;standard rules.yaml config\u0026gt;\u0026#34;, \u0026#34;rules2.yaml\u0026#34;: \u0026#34;\u0026lt;standard rules.yaml config\u0026gt;\u0026#34; }, \u0026#34;template_files\u0026#34;: { \u0026#34;templates.tmpl\u0026#34;: \u0026#34;\u0026lt;standard template file\u0026gt;\u0026#34;, \u0026#34;templates2.tmpl\u0026#34;: \u0026#34;\u0026lt;standard template file\u0026gt;\u0026#34; } } } Formatting id - should be incremented every time data is updated; Cortex will use the config with the highest number.\nrule_format_version - allows compatibility for tenants with config in Prometheus V1 format. Pass \u0026ldquo;1\u0026rdquo; or \u0026ldquo;2\u0026rdquo; according to which Prometheus version you want to match.\nconfig.alertmanager_config - The contents of the alertmanager config file should be as described here, encoded as a single string to fit within the overall JSON payload.\nconfig.rules_files - The contents of a rules file should be as described here, encoded as a single string to fit within the overall JSON payload.\nconfig.template_files - The contents of a template file should be as described here, encoded as a single string to fit within the overall JSON payload.\nEndpoints Manage Alertmanager GET /api/prom/configs/alertmanager - Get current Alertmanager config\n Normal Response Codes: OK(200) Error Response Codes: Unauthorized(401), NotFound(404) POST /api/prom/configs/alertmanager - Replace current Alertmanager config\n Normal Response Codes: NoContent(204) Error Response Codes: Unauthorized(401), BadRequest(400) POST /api/prom/configs/alertmanager/validate - Validate Alertmanager config\nNormal Response: OK(200)\n{ \u0026#34;status\u0026#34;: \u0026#34;success\u0026#34; } Error Response: BadRequest(400)\n{ \u0026#34;status\u0026#34;: \u0026#34;error\u0026#34;, \u0026#34;error\u0026#34;: \u0026#34;error message\u0026#34; } Manage Rules GET /api/prom/configs/rules - Get current rule files\n Normal Response Codes: OK(200) Error Response Codes: Unauthorized(400), NotFound(404) POST /api/prom/configs/rules - Replace current rule files\n Normal Response Codes: NoContent(204) Error Response Codes: Unauthorized(401), BadRequest(400) Manage Templates GET /api/prom/configs/templates - Get current templates\n Normal Response Codes: OK(200) Error Response Codes: Unauthorized(401), NotFound(404) POST /api/prom/configs/templates - Replace current templates\n Normal Response Codes: NoContent(204) Error Response Codes: Unauthorized(401), BadRequest(400) Deactivate/Restore Configs DELETE /api/prom/configs/deactivate - Disable configs for a tenant\n Normal Response Codes: OK(200) Error Response Codes: Unauthorized(401), NotFound(404) POST /api/prom/configs/restore - Re-enable configs for a tenant\n Normal Response Codes OK(200) Error Response Codes: Unauthorized(401), NotFound(404) These API endpoints will disable/enable the current Rule and Alertmanager configuration for a tenant.\nNote that setting a new config will effectively \u0026ldquo;re-enable\u0026rdquo; the Rules and Alertmanager configuration for a tenant.\nIngester Shutdown POST /shutdown - Shutdown all operations of an ingester. Shutdown operations performed are similar to when an ingester is gracefully shutting down, including flushing of chunks if no other ingester is in PENDING state. Ingester does not terminate after calling this endpoint.\n Normal Response Codes: NoContent(204) Error Response Codes: Unauthorized(401) ","excerpt":"[this is a work in progress]\nRemote API Cortex supports Prometheus\u0026rsquo; remote_read and …","ref":"/docs/apis.md/","title":"Cortex APIs"},{"body":"The ingester holds several hours of sample data in memory. When we want to shut down an ingester, either for software version update or to drain a node for maintenance, this data must not be discarded.\nEach ingester goes through different states in its lifecycle. When working normally, the state is ACTIVE.\nOn start-up, an ingester first goes into state PENDING. After a short time, if nothing happens, it adds itself to the ring and goes into state ACTIVE.\nA running ingester is notified to shut down by Unix signal SIGINT. On receipt of this signal it goes into state LEAVING and looks for an ingester in state PENDING. If it finds one, that ingester goes into state JOINING and the leaver transfers all its in-memory data over to the joiner. On successful transfer the leaver removes itself from the ring and exits and the joiner changes to ACTIVE, taking over ownership of the leaver\u0026rsquo;s ring tokens.\nIf a leaving ingester does not find a pending ingester after several attempts, it will flush all of its chunks to the backing database, then remove itself from the ring and exit. This may take tens of minutes to complete.\nDuring hand-over, neither the leaving nor joining ingesters will accept new samples. Distributors are aware of this, and \u0026ldquo;spill\u0026rdquo; the samples to the next ingester in the ring. This creates a set of extra \u0026ldquo;spilled\u0026rdquo; chunks which will idle out and flush after hand-over is complete. The sudden increase in flush queue can be alarming!\nThe following metrics can be used to observe this process:\n cortex_member_ring_tokens_owned - how many tokens each ingester thinks it owns cortex_ring_tokens_owned - how many tokens each ingester is seen to own by other components cortex_ring_member_ownership_percent same as cortex_ring_tokens_owned but expressed as a percentage cortex_ring_members - how many ingesters can be seen in each state, by other components cortex_ingester_sent_chunks - number of chunks sent by leaving ingester cortex_ingester_received_chunks - number of chunks received by joining ingester You can see the current state of the ring via http browser request to /ring on a distributor.\n","excerpt":"The ingester holds several hours of sample data in memory. When we want to shut down an ingester, …","ref":"/docs/guides/ingester-handover.md/","title":"Ingester Hand-over"},{"body":"","excerpt":"","ref":"/docs/guides/","title":"Guides"},{"body":" Cortex uses Jaeger to implement distributed tracing. We have found Jaeger invaluable for troubleshooting the behavior of Cortex in production.\nDependencies In order to send traces you will need to set up a Jaeger deployment. A deployment includes either the jaeger all-in-one binary, or else a distributed system of agents, collectors, and queriers. If running on Kubernetes, Jaeger Kubernetes is an excellent resource.\nConfiguration In order to configure Cortex to send traces you must do two things: 1. Set the JAEGER_AGENT_HOST environment variable in all components to point to your Jaeger agent. This defaults to localhost. 1. Enable sampling in the appropriate components: * The Ingester and Ruler self-initiate traces and should have sampling explicitly enabled. * Sampling for the Distributor and Query Frontend can be enabled in Cortex or in an upstream service such as your frontdoor.\nTo enable sampling in Cortex components you can specify either JAEGER_SAMPLER_MANAGER_HOST_PORT for remote sampling, or JAEGER_SAMPLER_TYPE and JAEGER_SAMPLER_PARAM to manually set sampling configuration. See the Jaeger Client Go documentation for the full list of environment variables you can configure.\nNote that you must specify one of JAEGER_AGENT_HOST or JAEGER_SAMPLER_MANAGER_HOST_PORT in each component for Jaeger to be enabled, even if you plan to use the default values.\n","excerpt":"Cortex uses Jaeger to implement distributed tracing. We have found Jaeger invaluable for …","ref":"/docs/guides/tracing.md/","title":"Tracing"},{"body":"","excerpt":"","ref":"/docs/configuration/","title":"Configuration"},{"body":"","excerpt":"","ref":"/index.json","title":""},{"body":" master / unreleased [CHANGE] The frontend component has been refactored to be easier to re-use. When upgrading the frontend, cache entries will be discarded and re-created with the new protobuf schema. #1734 [CHANGE] Remove direct DB/API access from the ruler [CHANGE] Removed Delta encoding. Any old chunks with Delta encoding cannot be read anymore. If ingester.chunk-encoding is set to Delta the ingester will fail to start. #1706 [CHANGE] Setting -ingester.max-transfer-retries to 0 now disables hand-over when ingester is shutting down. Previously, zero meant infinite number of attempts. #1771 [CHANGE] dynamo has been removed as a valid storage name to make it consistent for all components. aws and aws-dynamo remain as valid storage names. [FEATURE] Global limit on the max series per user and metric #1760 -ingester.max-global-series-per-user -ingester.max-global-series-per-metric Requires -distributor.replication-factor and -distributor.shard-by-all-labels set for the ingesters too [FEATURE] Flush chunks with stale markers early with ingester.max-stale-chunk-idle. #1759 [FEATURE] EXPERIMENTAL: Added new KV Store backend based on memberlist library. Components can gossip about tokens and ingester states, instead of using Consul or Etcd. #1721 [FEATURE] Allow Query Frontend to log slow queries with frontend.log-queries-longer-than. #1744 [FEATURE] The frontend split and cache intervals can now be configured using the respective flag --querier.split-queries-by-interval and --frontend.cache-split-interval. If --querier.split-queries-by-interval is not provided request splitting is disabled by default. --querier.split-queries-by-day is still accepted for backward compatibility but has been deprecated. You should now use --querier.split-queries-by-interval. We recommend a to use a multiple of 24 hours. [ENHANCEMENT] Allocation improvements in adding samples to Chunk. #1706 [ENHANCEMENT] Consul client now follows recommended practices for blocking queries wrt returned Index value. #1708 [ENHANCEMENT] Consul client can optionally rate-limit itself during Watch (used e.g. by ring watchers) and WatchPrefix (used by HA feature) operations. Rate limiting is disabled by default. New flags added: --consul.watch-rate-limit, and --consul.watch-burst-size. #1708 [ENHANCEMENT] Added jitter to HA deduping heartbeats, configure using distributor.ha-tracker.update-timeout-jitter-max #1534 [ENHANCEMENT] Allocation improvements in adding samples to Chunk. #1706 [ENHANCEMENT] Consul client now follows recommended practices for blocking queries wrt returned Index value. #1708 [ENHANCEMENT] Consul client can optionally rate-limit itself during Watch (used e.g. by ring watchers) and WatchPrefix (used by HA feature) operations. Rate limiting is disabled by default. New flags added: --consul.watch-rate-limit, and --consul.watch-burst-size. #1708 0.3.0 / 2019-10-11 This release adds support for Redis as an alternative to Memcached, and also includes many optimisations which reduce CPU and memory usage.\n [CHANGE] Gauge metrics were renamed to drop the _total suffix. #1685 In Alertmanager, alertmanager_configs_total is now alertmanager_configs In Ruler, scheduler_configs_total is now scheduler_configs scheduler_groups_total is now scheduler_groups. [CHANGE] --alertmanager.configs.auto-slack-root flag was dropped as auto Slack root is not supported anymore. #1597 [CHANGE] In table-manager, default DynamoDB capacity was reduced from 3,000 units to 1,000 units. We recommend you do not run with the defaults: find out what figures are needed for your environment and set that via -dynamodb.periodic-table.write-throughput and -dynamodb.chunk-table.write-throughput. [FEATURE] Add Redis support for caching #1612 [FEATURE] Allow spreading chunk writes across multiple S3 buckets #1625 [FEATURE] Added /shutdown endpoint for ingester to shutdown all operations of the ingester. #1746 [ENHANCEMENT] Upgraded Prometheus to 2.12.0 and Alertmanager to 0.19.0. #1597 [ENHANCEMENT] Cortex is now built with Go 1.13 #1675, #1676, #1679 [ENHANCEMENT] Many optimisations, mostly impacting ingester and querier: #1574, #1624, #1638, #1644, #1649, #1654, #1702 Full list of changes: https://github.com/cortexproject/cortex/compare/v0.2.0...v0.3.0\n0.2.0 / 2019-09-05 This release has several exciting features, the most notable of them being setting -ingester.spread-flushes to potentially reduce your storage space by upto 50%.\n [CHANGE] Flags changed due to changes upstream in Prometheus Alertmanager #929: alertmanager.mesh.listen-address is now cluster.listen-address alertmanager.mesh.peer.host and alertmanager.mesh.peer.service can be replaced by cluster.peer alertmanager.mesh.hardware-address, alertmanager.mesh.nickname, alertmanager.mesh.password, and alertmanager.mesh.peer.refresh-interval all disappear. [CHANGE] \u0026ndash;claim-on-rollout flag deprecated; feature is now always on #1566 [CHANGE] Retention period must now be a multiple of periodic table duration #1564 [CHANGE] The value for the name label for the chunks memcache in all cortex_cache_ metrics is now chunksmemcache (before it was memcache) #1569 [FEATURE] Makes the ingester flush each timeseries at a specific point in the max-chunk-age cycle with -ingester.spread-flushes. This means multiple replicas of a chunk are very likely to contain the same contents which cuts chunk storage space by up to 66%. #1578 [FEATURE] Make minimum number of chunk samples configurable per user #1620 [FEATURE] Honor HTTPS for custom S3 URLs #1603 [FEATURE] You can now point the query-frontend at a normal Prometheus for parallelisation and caching #1441 [FEATURE] You can now specify http_config on alert receivers #929 [FEATURE] Add option to use jump hashing to load balance requests to memcached #1554 [FEATURE] Add status page for HA tracker to distributors #1546 [FEATURE] The distributor ring page is now easier to read with alternate rows grayed out #1621 0.1.0 / 2019-08-07 [CHANGE] HA Tracker flags were renamed to provide more clarity #1465 distributor.accept-ha-labels is now distributor.ha-tracker.enable distributor.accept-ha-samples is now distributor.ha-tracker.enable-for-all-users ha-tracker.replica is now distributor.ha-tracker.replica ha-tracker.cluster is now distributor.ha-tracker.cluster [FEATURE] You can specify \u0026ldquo;heap ballast\u0026rdquo; to reduce Go GC Churn #1489 [BUGFIX] HA Tracker no longer always makes a request to Consul/Etcd when a request is not from the active replica #1516 [BUGFIX] Queries are now correctly cancelled by the query-frontend #1508 ","excerpt":"master / unreleased [CHANGE] The frontend component has been refactored to be easier to re-use. …","ref":"/docs/changelog/","title":"Changelog"},{"body":"Cortex follows the CNCF Code of Conduct.\n","excerpt":"Cortex follows the CNCF Code of Conduct.","ref":"/docs/code-of-conduct/","title":"Code of Conduct"},{"body":" Welcome! We\u0026rsquo;re excited that you\u0026rsquo;re interested in contributing. Below are some basic guidelines.\nWorkflow Cortex follows a standard GitHub pull request workflow. If you\u0026rsquo;re unfamiliar with this workflow, read the very helpful Understanding the GitHub flow guide from GitHub.\nYou are welcome to create draft PRs at any stage of readiness - this can be helpful to ask for assistance or to develop an idea. But before a piece of work is finished it should:\n Be organised into one or more commits, each of which has a commit message that describes all changes made in that commit (\u0026lsquo;why\u0026rsquo; more than \u0026lsquo;what\u0026rsquo; - we can read the diffs to see the code that changed). Each commit should build towards the whole - don\u0026rsquo;t leave in back-tracks and mistakes that you later corrected. Have tests for new functionality or tests that would have caught the bug being fixed. Include a CHANGELOG message if users of Cortex need to hear about what you did. Developer Certificates of Origin (DCOs) Before submitting your work in a pull request, make sure that all commits are signed off with a Developer Certificate of Origin (DCO). Here\u0026rsquo;s an example:\ngit commit -s -m \u0026#34;Here is my signed commit\u0026#34; You can find further instructions here.\nBuilding Cortex To build:\nmake (By default, the build runs in a Docker container, using an image built with all the tools required. The source code is mounted from where you run make into the build container as a Docker volume.)\nTo run the test suite:\nmake test Playing in minikube First, start minikube.\nYou may need to load the Docker images into your minikube environment. There is a convenient rule in the Makefile to do this:\nmake prime-minikube Then run Cortex in minikube:\nkubectl apply -f ./k8s (these manifests use latest tags, i.e. this will work if you have just built the images and they are available on the node(s) in your Kubernetes cluster)\nCortex will sit behind an nginx instance exposed on port 30080. A job is deployed to scrape itself. Try it:\nhttp://192.168.99.100:30080/api/prom/api/v1/query?query=up\nIf that doesn\u0026rsquo;t work, your Minikube might be using a different ip address. Check with minikube status.\nDependency management We uses Go modules to manage dependencies on external packages. This requires a working Go environment with version 1.11 or greater, git and bzr installed.\nTo add or update a new dependency, use the go get command:\n# Pick the latest tagged release. go get example.com/some/module/pkg # Pick a specific version. go get example.com/some/module/pkg@vX.Y.Z Tidy up the go.mod and go.sum files:\ngo mod tidy go mod vendor git add go.mod go.sum vendor git commit You have to commit the changes to go.mod and go.sum before submitting the pull request.\n","excerpt":"Welcome! We\u0026rsquo;re excited that you\u0026rsquo;re interested in contributing. Below are some basic …","ref":"/docs/contributing/","title":"Contributing"},{"body":" Horizontally scalable, highly available, multi-tenant, long term Prometheus. Learn More Releases Companies using Cortex\n Long term storage Durably store data for longer than the lifetime of any single machine, and use this data for long term capacity planning. Blazin\u0026rsquo; fast PromQL Cortex makes your PromQL queries blazin' fast through aggressive parallelization and caching. A global view of data Cortex gives you a global view of Prometheus time series data that includes data in long-term storage, greatly expanding the usefulness of PromQL for analytical purposes. Horizontally scalable Cortex runs across multiple machines in a cluster, exceeding the throughput and storage of a single machine. This enables you to send the metrics from multiple Prometheus servers to a single Cortex cluster. We are a Cloud Native Computing Foundation Sandbox project.\n Join the community ! Join users and companies that are using Cortex in production.\n Slack Issues Twitter ","excerpt":"Horizontally scalable, highly available, multi-tenant, long term Prometheus. Learn More Releases …","ref":"/","title":"Cortex"},{"body":"","excerpt":"","ref":"/search/","title":"Search Results"}]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment