Skip to content

Instantly share code, notes, and snippets.

@rosskarchner
Last active July 16, 2023 14:13
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rosskarchner/2381ca5f8136f72b3718ca8353a7d5cc to your computer and use it in GitHub Desktop.
Save rosskarchner/2381ca5f8136f72b3718ca8353a7d5cc to your computer and use it in GitHub Desktop.
Notes from Elasticsearch Engineer II training
  • Elasticsearch Internals
    • Lucene Indexing
      • Java Library
      • Shard
        • A shard is an instance of Lucene
        • max # of documents in a shard is Integer.MAX_VALUE - 128
        • clients do not refer to shards directly (use index instead
        • What's in a shard?
          • indexed document gets analyzed, put in buffer
            • buffer defaults to 10% of node heap (set with indices.memory.index_buffer_size)
            • buffer uis shared by all shards allocated to that node
          • after buffer is full, or some time passes (refresh_interval), a "lucene flush" occurs
          • documents are not searchable until flushed to a "segment"
        • refresh parameter on search
          • default: false
          • true: force a lucene flush
          • wait_for: wait for next flush
    • Understanding Segments
      • shard=lucene index= collection of segements
        • think of segment as an immutable "mini index"
      • Each segment is a fully independent index
        • Each segment is composed by a bunch of files
        • Segment files are written to disk
        • Segment files are never updated (read only)
      • a search request is distributed to the shards and on each shard it is performed sequentially over the segments
    • Segment Merges
      • segments are periodically merged into larger segments
      • keeps index size and number of segments management
      • deleted documents get expunged during a merge
      • force a merge with POST index_name/_forcemerge
    • Elasticsearch Indexing
      • index = 1 or more shards
      • a shard consists of segments
      • segments are fysynced to disk during a lucene commit
        • a relatively heavy operation
        • should not be performed after every index or delete operation
      • until a commit segment data is on disk it is susceptible to loss
      • to prevent this data loss, ES implements a transaction log for each shard
      • transaction log is written at the same time as buffer
      • in the event of a failure, translog will be replayed
      • force es flush with POST index_name/_flush
      • synced flush
        • performs a normal flush, then adds a generated unique marker (sync_id) to all shards
        • sync_id provides a quick way to check if two shards are identical.
        • happens every minute or so
        • consider doing it before a cluster restart
    • Doc Values
      • a data structure that store the values of a document on-disk in a column-oriented fashion, at index time
        • which makes sorting and aggregations faster
    • Caching
      • one cache per node that is shared by all shards
        • LRU eviction policy
        • it only caches queries inside a filter context
      • one static node-level settings
        • indices.queries.cache.size: "5%"
      • segment level each uses bit sets...
        • array of bits, each position represents a socument
          • super efficient
        • Only built for segments that are "big enough"
      • Shared request cache
        • one cache per node that is shared by all shards
        • LRU eviction policy
        • Data modification invalidates the cache
        • Good fir for indices that you don't write anymore
        • search requests return results almost instantly
        • static setting that must be configured on every node:
          • indices.request.cache.size: "5%"
        • shard level request cache
        • enabled on every index by default
  • Field Modeling
    • use cases for granular fields
      • suppose a version number
      • allow queries like 5.4 or 6.x
      • instead of simple version number, index major, minor, and bug fix number + "display name"
      • now we can search at any desired level
    • Range data types introduced in 5.2
      • integer_range
      • float_range
      • long_range
      • double_range
      • date_range
      • ip_range
    • use range query
      • intersects
      • contains
      • within
    • Mapping parameters
      • https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-params.html
      • index: false (field will still be in _source)
        • doc_values: false (field will still be in _source, field can still be used in queries, but not aggregations
      • enabled: false (can not query or aggregate, will still be in _source)
      • disabling _all
        • _all will be removed from future versions of ES
        • can not use it in new indexes, as of ES6
        • disable with "_all":{"enabled": false} in mappings
      • copy_to
        • use case example: multiple fields that represent location (city, region, country), want to search them in a single query
          • could just run book query across the fields
          • or copy all the values to a single array field during indexing
      • null: treated as if has no value
        • the average of 5.0 and null is 5.0
        • use null_value to set a default
      • coercing data
        • disable with coerce:false
      • _meta field
        • custom metadata in your mapping
    • Dynamic templates
      • dealing with large number if fileds
      • dynamic filed names not known at time of mapping definition
      • define a fields mapping based on
        • datatype
        • name of the field
          • for example, map f_* as float
        • path to the field
      • controlling dynamic fields
        • strict mappings
          • by default, if a document is indexed with an unexpected field, the mapping is dynamically modified
          • in prod, you will likely define mappings prior to indexing
            • you probably do not want your mappings to change
            • dynamic: strict
              • do not index a doc that contains fields not already in mapping
              • you can still modify the mapping directly
              • other values:
                • true, default behavior
                • false, new fields will be ignored
  • Fixing Data
    • Tools for fixing data
      • Painless is a scripting language designed specifically for use with ES
        • fast, secure, groovy-like syntax
        • supports all of Java's data types and a subset of the Java API
        • use cases
          • accessing document fields
            • updated/deleare a field in a document
            • perform computations on fields before returning them
            • customize the score of a doc
            • working with aggregations
          • Ingest node
            • execute a script within an ingest pipeline
          • Reindex API
            • manipulate data as it is getting reindexed
          • Basic example: updating a field
            • increment number of views on a blog post
        • Can be defined inline, or stored (POST /_scripts/script_name)
        • scripts are cached (inline or stored)
          • new script can evict a cached script
          • default cache is 100 scripts
            • configure using script.cache.max_size and script.cache.expire
          • compilation can be expensive
            • default: 75 compilations per 5-minute window (75/5m)
            • configured by script.max_compilations_rate
            • avoid by using parameters
      • Languages like Groovy, Javascript, and Python are no longer available as of ES6
      • Reindex API
        • take data from one index, optionally manipulate, and write to a new one.
      • Updated by query
        • same as reindex API, but updates itself
      • delete by query
        • like SQL delete.. where
    • fixing mappings
      • you can not change the data type in a mapping
      • must define a new index with the desired data types
      • post _reindex, define source and dest indexes
    • reindexing tips
      • conflicts parameter
    • picking up mapping changes
      • you can always add fields, but updating the mapping won't update the index
      • POST blogs_fixed/_update_by_query
      • reindexBatch pattern— use a field with a batch ID number. to continue after a fail, update_by_query to finish where you left off.
    • fixing fields
      • splitting a field
      • Ingest Node
        • provide the ability to preprocess a doc right before it gets indexed
        • intercepts an index or bulk API request
        • applies transformations
        • passes the document back to the index or bulk API
        • when indexing a doc, you specify a pipeline
        • centralized, pipelines stored in cluster state
        • by default, all nodes can ingest. node.ingest=true
        • pipeline
  • Advanced Search & Aggregations
    • Searching for patterns
      • wildcard query is a term-based search that contains a wildcard
          • = anything
        • ? = any single character
        • cam be expensive, especially if you have leading wildcards
      • regex query
      • dealing with null values
        • documents will likely have missing or null fields
        • exists query
          • wrap exists in a must_not to find docs that are missing a field
      • scripted queries
        • script queries must compute a boolean value
        • test a query with the _execute API
        • contexts: filter, or scoring
      • scripted fields
        • add fields to a query response that are generated in a script
        • script_fields in query body
        • expensive, consider adding as a field, calculating on ingest
      • Search templates
        • like stored procedures
        • mustache templates, script.lang=mustache
        • does not have if/else logic
          • but you can define a section that gets skipped if the parameter is false or not defined
      • percentiles aggregation
        • defaults percentiles, but can be overridden
      • percentiles_rank (reverse percentiles)
      • top_hits
      • missing
      • scripted aggregations
        • combine fields for a terms agg
      • significant terms aggregation
        • terms aggregations + noise filter
          • discards commonly common terms that the terms agg would return
        • low frequency terms in the background data pop out as high frequency terms in the foreground data
      • Pipeline aggregations
  • Cluster Management
    • Architecture Recap
      • largest unit is a cluster
      • cluster made up of nodes
      • each node is running ES process and typically there is 1x1 relationship between a server and a node
      • indexes and shards
        • cluster contains one ore more indices
        • each index is shared and its shards are distributed over nodes
        • ES automatically distributes the shards evenly across the nodes
      • maybe you want more control
        • your servers specification
        • what nodes shards are allocated to
          • data management strategy? some data on less expensive hardware?
    • Dedicated Nodes
      • several roles a node can have
        • master eligible
        • data
        • ingest
        • machine learning
        • coordinating
      • nodes can be dedicated only to a single role
      • by default, a node is master-eligible, data, and ingest
        • node.master=true
        • node.data=true
        • node.ingest=true
      • why use?
        • dedicated master eligible nodes
          • focus on cluster state
          • machines with low CPU, RAM, and disk resources
        • dedicated data nodes
          • fast disks, lots of ram
          • can focus on data storage and processing client requests
        • dedicated ingest nodes
          • can focus on data processing
          • machines with low disk, medium RAM, and high CPU
        • coordinating-only node
          • machines with low disk, medium/high RAM, and medium/high CPU resources
          • behave like smart load balances
          • perform the gather/reduce phase of search requests
          • lightens the load on data nodes
        • coordinating + ingest not uncommon
    • Hot/Warm architecture
      • you can configure the dedicated data nodes in your cluster to use a hot/warm arch
        • useful for scenarios where you want to control which nodes perform indexing vs query handling
      • fine-grained control over data allocation
      • dedicated data nodes can be used as
        • hot nodes for indexing
        • warm nodes for older, read-only indices
          • larger amounts of data may require additional nodes to meet performance requirements
    • Shard Filtering
      • the ability to control which nodes the shards for an index are allocated
        • use node.attr to tag your nodes
        • use index.routing.allocation to assign indexes to nodes
        • three types of rules for assigning indexes to node
          • index.routing.allocations.include
          • index.routing.allocations.exclude
          • index.routing.allocations.require
        • PUT _settings on index to configure
    • Shard filtering for Hardware
    • Shard allocation awareness
      • make nodes aware of where they are (racks, availability zone)
      • you can make ES aware of the physical configuration of your hardware
      • cluster.routing.allocation.awareness
      • useful when more than once node shares the same resource (disk, host machine, network switch, rack, etc)
    • Forced Awareness
      • suppose you have a rack or zone that fails
      • the remaining rack will try to reassign all the missing replicas
      • the single rack might not be able to handle that type of volume
      • you might not want ES to panic when you restart a rack
      • stay yellow during a temporary event
  • Capacity Planning
    • designing for scale
      • default settings can take you a long way
      • proper design can make scaling easier
    • one shard does not scale well
    • two shards can scale if we add a node
    • shard overallocation
      • if you are expecting your cluster to grow, it's good to plan for that by oveallocating shards
        • num shards > num nodes
      • be careful
        • unit of scaling is the shard
        • but you should not have too many
      • specially sub-utilized shards
      • each shard uses resources from the machine
        • if you have too many, it will seriously slow things down
      • too many shards, ES will be slow
      • "too much"
      • do not overshard
    • show how many shards should I configure?
      • no simple formula
      • "it depends!"
    • too many factors involved:
      • use case: metrics, logging, search, api etc
      • hardware
      • num of docs
      • size and complexity of docs
    • before trying to determine your capacity, determine your SLA's
      • docs/second
      • queries/second
      • maximum response time for queries
    • get some production data
      • actual documents
      • actual queries
      • actual mappings
    • create a one-node (one shard?) cluster
    • push it until it breaks (breaks SLA's)
    • rally automates this? https://esrally.readthedocs.io/en/stable/quickstart.html
      • run it in docker
    • number of primary shards
      • not an exact since, but you can
        • estimate the total amount of data for your index
        • leave room for growth
        • divide by the maximum capacity of a single shard
      • the number of primary shards is usually determined by indexing speed
      • for searching, the total number of shards is the measurement (agnostic of index)
        • note we are not talking about replicas here, just number of primary shards
    • measure indexing capacity of a node
      • index documents in a similar way as your app woll
      • index in parallel until error 429
      • use this to calculate the number of docs/second
      • you can calculate how to scale you nodes for ingestion based on your SLA
    • scaling with replicas
      • adding replicas provides an additional level od scaling
        • we know replicas provide high availability
        • they also provide scaling fro reads and search (IF YOU ADD ADDITIONAL HARDWARE)
        • cost of replicas:
          • slower indexing speed (although replicas are indexed in parallel)
          • more storage on disk
          • larger associated heap memory footprint of the extra shard(s)
    • scaling with indices
      • using multiple indices also provides scaling
        • if you need to add capacity, consider just creating a new index
      • then search across both indices to search new and old data
        • you could even define a single alias for the multiple indices
      • searching 1 index with 50 shards is equivalent to searching 50 indices with 1 shard each
    • Capacity planning use cases
      • understand
        • what your data looks like
        • how that data is going to be searched
      • two common use cases are:
        • searching fixed-size data a large dataset that may grow slowly
          • care about relevancy, not so much timeliness
          • plainning
            • sizes your indices based on the maximum shard capacity vs the amount of data you anticipate
              • if you need to increase capacity, either reindex or add multiple indices (preferably not often)
              • easy to increase throughput by adding nodes and replicas
        • time based data: grows rapidly, like logs, social media streams, time-based events
          • all docs have a timestamp, and likely do not change
          • planning
            • searching usually involves a timestamp
            • older docs become less important
            • data ingestion is a key factor
              • typically a lot of data coming in
              • you do not want indexing to become a bottleneck (it has to keep up)
            • common pattern: time-based indices
              • new index for da, week, month, year
              • add the date to the name of the index
              • search using index wildcards or aliases
            • data ingestion
              • hardcode the index name in the app is not scalable
              • you can use date math in your index name, like <logstash-{now/d}>playself
            • update aliases with curator: https://www.elastic.co/guide/en/elasticsearch/client/curator/5.6/index.html
            • aliases
              • simple to define
              • need a tool to update the alias
              • no need to redeploy your application
            • managing time-based indices
              • for optimal ingest rates
                • spread shards of your active index over as many nodes as possible
              • optimal search
                • shrink older indices down to the optimal number of shards
                • close indices that are no longer being search
              • use hot/warm architecture
                • hot for indexing, warm for querying
  • Document Modeling
    • the need for doc modeling
      • in ES, you typically denormalize your data
      • flat world has advantages
        • indexing and searching is fast
        • no need to join tables to lock rows
      • sometimes relationships matter
    • we need to bridge the gap between normal and flat
    • four common techniques for managing relational data in ES
      • denormalization
        • store redundant copies of data in each document
        • example, tweet + user
        • optimizing for reads
        • no mechanism for keeping denormalized data up to date— avoid demoralizing data that is likely to change frequently
        • when denormalizing data does change ensure that there is 1 and only 1 authoritative source for the data
        • if you do denormalize data that might change, it can help to also denormalize a field that does not change, like an _id.
        • try not to maintain state in ES, instead try to record events
      • app-side joins (run multiple queries)
        • not recommended
      • nested objects (for working with arrays of objects)
      • parent/child relationships
    • denormalization
    • the need for nested types
      • ES flattens objects (tags.key, tags.value)
      • allows arrays of objects to be indexed and queries independently of each other
      • use nested if you need to maintain the relationship of each object in the array
      • nested objects are stored separately
    • nested types
    • querying a nested type
      • {query: {nested: {path: tags, query:{}}}}
      • inner_hits to see why a document matched
    • the nested aggregation
      • puts nested objects in a bucket
        • useful for performing sub-aggregations
    • parent/child relationship
      • 1-N relationships
      • updating a nested object requires reindexing the root object and a complete reindexing of all its nested objects. nested objects are a READ optimization.
      • using a join datatype, you can completely seperate two objects while maintaining their relationship
        • parent and child are complete seperate documents
        • the parent can be updated without reindexing the children
        • children can be added/changed/deleted without affecting the parent or other children.
      • it's a write optimization.
      • configured in your mappings using the join datatype
      • parent and all children must live in the same shard
      • overrides the default shard routing algorithm, uses the parents _id.
      • therefore, every time you refer to a child document, you must specify its parent's id
      • defining the relationship
        • define the mapping, type: join.
        • index some parent documents
        • index some child documents, ?routing=parent_id
        • index/_search will return parents and children
    • the has_child query
      • has_child type: relation_name
      • filter that accepts a query and the child type to run against
      • results in parent document
    • the has_parent query
      • matches children that have parent that matches
    • always need id of parent when you query children
    • Kibana has very limited support for nested types and parent/child relationships
  • Monitoring and Alerting
    • Stats API
    • Task management API
      • show how busy the server is
    • _cat API
      • wrapper around many jspn APIs
      • command-line tool friendly
      • helpful for simple monitoring
      • v parameter to show headers
      • h parameter to specify column, * for all
    • Diagnosing performance issues
      • thread pool queue
        • GET _nodes/thead_pool
        • GET _nodes/stats/thread_pool
        • when a queue is full, error 429
        • _cat/thread_pool?v
        • a full queue can be good or bad
          • OK if bulk indexing is faster than ES can handle
          • bad if search queue is full
      • the hot_threads API
        • allows you to view current hot threads on each node
        • GET _nodes/hot_threads
        • GET _nodes/node/hot_threads
      • The indexing slow log
        • captures long-running index operations to a log file
        • logs indexing events that take longer than configured thresholds
        • defaults set in log4j2.properties
        • can be overridden in index settings, index.indexing.slowlog
      • search slow logs
        • disabled by default
          • usefulness can be limited since it logs per shard
          • packet beat may be a better solution
      • Profile API
        • just set profile to true in your query
        • profiler response: hard to read, info on each shard that processed the search
        • nice support in Kibana
    • Elastic monitoring component
      • use elasticsearch to monitor elasticsearch
      • xpack.monitoring.collection.enabled defaults to false
      • the stats of all nodes are indexed into Elasticsearch
      • common settings
      • dedicated monitoring cluster
        • reduce the load and storage on your other clusters
        • access to monitoring even when other clusters are unhealthy
        • seperate security levels from monitoring and production clusters
        • if elastic security is enabled, then provide credentials
        • you can also use http exporters, pack.monitoring.exporters
      • monitoring multiple clusters
        • many clusters can be monitored with a single monitoring cluster (gold/platinum)
      • monitoring UI
        • you can see heap pressure
      • Clusters dashboard
    • Configure alerts
      • for unexpected or extreme situations
      • elastic alerting
        • watch for changed or anomalies in your data
        • perform the necessary actions in response
        • for example:
          • open a helpdesk ticket when servers are running out of free space
          • track network activity to detect malicious activity
          • send immediate notification if nodes leave the cluster
        • watch
          • trigger: when the watch is executed
        • input
          • loads data unto the watch payload
        • condition
          • whether the watch actions are executed
        • actions
        • PUT _xpack/watcher/watch/match_name
        • watch executions are stored in ES too (.watcher-history/_search
        • users can have watcher-specific role
  • From Dev to Production
    • disabling dynamic indexes
      • put a doc into an index that doesn't exist, and it'll be created
      • disable in a prod environment
      • PUT _cluster/settings
        • action.auto_create_index: false
      • You can whitelist certain patterns, usually time-series indexes
    • dev vs prod mode
      • HTTP vs. Transport
        • HTTP- how APIs are exposed
          • ports in 9200-9299 range
        • transport: used for internal communication between nodes in cluster
          • ports in 9300-9399 range
      • binds to localhost by default
      • dev mode: if it does not bind to an external interface
      • prod mode: if it does bind transport to an external interface
      • bootstrap tests
        • dev mode: any bootstrap checks that fail will generate warnings
        • prod mode: will refuse to start
    • best practices
      • avoid running over WAN links between datacenters
        • not officially supported by elastic
      • try to have 0 or few hops between nodes
      • If you have multiple network cards, seperate transport and http traffic
        • bind to different intefaces
        • use separate firewall rules for each kind of traffic
      • use long-lived HTTP connections
        • client libraries support this
        • use a proxy/load-balancer
      • storage best practices
        • segments are immutable, so the write amplification factor approaches one and is a non-issue
        • local disk is king!
          • avoid network storage
        • Elasticsearch does not need redundant storage
          • replicas/sofetare provide HA
          • local disks are better than SAN
          • RAID 1/5/10 is not necessary
        • if you have multiple disks in the same server, set RADO0 or path.data
        • RAID 0
          • splits data evenly across two or more disks
          • perfect distribution of the data across the disks
          • if you lose one disk, you lose them all
        • path.data
          • distribute index across multiple SSD's
          • potential to an unbalanced distribution
          • if you lose one disk, the data on the other disks are preserved
        • SSD: use noop or deadline schedule in the OS https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html#_disks
        • spinning disk are OK for warm nodes, but discourage concurrent merges
          • index.merge.scheduler.max_thread_count: 1
        • Trim your SSD's https://www.elastic.co/blog/is-your-elasticsearch-trimmed
      • Hardware selection
        • in general, choose medium machines over large machines
        • loss of a large node has greater impact
          • prefer six 4cpu X 64GB X 4 1tb drives
          • avoid two 12 cpu 256GB 12 1tb drives
        • avoid running multiple nodes on one server
          • one ES instance can fully consume a machine
        • larger machines can be helpful as warm nodes
          • configure shard allocation filtering as previously discussed.
      • Cloud strategies
        • use discovery plugin
          • dynamically configure the unicast hosts
        • span the cluster across more than one AZ
        • prefer ephemeral storage over network storage
          • EBS: maybe provision IOPS?
        • snapshot to cloud storage with repository plugins
        • use shard awareness and forced awareness
        • avoid instances marked with low networking performance.
      • Throttles
        • relocation nd recovery throttles to ensure these tasks do not have a negative impact
        • Recovery
          • for faster recovery, temporarily increase the number of concurrent recoveries
          • _cluster/settings
            • cluster.routing.allocation.node_concurrent_recoveries
        • Relocation
          • for faster rebalancing of shards, increase cluster.routing.allocation.cluster_concurrent_rebalance
      • JVM configuration
        • only 64-bit JVMs are supported
        • general, half-memory to JVM
        • recommendation config/jvm.options
        • what goes in the heap?
          • indexing buffer
          • completion suggester
          • cluster state
          • caches
            • node query cache
            • shard query cache
            • fielddata
          • and more
        • HEAP size
          • default is 1GB, not high enough for production
          • configure with Xms (min heap) and Xmx(max heap)
            • like -Xms8g -Xmx8g
            • set Xmx to no more than 50% of your physical RAM
          • rule of thumb for setting the JVM heap
      • common causes of poor query performance
        • use filters to benefit from caching
        • limit the scope of aggregations by adding a query
        • use the sampler bucket aggregation
        • don't try to use ES like a relational database
          • consider denormalization, or polyglot architecture
          • avoid parent/child relationships
        • too many shards
          • more shards== slower query performance
          • the default of 5 shards can be too high for some scenarious
          • having thousands of 100MB indices with 5 shads each is not good
          • solution
            • shards can be fairly large (40GB is good)
            • create as few of them as needed
          • unnecessary scripting
            • try to run scripts at index time instead of query time
          • regex are slow
            • use them less
            • clever indexes, need to search the end, try the reverse token filter
    • Cross-cluster search
      • allow any node to act as a client for executing queries across multiple clusters
      • introduced in ES 5.3, replaces "tribe node"
      • remote clusters are configured in the cluster settings
        • using the cluster.remote property
        • seeds is a list of nodes in the remote cluster used to retrieve the cluster state when registering the remote cluster
      • search with GET cluster_name:index-name/_search
      • search multiple clusters with GET index_name, cluster_name:blog/_search
      • how it works
        • from a search execution perspective, there is no diff between local indices and remote indices
        • as long as local nodes can reach remote nodes
        • coordinating node gets size hits from each shard (local and remote) and performs reduction
        • top hits are fetched from the relevant shards and returned to client
      • you can use wildcards in cluster names
    • Overview of Upgrades
      • Versions are major.minor.patch/maintenance
      • ES can use indices created in the previous major version, but older indices must be reindexed or deleted
        • 6.x can use indices created in 5.x
        • 6.x cannot use indices created in 2.x or before
        • 5.x can use indices created in 2.x
        • 5.x cannot use indices created in 1.x or before
      • ES will fail to start if an incompatible index is found
      • modern versions, you can do rolling upgrades
      • https://www.elastic.co/products/upgrade_guide
      • Cluster restart strategies
        • rolling restart
          • zero downtime
          • reads and writes continue to operate normally
        • Full cluster
          • can be faster
        • steps for a rolling restart
          • stop non-essential indexing (if possible)
          • disable shard allocation
            • transitent: cluster.routing.allocation.enable: none
            • POST _flush/synced
          • stop and updated one node
          • restart the node
            • check node health
          • re-enable shard allocation and wait
            • check cluster health
          • go back to "disable shard allocation"
        • full cluster restart
          • stop indexing
          • disable shard allocation
            • make sure to use "persistent"
          • synced flush
          • shutdown and update all nodes
          • start all dedicated master nodes
            • wait for master election
          • start the other nodes
          • wait for yellow
          • reenable shard allocation
  • Resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment