Skip to content

Instantly share code, notes, and snippets.

@pinkeen
Last active August 21, 2019 11:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pinkeen/01bf5ce1443aeb9eb5ddbc5d7c57b1a8 to your computer and use it in GitHub Desktop.
Save pinkeen/01bf5ce1443aeb9eb5ddbc5d7c57b1a8 to your computer and use it in GitHub Desktop.
DevOps/SysOps Fuckup Anecdotes

Elasticsuite thesaurus fuckup / huge es logfiles

Elasticsuite synonimization feature (thesaurus) works like this:

  • For each synonimized keyword in user's search query and each synonym configured for this keyword add a match condition to the elasticsearch query that has the keyword string replaced with synonym.
  • For multiple synonymized keywords appearing in user's search query this will produce a separate match condition for each combination of each synonym of each keyword.
  • This means that the resulting elasticsearch query will basically grow geometrically with the number of synonymized keywords present in user's query as the number of conditions is roughly (keyword count)^(average number of synonyms).

What client did is they've configured a bunch of keywords with 40-80 synonyms each one. When somebody entered a search query that had 3 of them elasticsuite would produce an elasticsearch query (the raw JSON) that weighed ~1GB (sic!). Of course elasticsearch would fail to parse such query so the request would fail, but... It would log the full query contents, and quickly cause everything to blow up due to lack of disk space. This means that with each request like this the elasticsearch logfile would grow by ~1GB 😎

mindblown

It took me a while to analyze this huge multi-gigabyte logfile to deduct where this query is actually coming from 😅. And the lesson coming from this is (unless they've fixed it) better to disable this elasticsuite feature completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment