Elasticsuite thesaurus fuckup / huge es logfiles
Elasticsuite synonimization feature (thesaurus) works like this:
- For each synonimized keyword in user's search query and each synonym configured for this keyword add a match condition to the elasticsearch query that has the keyword string replaced with synonym.
- For multiple synonymized keywords appearing in user's search query this will produce a separate match condition for each combination of each synonym of each keyword.
- This means that the resulting elasticsearch query will basically grow geometrically with the number of synonymized keywords present in user's query as the number of conditions is roughly
(keyword count)^(average number of synonyms).
What client did is they've configured a bunch of keywords with 40-80 synonyms each one. When somebody entered a search query that had 3 of them elasticsuite would produce an elasticsearch query (the raw JSON) that weighed ~1GB (sic!). Of course elasticsearch would fail to parse such query