jayswan/splunk-elk.md

## splunk-elk.md

      
    Raw
  

              splunk-elk.md
            
          
    Splunk vs ELK is complicated, depending on what you want to optimize. Probably the biggest issue is the ecosystem around post-search data manipulation.
Places where ES shines

ES is amazing at searching for tokens and returning documents. The aggregations are also superb -- actually much faster than Splunk under most conditions. Plugins can extend that functionality. Stuff like fuzzy search, regex queries, indexed terms lookups, significant terms aggregations, and nested aggregations can be extremely powerful if you know how to use them well.
Trouble areas

ES has a reputation for stability problems. These are mostly solvable by running an appropriately sized cluster with new versions and proper circuit breaker settings. Much of the FUD I've seen about this is incorrect, but the biggest problem remains that you can't kill a misbehaving query or constrain its resource use after it has started; if your circuit breakers aren't working correctly then you're out of luck.
Chaining data processing

Unless what you want to do fits precisely into an ES query or aggregation, you have to do the post-search or post-aggregation data manipulation with external code. This is not necessarily a problem if you know in advance what you need to do and can write tooling for it, but Splunk has a really rich toolset built around doing this ad hoc.
Example: search time field extraction

Say your index contains a path=/some/url/path/here KV pair, and you want to extract the path part from between the / characters and do some aggregation on that. That's really easy in SPL. In ES you'd need to either a) run a dynamic scripted field aggregation (which is complex and can blow up in unexpected ways), or b) do a document search or scan query and post-process the results.
Example: unique value extraction

Say you want to find all the unique values for a particular field. In ES you do this by running a terms aggregation, but if the cardinality of the field is high then your bucket count needs to equal the cardinality of the field. Running terms aggregations with very large bucket sizes in ES is very expensive and can cause cluster stability problems. It also only works reasonably on keyword tokens (as opposed to text tokens). The other alternative would be to run a scan query for every document that matches and post-process the results. In Splunk, there are a couple of simple ways to accomplish this.
Visualizations

Some people have strong opinions about Kibana vs Splunk visualizations. There are things I like about both, so I don't have a strong opinion here.
Other thoughts

Once you get into more complex queries like time window aggregations, Splunk has a ton of nice built-in features. Some of that you can do with complex aggregation chains in ES or Timelion (depending on the data types), but there are a lot of limitations.
I'm increasingly convinced that the optimal way to solve the most complex problems is to have a set of tools that can run on the same data for a bunch of different conditions: a SQL tool (Presto/Hive/etc), a free-text search tool (ES), a streaming tool (Flink, Spark, etc), and maybe a graph database tool. But that requires a lot of specialized knowledge and data engineering; Splunk does a lot of that stuff reasonably well in one (very) expensive package.