dcode/_CLI_Elasticsearch_Analysis.md

## _CLI_Elasticsearch_Analysis.md

      
    Raw
  

              _CLI_Elasticsearch_Analysis.md
            
          
    README

Especially when developing new query logic, it's helpful to query elasticsearch from the command line.
If your Elasticsearch cluster uses SAML authentication or some other SSO, it's not simple or sometimes not even
possible to query using curl directly. I wrote an auth plugin for HTTPie that should greatly simplify this process
if you have rights to create API keys via the Kibana dev console (talk to your administrator and see the link below).
This process is also super handy for shell scripting because you can provide fine-grained limits of what your API key
can do, making their use much safer and easier to manage than embedding native realm username/passwords.
Setup

First, install HTTPie and my auth plugin.
pip install httpie httpie-apikey-auth
Create API key

Using dev tools in Kibana, create API key
See https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html for details and
options
Most basic syntax with your full rights:
POST /_security/api_key
{
  "name": "cli analysis key (dcode)"
}

In your shell, export the following variables. Side note, I like to split out my workspaces by distinct project
directories. You can use a tool call direnv to store these environment variables and autoload
them when you cd to that directory. This is particularly beneficial for this, because if you lose your API key creds
you have to delete them using Kibana and regenerate them.
export SESSION_NAME=my_session
export MY_ID='<your key ID from above>'
export MY_APIKEY='<your apikey from above>'
export ES_HOST='https://es.cloud.sample.com'
Now, let's create an http session with creds. After this, you don't need to specify auth information anymmore. HTTPie
will automatically track the auth info in the local session storage.
http --auth-type=apikey --auth="${MY_ID}:${MY_APIKEY}" --session ${SESSION_NAME} ${ES_HOST}
Now you can move on to your CLI querying.
Query examples

From now on, you can just do the following to use your existing session to hit the entire Elasticsearch API.
http --session ${SESSION_NAME} ${ES_HOST}
Elasticsearch URI search

URI Search documentation
Query where agent.hostname is set to rock01 and limit it to 10 results.
http --session ${SESSION_NAME} ${ES_HOST}/*beat*/_search  q=='agent.hostname: rock01' size=10
Elasticsearch Query DSL

We can perform the same search as above using a request body and a Query DSL query by passing it to HTTPie on stdin via the pipe.
cat <<EOF | http --session ${SESSION_NAME} ${ES_HOST}/*beat*/_search
{
    "size": 10,
    "query" : {
        "term" : { "agent.hostname" : "rock01" }
    }
}
EOF
PRO-TIP: You can form your query in Kibana Discover app, and click Inspect to see the request that Kibana sends. Kibana throws a bunch of other aggregations and other items that you probably don't need, but you can at least see how it maps your query from KQL to Elasticsearch Query DSL.

  
## elasticsearch analysis with jq.sh
# Query from STDIN. This searches all beats indexes
cat <<EOF | http --session ${SESSION_NAME} ${ES_HOST}/*beat*/_search  | jq -c '.hits.hits[]' | tee output.json
{
  "version": true,
  "size": 500,
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "30s",
        "time_zone": "America/Chicago",
        "min_doc_count": 1
      }
    }
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2020-03-31T16:06:11.900Z",
              "lte": "2020-03-31T16:21:11.900Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}
EOF

# split results into files by agent.type
while read -r line; do
  echo "${line}" >> "$(echo "${line}" | jq -r '.agent.type').json";
done < <(cat output.json | jq -c -M '._source')

# Split results into files by index type (should be same as above, but statically defined)
cat output.json | \
  tee \
  >(jq -c 'select(._index | startswith("auditbeat")) | ._source' > auditbeat.ndjson) \
  >(jq -c 'select(._index | startswith("filebeat")) | ._source' > filebeat.ndjson) \
  >(jq -c 'select(._index | startswith("metricbeat")) | ._source' > metricbeat.ndjson) \
  >(jq -c 'select(._index | startswith("journalbeat")) | ._source' > journalbeat.ndjson) \
  >(jq -c 'select(._index | startswith("winlogbeat")) | ._source' > winlogbeat.ndjson) \
  >(jq -c 'select(._index | startswith("endgame")) | ._source' > endgame.ndjson) \
  >/dev/null
	# Query from STDIN. This searches all beats indexes
	cat <<EOF \| http --session ${SESSION_NAME} ${ES_HOST}/beat/_search \| jq -c '.hits.hits[]' \| tee output.json
	{
	"version": true,
	"size": 500,
	"sort": [
	{
	"@timestamp": {
	"order": "desc",
	"unmapped_type": "boolean"
	}
	}
	],
	"aggs": {
	"2": {
	"date_histogram": {
	"field": "@timestamp",
	"fixed_interval": "30s",
	"time_zone": "America/Chicago",
	"min_doc_count": 1
	}
	}
	},
	"query": {
	"bool": {
	"must": [],
	"filter": [
	{
	"match_all": {}
	},
	{
	"range": {
	"@timestamp": {
	"gte": "2020-03-31T16:06:11.900Z",
	"lte": "2020-03-31T16:21:11.900Z",
	"format": "strict_date_optional_time"
	}
	}
	}
	],
	"should": [],
	"must_not": []
	}
	}
	}
	EOF

	# split results into files by agent.type
	while read -r line; do
	echo "${line}" >> "$(echo "${line}" \| jq -r '.agent.type').json";
	done < <(cat output.json \| jq -c -M '._source')

	# Split results into files by index type (should be same as above, but statically defined)
	cat output.json \| \
	tee \
	>(jq -c 'select(._index \| startswith("auditbeat")) \| ._source' > auditbeat.ndjson) \
	>(jq -c 'select(._index \| startswith("filebeat")) \| ._source' > filebeat.ndjson) \
	>(jq -c 'select(._index \| startswith("metricbeat")) \| ._source' > metricbeat.ndjson) \
	>(jq -c 'select(._index \| startswith("journalbeat")) \| ._source' > journalbeat.ndjson) \
	>(jq -c 'select(._index \| startswith("winlogbeat")) \| ._source' > winlogbeat.ndjson) \
	>(jq -c 'select(._index \| startswith("endgame")) \| ._source' > endgame.ndjson) \
	>/dev/null