Christian Dahlqvist cdahlqvist

## bulk_rejections.md

      
              3 files
            
          
              3 forks
            
          
              0 comments
            
          
              8 stars
            
          
                cdahlqvist
                / bulk_rejections.md
            
            
              Last active
              April 5, 2023 06:27
            
              
                rally-bulk-rejections-track
              
          
    Bulk Rejections Test

This Rally track is used to test the relationship between bulk indexing rejections and the following parameters:

Number of concurrent clients indexing into Elasticsearch
Number of shards actively being indexed into
Number of data nodes in the cluster
Size of bulk requests

The track contains a number of challenges, each indexing into an index with a set number of shards using a increasing number of concurrent client connections and two different bulk sizes.

  
## ingest_pipeline_delay
# Ingest pipeline that records the timestamp the event was processed (`@received`)
# by the ingest pipeline and calculates the difference in milliseconds compared to
# the event timestamp (`@timestamp`).

POST _scripts/calculate_ingest_delay
{
  "script": {
    "lang": "painless",
    "source": "SimpleDateFormat sdf = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\"); ctx.ingest_delay = (sdf.parse(ctx['received']).getTime() - sdf.parse(ctx['@timestamp']).getTime()) / 1000.0"
  }

## Riak CRDT HTTP Examples
Start a Riak 2.0 cluster. This has been tested against Riak 2.0.0pre11.

First set up bucket types (note you can name these as you like for your domain (and add other properties)

    $ rel/riak/bin/riak-admin bucket-type create maps '{"props":{"datatype":"map"}}'
    maps created
    $ rel/riak/bin/riak-admin bucket-type create sets '{"props":{"datatype":"set"}}'
    sets created
    $ rel/riak/bin/riak-admin bucket-type create counters '{"props":{"datatype":"counter"}}'
    counters created

## epoch_prefixed_md5_identifier.conf
input {
  generator {
    lines => ['2011-04-19T03:44:01.103Z testlog1',
              '2011-04-19T03:44:02.035Z testlog2',
              '2011-04-19T03:44:03.654Z testlog3',
              '2011-04-19T03:44:03.654Z testlog3']
    count => 1
  }
}

## gdpr_access_controls.txt
# Tested with version 6.2.x of the Elastic Stack

# Add index templates

PUT _template/identity_store
{
  "index_patterns": ["identity_store"],
  "settings": {
    "number_of_shards": 1
  },

## restore_snapshot.sh
#/bin/bash

TIMESTAMP=$(date +%s)
ES_HOST=$1
REPOSITORY=$2
INDEX_NAME=$3
SNAPSHOT_ID=$4
NEW_INDEX_NAME=$5


## README.md

      
              8 files
            
          
              1 fork
            
          
              0 comments
            
          
              1 star
            
          
                cdahlqvist
                / README.md
            
            
              Created
              April 23, 2017 10:38
            
              
                Access log index size test
              
          
    Access log size test

This gist contains supporting files for evaluating Elasticsearch index sizes for web access logs.
Prerequisites


Machine with Linux or Mac OS X.
Local Elasticsearch 5.3.x instance accessible via 127.0.0.1:9200
The local Elasticsearch 5.3.x instance must have the geoip and useragent ingest plugins installed
Local installation of Filebeat 5.3.x with environment variable FILEBEAT_HOME pointing to the directory containing the filebeat binary.


## create_repositories.sh
#/bin/bash

echo $(date) "Create snapshot repositories"

curl -X PUT "localhost:9200/_snapshot/elasticlogs-nofm" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/data/snapshots/elasticlogs-nofm"
  }

## ccr_watch
{
  "trigger": {
    "schedule": {
      "interval": "10s"
    }
  },
  "input": {
    "http" : {
      "request" : {
        "host" : "127.0.0.1:9200",

## filter_logs.conf
input {
  stdin {}
}

filter {
  grok {
    match => { "message" => [ '%{IP:ip}" %{GREEDYDATA:a}',
                              '%{IP:ip1}, %{IP:ip}" %{GREEDYDATA:a}' ] }
  }
	# Ingest pipeline that records the timestamp the event was processed (`@received`)
	# by the ingest pipeline and calculates the difference in milliseconds compared to
	# the event timestamp (`@timestamp`).

	POST _scripts/calculate_ingest_delay
	{
	"script": {
	"lang": "painless",
	"source": "SimpleDateFormat sdf = new SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\"); ctx.ingest_delay = (sdf.parse(ctx['received']).getTime() - sdf.parse(ctx['@timestamp']).getTime()) / 1000.0"
	}
	Start a Riak 2.0 cluster. This has been tested against Riak 2.0.0pre11.

	First set up bucket types (note you can name these as you like for your domain (and add other properties)

	$ rel/riak/bin/riak-admin bucket-type create maps '{"props":{"datatype":"map"}}'
	maps created
	$ rel/riak/bin/riak-admin bucket-type create sets '{"props":{"datatype":"set"}}'
	sets created
	$ rel/riak/bin/riak-admin bucket-type create counters '{"props":{"datatype":"counter"}}'
	counters created
	input {
	generator {
	lines => ['2011-04-19T03:44:01.103Z testlog1',
	'2011-04-19T03:44:02.035Z testlog2',
	'2011-04-19T03:44:03.654Z testlog3',
	'2011-04-19T03:44:03.654Z testlog3']
	count => 1
	}
	}
	# Tested with version 6.2.x of the Elastic Stack

	# Add index templates

	PUT _template/identity_store
	{
	"index_patterns": ["identity_store"],
	"settings": {
	"number_of_shards": 1
	},
	#/bin/bash

	TIMESTAMP=$(date +%s)
	ES_HOST=$1
	REPOSITORY=$2
	INDEX_NAME=$3
	SNAPSHOT_ID=$4
	NEW_INDEX_NAME=$5
	#/bin/bash

	echo $(date) "Create snapshot repositories"

	curl -X PUT "localhost:9200/_snapshot/elasticlogs-nofm" -H 'Content-Type: application/json' -d'
	{
	"type": "fs",
	"settings": {
	"location": "/data/snapshots/elasticlogs-nofm"
	}
	{
	"trigger": {
	"schedule": {
	"interval": "10s"
	}
	},
	"input": {
	"http" : {
	"request" : {
	"host" : "127.0.0.1:9200",
	input {
	stdin {}
	}

	filter {
	grok {
	match => { "message" => [ '%{IP:ip}" %{GREEDYDATA:a}',
	'%{IP:ip1}, %{IP:ip}" %{GREEDYDATA:a}' ] }
	}