Skip to content

Instantly share code, notes, and snippets.

@drumadrian
Last active March 15, 2024 17:22
Show Gist options
  • Save drumadrian/8a0c1f6bd95cb70f871cffbc38084c22 to your computer and use it in GitHub Desktop.
Save drumadrian/8a0c1f6bd95cb70f871cffbc38084c22 to your computer and use it in GitHub Desktop.
Sample logstash.conf file for S3 Input plugin
# References:
# https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html
# https://www.elastic.co/blog/logstash-lines-inproved-resilience-in-S3-input
# https://www.elastic.co/guide/en/logstash/6.3/installing-logstash.html
# https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html
# https://www.garron.me/en/bits/curl-delete-request.html
sudo yum update -y
sudo yum install -y java-1.8.0-openjdk
java -version
# Logstash requires Java 8
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
sudo vi /etc/yum.repos.d/logstash.repo
# Insert this below as the contents (omitting the leading "#" ):
# [logstash-6.x]
# name=Elastic repository for 6.x packages
# baseurl=https://artifacts.elastic.co/packages/6.x/yum
# gpgcheck=1
# gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
# enabled=1
# autorefresh=1
# type=rpm-md
# Now install Logstash
sudo yum install -y logstash
sudo systemctl start logstash
sudo systemctl stop logstash
#ensure that logstash starts on boot
sudo systemctl enable logstash
# The S3 Logstash plugins should be present by default....otherwise you will need to install them
sudo yum install -y mlocate
sudo updatedb
cd /usr/share/logstash
bin/logstash-plugin list
# Config files are stored here:
# /etc/logstash/conf.d/*.conf
cd /etc/logstash/conf.d/
sudo vi s3_input.conf
sudo systemctl start logstash
# Now look at the log file for logstash here: tail -f /var/log/logstash/logstash-plain.log
# Sample Logstash configuration for creating a simple
# AWS S3 -> Logstash -> Elasticsearch pipeline.
# References:
# https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html
# https://www.elastic.co/blog/logstash-lines-inproved-resilience-in-S3-input
# https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html
input {
s3 {
#"access_key_id" => "your_access_key_id"
#"secret_access_key" => "your_secret_access_key"
"region" => "us-west-2"
"bucket" => "testlogstashbucket1"
"prefix" => "Logs"
"interval" => "10"
"additional_settings" => {
"force_path_style" => true
"follow_redirects" => false
}
}
}
output {
elasticsearch {
hosts => ["http://vpc-test-3ozy7xpvkyg2tun5noua5v2cge.us-west-2.es.amazonaws.com:80"]
index => "logs-%{+YYYY.MM.dd}"
#user => "elastic"
#password => "changeme"
}
}
@drumadrian
Copy link
Author

drumadrian commented Jul 16, 2022 via email

@stoufa
Copy link

stoufa commented Aug 8, 2022

And that's exactly what We did!
Thanks again for the great advice.

@drumadrian
Copy link
Author

drumadrian commented Aug 8, 2022 via email

@ganeshk-nd
Copy link

Hi again Adrian (@drumadrian),

Thank you for the swift reply, We found a solution for the dynamic prefix value. Now, I'm curious to know what other tools you recommend we use besides Logstash.

Email sent. Thank you so much for your attention and participation.

what was the solution you figured out for dynamic prefix ?

@stoufa
Copy link

stoufa commented Apr 16, 2023

Hi @ganeshk-nd,
Before switching to AWS Kinesis Firehose, We used to generate the date in the required format and inject it in a template config file.
Here are some snippets from both the template file and the Python script.

input {
  s3 {
    "region" => "REGION_PLACEHOLDER"
    "bucket" => "BUCKET_PLACEHOLDER"
    "prefix" => "PREFIX_PLACEHOLDER"
    "interval" => "10"
    "additional_settings" => {
      "force_path_style" => true
      "follow_redirects" => false
    }
  }
}

...
# generating the folder having today's files;
# format: yyyy/mm/dd/ e.g. 2022/07/05/
today = date.today()
PREFIX = f'{today.year:4}/{today.month:02}/{today.day:02}/'

...

data = {
    'REGION_PLACEHOLDER': args.region,
    'BUCKET_PLACEHOLDER': BUCKET_NAME,
    'PREFIX_PLACEHOLDER': PREFIX,
    # if no environment is set, use dev by default
    'ENVIRONMENT_PLACEHOLDER': 'prod' if args.prod else 'dev'
}

with open(f'templates/pipeline_{context}.conf') as f:
    template = f.read()
    result = template
    for placeholder, value in data.items():
        result = result.replace(placeholder, value)

# saving results to a file
output_file_path = '/path/to/logstash-x.y.z/config/pipeline.conf'

with open(output_file_path, 'w') as f:
    f.write(result)

I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment