S3 support SNS notifications for new objects. These SNS notifications can fan out to SQS queues.
Logstash can read from inputs including:
but neither of these are sufficient:
- S3 input has very low performance in attempting to read for a bucket with a high number of writes. The S3 reader lands up spending most of its time listing the bucket contents vs. reading objects.
- SQS input works well. It works well with a single logstash process or a cluster of multiple processes. However, the SQS input doesn't understand the format of an SNS notification object for S3 changes.
To successfully slurp from S3 into ES via logstash we should create S3Notification
input type:
- Modify SQS filter to understand SNS notifications and parse S3 object paths from
Records[] | (.bucket.arn + .s3.key)
. - Download individual objects from S3 and then treat each line in the object as an event.
Pointers:
I'm also looking at implementing something similar.