you have a list of s3 objects in a bucket that need to be imported manually into BigQuery/BQ for backfill etc, named missing.txt
that contains items like
exports/project-activity/firehose/2020/09/28/12/project-activity-batch-s3-1-2020-09-28-12-58-57-56b1c279.gz
exports/project-activity/firehose/2020/09/28/13/project-activity-batch-s3-1-2020-09-28-13-14-00-9e633629.gz
exports/project-activity/firehose/2020/09/28/13/project-activity-batch-s3-1-2020-09-28-13-29-01-f3174b15.gz
take a template file like this, say, named event-json.tpl
,
{
"Records": [
{
"s3": {
"bucket": {
"name":"mybucket"
},
"object": {
"key":"__KEY__"
}
}
}
]
}
then you can run the following to generate event json files for each line in missing.txt
mkdir -p events; y=0; for x in $(cat missing.txt); do y=$((y+1)); echo $y; cat event-json.tpl | sed "s|__KEY__|$x|" > "events/$(printf "%05d.json" $y)"; done
which will yield files like events/00010.json
which look like:
{
"Records": [
{
"s3": {
"bucket": {
"name":"mybucket"
},
"object": {
"key":"exports/project-activity/firehose/2020/09/28/15/project-activity-batch-s3-1-2020-09-28-15-14-19-5ffed6d9-d21d.gz"
}
}
}
]
}
after which you can then run your s3 triggered lambda function via:
cat events/00001.json | docker run --rm -i --env-file=.env -v "$PWD":/var/task lambci/lambda:nodejs10.x index.handler
or process all the events via:
ls -1 events/ | xargs -I {} bash -c 'echo {}; cat events/{} | docker run --rm -i --env-file=.env -v "$PWD":/var/task lambci/lambda:nodejs10.x index.handler; sleep 2'