Skip to content

Instantly share code, notes, and snippets.

@enobufs
Last active February 4, 2022 23:25
Show Gist options
  • Save enobufs/6e2fe8fed331147fb3050aa53af67edd to your computer and use it in GitHub Desktop.
Save enobufs/6e2fe8fed331147fb3050aa53af67edd to your computer and use it in GitHub Desktop.
"""List S3 objects into a CSV file
"""
import argparse
import json
import pandas as pd
import subprocess
parser = argparse.ArgumentParser(description='List S3 objects into a CSV file')
parser.add_argument('bucket', type=str, help='bucket name')
parser.add_argument('--prefix', type=str, help='prefix')
parser.add_argument('--profile', type=str, help='aws-cli credential profile')
parser.add_argument('-o', '--output', type=str, default="out.csv", help='output CSV file name')
args = parser.parse_args()
cmd_args = [
"aws s3api list-objects-v2",
"--bucket", args.bucket,
"--query \"Contents[].{Key: Key, Size: Size, LastModified: LastModified}\"",
"--output json",
]
if args.prefix is not None:
cmd_args.append(f"--prefix {args.prefix}")
if args.profile is not None:
cmd_args.append(f"--profile {args.profile}")
cmd = ' '.join(cmd_args)
#print(cmd)
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
obj=json.load(process.stdout)
pd.DataFrame(obj).to_csv(args.output, index=None)
@enobufs
Copy link
Author

enobufs commented Jan 27, 2022

Usage

usage: lsobj2csv.py [-h] [--prefix PREFIX] [--profile PROFILE] [-o OUTPUT] bucket

Get object table as a CSV file

positional arguments:
  bucket                bucket name

optional arguments:
  -h, --help            show this help message and exit
  --prefix PREFIX       prefix
  --profile PROFILE     aws-cli credential profile
  -o OUTPUT, --output OUTPUT
                        output CSV file name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment