Skip to content

Instantly share code, notes, and snippets.

@camilonova
Created January 17, 2024 18:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save camilonova/447cb48f60c0aa9d0d21850b0c516e88 to your computer and use it in GitHub Desktop.
Save camilonova/447cb48f60c0aa9d0d21850b0c516e88 to your computer and use it in GitHub Desktop.
Get an anonimized database
# Run as: ./anonymize_database.sh my-database-production (database name)
set -e
# check if pganonymize is installed
if ! command -v pganonymize &> /dev/null
then
echo "pganonymize could not be found"
exit 1
fi
# check if the pganonymize.yml file exists
if [ ! -f pganonymize.yml ]; then
echo "pganonymize.yml does not exist"
exit 1
fi
# Get the database name from the first command line argument
DB_NAME=$1
if [ -z "$DB_NAME" ]; then
echo "No database name provided"
exit 1
fi
# dump the original database
pg_dump -Fc $DB_NAME > /tmp/$DB_NAME.dump
# create a new database for the anonymized data
createdb $DB_NAME-anonimized
# restore the original dump into the anonymized database
pg_restore -d $DB_NAME-anonimized /tmp/$DB_NAME.dump
# delete the dump file
rm /tmp/$DB_NAME.dump
# anonymize the anonymized database
pganonymize --schema=pganonymize.yml --dbname=$DB_NAME-anonimized
# dump the anonymized database
pg_dump -Fc $DB_NAME-anonimized > /tmp/$DB_NAME-anonimized.dump
# delete the anonymized database
dropdb $DB_NAME-anonimized
# /tmp/$DB_NAME-anonimized.dump must be deleted after the file was downloaded
tables:
- user_user:
primary_key: id
chunk_size: 5000
fields:
- first_name:
provider:
name: fake.first_name
- last_name:
provider:
name: set
value: "Bar"
- email:
provider:
name: md5
append: "@localhost.com"
truncate:
- django_session
import io, os
def plug_cleaning_into_stream(stream, filename):
try:
closer = getattr(stream, 'close')
def new_closer():
closer()
# removes the file when finishes
os.remove(filename)
setattr(stream, 'close', new_closer)
except:
raise
# path_to_file is /tmp/$DB_NAME-anonimized.dump
def send_file(request, path_to_file):
# Call the anonymize_database.sh inside python and use the DB_NAME from the settings as the argument
with io.open(path_to_file, 'rb') as ready_file:
plug_cleaning_into_stream(ready_file, path_to_file)
response = HttpResponse(ready_file.read(), content_type='application/force-download')
return response
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment