Skip to content

Instantly share code, notes, and snippets.

@ruanbekker
Last active May 28, 2022 14:00
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ruanbekker/9131f1d00602e64cdba02bcfda884885 to your computer and use it in GitHub Desktop.
Save ruanbekker/9131f1d00602e64cdba02bcfda884885 to your computer and use it in GitHub Desktop.
Very basic bash script to sanitize a mysql table column from a database dump

This is a basic example with a dirty bash script on how to create a copy of a production database and update the column values of the new (test / staged) database, which might be sensitive or which is not suitable for test environments.

This test is intentional for small datasets.

The flow:

  • mysqldump of the production database to a sql file
  • create the staging database
  • import the production data from the sql file into the staging database
  • run the bash script which loops through each record and updates the selected column value (creditcard_num)
  • dumps the sanitized database to a sql file for further usage

MySQL Server

If you are not running mysql locally, you can use this docker-compose.yml or run:

Then:

mkdir /tmp/mysql
cd /tmp/mysql
wget https://raw.githubusercontent.com/ruanbekker/awesome-docker-compose/main/mysql-server/docker-compose.yml
docker-compose up -d

Prepare or Mimic the Prod Database

Exec into the container:

docker exec -it mysql-db bash

Connect to the db (either from within or outside the container):

mysql -h 127.0.0.1 -u root -prootpassword

Mimic the production database:

create database prod_db;

Create the production table:

create table prod_db.customers ( 
  customer_id MEDIUMINT NOT NULL AUTO_INCREMENT, 
  username VARCHAR(50) NOT NULL, 
  creditcard_number VARCHAR(19) NOT NULL, 
  country VARCHAR(50) NOT NULL, 
  PRIMARY KEY (customer_id)
);

Insert production data:

INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'ruanb', '1111-1234-5678-9100', 'south africa');
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'adrianw', '2222-1234-5678-9100', 'south africa');
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'jameso', '3333-1234-5678-9100', 'kenya');
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'sarahw', '4444-1234-5678-9100', 'mexico');
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'thomask', '5555-1234-5678-9100', 'australia');

View the data:

MariaDB [(none)]> select * from prod_db.customers;
+-------------+----------+---------------------+--------------+
| customer_id | username | creditcard_number   | country      |
+-------------+----------+---------------------+--------------+
|           1 | ruanb    | 1111-1234-5678-9100 | south africa |
|           2 | adrianw  | 2222-1234-5678-9100 | south africa |
|           3 | jameso   | 3333-1234-5678-9100 | kenya        |
|           4 | sarahw   | 4444-1234-5678-9100 | mexico       |
|           5 | thomask  | 5555-1234-5678-9100 | australia    |
+-------------+----------+---------------------+--------------+
5 rows in set (0.001 sec)

Anonamize Database Dump

We will dump the production database, import it into a staging database, then run a for loop to update each column value with a bash function to make it random.

In our bash script: sanitize_creditcard_numbers.sh

#!/usr/bin/env bash
set -e

DB_HOST=127.0.0.1
DB_PORT=3306
DB_USER=root
DB_PASS=rootpassword

MAIN_DB=prod_db
MASKED_DB=staging_db
TABLE_NAME=customers

# functions

function generate_4_char_number() {
    echo $(seq 1000 9999 | sort -R | head -n 1)
}

function generate_fake_creditcard_num() {
    one="0000"
    two=$(generate_4_char_number)
    three=$(generate_4_char_number)
    four=$(generate_4_char_number)
    echo "${one}-${two}-${three}-${four}"
}

echo -e "\n:: dumping maindb ${MAIN_DB} to ${MAIN_DB}.sql\n"
sleep 2
mysqldump -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} ${MAIN_DB} > ${MAIN_DB}.sql

echo -e ":: creating stagingdb ${MASKED_DB} if not already exists\n"
sleep 2
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} -e "CREATE DATABASE IF NOT EXISTS ${MASKED_DB};"

echo -e ":: importing dumped ${MAIN_DB}.sql into ${MASKED_DB}\n"
sleep 2
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} ${MASKED_DB} < ${MAIN_DB}.sql

echo -e ":: looping through the table ${TABLE_NAME} on ${MASKED_DB} and updating the column values of creditcard_number to dummy data\n"
sleep 3
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} -e "SELECT customer_id, creditcard_number FROM ${MASKED_DB}.${TABLE_NAME};" | while read customer_id creditcard_number
do
  fake_creditcard_num=$(generate_fake_creditcard_num)
  mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} ${MASKED_DB} -e "UPDATE ${MASKED_DB}.${TABLE_NAME} SET creditcard_number=\"$fake_creditcard_num\" WHERE customer_id=$customer_id ;"
done

echo -e ":: reading table data from ${MASKED_DB}.${TABLE_NAME}\n"
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} -e "SELECT * from ${MASKED_DB}.${TABLE_NAME}"
sleep 1

echo -e "\n:: dumping maindb ${MASKED_DB} to ${MASKED_DB}.sql\n"
sleep 2
mysqldump -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} ${MASKED_DB} > ${MASKED_DB}.sql

Run the script:

bash sanitize_creditcard_numbers.sh

:: dumping maindb prod_db to prod_db.sql

:: creating stagingdb staging_db if not already exists

:: importing dumped prod_db.sql into staging_db

:: looping through the table customers on staging_db and updating the column values of creditcard_number to dummy data

:: reading table data from staging_db.customers

+-------------+----------+---------------------+--------------+
| customer_id | username | creditcard_number   | country      |
+-------------+----------+---------------------+--------------+
|           1 | ruanb    | 0000-4269-4475-5452 | south africa |
|           2 | adrianw  | 0000-6070-7594-6583 | south africa |
|           3 | jameso   | 0000-8418-9873-9130 | kenya        |
|           4 | sarahw   | 0000-7374-1281-6272 | mexico       |
|           5 | thomask  | 0000-5994-1160-1424 | australia    |
+-------------+----------+---------------------+--------------+

:: dumping maindb staging_db to staging_db.sql
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment