This is a basic example with a dirty bash script on how to create a copy of a production database and update the column values of the new (test / staged) database, which might be sensitive or which is not suitable for test environments.
This test is intentional for small datasets.
The flow:
- mysqldump of the production database to a sql file
- create the staging database
- import the production data from the sql file into the staging database
- run the bash script which loops through each record and updates the selected column value (creditcard_num)
- dumps the sanitized database to a sql file for further usage
MySQL Server
If you are not running mysql locally, you can use this docker-compose.yml or run:
Then:
mkdir /tmp/mysql
cd /tmp/mysql
wget https://raw.githubusercontent.com/ruanbekker/awesome-docker-compose/main/mysql-server/docker-compose.yml
docker-compose up -d
Prepare or Mimic the Prod Database
Exec into the container:
docker exec -it mysql-db bash
Connect to the db (either from within or outside the container):
mysql -h 127.0.0.1 -u root -prootpassword
Mimic the production database:
create database prod_db;
Create the production table:
create table prod_db.customers (
customer_id MEDIUMINT NOT NULL AUTO_INCREMENT,
username VARCHAR(50) NOT NULL,
creditcard_number VARCHAR(19) NOT NULL,
country VARCHAR(50) NOT NULL,
PRIMARY KEY (customer_id)
);
Insert production data:
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'ruanb', '1111-1234-5678-9100', 'south africa');
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'adrianw', '2222-1234-5678-9100', 'south africa');
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'jameso', '3333-1234-5678-9100', 'kenya');
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'sarahw', '4444-1234-5678-9100', 'mexico');
INSERT INTO prod_db.customers (customer_id, username, creditcard_number, country) VALUES (NULL, 'thomask', '5555-1234-5678-9100', 'australia');
View the data:
MariaDB [(none)]> select * from prod_db.customers;
+-------------+----------+---------------------+--------------+
| customer_id | username | creditcard_number | country |
+-------------+----------+---------------------+--------------+
| 1 | ruanb | 1111-1234-5678-9100 | south africa |
| 2 | adrianw | 2222-1234-5678-9100 | south africa |
| 3 | jameso | 3333-1234-5678-9100 | kenya |
| 4 | sarahw | 4444-1234-5678-9100 | mexico |
| 5 | thomask | 5555-1234-5678-9100 | australia |
+-------------+----------+---------------------+--------------+
5 rows in set (0.001 sec)
Anonamize Database Dump
We will dump the production database, import it into a staging database, then run a for loop to update each column value with a bash function to make it random.
In our bash script: sanitize_creditcard_numbers.sh
#!/usr/bin/env bash
set -e
DB_HOST=127.0.0.1
DB_PORT=3306
DB_USER=root
DB_PASS=rootpassword
MAIN_DB=prod_db
MASKED_DB=staging_db
TABLE_NAME=customers
# functions
function generate_4_char_number() {
echo $(seq 1000 9999 | sort -R | head -n 1)
}
function generate_fake_creditcard_num() {
one="0000"
two=$(generate_4_char_number)
three=$(generate_4_char_number)
four=$(generate_4_char_number)
echo "${one}-${two}-${three}-${four}"
}
echo -e "\n:: dumping maindb ${MAIN_DB} to ${MAIN_DB}.sql\n"
sleep 2
mysqldump -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} ${MAIN_DB} > ${MAIN_DB}.sql
echo -e ":: creating stagingdb ${MASKED_DB} if not already exists\n"
sleep 2
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} -e "CREATE DATABASE IF NOT EXISTS ${MASKED_DB};"
echo -e ":: importing dumped ${MAIN_DB}.sql into ${MASKED_DB}\n"
sleep 2
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} ${MASKED_DB} < ${MAIN_DB}.sql
echo -e ":: looping through the table ${TABLE_NAME} on ${MASKED_DB} and updating the column values of creditcard_number to dummy data\n"
sleep 3
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} -e "SELECT customer_id, creditcard_number FROM ${MASKED_DB}.${TABLE_NAME};" | while read customer_id creditcard_number
do
fake_creditcard_num=$(generate_fake_creditcard_num)
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} ${MASKED_DB} -e "UPDATE ${MASKED_DB}.${TABLE_NAME} SET creditcard_number=\"$fake_creditcard_num\" WHERE customer_id=$customer_id ;"
done
echo -e ":: reading table data from ${MASKED_DB}.${TABLE_NAME}\n"
mysql -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} -e "SELECT * from ${MASKED_DB}.${TABLE_NAME}"
sleep 1
echo -e "\n:: dumping maindb ${MASKED_DB} to ${MASKED_DB}.sql\n"
sleep 2
mysqldump -h ${DB_HOST} -P ${DB_PORT} -u ${DB_USER} -p${DB_PASS} ${MASKED_DB} > ${MASKED_DB}.sql
Run the script:
bash sanitize_creditcard_numbers.sh
:: dumping maindb prod_db to prod_db.sql
:: creating stagingdb staging_db if not already exists
:: importing dumped prod_db.sql into staging_db
:: looping through the table customers on staging_db and updating the column values of creditcard_number to dummy data
:: reading table data from staging_db.customers
+-------------+----------+---------------------+--------------+
| customer_id | username | creditcard_number | country |
+-------------+----------+---------------------+--------------+
| 1 | ruanb | 0000-4269-4475-5452 | south africa |
| 2 | adrianw | 0000-6070-7594-6583 | south africa |
| 3 | jameso | 0000-8418-9873-9130 | kenya |
| 4 | sarahw | 0000-7374-1281-6272 | mexico |
| 5 | thomask | 0000-5994-1160-1424 | australia |
+-------------+----------+---------------------+--------------+
:: dumping maindb staging_db to staging_db.sql