Skip to content

Instantly share code, notes, and snippets.

@numberwhun
Created November 28, 2011 05:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save numberwhun/1399212 to your computer and use it in GitHub Desktop.
Save numberwhun/1399212 to your computer and use it in GitHub Desktop.
Code used to audit a database and ensure all fields are defined as per the schema.
# I wrote this code at one of my jobs because the databases that we had for our customers had issues.
# It ended up that the csv files that were being produced did not have correct values in the
# corresponding fields. This generally happened when people added fields in the middle of the schema
# file, instead of at the end like they should have. This code was run on databases that, at the
# time, were as large as 1.5 terabytes. The function would spit out the differences between the schema and the actual defined table definitions and it was then my job to make the modifications on the fly to the live tables to correct the issues. Sometimes I simply modified the code that was used
# to populate the csv files as it seemed a bit easier, but most of the mods had to be to the database
# table definitions themselves. Hair raising for a 1.5 terabyte database, let me just say.
# I wrote this function, which was added to a larger file containing a bunch of handy functions, and
# when run, compared the database it was pointed as with the schema that was used to define it.
# Granted not all of the stuff that is defined is defined here, it was defined elsewhere, but this is
# the code I wrote. Its more as a reference for me.
sub do_auditdb {
### Variables Definitions
my $dbh;
my $sth;
my $q_str;
my $tbh;
my @tables;
my @dblist;
my $tblnm;
my $rowcount;
my $schemafile;
my @line;
my $lctype;
my @schema_lines;
my $schema_line_count;
my $c;
my $i;
my $error_count = 0;
$dbh = DBI->connect("dbi:mysql:host=$Db_Host:database=$Db_Name:port=$Db_Port", $Db_User , $Db_Password) or die;
print("Database: $Db_Name\n");
# Get list of tables in current DB
$q_str = "show tables;";
$tbh = $dbh->prepare("$q_str");
if($tbh->execute()) {
while(my $tblname = $tbh->fetchrow_array){
if($tblname =~ /pstk$/){
push(@tables, "$tblname");
}
}
}
TABLELOOP: foreach $tblnm (@tables) {
### Print the table name to output
print("\tTable: $tblnm\n");
$c = 0;
$q_str = "describe $tblnm;";
$sth = $dbh->prepare("$q_str");
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
$Cur_Schm[$c]{'Field'} = $ref->{'Field'};
$Cur_Schm[$c]{'Type'} = $ref->{'Type'};
$Cur_Schm[$c]{'Key'} = $ref->{'Key'};
$c++;
}
$sth->finish();
$rowcount = $c;
### Get schema file name and open it
$schemafile = $tblnm;
$schemafile =~ y/_/./;
$schemafile =~ s/pstk/def/;
### Start a loop that will allow us to cycle through the %Cur_Schm hash
### While going through the hash, a each field should be compared against the schema file
### entries to ensure that the table is created correctly.
COUNTERLOOP: for($i=0; $i < $rowcount; $i++){
### Test to see if the schema file exists. If not, throw an error to the screen.
### There are tables in some databases that do not have schema files associated with
### them. This will avoid the script exiting because of the missing file error.
if(-e "${Schema_Dir}/$schemafile"){
open(SCHEMAFILE, "<${Schema_Dir}/$schemafile") or die "Cannot open $schemafile: $!";
}
else{
print ("\t\tTable $tblnm does not have a schema file associated with it!\n");
$error_count++;
next TABLELOOP;
}
### First, we need to compare the schema of the table with the schema file and ensure
### that they have the same number of entries.
@schema_lines = <SCHEMAFILE>;
@line = split(/\s+/, $schema_lines[$i]);
$lctype = lc($line[1]);
if($lctype eq "integer"){
$lctype = "int(11)";
}
if($Cur_Schm[$i]{'Field'} =~ /$line[0]/){
if($Cur_Schm[$i]{'Type'} ne "$lctype"){
if($Cur_Schm[$i]{'Type'} !~ /$lctype/){
### If type is wrong, then this will print the error. The expected value in the error
### is what is in the schema file.
print("\t\tField: $Cur_Schm[$i]{'Field'} - Type Mismatch. (Found $Cur_Schm[$i]{'Type'}, Expected $lctype)\n");
$error_count++;
next COUNTERLOOP;
}
}
next COUNTERLOOP;
}
else {
### If the Fields don't match, this will print an error. The expected value in the error
### is what is in the schema file.
print("\t\tField: $Cur_Schm[$i]{'Field'} - Name Mismatch. (Expected: $line[0])\n");
$error_count++;
next COUNTERLOOP;
}
} # for
close SCHEMAFILE;
next TABLELOOP;
} # foreach $tblnm
if($error_count > 0) {
print("\n\nErrors found. Please correct the errors and re-run the audit.\n\n");
}
} # sub do_auditdb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment