Skip to content

Instantly share code, notes, and snippets.

@lavoiesl
Last active January 16, 2024 10:35
Show Gist options
  • Save lavoiesl/9a08e399fc9832d12794 to your computer and use it in GitHub Desktop.
Save lavoiesl/9a08e399fc9832d12794 to your computer and use it in GitHub Desktop.
Add newlines before parenthesis for a SQL mysqldump
// gcc -O2 -Wall -pedantic process-mysqldump.c -o process-mysqldump
// Usage: cat dump.sql | process-mysqldump
// Or : process-mysqldump dump.sql
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#define BUFFER 100000
bool is_escaped(char* string, int offset) {
if (offset == 0) {
return false;
} else if (string[offset - 1] == '\\') {
return !is_escaped(string, offset - 1);
} else {
return false;
}
}
bool is_commented(char* string) {
char buffer[4];
sprintf(buffer, "%.3s", string);
return strcmp(buffer, "-- ") == 0;
}
int main(int argc, char *argv[])
{
FILE* file = argc > 1 ? fopen(argv[1], "r") : stdin;
char buffer[BUFFER];
char* line;
int pos;
int parenthesis = 0;
bool quote = false;
bool escape = false;
bool comment = false;
while (fgets(buffer, BUFFER, file) != NULL) {
line = buffer;
// skip commented
if (comment || is_commented(line)) {
comment = line[strlen(line) - 1] != '\n';
fputs(line, stdout);
} else {
pos = 0;
nullchar:
while (line[pos] != '\0') {
// if we are still in escape state, we need to check first char.
if (!escape) {
// find any character in ()'
pos = strcspn(line, "()'\\");
}
if (pos > 0) {
// print before match
printf("%.*s", pos, line);
}
switch (line[pos]) {
case '(':
if (!quote) {
if (parenthesis == 0) {
putchar('\n');
}
parenthesis++;
}
if (escape) {
escape = false;
}
break;
case ')':
if (!quote) {
if (parenthesis > 0) {
parenthesis--;
} else {
// whoops
puts("\n");
fputs(line, stdout);
fputs("Found closing parenthesis without opening one.\n", stderr);
exit(1);
}
}
if (escape) {
escape = false;
}
break;
case '\\':
escape = !escape;
break;
case '\'':
if (escape) {
escape = false;
} else {
quote = !quote;
}
break;
case '\0':
goto nullchar;
default:
if (escape) {
escape = false;
}
break;
}
// print char then skip it (to make sure we don’t double match)
putchar(line[pos]);
line = line + pos + 1;
pos = 0;
}
}
}
return 0;
}
<?php
// Usage: cat dump.sql | php process-mysqldump.php
$input = fopen('php://stdin', 'r');
while(!feof($input)) {
$line = fgets($input);
if (substr($line, 0, 6) == 'INSERT') {
process_line($line);
} else {
echo $line;
}
}
function process_line($line) {
$length = strlen($line);
$pos = strpos($line, ' VALUES ') + 8;
echo substr($line, 0, $pos);
$parenthesis = false;
$quote = false;
$escape = false;
for ($i = $pos; $i < $length; $i++) {
switch($line[$i]) {
case '(':
if (!$quote) {
if ($parenthesis) {
throw new Exception('double open parenthesis');
} else {
echo PHP_EOL;
$parenthesis = true;
}
}
$escape = false;
break;
case ')':
if (!$quote) {
if ($parenthesis) {
$parenthesis = false;
} else {
throw new Exception('closing parenthesis without open');
}
}
$escape = false;
break;
case '\\':
$escape = !$escape;
break;
case "'":
if ($escape) {
$escape = false;
} else {
$quote = !$quote;
}
break;
default:
$escape = false;
break;
}
echo $line[$i];
}
}
fclose($input);
@divinity76
Copy link

divinity76 commented Dec 28, 2017

did some further optimizations to the PHP version, now it's about 8.4 times faster than the original PHP version,
and about 4.8 times slower than the C version.

1.323755
1.353491
1.351569
1.332137
1.386095
  • could speed it up a lot by doing
        $ret .= $line[$i];
        $to = strcspn($line, '()\\\'', $i+1);
        if ($to !== 0) {
            $ret .= substr($line, $i+1, $to);
            $i += ($to);
        }

@robehickman
Copy link

What license is this code?

@rodrigueLeopold
Copy link

rodrigueLeopold commented Jan 16, 2024

Hello.
Thanks for your great work, but there is a bug. If someone writes in a varchar "-- " at the 100000th character, your code will think it's a comment. In the C version.
You should replace the "if(comment ..." with "if (!quote && (comment || is_commented(line))) {"

It took me a long time to figure that bug out... Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment