Note: This is posted as an answer to a StackOverflow question. Keeping it here for my own convenience.
I wanted a safer, more robust solution that didn't involve fully tokenizing query. Based on my experience writing SQL parsers (here, and here), I can say this solution is pretty bulletproof without having to use a full-featured query parser.
- It does not require a new SQL user with limited permissions
- It does not require a new DB connection with limited permissions
- It does not break if the query contains a string or a comment with the word "delete"
- It allows complex queries with nested queries
- It allows the user to input arbitrary SQL
- It's safe
All the other answers as of the time of writing this have one or more of these limitations.
- Remove all inline and multi-line comments
- Remove all single and double quoted strings
- Remove all symbols and numbers
- Create a unique array of the remaining (key)words
- If any of the remaining keywords are
INSERT
,UPDATE
,DELETE
,RENAME
,DROP
,CREATE
,TRUNCATE
,ALTER
,COMMIT
,ROLLBACK
,MERGE
,CALL
,EXPLAIN
,LOCK
,GRANT
,REVOKE
,SAVEPOINT
,TRANSACTION
, orSET
then it is a "dangerous" query and should not be run.
This function does not validate the SQL, it only ensures that the SQL will not alter your database in any way. You will still need to run the query in a try/catch to make sure the query is valid.
/**
* Determine if an SQL statement could potentially alter the database in any way.
* @param string $sql - An SQL statement
* @return boolean - True if query could alter the database, else false
*/
function isDangerousQuery($sql){
$sql = trim($sql);
// Irrelevant tokens to be parsed out of the query
// A comment or string may contain a word like "drop"
// so comments and strings need to be removed from the query
$token_types = [
[ 'name' => 'Single-Line Comment',
'start' => "--",
'end' => "\n" ],
[ 'name' => 'Multi-Line Comment',
'start' => "/*",
'end' => "*/" ],
[ 'name' => 'Double-quoted String',
'start' => "\"",
'end' => "\"" ],
[ 'name' => 'Single-quoted String',
'start' => "'",
'end' => "'" ]
];
// This array will contain every character that is not part
// of one of the above described irrelevant tokens
$keywords_buffer = [];
// If we are currently parsing one of the above token types
// it's index is held here, else this will be false
$current_token_type_index = false;
// Loop through each character and reconstruct the query without the
// irrelevant token types. We need to loop rather than use a regex
// because there could be quotes nested in comments and things like that
// that would "trick" our regex
$length = strlen($sql);
for ($index = 0; $index < $length; $index++) {
$chunk = substr($sql, $index);
// If the current char is an escape char, skip the next char
if($sql[$index] === '\\'){
$index++;
continue;
}
// Looking for all starting tokens
if(false === $current_token_type_index){
foreach($token_types as $token_type_index => $token_type){
if(0 === strpos($chunk, $token_type['start'])){
$current_token_type_index = $token_type_index;
}
}
if(false === $current_token_type_index){
$keywords_buffer[] = $sql[$index];
}
// Looking for ending token
}else if(0 === strpos($chunk, $token_types[$current_token_type_index]['end'])){
$index += strlen($token_types[$current_token_type_index]['end']);
if(strpos($token_types[$current_token_type_index]['end'], "\n") !== false) $keywords_buffer[] = "\n";
$current_token_type_index = false;
}
}
// Reconstruct the sql without the irrelevant tokens
$sql_cleaned = implode('', $keywords_buffer);
// Remove all symbols from the sql leaving only keywords and numbers
$sql_keywords_only = preg_replace("/[^a-zA-Z_0-9\s]/", ' ', $sql_cleaned);
// Create an array of unique keywords in upper-case
$sql_keywords = array_unique(preg_split("/\s+/", strtoupper($sql_keywords_only)));
// Filter out numbers and empty strings to get actual keywords
$sql_keywords_filtered = [];
foreach($sql_keywords as $keyword){
if(!empty($keyword) && !is_numeric($keyword)){
$sql_keywords_filtered[] = $keyword;
}
}
// list of forbidden/dangerous keywords
$dangerous_keywords = [
'INSERT',
'UPDATE',
'DELETE',
'RENAME',
'DROP',
'CREATE',
'TRUNCATE',
'ALTER',
'COMMIT',
'ROLLBACK',
'MERGE',
'CALL',
'EXPLAIN',
'LOCK',
'GRANT',
'REVOKE',
'SAVEPOINT',
'TRANSACTION',
'SET'
];
// Contains an array of dangerous keywords found
// If this array is empty, query is safe
$found_dangerous_keywords = array_intersect($dangerous_keywords, $sql_keywords_filtered);
return count($found_dangerous_keywords) > 0;
}