Skip to content

Instantly share code, notes, and snippets.

@Pamblam
Created June 4, 2021 16:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Pamblam/f23d46f024cb30362e44319f75f26ba5 to your computer and use it in GitHub Desktop.
Save Pamblam/f23d46f024cb30362e44319f75f26ba5 to your computer and use it in GitHub Desktop.
Check if an arbitrary SQL query is safe to run on the database.

Note: This is posted as an answer to a StackOverflow question. Keeping it here for my own convenience.

I wanted a safer, more robust solution that didn't involve fully tokenizing query. Based on my experience writing SQL parsers (here, and here), I can say this solution is pretty bulletproof without having to use a full-featured query parser.

This is the best answer because:

  • It does not require a new SQL user with limited permissions
  • It does not require a new DB connection with limited permissions
  • It does not break if the query contains a string or a comment with the word "delete"
  • It allows complex queries with nested queries
  • It allows the user to input arbitrary SQL
  • It's safe

All the other answers as of the time of writing this have one or more of these limitations.

Here's how it works

  • Remove all inline and multi-line comments
  • Remove all single and double quoted strings
  • Remove all symbols and numbers
  • Create a unique array of the remaining (key)words
  • If any of the remaining keywords are INSERT, UPDATE, DELETE, RENAME, DROP, CREATE, TRUNCATE, ALTER, COMMIT, ROLLBACK, MERGE, CALL, EXPLAIN, LOCK, GRANT, REVOKE, SAVEPOINT, TRANSACTION, or SET then it is a "dangerous" query and should not be run.

Note:

This function does not validate the SQL, it only ensures that the SQL will not alter your database in any way. You will still need to run the query in a try/catch to make sure the query is valid.

/**
 * Determine if an SQL statement could potentially alter the database in any way.
 * @param string $sql - An SQL statement
 * @return boolean - True if query could alter the database, else false
 */
function isDangerousQuery($sql){
	$sql = trim($sql);

	// Irrelevant tokens to be parsed out of the query
	// A comment or string may contain a word like "drop"
	// so comments and strings need to be removed from the query
	$token_types = [
		[	'name' => 'Single-Line Comment',
			'start' => "--",
			'end' => "\n"	],
		[	'name' => 'Multi-Line Comment',
			'start' => "/*",
			'end' => "*/"	],
		[	'name' => 'Double-quoted String',
			'start' => "\"",
			'end' => "\""	],
		[	'name' => 'Single-quoted String',
			'start' => "'",
			'end' => "'"	]
	];
	
	// This array will contain every character that is not part
	// of one of the above described irrelevant tokens
	$keywords_buffer = [];
	
	// If we are currently parsing one of the above token types
	// it's index is held here, else this will be false
	$current_token_type_index = false;
	
	// Loop through each character and reconstruct the query without the 
	// irrelevant token types. We need to loop rather than use a regex
	// because there could be quotes nested in comments and things like that
	// that would "trick" our regex
	$length = strlen($sql);
	for ($index = 0; $index < $length; $index++) {
		$chunk = substr($sql, $index);

		// If the current char is an escape char, skip the next char
		if($sql[$index] === '\\'){
			$index++;
			continue;
		}

		// Looking for all starting tokens
		if(false === $current_token_type_index){

			foreach($token_types as $token_type_index => $token_type){
				if(0 === strpos($chunk, $token_type['start'])){
					$current_token_type_index = $token_type_index;
				}
			}

			if(false === $current_token_type_index){
				$keywords_buffer[] = $sql[$index];
			}

		// Looking for ending token
		}else if(0 === strpos($chunk, $token_types[$current_token_type_index]['end'])){

			$index += strlen($token_types[$current_token_type_index]['end']);
			if(strpos($token_types[$current_token_type_index]['end'], "\n") !== false) $keywords_buffer[] = "\n";
			$current_token_type_index = false;

		}

	}
	
	// Reconstruct the sql without the irrelevant tokens
	$sql_cleaned = implode('', $keywords_buffer);
	
	// Remove all symbols from the sql leaving only keywords and numbers
	$sql_keywords_only = preg_replace("/[^a-zA-Z_0-9\s]/", ' ', $sql_cleaned);
	
	// Create an array of unique keywords in upper-case
	$sql_keywords = array_unique(preg_split("/\s+/", strtoupper($sql_keywords_only)));
	
	// Filter out numbers and empty strings to get actual keywords
	$sql_keywords_filtered = [];
	foreach($sql_keywords as $keyword){
		if(!empty($keyword) && !is_numeric($keyword)){
			$sql_keywords_filtered[] = $keyword;
		}
	}
	
	// list of forbidden/dangerous keywords
	$dangerous_keywords = [
		'INSERT',
        'UPDATE',
        'DELETE',
        'RENAME',
        'DROP',
        'CREATE',
        'TRUNCATE',
        'ALTER',
        'COMMIT',
        'ROLLBACK',
        'MERGE',
        'CALL',
        'EXPLAIN',
        'LOCK',
        'GRANT',
        'REVOKE',
        'SAVEPOINT',
        'TRANSACTION',
        'SET'
	];
	
	// Contains an array of dangerous keywords found
	// If this array is empty, query is safe
	$found_dangerous_keywords = array_intersect($dangerous_keywords, $sql_keywords_filtered);
	
	return count($found_dangerous_keywords) > 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment