PeeHaa/SQL and PHP tutorial.md

## SQL and PHP tutorial.md

      
    Raw
  

              SQL and PHP tutorial.md
            
          
    -- WORK IN PROGRESS --
SQL and PHP tutorial

Introduction

Why this tutorial?
I'm writing this tutorial because there are so many bad tutorials and examples on the web about using SQL with PHP. The sad thing about this fact is that people who use these tutorials don't know any better which results in many vulnerable applications. I cannot really blame (all) the people with vulnerabilities in their applications when there are so many bad resources on the web. However I do think it is important to let all the bad code die a horrid death and show how to properly use SQL in PHP. Not in the last place because all that bad code is introduced in production code at some point and considering the fact that databases are used to store information (and potentially personal information) it would suck for both you to tell your users that their personal information is 'stolen' as well as for your users when they hear you 'gave' away their info. It's not only that the tutorials introduce vulnerabilities into applications, but also most of them simply lack best practice. Some tutorials have added a disclaimer that it is meant for beginners or that it shouldn't be used in production code, but we all know how it goes: it will end up in production and you know it. And why not change code which is just wrong instead of adding some lame disclaimer? I could go on and on about this topic, however to prevent me going into (further) rant mode lets just start with the tutorial shall we?
How is this tutorial setup?
This tutorial is for everyone who wants to learn how to properly use SQL in PHP. This is not meant as a gimmetehcodez type of tutorial (don't where there is code in it). It focuses on learning something rather than: "Hey, look what I just copy/pasted".
When you have finished the tutorial you will:

Know how to properly use SQL in PHP
Have learned some best (coding) practices
Know what is wrong with most tutorials found regarding this on the web and why

During this tutorial I will first tell you how we used to do SQL in PHP and how we should do it. We will dive in the theory of using SQL in PHP. And at the end we will use everything we learned to create a sample application.
mysql_* vs mysqli_* vs insertdbfunction here vs PDO

There are several ways of accessing a database in PHP. The most infamous one is mysql_*. Actually this is one of the reasons that this tutorial is needed. The mysql_* functions shouldn't be used any more. Not only do they lack new functions (introduced in 'newer' versions of MySQL), but they will also be deprecated in future versions of PHP. Another drawback of using the mysql_* functions is that we cannot do any OOP with them. By now there is a new and improved version of the mysql_* functions called mysqli_*. Where the i stands for improved. The two biggest improvements are the support for new MySQL functions and OOP support (although mysqli_* still supports procedural coding). So are we going to use the 'new' mysqli_* functions in this tutorial? No. Besides the fact that I think that PostgreSQL is superior (I won't go into this I promise) so I wouldn't use MySQL anyway. We aren't going to use any database engine specific functions. For this tutorial we are going to use PDO. PDO is a data access abstraction layer. This is just a fancy way of saying that when using PDO we can simply switch the database to any other supported database we like (almost) without the need to change any code / query. Another advantage of using PDO is that we can use OOP. And (this is my personal opinion) the PDO API is easier to use.
Requirements

PDO is available from PHP5. Before PHP 5.1 PDO is available as a PECL extension.

On Linux PDO and the PDO_SQLITE driver is enabled by default as of PHP 5.1.

On Windows PDO is enabled by default as of PHP 5.3.
Besides PDO we also need to add a database specific driver (e.g. PDO_MYSQL or PDO_PGSQL).
To verify whether the required extensions are loaded we can simply do: phpinfo();
Making a connection

Before we are able to do anything with our database we first have to make a connection. To make a connection using PDO we simply have to pass the connection parameters, e.g.:
$options = array(PDO::ATTR_EMULATE_PREPARES => false, // disable emulation of prepared statements
                 PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION); // set the way we are going to handle errors
$dbConnection = new PDO('mysql:dbname=test;host=127.0.0.1', 'user', 'pass', $options);

Now $dbConnection contains a connection to our database. So what parameters have we just used there? Let's have a look at the constructor of the PDO class:
PDO::__construct() ( string $dsn [, string $username [, string $password [, array $driver_options ]]] )

The only required parameter in the constructor is the $dsn string. The dsn (which stands for Data Source Name) contains the information required to connect to the database. It tells PDO what database driver and what database we want to use. Besides these two things there are also some driver specific directives we can add in the dsn. So when we use the MySQL connection (mysql:dbname=test;host=127.0.0.1) we just made as an example we see that we are going to connect to the MySQL database test which is running on localhost. For MySQL the username and password parameters are also required to make a connection. And the last thing we have passed into  the constructor of the PDO class is to disable emulating of prepared statements (more information about this topic in the chapter "Prepared statements").
So now that we can make a connection to a database we run into something where lots of tutorials / people are doing wrong. We only want to open one connection per request. The pattern often used for this kind of behavior is the Singleton pattern. Although the Singleton (anti-)pattern indeed does the job of ensuring we only make one connection to the database it is basically just a fancy global. And we all know that globals are bad right? Right? Right! So why would we make use of a global to make sure we only have one connection to the database? Well... you shouldn't!
To ensure we are only having one connection to the database we are going to adept Dependency injection. This means the we are going to pass (inject) our connection into the function or class that needs it rather than letting the function / class get it self.
I'll show the difference between the two patterns by two examples to make it clear:
Singleton pattern - The "you're doing it wrong way"
$user = new User();

class Db
{
    protected static $instance = NULL;

    public static function getInstance() {
        if (!self::$instance) {
            self::$instance = new PDO('mysql:dbname=test;host=127.0.0.1', 'username', 'password');
        }

        return self::$instance;
    }
}

class User
{
    protected $connection = null;

    function __construct()
    {
        $this->connection = Db::getInstance();
    }
}

As you can see the Singleton pattern heavily utilizes static which just made it a global, because we can now call Db::getInstance(); from everywhere in our application. Another problem of the above example is the fact that we have tightly coupled the Db class to our User class. What would happen if at some point we decide to use a different db class to talk to our database (e.g. if we want to test our class and add a mock database class instead of the real database class)? We cannot change it anymore (unless we go through all the code where it is used). This is the reason we are NOT going to make a connection this way.
Dependency injection
$connection = new PDO('mysql:dbname=test;host=127.0.0.1', 'username', 'password');
$user = new User($connection);

class User
{
    protected $connection = null;

    function __construct($connection)
    {
        $this->connection = $connection;
    }
}

If some other class also need to do something with the database we can simply also inject it in that class in the same way with the same $connection. As you can see this ensures we only have one connection without the need of Singletons. If at some point we want to change the connection (e.g. to a mocked one for testing purposes) we only have to change the initial connection instance.
Closing connections

Most (web)applications make heavy use of a database connection. Because of this in most applications you don't need to close the connection when you are finished accessing / mutating it. Since we make use of dependency injection and inject the same connection in our classes and functions we only have one open connection per request per user. So in most cases it is not needed to close the database connection while we are still processing the request. Once the request is finished the connection will automatically be closed.
If for some reason you do want to close the connection premature we can simply set the $connection instance to NULL like:
// new connection with our text database
$connection = new PDO('mysql:dbname=test;host=127.0.0.1', 'username', 'password');

// do stuff with the connection. E.g. create user
$user = new User($connection);

// close connection
$connection = NULL;

However as stated in most application you would simply keep the connection open for the entire request.
Prepared statements

Now that we can make a connection to the database the next step would be to query the database the retrieve or store data. If you have used / still are using the old mysql_* functions for this you now that we can retrieve data by running a query, e.g.:
$recordset = mysql_query("SELECT fullname FROM users WHERE username = 'myusername'");

The $recordset variable would now contain 1 row of data from the database with the full name of the user in it. However when using PDO (or mysqli_* for that matter) we can make use of prepared statements. Prepared statements have two huge benefits over out 'normal' statements.

Better performance (because the statement is parsed by the SQL server)
Better security (SQL injection prevention)

Performance
As we will see in the examples below: when using prepared statements we are going to split our statements into two separate parts. The first part is where we are preparing the statement. The second part is where we are going to add values into our statement. Both parts are sent to the server separately and we are able to 'reuse' the first step (preparing the statement). So in specific cases we only have to send the new values for the statement to the server.
Security
Because we are using prepared statements our application will not be vulnerable to so called SQL Injection attacks. It's sad, but this type of attack can still be found in many application. And the many tutorials about using SQL in PHP don't really help here at all. Because many of them have the same flaw(s). An SQL Injection attack simple means that an attacker can inject (malicious) SQL into your queries. To better explain this  I will elaborate using an example:
Consider you have a webshop where people can navigate to a product by the following URL:
http://example.com/product?id=1

When people navigate to that URL you would retrieve all the information about the product by the unique id:
$recordset = mysql_query("SELECT name, description, price FROM products WHERE id = '" . $_GET['id'] . "'");

No problem here. The query (SELECT name, description, price FROM products WHERE id = '1') gets executed and the $recordset variable would now contain the product which has the id 1. However since the id (1) is coming from the clientside it would be easy to add anything which isn't a valid id. So consider the following example with the same URL (only with added malicious SQL code) and the same background code:
http://example.com/product?id=1' OR 1=1--

Now what happens when we use the same background code to do our query is that instead of searching for a product with an id of 1 we are searching for a product with an id of 1, but at the same time we are injecting  OR 1=1--.
So instead of our previous harmless query we now have the following query:
SELECT name, description, price FROM products WHERE id = '1' OR 1=1--

Because we have added an OR clause it now retrieves the record with an id of 1 OR where 1 = 1. And we all know (I hope) that 1=1 is always true. So $recordset now contains all products found in the table and not only the product with an id of 1.
The above example is still not very harmful, however it is also possible to run multiple queries inside one statement. And now it really gets scary. This means we could also do something like:
SELECT name, description, price FROM products WHERE id = '1'; DROP TABLE products;--

or
SELECT name, description, price FROM products WHERE id = '1'; SELECT * FROM users WHERE name = 'admin';

The first example drops (deletes) the products table. The second example retrieves the user called admin. Doesn't look too good now does it. The general rule of thumb when developing something is never ever trust anything that is coming from the client side.
Before we had the option of using prepared statements we had the option to escape 'dangerous' characters like mysql_real_escape_string(). If we have a look at our first example again http://example.com/product?id=1 the PHP code would look something like the following:
$recordset = mysql_query("SELECT name, description, price FROM products WHERE id = '" . mysql_real_escape_string($_GET['id']) . "'");

The above would escape all 'dangerous' characters so we effectively prevent a user to add malicious SQL code. However there are two drawbacks of this:


Consider a query where we are going to add a lot of variables. Now we have to add mysql_real_escape_string to all those variables.
 $recordset = mysql_query("SELECT name, description, price FROM products WHERE id = '" . mysql_real_escape_string($_GET['id']) . "' AND description = '" . mysql_real_escape_string($_GET['name']) . "' AND price > '" . mysql_real_escape_string($_GET['min_price']) . "' AND PRICE < '" . mysql_real_escape_string($_GET['max_price']) . "'");

As you can see this tends to get really ugly and unreadable.


We always have to add the mysql_real_escape_string function when we are using data coming from the client side in our query. This can at some point be forgotten. It might not happen very often, but if you forget to do this in only one place your entire application will be vulnerable.


By now I hear you saying: Alright enough with the theory show me how to use prepared statements to do stuff. I'll get to that right after this: remember when we created a new connection to our database we added the following option: PDO::ATTR_EMULATE_PREPARES => false. This line is crucial to be 100% safe against SQL injection attacks, because the query as well as the values which are used in the query will be sent separately to the server a possible attacker never has any chance of inserting malicious SQL code in our query. So what does a prepared statement look like (e.g. by using our first select example again)?
$stmt = $db->prepare("SELECT name, description, price FROM products WHERE id = ?");
$stmt->execute(array($_GET['id']));
$recordset = $stmt->fetchAll(PDO::FETCH_ASSOC);

As you can see instead of adding our value directly into our query we are first preparing the query and are using a placeholder (?) where we want the value to be inserted. After that we are setting the value for the placeholder. And finally we are getting the result from our query. This is just a simple example of course. Let's have a look at another query:
$stmt = $db->prepare("SELECT name, description, price FROM products WHERE description LIKE ? AND price > ? AND price < ?");
$stmt->execute(array('%' . $_GET['something'] . '%'),
                     $_GET['min_price'],
                     $_GET['max_price'],
                     );
$recordset = $stmt->fetchAll(PDO::FETCH_ASSOC);

Now we have added three value into our query. Also note that when using prepared statements the placeholder always contains the complete value (in the case of LIKE this includes the % signs). So basically we have just prepared a statement with three placeholders and are replacing these by the three values in the array(). This could easily become unreadable (even more so when we are using more placeholders in the statement). Luckily PDO provides us with a way to name out placeholders:
$stmt = $db->prepare("SELECT name, description, price FROM products WHERE description LIKE :description AND price > :minprice AND price < :maxprice");
$stmt->execute(array(':description' => '%' . $_GET['something'] . '%'),
                     ':minprice' => $_GET['min_price'],
                     ':maxprice' => $_GET['max_price'],
                     );
$recordset = $stmt->fetchAll(PDO::FETCH_ASSOC);

The example above this the exact same thing as the previous example, only instead of using question marks for placeholders we now have named out placeholders for better readability.
Fetching data


Only show the most used ways of fetching data (incl. link to all ofc).
Iterating recordset

Transactions

Need to think about this is something I want in this tutorial
Error handling

OMGWTFBBQ! We can set up how we would like our errors served.
Creating a 'wrapper'

Now let's make out life even more easy
Unsupported stuff in PDO

PDO != native driver
Performance

Is not important ;-)
Bad examples

blablablabla some sucky examples:
http://chat.stackoverflow.com/transcript/message/3260619#3260619
http://www.androidhive.info/2012/01/android-login-and-registration-with-php-mysql-and-sqlite/
http://www.w3schools.com/sql/sql_syntax.asp
To do / To be include

http://chat.stackoverflow.com/transcript/message/3278865#3278865
http://chat.stackoverflow.com/transcript/11?m=3277912#3277912
http://stackoverflow.com/questions/5175357/extending-pdo-class
http://php.net/manual/en/book.pdo.php#98659
http://stackoverflow.com/questions/4361459/php-pdo-charset-set-names
http://ch2.php.net/manual/en/pdostatement.bindparam.php#101764
http://ch2.php.net/manual/en/pdostatement.bindparam.php#100494
self::$instance-> setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

http://stackoverflow.com/questions/10195783/using-variables-inside-mysql-prepared-statements/10195876
http://wiki.hashphp.org/PDO_Tutorial_for_MySQL_Developers