valarpirai/Database race condition.md

## Database race condition.md

      
    Raw
  

              Database race condition.md
            
          
    How to Solve Race Conditions in a Booking System

Database isolation refers to the level of isolation between concurrent transactions in a database. Isolation levels control the visibility and accessibility of data to concurrent transactions and can affect the occurrence of race conditions in a database. If your isolation level is not “serializable” - there is a possibility of race conditions.
The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution for all committed transactions; as if transactions had been executed one after another, serially, rather than concurrently
Race conditions

Example of Race condition, Hotel room booking and movie ticket booking etc
CREATE TABLE Room (
  id INT PRIMARY KEY AUTO_INCREMENT,
  room_number INT,
  available BOOLEAN
);

CREATE TABLE Booking (
  id INT PRIMARY KEY AUTO_INCREMENT,
  room_id INT,
  start_date DATE,
  end_date DATE
);

insert into room(id, room_number, available) values(123, 123, true);

Two users, Alice and Bob, try to book the same room for overlapping dates at the same time.
Alice:
UPDATE Room
SET available = FALSE
WHERE id = 123;

INSERT INTO Booking (room_id, start_date, end_date)
VALUES (123, '2022-01-01', '2022-01-07');

Bob:
UPDATE Room
SET available = FALSE
WHERE id = 123;

INSERT INTO Booking (room_id, start_date, end_date)
VALUES (123, '2022-01-03', '2022-01-10');


The final result (the room being booked for both Alice and Bob) is different from what would be expected if the statements were executed sequentially (either Alice's booking or Bob's booking being rejected)

Pessimistic concurrency control

Pessimistic concurrency control is a technique used to prevent race conditions in a database by locking the data that is being accessed or updated. This ensures that only one user can access the data at a time, and other users have to wait until the lock is released before they can access it.
In SQL, pessimistic concurrency control can be implemented using the "SELECT ... FOR UPDATE" statement. This statement allows a user to lock the rows of a table that are being accessed and prevents other users from updating or locking those rows until the lock is released.
To implement pessimistic concurrency control for the "Booking" table, a user can execute the following SQL statement:
SELECT * FROM Room WHERE id = 123 FOR UPDATE;
This statement will lock the row with the ID 123 in the "Book" table, and prevent other users from accessing or updating that row until the lock is released.
To release the lock, the user can commit or roll back the transaction:
COMMIT;  -- releases the lock
ROLLBACK;  -- releases the lock and discards any changes made to the data

Optimistic concurrency control

Optimistic concurrency control, on the other hand, allows multiple users to access and update the data concurrently, but checks for conflicts before committing the changes. If a conflict is detected, the user is notified and the changes are not applied.
One way to implement optimistic concurrency control in a booking system is to use a "version" column in the "Room" table. This column can be used to store a "version number" for each booking, which is incremented each time the booking is updated.
ALTER TABLE Room
ADD version INT DEFAULT 1;

Then we need to update SQL statements for booking a room. Now Alice's statement will look like this:
UPDATE Room
SET available = FALSE, version = version + 1
WHERE room_number = 123 AND version = 1;

INSERT INTO Booking (room_id, start_date, end_date)
VALUES (123, '2022-01-01', '2022-01-07');

And Bob’s will look like this:
UPDATE Room
SET available = FALSE, version = version + 1
WHERE id = 123 AND version = 1;

INSERT INTO Booking (room_id, start_date, end_date)
VALUES (123, '2022-01-03', '2022-01-10');

If both of these statements are executed concurrently, the first UPDATE statement to be executed will increment the "version" of the room with ID 123 to 2, and the second UPDATE statement will fail, as the "version" in the WHERE clause is 1 (so zero rows will be updated with the second transaction).
This will prevent the race condition from occurring, as only one booking will be inserted into the "Booking" table, and the availability of the room will not be incorrectly updated.
Bonus


https://hackernoon.com/how-to-solve-race-conditions-in-a-booking-system
https://ketanbhatt.com/db-concurrency-defects/

Is MySQL's auto_increment really monotonic?

Why you shouldn't rely on auto_increment feature in some cases.
If a race condition between two MySQL transactions appears, the row with ID = N may appear in the database BEFORE another row with ID < N

We need three terminals here, let’s call them A, B and C.
Create Table in T-A

create table foo (
  id int not null auto_increment,
  value char(30) not null,
  created_at timestamp default current_timestamp,
  primary key (id)
  );

Query OK, 0 rows affected (0,08 sec)

select * from foo;

Empty set (0,00 sec)


T-B
Start transaction, insert row. but don't commit yet

start transaction;
insert into foo (value) values ('alfa');


T-A

select * from foo;

Empty set (0,00 sec)


T-C

insert into foo (value) values ('bravo');

Query OK, 1 row affected (0,01 sec)

select * from foo;
+----+-------+---------------------+
| id | value | created_at          |
+----+-------+---------------------+
|  2 | bravo | 2018-04-09 13:13:51 |
+----+-------+---------------------+
1 row in set (0,00 sec)

It’s there! As you can notice, it got assigned the ID = 2, because ID = 1 had been previously allocated for the alfa row. But hey, where is it?!
Uncommited! Hanging in the air!

T-B

commit;

Query OK, 0 rows affected (0,02 sec)


T-A

select * from foo;

+----+-------+---------------------+
| id | value | created_at          |
+----+-------+---------------------+
|  1 | alfa  | 2018-04-09 13:13:21 |
|  2 | bravo | 2018-04-09 13:13:51 |
+----+-------+---------------------+
2 rows in set (0,01 sec)

The Issue

This is a edge case.
Consider that we query the db during this short between-commit phase:

we retrieve a single row, ID = 2,
we process this batch of one row,
we store the ID = 2 state for the next batch, telling it you should add where ID > 2 to your query.
We’ve just ommited the alfa row!

Solution ideas


Mark the already pushed/updated rows with a flag, query with where completed = false (add additional column to the table)
Exploit the updated_at column to improve the situation (add additional column to the table)

Reference - https://www.boldare.com/tech-blog/is-mysql-autoincrement-really-monotonic/