You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What are Concurrency conflicts? The occur when two or more processes or resource consumers compete for a shared resource, which could result in conflics.
Example
ACID Properties
Isolation implementation Paradigms
Optimistic
Pessimistics
Concurrency Conflicts
Imagine a shared tip jar in a restaurant with two servers, Ruth and Jim.
place money in
use it to break large notes
place IOU notes in, in order to use the jar money to give customers exact change
As long as Ruth and Jim use the tip jar in turn, there will be no concurrency issues. However, the impact of that is potential bottlenecks (e.g., what happens if you have hundreds of servers using the tip jar for the above options, it would take a lot of time to negotiate who is using the tip jar at a given moment). The moment two people use the jar for different purposes at the same time, concurrency conflicts could cause other people's attempts to fail.
A database needs to be able to handle concurrency issues to avoid such conflicts.
Database Transaction
Definition:
"A transaction symbolizes a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions" wikipedia
ACID Properties Guarantees
RDBMs engines are required to follow a set of desired properties known as ACID.
Atomicity: Individual and irreducible series of database operations
Concistency: Any transasction can change the state of the database from one valid state to another valid state
Isolation: determines when and how changes become visible to others
Durability: committed changes survive permanently
An example of atomicity is in an ATM withdrawal, which will require three steps: first checking the account balance, debiting the account, and issuing the cash. All of these must be bound as a single transaction where if any one fails, the transaction is cancelled and rolled back in its entirety.
A transaction has to be consistent (from a consistent state and to a consistent state). The database engine is responsible to avoid transitioning the database to an inconsistent state. Inconsistency may be related to Primary and Foreign Keys, and all other Constraints (such as unique, null, and check constraints).
Imagine two transactions. Isolation must ensure shared aspects of two transactions are not in conflict. Imagine two transactions
Transaction #1
Transaction #2
Data Read A
Data Modification B
Data Modification B
Data Modification C
Data Read C
Data Read B
Isolation defines aspects such as
If transaction #1 can run Data Modification B that has already been modified by transaction #2
Whether transaction #2 can see the modification on Data Read B that has already been modified by transaciton #1's Data Modification B step
Whether transaction #1 can see the modification on Data Read C that has already been modified by transaciton #2's Data Modification C step.
Duribility gaurantees that after the DBE acknowledges that the transaction has been committed, that the transaction will survive any system failures. Not as obvious as it may seem since data modification happen in volatile system RAM. If the RAM changes are not committed to a non-volitile medium like a disk.
Isolation Paradigms
To handle concurrency conflicts, DBEs use one of the following two paradigms:
Pessimistic: assume there will be conflicts, and take measures to prevent them in the first place
Optimistic: assumes that conflicts will be rare, so it is not preventative, instead it checks transaction validity before being allowed to commmit.
e.g., in a pesimitic paradigm, the DBE would place a lock on the tip jar for transition duration. In an optimistic paradigm, we take a snapshot of the jar, check the consistency of the transaction, and only proceed if it will work according to the snapshot. Then the change is applied against the Tip jar and rolled back if the state of the Tip jar differs from the state of the snapshot after the same change had occurred on it, undoing the work that was attempted.
Formal definitions and implementations as set by ANSI (American National Standard Intsitute) - ANSI ISO SQL-92 standard:
Lost Update*
Dirty Read
Non-repeatable Read
Phantom Rows
* not a part of the ANSI ISO SQL-92
1. Lost Update
Here is a simplified theoretical lost update scenario. This fundamental conflict between two writers cannot occur in PostgreSQL or other mainstream RDBMS. Lost update is therefore not a concern, but this is the idea:
Transaction A
Transaction B
↓
Begins
Begins
↓
Modifies Resource X
[...]
↓
[...]
Modifies Resource X
↓
Ends
Ends
Two transactions begin simultanously executing.
Transaction A starts, and then Transaction B starts.
Transaction A modifies resource X
Then Transaction B also modifes resource X, overwriting the changes on resource X from Transaction A.
Transaction A commits.
Transaction B commits.
Transaction A will have lost its update.
2. Dirty Read
A dirty read occurs when when a transaction reads a resource that has been modified by a concurrent transaction before it commits; but the modifying transaction then decides to rollback its changes.
Transaction A
Transaction B
↓
Begins
Begins
↓
Modifies Resource X
[...]
↓
[...]
Reads Resource X
↓
[...]
Commits
↓
Rolls back (reverting Resource X)
[...]
Transaction A, by rolling back at the end, reverts resource X to it's previous state, but after Transaction B had already read and potentially acted on.
Since changes made by transactions should not persist in the database until the transaction commits; the state of Resource X ( which transaction B read and potentially acted upon), never really existed in the database.
3. Non-repeatable Read
Non-repeatable read phenomena occurs when a transaction is unable to read a consistent version of a resource multiple times due to concurrent changes made by other transactions.
Transaction A
Transaction B
↓
Begins
Begins
↓
[...]
Reads resource X
↓
Modifies Resource X
< does some more work ... >
↓
Commits
Reads resource X
↓
[...]
Commits
The state of resource X on Transaction B's second read is different from what it was when it was previous read. Hence: Non-Repeatable.
4. Phantom Rows
Phantom rows are a subtler variant of the non-repeatable read phenomena.
Transaction A
Transaction B
↓
Begins
Begins
↓
[...]
Read rows 1 to 10, 1 to 3 Exist
↓
Inserts row 4
[...]
↓
Commits
Read rows 1 to 10, 1 to 4 Exist
↓
[...]
Commits
Transaction A and B begin, and Transcation B reads a range of rows using a filter of keyvalues 1 to 10, however at this point in time only rows 1 to 3 actually exist in the table and are returned.
Transaction A then inserts row 4 and commits. Now, if transaction B performs the same transaction as it did before, rows 1 through 4 are returned.
Note that this is not a non-repeatable read, because rows 1 through 3 were returned both times consistently; but row number 4 seems to have appeared out of nowhere. Hence: Phantom Row.
How ANSI Isolation Levels Solve Concurrency Phenomena
Now that we know the concurrency phenomena that may occur (Lost Update, Dirty Read, Non-repeatable Read, Phantom Rows), we can see what measures the ANSI SQL standard provides for dealing with them (both in general and specifically by PostgreSQL).
READ UNCOMMITTED
READ COMMITED
REPEATABLE READ
SERIALIZABLE
These are used in transactions with the following syntax:
Note that the more restrictive the isolation level, the higher the pentalyu in terms of concurrency and performance. This is due to maintaing snapshots through entire transactions.
1. READ UNCOMMITTED
Most RDBMS except for PostgreSQL support the least restrictive isolation lvele known as READ UNCOMMITTED. In this isolation level, all concurrency phenomena are permitted included diry reads, non-repeatable reads, and phantom rows).
Althrough PostgreSQL does support this syntac for setting read uncommitted, it is treated exactly the same way as READ COMMITTED, therefore for all practical purposes, it doesn't exist in postgreSQL at all.
This implies that dirty reads can never occur in PostgreSQL.
2. READ COMMITTED
Read committed is the default isoliation level in postgreSQL. Read committed prevents dirty reads, but non-repeatable reads, and phantom rows can still occur.
PostgreSQL uses MVCC, an optimistic isolitaion implementation paradigm, and read committed is implemented using data snapshots.
Transaction A
Transaction B
↓
Begins
Begins
↓
📷 Reads Resource X
[...]
↓
[...]
Modifies Resource X
↓
📷 Reads Resource X
Commits
Transaction A uses Snapshots (:camera:) on each read of resource X in order to isolate it from concurrent transactions. The snapshot only lasts for the duration of the reading statement.
Shortly after or concurrently, Transaction B modifies the same resource and commits.
Transaction A reads the same resource again and takes a fresh snapshot of it. The snapshot now contains the modified state of resource X, and this constitutes a non-repeatable read; since the transaction is reading a different state of resource X from one read to the next.
3. REPEATABLE READ
Repeatable read prevents both dirty reads, and non-repeatable reads. But it should still allow for phantom rows. PostgreSQL optionally goes beyond this and does not allow phantom rows with repeatable read.
Transaction A
Transaction B
↓
Begins
Begins
↓
📷 Reads Resource X
[...]
↓
[...]
Modifies Resource X
↓
Reads Resource X Again
Commits
In this transaction, one the second read of X, it uses the same snapshot as when it first read the data.
4. Serializable
The most restrictive isolation level, effectively emulating serial transaction execution, effectivly eliminating any concurrency in favour of high isolation.
Serializable eliminates dirty reads, non-repeatable reads, and phantom rows.
Again, [COMMIT | ROLLBACK] TRANSACTION is the most explicit and common to other RDBMS.
BEGIN TRANSACTION;
-- Transaction bodyCOMMIT TRANSACTION;
You cannot nest transactions in PostgreSQL. If you try to start a transaction inside the scope of another, it will be ignored and elicit a warning message.
2. SET TRANSACTION
You can set the isolation level for a transaction as:
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- TRANSACTION BODYCOMMIT TRANSACTION;
----- is equivalent to:---BEGIN TRANSACTION;
SET TRANSASCTION ISOLATION LEVEL REPEATABLE READ;
-- TRANSACTION BODYCOMMIT TRANSACTION;
You can also preface all transactions in a session to follow a certain isolation level by moving the SET statement outside the transactions and specifying SESSION:
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
-- TRANSACTION BODYCOMMIT TRANSACTION;
The server can be configured with a default isolation level without knowledge of the people crafting transactions, so be aware of that.
3. Transaction Save Points
You can divide a transaction into smaller chunks providing partial rollback points using the SAVEPOINT statement, followed by a unique savepoint name.
BEGIN TRANSACTION
-- <Do Some Work>
SAVEPOINT MySavePoint1
-- <Do Some More Work>
IF <Some Condition>ROLLBACK TO SAVEPOINT MySavePoint1
END IF
COMMIT TRANSACTION
Notice that we use a condition to potentially rollback to the SAVEPOINT. Note that you may be better off just making smaller transactions rather than using savepoints, because it is usually better to avoid conditional rollback, just because it adds complexity.
4. DDL Statements
DDL statements are fully transactional. Therefore, DDL statements like CREATE TABLE statements occuring within a transaction will be committed or rolled back much like an INSERT or DELETE statement.
We can think of DDL statements as an insert only on system tables. For example:
BEGIN TRANSACTION
CREATETABLET1 (Col1 INTPRIMARY KEY)
INSERT INTO T1 VALUES (1)
ROLLBACK TRANSACTION
SELECT*FROM T1
If the transaction for whatever reason was rolled back, it will negate the effect of the CREATE TABLE and the INSERT.
-- It's advisable to include the column names in the INSERT statements for explicitness and readibility, and avoid potential future bugs if the table structure ever changesINSERT INTO Table [ AS Alias ] [ ( Column_Name [, ...] ) ]
[ OVERRIDING { SYSTEM | USER } VALUE ] -- used to explicity handle system generated values such as identity oclumns and sequences
{ DEFAULT VALUES | VALUES ({ Expression | DEFAULT } [, ...])
[, ...] | Query } -- Default values, explicit values, or a query that allows the data to be inserted
[ ON CONFLICT [ Conflict_target ] Conflict_Action ] -- defines which actions we want to take incase our data conflicts with existing table constraints
[ RETURNING * | Output_Expression [ [ AS ] Output_Alias ] [, ...] ] -- included to make our insert statements not only insert the data, but also return data to either the client application or a nesting statement similar to a SELECT.
2. Inserting Literal Values
The VALUES row constructor can be used to generate datasets for inserts. The VALUES keyword is followed by parenthases with comma-separated values for the different column value expressions.
We can also have it create additional rows by adding a second set of parentheses also separated by a column.
All rows must have the same number of expressions.
This is the same as using a SELECT statement with literal expressions and without a FROM clause. Both are VALID forms to insert literal values.
INSERT INTO Table [ AS Alias ] [ ( Column_Name [, ...] )]
VALUES ( Expression [, ...] ) [, ...]
INSERT INTO Table [ AS Alias ] [ ( Column_Name [, ...] )]
SELECT Expression [, ...]
Examples of Inserting Literal Values
Example of inserts using the VALUES keyword:
DROPTABLE IF EXISTS T;
CREATETABLET (
Identity_Column_ALWS INTNOT NULL GENERATED ALWAYS AS IDENTITY,
Identity_Column_DFLT INTNOT NULL GENERATED BY DEFAULT AS IDENTITY,
Default_Column INTNOT NULL DEFAULT(0),
Unique_Column INTNULL UNIQUE,
Constraint_Column INTNULLCHECK (Constraint_Column >0)
);
-- Inserting LiteralsINSERT INTO T (Default_Column, Unique_Column, Constraint_Column)
VALUES (1, 1, 1),
(2, 2, 2); -- INSERT 0 2SELECT*FROM T;
To insert default values means that we do not explicitly state the actual value as part of the insert statement Instead, we instruct the database engine to populate that value for us.
In PosetgreSQL we have two types of default values
Explicitly assigned with a default constraint, which is assinged to a column; and
Identity or Sequence columns, which can be used to generate sequential identifiers for rows (often abused as surrogate keys).
These kinds can be triggered by excluding them completely from an insert column-list.
INSERT INTO Table [ AS Alias ] [ ( Column_Name [, ...] ) ]
For identiy, the INSERT syntax allows us to state the desired behaviour when there is a conflict of an explicit value being provided for a column that has an identity property.
Stating OVERRIDING SYSTEM VALUE will give preference to the value in the statement; and,
stating OVERRIDING USER VALUE will simply ignore it and keep using the system generated value.
INSERT INTO Table [ AS Alias ] [ ( Column_Name [, ...] ) ]
[ OVERRIDING { SYSTEM | USER } VALUE ]
Another way to trigger default constraints is to use the DEFAULT VALUES clause for cases where you want the whole row to exist as defaults. Or use the keyword DEFAULT as part of the values clause.
INSERT INTO Table [ AS Alias ] [ ( Column_Name [, ...] ) ]
[ OVERRIDING { SYSTEM | USER } VALUE ]
DEFAULT VALUES | VALUES ( DEFAULT [, ... ] )
Examples of inserting default values
To trigger the default constraint for the Default_Column we can either exclude it from the insert column list or explicitly use the DEFAULT keyword.
-- Inserting DefaultsINSERT INTO T (Unique_Column, Constraint_Column)
VALUES (4, 4);
INSERT INTO T (Default_Column, Unique_Column, Constraint_Column)
VALUES (DEFAULT, 5, 5);
SELECT*FROM T;
Trying to do the same with the Identity_Column_ALWS which generates identities defined with ALWAYS AS IDENTITY will elicit a warning message if attempted.
INSERT INTO T (Identity_Column_ALWS, Unique_Column, Constraint_Column)
VALUES (12, 7, 7);
-- ERROR: cannot insert a non-DEFAULT value into column "identity_column_alws"-- DETAIL: Column "identity_column_alws" is an identity column defined as GENERATED ALWAYS.-- HINT: Use OVERRIDING SYSTEM VALUE to override.