danidiaz/_Concurrent n Distributed.md

## _Concurrent n Distributed.md

      
    Raw
  

              _Concurrent n Distributed.md
            
          
    Papers

Time clock and the ordering of events in a distributed system
Consensus on Transaction Commit
An empirical study on the correctness of formally verified distributed systems
Byzantizing Paxos by Refinement
Fast Byzantine Paxos
Impossibility of Distributed Consensus with One Faulty Process

an asynchronous consensus algorithm cannot be guaranteed to be both safe and live. This is called the FLP Impossibility Result

Blockchains from a Distributed Computing Perspective ht
Seeing is Believing: A Client-Centric Specification of Database Isolation
CASPaxos: Replicated State Machines without logs
On Designing and Deploying Internet-Scale Services
link collection
Composition: A Way to Make Proofs Harder mentioned here
Paxos Made Live - An Engineering Perspective. Red Hat contributes etcd, the cornerstone of Kubernetes, to the Cloud Native Computing Foundation. HN.
Building Consistent Transactions with Inconsistent Replication.
A generalised solution to distributed consensus. HN
Atul Adya’s PhD thesis. Natacha Crooks.

Atul Adya’s PhD thesis gives a precise definition of the SQL standard isolation levels based on how reads and writes from different transactions may be interleaved. However these definitions are given from the point of view of the system. The recent work by Natacha Crooks et. al gives elegant and precise definitions from the point of view of the user.

Interactive checks for coordination avoidance
scaling replicated state machines
Posts

Strong consistency models

Because serializability allows arbitrary reordering of operations (so long as the order appears atomic), it is not particularly useful in real applications. Most databases which claim to provide serializability actually provide strong serializability, which has the same time bounds as linearizability. To complicate matters further, what most SQL databases term the SERIALIZABLE consistency level actually means something weaker, like repeatable read, cursor stability, or snapshot isolation.


Consistency means linearizability, and in particular, a linearizable register.

Serializability, linearizability, and locality
Difference between Linearizability and Serializability
Linearizability and Serializability in context of Software Transactional Memory
Linearizability versus Serializability

Serializability is the traditional “I,” or isolation, in ACID. If users’ transactions each preserve application correctness (“C,” or consistency, in ACID), a serializable execution also preserves correctness. Therefore, serializability is a mechanism for guaranteeing database correctness.1


Granted, serializability is (more or less) the most general means of maintaining database correctness. In what’s becoming one of my favorite “underground” (i.e., relatively poorly-cited) references, H.T. Kung and Christos Papadimitriou dropped a paper in SIGMOD 1979 on “An Optimality Theory of Concurrency Control for Databases.” In it, they essentially show that, if all you have are transactions’ syntactic modifications to database state (e.g., read and write) and no information about application logic, serializability is, in some sense, “optimal”: in effect, a schedule that is not serializable might modify the database state in a way that produces inconsistency for some (arbitrary) notion of correctness that is not known to the database.


Combining serializability and linearizability yields strict serializability: transaction behavior is equivalent to some serial execution, and the serial order corresponds to real time. For example, say I begin and commit transaction T1, which writes to item x, and you later begin and commit transaction T2, which reads from x. A database providing strict serializability for these transactions will place T1 before T2 in the serial ordering, and T2 will read T1’s write. A database providing serializability (but not strict serializability) could order T2 before T1.2


As Herlihy and Wing note, “linearizability can be viewed as a special case of strict serializability where transactions are restricted to consist of a single operation applied to a single object.”

Linearizability, serializability, transaction isolation and consistency models
Linearizability
Serializability
Please stop calling databases CP or AP

Moreover, databases with snapshot isolation/MVCC are intentionally non-linearizable, because enforcing linearizability would reduce the level of concurrency that the database can offer. For example, PostgreSQL’s SSI provides serializability but not linearizability, and Oracle provides neither. Just because a database is branded “ACID” doesn’t mean it meets the CAP theorem’s definition of consistency. Is this quote really correct?

Seriablizable but not linearizable
Distributed transactions
How does two phase commit recover from a participant's failure?
How ACID is the two-phase commit protocol?
Jepsen
Notes on distributed systems for youngbloods some additional commentary
Testing Distributed Systems for Linearizability
Scalability Cheatsheet: The Road to Paxos
The Limits of the CAP Theorem HN

The only time that a CAP-Available system would be available when a CAP-Consistent one would not is when one of the datacenters can’t talk to the other replicas, but can talk to clients, and the load balancer keeps sending it traffic.


the 'A' in CAP is boring. It does not mean what you think it means. Lynch et al. probably chose the definition because it's one for which the 'theorem' is both true and easy to prove. This is not the impossibility result with which designers of distributed systems should be most concerned.

The CAP Theorem - Foundation DB
Clarifications On The CAP Theorem And Data-Related Errors
Heisenbugs and Bohrbugs
paxos vs. 2PC

Paxos actually solves a more genreal problem than 3PC.

Consensus on Transaction Commit
Delivering Billions of Messages Exactly Once HN

I don't want to ever see the phrase "Exactly Once" without several asterisks behind it. It might be exactly once from an "overall" point of view, but the client effectively needs infinitely durable infinite memory to perform the "distributed transaction" of acting on the message and responding to the server.

how do you cut a monolith in half? HN
Exactly-once Semantics are Possible: Here’s How Kafka Does it reddit nope

Providing API for building applications that have transactions and help with idempotent producing is really exciting. This will help lower a lot of pains associated with building stream processing systems. Doing it with very little performance overhead is amazing.


Indeed. Idempotent operations is the mainstay of managing "at least once" message delivery. If you had a true "exactly once" system, idempotency would be unnecessary.

Manage Kubernetes Clusters on AWS Using Kops
Docker & AWS

The problem with renting boxes is the hidden costs if you want to do it right. First of all, if you have anything mission critical, you need to run it in a high availability config, this is easy for stateless microservices, but when it comes to running your DB, you start renting three boxes instead of one or two and configuring them accordingly. And then you setup your Backup Infrastructure for disaster recovery, Glacier needs a replacement after all. No problem, just more disks(?) on a few more boxes(?) and bacula(?), better in a different Datacenter just to be on the safe side, it would be nasty if you whole rack gets fried and your data with it. Don't forget to backup your configuration, all of it. Loadbalancers, Server Environment Variables, Network (do you have an internal DNS?), Crontabs, some businesses need their audit logs stored etc...  On the infrastructure level there is lots and lots of stuff you can do and you won't ever really need AWS, you'll just spend significantly more time finding and administering the right solutions than just using the AWS Solutions where you'll find a treasure trove of great tutorials and can relatively cheaply pay for support.  If you then pay someone on top for 24/7 management/monitoring of your dedicated stack so that your team doesn't have to get up at 3 am because one of your VMs disk fills because some stray logfile is filling the disc, many of the savings you had by setting it up on a dedicated server go out of the window because the management partner needs to train their people to look into your infrastructure. AWS only Management Partners are just light-years cheaper because they can streamline their processes much better.  You could also hire your own team of admins...  Sure AWS is a beast with its own surprises, but overall the cost/benefit ratio is still very fair even if you factor in all the "surprises"(many of which your management partner will probably know about). Having layered support is really something beneficial aswell.  If something is wonky with RDS, you get to call your management partner if he didn't detect it before you, who if he can't tackle it himself can call AWS technicians. This gets you much much further than you would get elsewhere. The outside the world is paying for (for example) perconas consultants or someone similar if the problems grow over their team's head.  Sure, at some point in a companies growth, depending on how technical the operation is, there might be a time where an admin team and colocation/dedicated boxes make sense, where AWS technicians will scratch their heads etc., especially if you have some very very specific tasks you need to do.  But for most people this is far off if ever.

Serializability, linearizability, and locality

Recall that strict serializability is essentially serializability plus linearizability’s real-time constraint: transactions cannot be arbitrarily re-ordered, but must appear to take place atomically at some time between their invocation and completion. When we add real-time constraints to sequential consistency, we get linearizability: a local property. Why can’t we add real-time constraints to serializability and obtain locality? Why don’t real-time multi-object transactions compose?


We can view strict serializability as linearizability plus the multi-object transactions of serializability. But in another sense, linearizability is strict serializability with the constraint that transactions are constrained to act on a single object, because that restriction provides locality.

Visualizing Linearizability
Linearizability: A Correctness Condition for Concurrent Objects (1990)

Sequential consistency is equivalent to linearizability without condition L2. Serializability is analogous to sequential consistency with transactions, and strict serializability is analogous to linearizability with transactions. Sequential consistency, serializability, and strict serializability do not have the same locality and non-blocking properties as linearizability. Moreover, serializability and linearizability are for different domains. Serializability works well for databases because application developers should be able to easily express complex transactions. Linearizability is better for infrastructure in which the developer is willing to spend considerable effort to maximize concurrency.

Consensus Protocols: Two-Phase Commit
Consensus Protocols: Three-Phase Commit
Consensus Protocols: Paxos

3PC works very well when nodes may crash and come to a halt – leaving the protocol permanently when they encounter a fault. This is called the fail-stop fault model, and certainly describes a number of failures that we see every day. However, especially in networked systems, this isn’t the only way in which nodes crash. They may instead, upon encountering a fault, crash and then recover from the fault, to begin executing happily from the point that they left off (remember that, with stable storage that persists between crashes, there’s no reason that a restarted node couldn’t simply pick up the protocol from where it crashed). This is the fail-recover fault model, which is more general than fail-stop, and therefore a bit more tricky to deal with.


Similarly, heavy network load can delay the delivery of a message for an arbitrary period. In a synchronous network model, there is a bound on the amount of time a remote host can take to process and respond to a message. In an asynchronous model, no such bound exists. The key problem that asynchronicity causes is that time outs can no longer be reliably used as a proxy for failure detection


There are other ways that Paxos can go wrong. Acceptors need to keep a record of the highest proposal they have agreed to, and the value of any proposals they have accepted, in stable storage. If that storage should crash then the acceptor cannot take part in the protocol for fear of corrupting the execution (by effectively lying about the proposals it has already seen). This is a kind of Byzantine failure, where an acceptor deviates from the protocol.


Rather than sacrifice correctness, which some variants (including the one I described) of 3PC do, Paxos sacrifices liveness, i.e. guaranteed termination, when the network is behaving asynchronously and terminates only when synchronicity returns.

In distributed systems, what is a simple explanation of the Paxos algorithm?

3PC is fail-stop resilient, but not fail-recover resilient. Unfortunately real life requires fail-recover and hence we need a more general solution. This is where Paxos comes in.


What happens if we mandate only one Leader at a time in Paxos, and also mandate that instead of majority, all nodes must vote? You are right – we get 2PC. 2PC is a specific case of Paxos.

Byzantine faults, Paxos, and consensus

Paxos is in a sense sort of like RSA. RSA is expensive, so you use it as a wrapper to a cheaper protocol.
With Paxos you elect a leader to be in charge of a large-scale event. The coordinator is a single point of
failure, so you can use paxos to come up with a new coordinator. Most of the time you don’t run paxos, you
only run it when something bad happens to prevent any one thing to be a single point of failure.

Version vector
Why is memory reclamation so important? HN
You Can’t Sacrifice Partition Tolerance
Fear and Loathing in lock-free programming
Google's Cloud Spanner: how does it stack up?

Spanner was originally built by Google to handle workloads like AdWords and Google Play, that were, according to Google, previously running on massive, manually sharded MySQL implementations

3PC. 2PC, Paxos SO question

So Paxos is a better, more general version of 2PC, and unlike 3PC, is has been proved correct?

A Comparison of Advanced, Modern Cloud Databases

Concurrent ACID: Whether the database supports ACID (atomicity, consistency, isolation, and durability) guarantees across multiple operations. ACID is a powerful tool for system correctness, and until recently has been a long sought but illusive chimera for distributed databases. I use the term “concurrent ACID” because technically Cosmos guarantees ACID, but only within the confines of a single operation.


modern techniques can achieve CP while still keeping availability that’s incredibly good. Like five or more 9s of good. This result is so optimal that modern databases seem to be converging on it. Every database on the list above is CP with varying levels of A


Time-based consistency. Sophisticated distributed systems like Spanner and CockroachDB tend to need a little more time to coordinate and verify the accuracy of the results that will be returned from any given node, and this makes them less suitable for low latency operations.

Spanner, TrueTime and the CAP Theorem HN
Why you should pick strong consistency, whenever possible
Paxos derived
distributed systems reading list
A History of Transaction Histories
Data Laced with History: Causal Trees & Operational CRDTs
another collection of links
Standing on Distributed Shoulders of Giants
Tweets about distributed transactions
The Four Most Expensive Words in the English Language Proofs of Space
tweet about consensus
The Horrors of Upgrading Etcd Beneath Kubernetes HN
Complex Event Flows in Distributed Systems
An Illustrated Proof of the CAP Theorem hn
Two Generals and Time Machines
Augmenting Agile with Formal Methods
1 like = 1 distributed systems paper
The many faces of consistency
Diagnosing A Weak Memory Ordering Bug
Mind Your State for Your State of Mind at another blog
Distributed Agreement on Random Order
Exploring Stretch Clusters for Red Hat OpenShift Dedicated

New Multi-AZ clusters can be deployed to AWS Regions that have at least three AZs. This allows for the control plane to be distributed with one node in each AZ (one master and one infra node in each AZ). In the event of an AZ outage, etcd quorum is not lost and the cluster can continue to operate normally.

Partitioned consensus and its impact on Spanner’s latency. HN.
An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems.
TLA is hard to learn
Introduction to TLA+ Model Checking in the Command Line. HN.
Time to Move on from Two Phase Commit. HN.
Reliable Microservices Data Exchange With the Outbox Pattern.
session types hn
A Critical Look at Event-Driven Systems
history of protocols distributed systems
Toward Domain-Specific Solvers for Distributed Consistency
Demystifying Database Systems: An Introduction to Transaction Isolation Levels. more on isolation levels. Read committed Snapshot VS Snapshot Isolation Level. follow up.
What Write Skew Looks Like
Real Transactions are Serializable
Lamport about Practical TLA+ hn
Building Robust Systems With ACID and Constraints.
A way to do atomic writes
Using Randomized Communication for Robust, Scalable Systems
Panel: The Promises and Perils of Eschewing Distributed Coordination
An explanation of the difference between Isolation levels vs. Consistency levels HN
Distributed consensus reading list great resource!
partitions and stuff
Gray Failure: The Achilles Heel of Cloud-Scale Systems
Pulsar vs. Kafka. blog. blog
Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Cloud-Scale Infrastructure
Atomic Replication Changes in etcd/Raft
what papers to read
Foundational distributed systems papers
a primer on memory consistency and cache coherence
PACELC theorem
Slides

Distributed systems course
Distributed Systems, Failures, and Consensus
another one
CIS 505: Software Systems Lecture Note on Consensus
map of consistency models tweet
How to scale a distributed system

Symmetry breaking (with leader election as an example)


Small control plane, big data plane


Health Checks and Graceful Degradation in Distributed Systems Envoy Service Mesh Case Study: Mitigating Cascading Failure at Lyft


Rate limiting and circuit breaking based on static thresholds and limits can prove to be error-prone and brittle from both correctness and scalability standpoints.

NewSQL databases fail to guarantee consistency and I blame Spanner. HN.
Consensus Systems with Ethan Buchman
Help! I Accidentally Distributed My System!.
Why Logical Clocks are Easy.
Using TLA+ to Understand Xen Vchan
Modern dining philosophers
mapConcurrently-alike with concurrency limit, and ordered job start.
go concurrency bugs
TLA+ material
transaction systems slides
Courses

Principles of Distributed Computing (lecture collection)
Another course
Distributed Systems W4995-1 Fall 2014 Roxana Geambasu lecture 17

3PC trades safety for liveness, 2PC trades liveness for safety. Paxos is "largely live" and blocks only on exceptional circunstances

Notes on Theory of Distributed Systems CPSC 465/565: Fall 2017
Languages and Abstractions for Distributed Programming CMPS290S, Fall 2018.
distsys class
Correctness Anomalies Under Serializable Isolation
ceph evolution
formal foundations of serverless computing
the huge costs of coordination in our systems
CSE138, Spring 2021
Books

Replication theory and practice


Consistency Models for Replicated Data


Replication techniques for availability


Modular Approach to Replication for Availability


Stumbling over Consensus Research: Misunderstandings and Issues


Replicating for Performance: Case Studies


A History of the Virtual Synchrony Replication Model


From Viewstamped Replication to Byzantine Fault Tolerance


Implementing Trustworthy Services Using Replicated State Machines


State Machine Replication with Byzantine Faults


Selected Results from the Latest Decade of Quorum Systems Research


From Object Replication to Database Replication


Database Replication: A Tutoria


Practical Database Replication


An introduction to distributed systems


when I started designing languages


Emily Short - Five Strategies For Collaborating With A Machine [PROCJAM 2016]


Distributed systems theory for the distributed systems engineer


An Overview of Distributed Systems and the Consensus Problem


queues are bad


learning to build distributed systems


composition in distributed systems


Videos

Our concurrent past, our distributed future
Distributed Systems vs Compositionality—Roland Kuhn
Consistency without consensus in production systems https://www.youtube.com/watch?v=lsKaNDj4TrE Martin Kleppmann - Conflict Resolution for Eventual Consistency GOTO 2015 • Coordination-Free Computations GOTO 2014 • How the Bitcoin Protocol Actually Works
Distributed Systems Theory for Practical Engineers
Applied Distributed Research in Apache Cassandra
Four Distributed Systems Architectural Patterns by Tim Berglund
What Came First: The Ordering of Events in Systems
Four Distributed Systems Architectural Patterns by Tim Berglund

[17:00] If you are a service, like Slack, which is composed of a series of organizations which are not as big...

Distributed Systems in One Lesson by Tim Berglund
Think Twice before Dropping ACID
The Practice and Theory of TLA+ more more
Consensus: Why Can't We All Just Agree?
The Future of Distributed Databases Is Relational
Streaming SQL to Unify Batch & Stream Processing w/ Apache Flink
Cluster Consensus: When Aeron Met Raft
CRDTs and Distributed Consensus
How to Build Observable Distributed Systems
Jepsen 7: Anna Concurrenina by Kyle Kingsbury
JavaZone talk: Transactions and Concurrency Control Patterns
cool linearizability diagram
Help! I Accidentally Distributed My System!.
A Raft implementation in Haskell.
Everything about distributed systems is terrible | Code Mesh LDN 18.
Apache Zookeeper As A Building Block For Distributed Systems.
Think local.
VIDEO: Consensus algorithms, Paxos and Raft.
Apache Zookeeper As A Building Block For Distributed Systems with Patrick Hunt - Episode 59.
High reliability infrastructure migrations
Time, Clocks and Ordering of Events in a Dist. System by Dan Rubenstein [PWL NYC]
Decoding Distributed Systems
Building A "Simple" Distributed System - Formal Verification.
What happens if the server dies after db commit but before saving event to kafka?.
Paxos tweet.
Serializability vs “Strict” Serializability: The Dirty Secret of Database Isolation Levels.  HN.
Impossibility of Distributed Consensus with One Faulty Process.
towards language support for distributed systems.
Designing Distributed Systems with TLA+
Lecture 14: Optimistic Concurrency Control and Snapshot Isolation
Patterns for Decoupling in Distributed Systems: Summary Event
Distributed Systems Engineering with Apache Kafka
Correctness Anomalies Under Serializable Isolation
data condition vs. data race
Transactional Outbox pattern - A piece of the eventual consistency puzzle.
Those Who Forget the Past Are Doomed to Repeat It
Strange Loop 2019:  "Correctness proofs of distributed systems with Isabelle" by Martin Kleppmann
Distributed consensus revised

Distributed consensus, the ability to reach agreement in the face of failures and asynchrony, is a fundamental and powerful primitive for constructing reliable distributed systems from unreliable components. For over two decades, the Paxos algorithm has been synonymous with distributed consensus. Paxos is widely deployed in production systems, yet it is poorly understood and it proves to be heavyweight, unscalable and unreliable in practice.

simulation
flexible paxos tweet
mergeable replicated datatypes
Designing Distributed Cache - Part I
Correctness proofs of distributed systems with Isabelle
Consistency in Non-Transactional Distributed Storage Systems
CS 144: Introduction to Computer Networking, Fall 2019
My Distributed Systems Seminar's reading list for Spring 2020
failure detectors
Best of 2019 in Tech Talks
Jepsen: Etcd 3.4.3
A walkthrough tutorial of TLA+ and its tools: analyzing a blocking queue
PigPaxos: Devouring the communication bottlenecks in distributed consensus
AVOID REUSABILITY ACROSS PROCESS BOUNDARIES
distributed id generation
Friday afternoon distributed systems thread: quorums and latency
Paxos vs. Raft
debugging distributed systems
Retries in distributed systems: good and bad parts
"A fault-tolerance shim for serverless computing"
CRDTs: The Hard Parts [video]
eventual consistency isn't for streaming
How to Build a Highly Available System Using Consensus

One of many timeless works from Butler Lampson. The patterns are still widely in use to this day. The trade offs between lease times and availability are so clearly put, reading this could save a lot of hard earned lessons.

a review of consensus protocols
How you could have come up with Paxos yourself
Helios

Helios is Microsoft's high speed stream ingestion and indexing system, but it's more than just that - it's also a reference architecture for their next-generation big data systems

Notes on Paxos
Advanced Join Patterns for the Actor Model Based on CEP Techniques
Structured Concurrency

That's the easy part and typically not entirely what you want.

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
https://news.ycombinator.com/item?id=25061901
Fairness in multi-tenant systems
Fast Flexible Paxos
S3 strong consistency

I had to work around the issue by storing a high-water mark in the config ID, and then poll an external system to find the expected high water mark before I knew I could safely read and update.

distributed systems reading list
a list of distributed transaction protocols
scalability comes from avoiding coordination
1. No shared memory. 2. No shared clock.
Anna: A KVS for Any Scale
Data laced with history
too much concurrency & lock trashing
Paxos & Raft
We found and fixed a rare race condition in our session handling
how to test a databse - simulation testing. testing distributed systems
	Internal Consistency in Streaming Systems 
Tail Latency Might Matter More Than You Think
Metastable Failures in Distributed Systems
Pipelining in TypedProtocols
Lamport Clock
Getting To Know Logical Clocks By Implementing Them
Scalable but Wasteful or Why Fast Replication Protocols are Actually Slow
The Fundamental Mechanism of Scaling

  
## _Databases.md

      
    Raw
  

              _Databases.md
            
          
    Videos

Modern SQL algorithms
Transactions and Concurrency Control Patterns by Vlad Mihalcea

[in MVCC] readers block readers to prevent dirty writes


Hibernate provides many application-level concurrency control strategies.

High-performance Hibernate
The Bits of Postgres You Haven't Found Yet
Transactions: myths, surprises and opportunities
distributed transactions lecture
CMU Advanced Database Systems - 17 Cost Models (Spring 2018) 2016 2017 2018
modern sql

I must admit to personally seeing the primary benefit of ORMs being for CRUD operations, namely when wiring entities and their relationships up to create / edit forms. And then lesser so on more simplistic (application-level) views using those entities where the query is real straightforward. When needing more complicated queries to get at a set of entities, or fast access to a subset of attributes, I prefer handrolled SQL as opposed to leaky abstraction DSLs atop SQL and / or in-language constructs like SQLAlchemy's expression syntax for static queries.


That said, when it comes to non-uniform (say, form-driven queries), SQLAlchemy's expression syntax is much easier to work with and much more flexible than building query strings up by hand.

The Mother of all Query Languages: SQL in Modern Times.
6 Technical Challenges Developing a Distributed SQL Database. .
Presto summit recap
Slides

Row Pattern Matching in SQL:2016
Pagination Done the Right Way
SQL Transactions - What they are good for and how they work
Linearizability
Implementing Dragon, Uber's data-integration tool for property graphs. Algebraic Property Graphs. data warehousing with cql. What is Enterprise Category Theory. Data Exchange Podcast (Episode 67): Ryan Wisnesky – Co-Founder of Conexus. Fast Left-Kan Extensions Using The Chase. Informal Data Transformation Considered Harmful. Informal Data Transformation Considered Harmful. Compositional Models for Power Systems. Algebraic Property Graphs. Categorical Data Integration for Computational Science. Dragon: Schema Integration at Uber Scale. meetup link. Categories for (Big) Data models and optimization. another meetup with more links
How Database Cursors Help in Fetching Large Result sets from your SQL. Best Practices Working with Billion-row Tables in Databases
. Redis In-Memory Database Crash Course.
Papers

Serializable Snapshot Isolation in PostgreSQL
An Optimality Theory of Concurrency Control for Databases

Granted, serializability is (more or less) the most general means of maintaining database correctness. In what’s becoming one of my favorite “underground” (i.e., relatively poorly-cited) references, H.T. Kung and Christos Papadimitriou dropped a paper in SIGMOD 1979 on “An Optimality Theory of Concurrency Control for Databases.” In it, they essentially show that, if all you have are transactions’ syntactic modifications to database state (e.g., read and write) and no information about application logic, serializability is, in some sense, “optimal”: in effect, a schedule that is not serializable might modify the database state in a way that produces inconsistency for some (arbitrary) notion of correctness that is not known to the database.

Serializability of concurrent database updates
Serializable Isolation for Snapshot Databases
Highly Available Transactions: Virtues and Limitations
A Critique of ANSI SQL Isolation Levels
Why do computer systems stop?
Spanner: Becoming a SQL System hackernews
Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
saner event sourcing HN

I have worked on, or cleaned up, 4 different CQRS/ES projects. They have all failed. Each time the people leading the project and championing the architecture were smart, capable, technically adept folks, but they couldn't make it work.
There's more than one flavor of this particular arch, but Event Sourcing in general is simply not very useful for most projects. I'm sure there are use cases where it shines, but I have a hard time thinking of any. Versioning events, projection, reporting, maintenance, administration, dealing with failures, debugging, etc etc are all more challenging than with a traditional approach.

D. Richard Hipp - SQLite - CMU Fall 2015 earlier
Modern SQL
Google Spanner podcast
Posts

Oracle serializable is not serializable
Quantifying isolation anomalies
Distributed Consistency and Session Anomalies

In the database systems community, the gold standard is serializability. [...] In the distributed systems community, the gold standard is linearizability.


Both serializability and linearizability require (potentially expensive) coordination. Therefore just as we consider weaker isolation levels than serializable, we can also consider weaker consistency levels than linearizable. At weaker consistency levels non-linearizable phenomena (anomalies) can be observed.


Serialisability is not stronger than SI or causal consistency. They are mutually incomparable. Serialisability allows the folowing anomaly w.r.t. Causality: user performs transaction T1 followed by T2; database executes them in the order T2 followed by T1. SI allows the write-skew anomaly, which SER disallows. The Strict Serialisability model is the only one stronger than both SI and CC (and Serialisability for that matter).

The many faces of consistency
A Beginner’s Guide to the True Order of SQL Operations
A beginner’s guide to database locking and the lost update phenomena
Quantified Comparison Predicates – Some of SQL’s Rarest Species
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
A True SQL Gem You Didn’t Know Yet: The EVERY() Aggregate Function
serializable (for Oracle)
A beginner’s guide to read and write skew phenomena
A beginner’s guide to the Phantom Read anomaly, and how it differs between 2PL and MVCC
group commit
info on correlated subqueries correlated correlated vs non-correlated in from
semijoins Semi Join and Anti Join Should Have Their Own Syntax in SQL
Non-semi-join Subquery Optimizations in MariaDB mysql Optimizing Subqueries with the EXISTS Strategy
What is semi-join in database?
lateral join
How to Calculate Multiple Aggregate Functions in a Single Query
The Sakila Database
"The key, all the key, and nothing but the key"
UPSERT MERGE
SQL Select only rows with Max Value on a Column greatest n per group
Mode Analytics on SQL Window Functions
transaction-log at DBA stack exchange
PostgreSQL anti-patterns: read-modify-write cycles HN discussion more on race conditions more more more more more on DBA stack exchange Do database transactions prevent race conditions?

Optimistic concurrency control (or “optimistic locking”) is usually implemented as an application-side method for handling concurrency, often by object relational mapping tools like Hibernate.


In this scheme, all tables have a version column or last-updated timestamp, and all updates have an extra WHERE clause entry that checks to make sure the version column hasn’t changed since the row was read. The application checks to see if any rows were affected by the UPDATE and if none were affected, treats it as an error and aborts the transaction.


Unlike SERIALIZABLE isolation, it works even in autocommit mode or if the statements are in separate transactions. For this reason it’s often a good choice for web applications that might have very long user “think time” pauses or where clients might just vanish mid-session, as it doesn’t need long-running transactions that can cause performance problems.


A key feature of this approach is that it lets you span work across multiple transactions, which is a big plus with systems that have high user counts and may have long delays between interactions with any given user.


Rather than using locking and/or high isolation levels, it's common for ORMs like Hibernate, EclipseLink, etc to use optimistic concurrency control (often called "optimistic locking") to overcome the limitations of weaker isolation levels while preserving performance.


Even with strong isolation some systems (like PostgreSQL) will abort conflicting transactions, rather than making them wait and running them serially. Your app must remember what it was doing and re-try the transaction. So while the transaction has prevented concurrency-related anomalies from being stored to the DB, it's done so in a manner that is not transparent to the application.


Some RDBMSs have stronger abilities than others. For example, PostgreSQL 9.2 and newer have quite good SERIALIZABLE isolation that detects most (but not all) possible interactions between transactions and aborts all but one of the conflicting transactions. So it can run transactions in parallel quite safely.

select .. for update more more
Clustered and Nonclustered Indexes Described
PostgreSQL doesn't have the concept of clustered indexes at all. Instead, all tables are heap tables and all indexes are non-clustered indexes.
Slow Postgres updates if OTHER columns are indexed? Efficient Use of PostgreSQL Indexes

When this happens, a sequential scan is actually most likely much faster than an index scan, so the query planner has in fact correctly judged that the cost of performing the query that way is lower.


Postgres will decide to perform a sequential scan on any query that will hit a significant portion of a table. If you do have an index on that column, it will be a dead index that’s never used - and indexes are not free: they come at a cost in terms of storage and maintenance.

Batch Writing, and Dynamic vs Parametrized SQL, how well does your database perform?
Why Use Postgres (Updated for Last 5 Years)

Upsert was a work in progress for several years. It was one of those features that most people hacked around with CTEs, but that could create race conditions.

you can create constraints to ensure only unique ranges in a table
Use row constructors to compare several values at once
Do you need a database transaction for reading data?
rank in mysql
Understanding Weak Isolation Is a Serious Problem
SQL/MED Hacker News
UUID or GUID as Primary Keys? Be Careful! hacker news
Testing PostgreSQL for Fun
Postgres Transactions Aren’t Fully Isolated (clickbaity)
Is my implementation of type/subtype design pattern (for mutually exclusive subclasses) correct?
Inventory management in MongoDB: A design philosophy I find baffling HN
How do you update your production codebase/database schema without causing downtime?
How to handle unexpected schema changes to production database
Zero Downtime Deployment with a Database
Customizing My Postgres Shell
Practical Guide to SQL Transaction Isolation
Falsehoods programmers believe about geography
Using ENUM vs other types
JOIN Elimination: An Essential Optimiser Feature for Advanced SQL Usage
The best way to soft delete with Hibernate
Deferrable Constraints in SQL Server
How to update only a subset of entity attributes using JPA and Hibernate
Hibernate: save, persist, update, merge, saveOrUpdate
Database Decay and What To Do About It
10 Cool SQL Optimisations That do not Depend on the Cost Model
Storing JSON in database vs. having a new column for each key
sql optimization & performance on SO.
Retrieving n rows per group
apply so

Note: Informix 12.10 xC2+ has Lateral Derived Tables and Postgresql (9.3+) has Lateral Subqueries which can be used to a similar effect.

lateral clauses in SQL

SQL;2003 allows a subquery in the from clause that is prefixed by lateral key word to access the attributes of preceding tables or sub queries in the from clause. Example SQL is given in the question.  Without the lateral clause, the sub query cannot access the correlation variable I1 from the outer query. Currently only few SQL implementations such as IBM DB2 support the lateral clause.

Optimize GROUP BY query to retrieve latest record per user

There are things that a LATERAL join can do, but a (correlated) subquery cannot (easily). A correlated subquery can only return a single value, not multiple columns and not multiple rows

Correlated subquery

The effect of correlated subqueries can in some cases be obtained using joins.


Correlated subqueries may appear elsewhere besides the WHERE clause

Taking a look at CROSS APPLY

I think the easiest way to think of CROSS APPLY is that it is like doing a CROSS JOIN with a correlated sub-query instead of a derived table.  Let's see if I can explain that .... A derived table is "self-contained", in that all tables and columns in the parent SELECT are not accessible (though variables and parameters can be referenced).

Subqueries

A subquery can represent a fixed value, a correlated value, or a list of values. You can use any number of subqueries. You can also nest subqueries inside other subqueries.

lateral lateral as alternative to group by more lateral even more lateral yet even more lateral and more on JOOQ
10 Common Mistakes Java Developers Make when Writing SQL

If you’re UPSERTING by chaining INSERT and UPDATE or by chaining SELECT .. FOR UPDATE and then INSERT or UPDATE, think again. Apart from risking race conditions, you might be able to express a simpler MERGE statement.

Managing concurrency when using SELECT-UPDATE pattern SQL Server 2012 potential race condition on select or insert PostgreSQL anti-patterns: read-modify-write cycles Handling race conditions in PostgreSQL


While SERIALIZABLE transaction isolation is a clean solution, it is also rather expensive.

A (Probably Incomplete) Comprehensive Guide to the Many Different Ways to JOIN Tables in SQL

This of course has nothing to do with relational algebra anymore, because it imposes a JOIN order (from left to right). But sometimes, that’s OK and sometimes, your table-valued function (or subquery) is so complex, that’s the only way you can actually use it.

Postgres LATERAL JOIN follow-up

other options: Use GROUP BY in a subquery to get the latest inspection date, then use that date to get the rest of the inspection info. / Use a window function. / Use DISTINCT ON.


Results for me take about 25ms. I’ve gotta say I love window functions, but it looks like in this case at least lateral join is a better choice.

What is the difference between LATERAL and a subquery in PostgreSQL? [https://stackoverflow.com/questions/1139160/when-should-i-use-cross-apply-over-inner-join]

For returning more than one column, a LATERAL join is typically simpler, cleaner and faster.


CROSS APPLY works better on things that have no simple JOIN condition.


When there is a LIMIT in the subquery.


You could probably do something like that using CTE's and window function [...]  but this is less readable and probably less efficient.


See the end of Ariel's link. A row_number() query is just as nice and doesn't even require a join. So I don't think I should use cross apply for this situation (select top 3, partition by t1.id).

What's the Hi/Lo algorithm?
Managing database schemas without downtime HN

We're using FlywayDB [0] and many of the ideas in Martin Fowler's Evolutionary DB article [1]. When database changes are breaking, we use a pre-deploy and post-deploy "pair" to create an intermediate DB state that both application versions will run against.

Evolutionary Database Design
How do UPSERT and MERGE work in Oracle, SQL Server, PostgreSQL and MySQL
A beginner's guide to database locking and the lost update phenomena
Volcano Iterator Model
Encapsulation of parallelism in the Volcano query processing system
Presto query optimizer at DataWorks Summit 2018
Big News In Databases — Summer 2018
A Simple Approach to Comparing Database Structures
The most sucky thing about SQL's 3VL is how it breaks != predicates.. NOT IN clause and NULL values.

it's tricky to remember that NULLs are always excluded from such results, except if checked explicitly with the IS NULL predicate.


This gets much worse when working with NOT IN, which is just != in disguise.

How does the GROUP BY clause manage the NULL values?
10 SQL tricks that you didn’t think were possible
Modern SQL: Evolution of a dinosaur
How to Use SQL UPDATE .. RETURNING to Run DML More Efficiently.
You can insert rows into many tables in one statement. video.
SQL Tricks And Tuning. Great resource!
How to Calculate Multiple Aggregate Functions in a Single Query. GROUPING SETS works in DB2.

Now, in the outer query, we’re using once COUNT(*), which simply counts all the rows regardless of any predicates in the CASE expressions. The other COUNT(expr) aggregate functions do something that surprisingly few people are aware of (yet a lot of people use this form “by accident”). They count only the number of non-NULL rows.

values — Create Rows out of Nothing.
EXPLAIN EXTENDED.

It builds a hash table on the outer query (the one that is probed). I repeat: it is the outer query that is hashed, not the inner one. Everything that needs to be returned in SELECT clause goes to the hash table

Questions About CUBE, ROLLUP and GROUPING SETs That You Were Too Shy to Ask.
Summarizing Data Using the GROUPING SETS Operator.
"Just because you are using @Hibernate, it does not mean you should not write SQL queries.".

Window Functions are the answer to many Query-related problems.


Hibernate is an alternative to JDBC, not to SQL. Otherwise, why do you think there’s a createNativeQuery? The example is fairly simple in terms of SQL, and you don’t need to make everything generic, especially since this kind of query is business use case-oriented anyway.

How to fetch entities multiple levels deep with Hibernate.
How to map table rows to columns using SQL PIVOT or CASE expressions.
Ah, the multiple-join trick!!!!
SQL Server UPSERT Patterns and Antipatterns.
What are the most common SQL anti-patterns?.
Query pagination with JPA and Hibernate.

The JPA query pagination is not limited to entity queries that return entities only. You can use it for DTO projections as well.


Markus Winand, who wrote the SQL Performance Explained book, advocates for Keyset pagination instead of Offset.

Entities or DTOs – When should you use which projection?. SO. The best way to map a projection query to a DTO (Data Transfer Object) with JPA and Hibernate.
How to Extract a Date Part in SQL.
How to Get NULLs Horribly Wrong in SQL Server.

As great as the NULLIF function is for handling certain situations, such as avoiding divide-by-zero errors

Consider Using [NOT] EXISTS Instead of [NOT] IN.
Is your query too complex for JPA and Hibernate?.
What the SQL?!? Lateral Joins. BULK NEAREST NEIGHBOR USING LATERAL JOINS. Reuse Calculations in the Same Query with Lateral Joins. lateral join pgcast. Optimise a LATERAL JOIN query on a big table.

In SQL, a lateral join can basically be seen as some sort of loop.


Reusing parts of computations has long been a wart in SQL. Let’s say you want to compute confidence intervals for your signup rate. You’ll have to use the standard error calculation twice: once for the upper bound, and once for the lower bound.


This is where lateral joins shine. Postgres’s latest join type solves this problem by letting you make the calculation in one join, and then reference the results of that calculation in subsequent joins.


Careful readers will notice we’re using a cartesian join. This is a join of the form from a, b, c, which is shorthand for from a join b on true join c on true. It creates a row for every possible combination of rows in the joined table. In our case, where each joined table computes a single value, this just has the effect of appending calculations and then reusing them for the next calculation.


The final version is almost as easy to read as a mathematical formula! We simply define a variable, use it in the next calculation, rinse and repeat.

GROUP BY + CASE statement.

An output column's name can be used to refer to the column's value in  ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

Select first row in each GROUP BY group?. https://stackoverflow.com/a/25536748/1364288. Optimize groupwise maximum query. Query last N related rows per row.
FIRST_VALUE and LAST_VALUE Window Function Examples.
Naming unnamed columns.
Query drafting without table
https://use-the-index-luke.com/sql/myth-directory/dynamic-sql-is-slow
https://modern-sql.com/use-case/reduce-network-latency-for-insert
https://modern-sql.com/use-case/pivot
https://sqlperformance.com/2014/06/t-sql-queries/dirty-secrets-of-the-case-expression
https://jaxenter.com/10-sql-tricks-that-you-didnt-think-were-possible-125934.html
https://www.sqlteam.com/articles/how-to-use-group-by-with-distinct-aggregates-and-derived-tables
https://dba.stackexchange.com/questions/77130/how-do-i-select-data-with-a-case-statement-and-group-by
https://modern-sql.com/concept/null#aggregates
https://docs.microsoft.com/es-es/sql/connect/jdbc/using-sql-escape-sequences
http://media.datadirect.com/download/docs/slnk/devref/scalarfn.html
High-Performance Hibernate Tutorial.
How does a Recursive CTE run, line by line?. great answer. SQL Server: are the recursive CTE’s really set-based?. Average sum with start end end date. 7 Unfolding | Set Based Approach | Create Attendance Report. Recursive CTE SQL Get all Levels. Connection between codata and greatest fixed points. Recursive CTEs Explained. Re-Inventing the Recursive CTE.
[Oracle Database Express Edition (XE) Release 18.4.0.0.0 (18c) was released on October 19, 2018].(https://www.oracle.com/technetwork/database/database-technologies/express-edition/downloads/index.html)
The Data Geekery SQL masterclass. Day 1. Day 2.
More stuff on lateral joins, including comparisons with other approaches. When are they a valid alternative to window functions?

Note that thanks to the aggregation step “happening before” the windowing step, we can use aggregation functions inside of window functions.


In the old days when databases like Oracle didn’t really support LIMIT (or when using DB2, SQL Server, and Sybase, which support “LIMIT” but not “OFFSET”), people used to resort to using window functions.

basic CTEs.
The LATERAL join

This is a non-obvious use of LATERAL but one I use often. Since you can reference columns from other records in the query, you can use LATERAL to calculate values and then reuse them in the main SELECT statement. Otherwise, you would have to recalculate values for each usage of them in the SELECT statement.

How to Reduce Syntactic Overhead Using the SQL WINDOW Clause.
Been helping a customer tune their SQL.

many devs, even seasoned ones, don't think of pivoting in SQL.


we're still investigating if there could be more composite indexes rather than individual column indexes


using a "projection" (or just use SQL, e.g. through jOOQ) is almost always worth it if you don't really need the entity. Why? - data transfer size - memory consumption on client and server - join elimination

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part III.
How to merge entity collections with JPA and Hibernate.
SQL does not cease to amaze me.

6-digit daily customer sessions with roughly 1.5B SQL queries per day. Its Oracle database uses 4 cores only. The median number of operations in v$sql_plan is 50. The max is 2670.


The impressive thing here is how many execution plans are more complex than 50 operations. Some are monster SQL queries with tons of complex business logic encoded in them. Still no problemo for Oracle.

Random slowdown.
https://use-the-index-luke.com/blog/2011-07-16/planning-for-reuse
https://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic
https://use-the-index-luke.com/sql/where-clause/bind-parameters
https://use-the-index-luke.com/sql/myth-directory/dynamic-sql-is-slow
https://use-the-index-luke.com/sql/where-clause/bind-parameters
https://use-the-index-luke.com/sql/myth-directory
https://use-the-index-luke.com/sql/myth-directory/most-selective-first
https://use-the-index-luke.com/sql/myth-directory/dynamic-sql-is-slow
column histograms
https://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm#TGSQL366
http://www.dba-oracle.com/art_otn_cbo_p4.htm
http://www.dba-oracle.com/art_builder_histo.htm

Important note:  If your database exclusively uses bind variables, Oracle recommends deleting any existing Oracle histograms and disabling Oracle histogram generation (method opt) for any future dbms_stats analysis.  This approach will use the number if distinct values to determine the selectivity of a column.


The central problem with cardinality estimation is the in cases of complex WHERE clauses the optimizer does not have enough information about inter-join result set sizes to determine the optimal table join order.


Column histograms should be created only when you have highly-skewed values in a column. This rarely occurs, and a common mistake that a DBA can make is the unnecessary collection of histograms in the statistics. Histograms tell the CBO when a column's values aren't distributed evenly, and the CBO will then use the literal value in the query's WHERE clause and compare it to the histogram statistics buckets.

Query Tuning Fundamentals: Density, Predicates, Selectivity, and Cardinality
Cost Control: Inside the Oracle Optimizer

A column histogram should only be created when we have data skew exists or is suspected. In the real world,   that happens rarely, and one of the most common mistakes with the optimizer is the unnecessary introduction of histograms into optimizer statistics

an overview of query optimization
Optimize SQL Server Queries with Index Statistics.

What happened? SQL Server will not auto-parameterize a query plan unless the optimal plan is the same for all possible values of the parameter. Now that there is an index, there are some regions, like Delaware, for which the optimal plan uses the index, and there are other regions, like California, for which using the index would be a poor choice.


For this index, the cutoff between using an index and not using an index was somewhere between 1.6 and 1.9% of the number of rows in the table. These numbers will vary for other indexes and other tables since they depend on more than just number of rows. The depth of the clustered index depends on things like the width of the index keys and does not scale linearly as the table grows in size. Nevertheless, the lesson is clear; once you are requesting more than a relatively small percentage of rows in a table, an index will probably not be useful.

13 Things You Should Know About Statistics and the Query Optimizer.
Getting the Most from Oracle's Cost-Based Optimizer.
bind variable peeking. more. Understanding SQL Query Parsing: Part 4 – Understanding Bind Variable Peeking. hard parse vs. soft parse.

We have seen that the optimizer does nonsense cardinality calculations in various cases by using bind variables

Planning For Reuse

Oracle uses bind peeking and introduced adaptive cursor sharing with 11g. SQL Server uses parameter sniffing and query hints. DB2 has the re-optimizing hint REOPT.


The hint based approaches will essentially disable execution plan re-use for the respective statement. Oracle's adaptive cursor sharing supports multiple execution plans per SQL string and learns from slow executions.

How to configure DB2 REOPT Property within Websphere Console?

REOPT can only be changed for static SQL but not dynamic SQL. Websphere uses the JDBC driver for connectivity to DB2. There are no JDBC driver properties which can be set to change the REOPT level.

Thomas Müller's (author of H2) answers on Stack Overflow.
Does one get ever used to writing JPA criteria query?. JPA Criteria vs JOOQ. jooq javadocs. jooq aliases.
Why Did We Shift Away From Database-Generated Ids?.
database migration example.
what if you delete without where
A Primer on Database Replication.
Indexing Group By
SQL GROUP BY and Functional Dependencies: A Very Useful Feature.
How to Find the Longest Consecutive Series of Events in SQL
Row pattern matching
Indexes: the neglected performance all-rounder
Support for LATERAL derived tables added to MySQL 8.0.14.
Rolling Retention Done Right in SQL
SQL in Haskell
How to log the database transaction id using MDC (Mapped Diagnostic Context).
sum types in SQL. so. more.
How to Calculate Multiple Aggregate Functions in a Single Query
The Cost of Useless Surrogate Keys in Relationship Tables
unreasonable effectiveness of SQL
let's build a simple database
Alternative to EAV for dynamic fields in a star schema data warehouse
Writing a simple bank schema: How should I keep my balances in sync with their transaction history?
CDC with Kafka
How does MVCC (Multi-Version Concurrency Control) work

The only use case that can still generate contention is when two concurrent transactions try to modify the same record since, once modified, a row is always locked until the transaction that modified this record either commits or rolls back.

How does database pessimistic locking interact with INSERT, UPDATE, and DELETE SQL statements

Once the database records are locked, no UPDATE statement can modify them, even on a MVCC database engine.

A beginner’s guide to Java Persistence locking

As previously explained, there are two types of explicit locking mechanisms: pessimistic (physical) and optimistic (logical). In this post, I’m going to explain how explicit pessimistic locking interacts with non-query DML statements (e.g. insert, update, and delete).


Once the database records are locked, no UPDATE statement can modify them, even on a MVCC database engine.

How to deadlock yourself in three simple steps very good!

(about UR) This is obviously not suitable for transactional applications, but can have perfect uses in the Business Intelligence or Analytics type applications.


Cursor Stability (CS) in which DB2, like RR and RS, ensures that an application with isolation level CS doesn’t read a row that another process has changed until that other process has released the row. Unlike RR, however, CS allows other applications to change rows that the application with isolation level CS has read. This provides for a high level of stability with a high level of concurrency and is the default isolation level.

FOR UPDATE WITH RS USE AND KEEP UPDATE LOCKS
Ways of Avoiding the Read-Modify-Write Anti-Pattern When Using Spring Data?
Optimistic locking automatic retry

You generally want to avoid situations when one user overrides changes made by another user without even looking at them.

How to retry JPA transactions after an OptimisticLockException
Optimistic locking in ws

Optimistic locking allows you to lower the isolation level that you use in an application so that fewer locks are placed on the database assets. It allows more applications to run concurrently against the database, and potentially increase the throughput of your applications.


To protect against potential database integrity problems brought on by lowering the isolation level, an optimistic locking implementation has to ensure that no integrity problems occur. The WebSphere Commerce implementation utilizes an optimistic predicate column for each table in the WebSphere Commerce database - OPTCOUNTER.


In this way, the implementation can guarantee that a row in the database has not changed between the time it was read and the time it was written, to protect the integrity of the database.

When to use select for update?
Locking in JBoss

When your application uses long transactions or conversations that span several database transactions, you can store versioning data, so that if the same entity is updated by two conversations, the last to commit changes is informed of the conflict, and does not override the other conversation's work. This approach guarantees some isolation, but scales well and works particularly well in Read-Often Write-Sometimes situations.


Hibernate provides two different mechanisms for storing versioning information, a dedicated version number or a timestamp.


Typically, you only need to specify an isolation level for the JDBC connections and let the database handle locking issues. If you do need to obtain exclusive pessimistic locks or re-obtain locks at the start of a new transaction, Hibernate gives you the tools you need.


WebSphere Commerce uses an optimistic locking scheme that allows the database transaction isolation level to be lowered from repeatable read to read committed. Using that scheme, database rows not normally accessed concurrently are not locked with an intent to update when they are read. Instead, when the update is eventually made, the row is checked to make sure it has not been updated concurrently since it was read. If it has been updated concurrently, then the transaction is rolled back and the command may be restarted from the beginning in a new transaction, if appropriate.

Isolation level: pessimistic vs. optimistic

Let me add to the confusion.  From an application perspective, optimistic concurrency can be implemented in the application layer even though the database engine uses pessimistic locking.  This is often done by checking to see if the row was changed since initially retrieved by comparing the before/current values (or just rowversion column) when performing updates and deletes. This technique allows a row to be retrived without holding long-term locks.

A beginner’s guide to database locking and the lost update phenomena. How to prevent lost updates in long conversations.
Data masking. wiki.
Well-known Databases Use Different Approaches for MVCC
MVCC Part 2: Pretty Pictures and Some Examples
skip locked
Entity-Attribute-Value (EAV) The Antipattern Too Great to Give-up. eav horror. EAV - is it really bad in all scenarios?. The Anti-Pattern – EAV(il) Database Design ?. Avoiding the EAV of Destruction. To EAV, or not to EAV? Choosing your data model. EAV is an anti-pattern?. Should I use EAV database design model or a lot of tables.

If you want to allow user-defined fields in a relational database, your realistic are either EAV or stuff json into a text column. EAV, if done extremely carefully, can be a good solution, but 99% of the time, it's going to be a huge pain.


Users could add attributes (columns) at any time without you being involved?


There are other approaches however. You could consider an XML field with extended data or, if you are on PostgreSQL 9.2, a JSON field (XML is easier to search though). This would give you a significantly larger range of possible searches without the headaches of EAV. The tradeoff would be that schema enforcement would be harder.

The JPA EntityManager createNativeQuery is a Magic Wand
table functions
Managing database schema changes without downtime. DPC2018: Zero Downtime Database Migrations and Deployments - Ondrej Mirtes. Managing db schema changes without downtime. Expand-contract.
More videos on database migrations https://www.youtube.com/watch?v=RUIUUZehGgI https://www.youtube.com/watch?v=hMO63IC6R7c https://www.youtube.com/watch?v=SAkNBiZzEX8 https://www.youtube.com/watch?v=CsGb3X7W9P0
What is faster, one big query or many small queries?.
time series database

Apache Druid. The killer for us is that we allow essentially ad-hoc querying over long time intervals, but require the results to be returned quickly. And the dataset, while not Google proportions, isn't small.

What is Druid?. How To Use Kafka and Druid to Tame Your Router Data
. Aggregations.

Approximate algorithms. Druid includes algorithms for approximate count-distinct, approximate ranking, and computation of approximate histograms and quantiles. These algorithms offer bounded memory usage and are often substantially faster than exact computations. For situations where accuracy is more important than speed, Druid also offers exact count-distinct and exact ranking.


Automatic summarization at ingest time. Druid optionally supports data summarization at ingestion time. This summarization partially pre-aggregates your data, and can lead to big costs savings and performance boosts.


In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.


Aggregations can be provided at ingestion time as part of the ingestion spec as a way of summarizing data before it enters Apache Druid (incubating). Aggregations can also be specified as part of many queries at query time.

Snapshot Isolation in SQL Server.

SQL Server introduced extensions to the SQL-92 isolation levels with the introduction of the SNAPSHOT isolation level and an additional implementation of READ COMMITTED.


A snapshot transaction always uses optimistic concurrency control, withholding any locks that would prevent other transactions from updating rows. If a snapshot transaction attempts to commit an update to a row that was changed after the transaction began, the transaction is rolled back, and an error is raised.

If Eventual Consistency Seems Hard, Wait Till You Try MVCC.

One of the logical necessities, for example, is that you can only modify the latest version of a row (eventually, at least). If you try to update an old version (the version contained in your consistent snapshot), you’re going to get into trouble.

Am I designing this database right?
To ORM or not to ORM?
modern SQL slides
How to expose Hibernate Statistics via JMX
How to store schema-less EAV (Entity-Attribute-Value) data using JSON and Hibernate
a tale of query optimization
Persisting Enums in JPA
LSM-based Storage Techniques: A Survey
Experiences with approximating queries in Microsoft’s production big-data clusters
FoundationDB Record Layer
Procella: unifying serving and analytical data at YouTube
Even more amazing papers at VLDB 2019
Notes on data modeling from Handbook of Relational Database Design
Just learn SQL
cool SQL queries you have written
Choosing a cloud DBMS: architectures and tradeoffs
Modern Data Practice and the SQL Tradition. HN
we do not use foreign keys
The next 50 years of databases. HN.
Why do databases cry at night?
The joy of database configuration
Throttling writes: LSM vs B-Tree
sqlite is serverless
tombstone removal in an LSM
Debezium at Google
What are some examples of good database schema designs?
Benchmarks vs real workloads
Tuning space and write amplification to minimize cost
How do you update multiple disparate databases?
https://use-the-index-luke.com/sql/myth-directory/most-selective-first
Nested Intervals Tree Encoding in SQL
write skews, external consistency, clock skews, database-generated IDs, nested transaction issues, caches & more. oops databases.
Things I Wished More Developers Knew About Databases hn

It is worth to also note that applications can handle a bit of inconsistency or programmers might have enough insights about the problem to add additional logic in the application to handle it without heavily relying on their database.

How to prevent lost updates in long conversations
Migrating a 40TB SQL Server Database
If All You Have Is a Database, Everything Looks Like a Nail. HN.
Database as a Queue. 9 yrs ago. 2013.
Many small queries are efficient in SQLite
Some opinionated thoughts on SQL databases
Is the JDBC ResultSet an application-level query cursor.
Data Cleaning IS Analysis, Not Grunt Work. Learning SQL 201: Optimizing Queries, Regardless of Platform.
Bitemporality: More Than a Design Pattern. Bitemporal History. temporal patterns
read from databases, write to APIs
the importance of idempotence for DataOps 2 3
sql & normalization
Common data model mistakes made by startups
database horror stories
One Database Transaction Too Many
Against SQL. hn.
Duoquest: A Dual-Specification System for Expressive SQL Queries

You can use as to name scalar values anywhere they appear. Except in a group by.

Cool Stuff in Snowflake – Part 4: Aliasing all the things

In Snowflake, you can re-use your aliases in other expressions and in the WHERE, GROUP BY and HAVING clause.

The Database Ruins All Good Ideas
Podcasts

Database Choices and Uber with Markus Winand
Courses

Introduction to databases
CMU 15-721: In-Memory Databases / Advanced Database Systems [video]
Books

Foundations of Databases A book published by Addison Wesley

  
## _DB2.md

      
    Raw
  

              _DB2.md
            
          
Manual / Version 11.1.0


IBM DB2 doesn't have a separate JSON datatype. It stores BSON BLOBs and provides conversion & manipulation functions. docs.


Schemas


A schema is a collection of named objects; it provides a way to group those objects logically. A schema is also a name qualifier; it provides a way to use the same natural name for several objects, and to prevent ambiguous references to those objects.


Data types


User-defined types


Storing LOBs inline in table rows


Time Travel Query using temporal tables


Creating tables


Shadow tables


Best practices


Questions at DBA Stack Exchange


Questions at Stack Overflow


Oracle and DB2 compared (by Oracle)


IBM DB2 supports the ON DELETE and ON UPDATE clauses in referential integrity constraints. Oracle only supports the ON DELETE clause in referential integrity constraints.


In IBM DB2 identity columns provide a way to automatically generate unique, sequential and recoverable values for each row in a table. [...] When you define it with the GENERATED ALWAYS command, the database manager always generates the values used.


DB2 vs Oracle on dates


Newbie differences between DB2 and Oracle


If you do not specify ORDER BY, the rows of the result table have an arbitrary order.


How to do SQL select top N … in AS400


Using Lateral Correlation To Define Expressions In DB2


Selecting values while inserting data


FROM FINAL TABLE


OLAP specification


The aggregation group of a given row is a set of rows that is defined in relation to the given row (in the ordering of the rows in the partition of the given row). window-aggregation-group-clause specifies the aggregation group. If this clause is not specified and a window-order-clause is also not specified, the aggregation group consists of all rows of the window partition. The aggregation group of all rows of the window partition can be explicitly specified using the RANGE or ROWS clauses.


If window-order-clause is specified, but window-aggregation-group-clause is not specified, the window aggregation group consists of all rows that precede a given row of the partition of the given row or all rows that are peers of the given row in the window ordering of the window partition that is defined by the window-order-clause


db2 lateral


CROSS APPLY (similar to a row-based INNER JOIN) and OUTER APPLY (similar to a row-based LEFT JOIN) are not supported in DB2 for i, although sometimes the same thing can be accomplished. For example, a table function that is participating in a join in DB2 can have the input parameters vary (say, from a column in another table) from row to row, whereas SQL Server table functions can't be used in this manner without using one of the APPLY operators. So in the case of table functions, the APPLY operators aren't even required in DB2.


Additionally, LATERAL correlation can be used to mimic the behavior of CROSS APPLY. However, I'm not aware of any way to use LATERAL to mimic OUTER APPLY.

db2 pdf docs for Windows and Linux SQL reference
How to Limit Query Results for DB2 Databases

Opensource databases such as MySQL and PostgreSQL supports non standard SELECT extension for limiting number of returned rows from query. You can see description in PostgreSQL manual. Look for keywords LIMIT and OFFSET. LIMIT specifies the maximum number of rows to return, while OFFSET specifies the number of rows to skip before starting to return rows. When both are specified, OFFSET rows are skipped before starting to count the LIMIT rows to be returned.


Returning TOP N rows (emulating LIMIT) from query is easy. This functionality is in every database but syntax is database specific. DB2 follows SQL2008 standard and syntax for TOP N rows in DB2 is SELECT * FROM T FETCH FIRST 10 ROWS ONLY.


Starting with DB2 9.7.2, LIMIT and OFFSET are supported in DB2

Seven Surprising Findings About DB2

Imagine you've already fetched a bunch of rows and need to get the next few ones. For that you'd use the time_stamp value of the last entry you've got for the bind value (?). The query then just return the rows from there on. But what if there are two rows with the very same time_stamp value? Then you need a tiebreaker: a second column—preferably a unique column—in the order by and where clauses that unambiguously marks the place till where you have the result.

perf tips 1. [perf tips 2])http://davebeulke.com/another-5-db2-sql-performance-tips/)

Since DB2 is the big data operational engine these days, creating prototype indexes on multi-billion row tables is impractical.  To quickly and easily prototype indexes on these large tables create Virtual Indexes that leverage the DB2 V10 INCLUDE Column feature, Index on Expression and any different index column combinations.

db2 big data tuning

The first and most important SQL tip in all the SQL tip blogs I have written, continues to be the most common Let the DB2 database engine do all or as much of the work as possible! Doing the work inside the DB2 database engine continues to be the best performance advantage because DB2 does it faster than your application code. By supplying as much WHERE clause criteria statements as possible and using the correct WHERE clause criteria, you achieve faster performance than executing a SQL statement and then applying application code for filtering, comparing and quantifying the SQL result set data.


When doing data qualification through the proper SQL WHERE clauses, the I/O of data from the database engine to the application program is minimized for the most efficient performance. Also, through DB2 and the processor chip technology, the SIMD operations can more efficiently skip over unneeded data. With database tables in the tens of billions of rows and SQL statements which need to be executed tens of millions of times to process daily transactions, every extra byte of data transferred can make a huge performance impact on overall I/Os, CPU and SQL performance.

How to do bitwise operations in DB2. SQL Tips for DB2 LUW. BITAND, BITANDNOT, BITOR, BITXOR, and BITNOT.

Use SMALLINT, INTEGER, BIGINT types?

(Aggregate Bitwise OR on a column)[https://www.sqlservercentral.com/Forums/Topic1627609-3077-1.aspx]

You can do it by breaking each permission down into a sum of powers of two.

IN predicate. values — Create Rows out of Nothing.

When a row-value-expression is specified, the IN predicate compares values with a collection of values.


expression1 IN (expressiona, expressionb, ...)

Examples of grouping sets, rollup, and cube queries.
group-by-clause.
GROUPING.

When used in conjunction with grouping-sets and super-groups, the GROUPING function returns a value that indicates if a row returned in a GROUP BY result is a row generated by a grouping set that excludes the column represented by expression.

Pivoting tables in DB2 (rows to columns)
https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/pivoting_tables56?lang=en
https://stackoverflow.com/questions/13579143/how-can-i-pivot-a-table-in-db2
https://stackoverflow.com/questions/47561974/pivoting-a-table-in-iseries-db2-dynamically
https://stackoverflow.com/questions/42492501/db2-pivot-rows-to-columns
https://stackoverflow.com/questions/26603519/db2-convert-rows-to-column
https://stackoverflow.com/questions/15529107/pivoting-in-db2

Creation of materialized query tables. paper. Creating and Editing Materialized Query Tables. Immediate refresh materialized views (MQT) with OUTER JOIN under DB2. An introduction to materialized query tables.


A materialized query table (MQT) is a table whose definition is based upon the result of a query. The data that is contained in an MQT is derived from one or more tables on which the materialized query table definition is based. Summary tables (or automatic summary tables, ASTs), which are familiar to IBM® DB2® Universal Database™ (UDB) for Linux, UNIX®, and Windows® (DB2 UDB) users, are considered to be a specialized type of MQT. The fullselect that is part of the definition of a summary table contains a GROUP BY clause summarizing data from the tables that are referenced in the fullselect.

Summary tables.

DB2® Warehouse uses DB2 summary tables to improve the performance of queries issued to cube models and cubes. A summary table is a special type of a materialized query table (MQT) that specifically includes summary data.
Because the Optimization Advisor always recommends MQTs with summarized data, the term summary table is used in the DB2 Warehouse documentation to describe the recommended MQTs.


You can complete expensive calculations and joins for your queries ahead of time and store that data in a summary table. When you run queries that can use the precomputed data, DB2 will reroute the queries to the summary table. A query does not need to match the precomputed calculations exactly. If you use simple analytics like SUM and COUNT, DB2 can dynamically aggregate the results from the precomputed data. Many different queries can be satisfied by one summary table. Using summary tables can dramatically improve query performance for queries that access commonly used data or that involve aggregated data over one or more dimensions or tables.

OLAP specification.
DB2 advanced OLAP.
LAG, LEAD, FIRST_VALUE, LAST_VALUE. more.
Distinct types.
Date arithmetic.
Using recursive queries. generating a range of dates.
Examples of recursive common table expressions. Recursive query in DB2 to get all items in the chain.
WEEK_ISO. WEEK. Fun with dates and times
. DAYOFWEEK_ISO.
Writing and tuning queries for optimal performance
. Writing SQL statements. Using constraints to improve query optimization. Improving insert performance. Understanding DB2 Query Access Plans.
Check query plan. DB2 LUW Execution Plan Operations. Paging Through Results.
Fetching a limited number of rows. The (unknown) benefits of FETCH FIRST in DB2 for z/OS SQL. Equivalent of LIMIT for DB2
. DB2 and PHP Best practices on IBM i. optimize-clause. Optimizing retrieval for a small set of rows.

OPTIMIZE FOR 100 ROWS tells the optimizer to “optimize” for that number. But if you specify OPTIMIZE FOR 100 ROWS, DB2 does not limit how many rows you can fetch


The optimize-clause tells Db2® to assume that the program does not intend to retrieve more than integer rows from the result table. Without this clause, Db2 assumes that all rows of the result table will be retrieved, unless the FETCH FIRST clause is specified. Optimizing for integer rows can improve performance. If this clause is omitted and the FETCH FIRST is specified, OPTIMIZE FOR integer ROWS is assumed, where integer is the value that is specified in the FETCH FIRST clause. Db2 will optimize the query based on the specified number of rows.

Spring Batch using Db2.
queryno-clause.

For correlating SQL statement text with EXPLAIN output in the plan table

https://www.ibm.com/blogs/cloud-computing/2018/05/08/ibm-red-hat-expand-partnership-cloud/
https://developer.ibm.com/recipes/tutorials/ibm-db2-on-ibm-cloud-private-with-redhat-openshift/
https://developer.ibm.com/articles/dm-1602-db2-docker-trs/
https://access.redhat.com/documentation/en-us/red_hat_jboss_bpm_suite/6.0/html/installation_guide/special_setup_for_ibm_db2_database
https://www.ibm.com/cloud/partners/ibm-redhat
https://access.redhat.com/solutions/3530941
https://blog.openshift.com/openshift-connecting-database-using-port-forwarding/
One of the benefits of OpenShift over a traditional Platform-as-a-Service (PaaS) is that you can have access to persistent volumes. This means you can attach storage to your web applications or run applications such as databases.
https://docs.openshift.com/container-platform/3.3/dev_guide/integrating_external_services.html
http://www.middlecon.se/wp-content/uploads/Data_Server_Day-Db2_Private_Cloud-2018-May-09.pdf
https://use-the-index-luke.com/sql/explain-plan/db2/filter-predicates
https://use-the-index-luke.com/sql/explain-plan/db2/operations
REOPT

The REOPT option specifies whether DB2® determines an access path at run time by using the values of host variables, parameter markers, and special registers.

Differences between static and dynamic SQL.

For static SQL statements that have input host variables, the time at which DB2 determines the access path depends on the REOPT bind option that you specify: REOPT(NONE) or REOPT(ALWAYS). REOPT(NONE) is the default. Do not specify REOPT(AUTO) or REOPT(ONCE); these options are applicable only to dynamic statements. DB2 ignores REOPT(ONCE) and REOPT(AUTO) for static SQL statements, because DB2 caches only dynamic SQL statements.

Reoptimizing SQL statements at run time.
Specifying optimization parameters at the statement level. DSN_USERQUERY_TABLE.

The unique identifier of the query, used to correlate with PLAN_TABLE rows for statement-level access paths.

DB2 How To Measure Transaction Log Write Time ?
dependent tables in DB2. post.
recursive sql in DB2
predicate manipulation
db2 isolation levels
know your isolation levels

If the program used an RR isolation level rather than CS, an UPDATE that occurs after the production of the first report but before the second would not have been allowed. The program would have maintained the locks it held from the generation of the first report and the updater would be locked out until the locks were released.

Currently committed semantics

The potential for deadlocks also varies with the isolation level.

A Close Look at the Index Include Clause.
The #IBM #Db2 Samples are now available on GitHub to share, improve and extend!

  
## _MySQL.md

      
    Raw
  

              _MySQL.md
            
          
See here to install mysql on ubuntu.


mysqladmin to create databases.


mysql -p -u username userdb to connect.


mysql -p -u username userdb < foo.sql to load a file


MySQL lacks the RANK() analytic function.


string functions


Splittin comma-separated values in MySQL


cursors in mysql cursor trouble continue handlers


substring


For all forms of SUBSTRING(), the position of the first character in the string from which the substring is to be extracted is reckoned as 1.


MySQL and CTEs


range of dates another solution with recursive cte, not posible in MySQL


MySQL infrastructure testing automation at GitHub


How to Deal with XA Transactions Recovery


MySQL vs. MariaDB: Reality Check


memory saturated mysql


example custom function

DELIMITER ;;

CREATE FUNCTION hello (s CHAR(20))
RETURNS CHAR(50) DETERMINISTIC
BEGIN
    RETURN CONCAT('Hello, ',s,'!');
END;;

DELIMITER ;


MySQL High Availability at GitHub hn


a kind of interesting tradeoff in distributed databases: using consensus to store your data, or using consensus to point at the master for the data.


Replicating data in consensus groups gives some nice guarantees, in particular it can guarantee monotonic read consistency even if you're not reading from the leader, at the cost of a network round-trip. Switching between master-slave replicas might cause you to go back and forth in time, even if replication is synchronous.


There are also some potential availability benefits to storing data in consensus groups, though most implementations use leader leases to improve performance, which reduces availability in case of leader failure. A master-slave system that does synchronous replication could do failovers that are just as fast as leader changes in a consensus protocol.


The pain of storing data in consensus groups is in performance, especially concurrency. Every write to the log requires some synchronization with a quorum to ensure consensus on the order. That introduces extra round-trips and makes it hard to handle high concurrency.

REPEATABLE-READ and READ-COMMITTED Transaction Isolation Levels
Deadlocks 101 – MySQL
choose something else

  
## _Oracle.md

      
    Raw
  

              _Oracle.md
            
          
    Data Concurrency and Consistency

  
## _PostgreSQL.md

      
    Raw
  

              _PostgreSQL.md
            
          
Customizing My Postgres Shell


Why favor PostgreSQL over MariaDB/MySQL?


Better Database Migrations in Postgres


Six ways to paginate in Postgres


Mastering PostgreSQL in Application Developement


What it means to be a postgresql extension


PostgreSQL anti-patterns: read-modify-write cycles


Isolation levels


Applications using this level must be prepared to retry transactions due to serialization failures.


The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures. In fact, this isolation level works exactly the same as Repeatable Read except that it monitors for conditions which could make execution of a concurrent set of serializable transactions behave in a manner inconsistent with all possible serial (one at a time) executions of those transactions. This monitoring does not introduce any blocking beyond that present in repeatable read, but there is some overhead to the monitoring, and detection of the conditions which could cause a serialization anomaly will trigger a serialization failure.


Declare transactions as READ ONLY when possible.


A-ha! It seems that SERIALIZABLE has race conditions into account.

PostgreSQL anti-patterns: read-modify-write cycles
PostgreSQL anti-patterns: Unnecessary json/hstore dynamic columns
Solving Schema Bloat with Postgresql's JSONB counterpoints
Postgres Indexes Under the Hood
The internals of Postgres
Parallel Index Scans In PostgreSQL
transaction isolation concurrency in Postgres slides Why is CTE open to lost updates? read-modify-update cycles

Predicate locks in PostgreSQL, like in most other database systems, are based on data actually accessed by a transaction. These will show up in the pg_locks system view with a mode of SIReadLock. The particular locks acquired during execution of a query will depend on the plan used by the query, and multiple finer-grained locks (e.g., tuple locks) may be combined into fewer coarser-grained locks (e.g., page locks) during the course of the transaction to prevent exhaustion of the memory used to track the locks.


The Repeatable Read mode provides a rigorous guarantee that each transaction sees a completely stable view of the database. However, this view will not necessarily always be consistent with some serial (one at a time) execution of concurrent transactions of the same level. For example, even a read only transaction at this level may see a control record updated to show that a batch has been completed but not see one of the detail records which is logically part of the batch because it read an earlier revision of the control record. Attempts to enforce business rules by transactions running at this isolation level are not likely to work correctly without careful use of explicit locks to block conflicting transactions.

Scalable incremental data aggregation on Postgres and Citus hn
postgres features
PostgreSQL 11’s Support for SQL Standard GROUPS and EXCLUDE Window Function Clauses
Have you ever chosen Postgres over Mongo and regretted it?
Batch Updates and Concurrency
Fast Full-Text Search in PostgreSQL
psql Tips and Tricks
What is the difference between LATERAL and a subquery in PostgreSQL?.
PostgreSQL 11’s Support for SQL Standard GROUPS and EXCLUDE Window Function Clauses.

ROWS counts the exact number of rows in the frame. RANGE performs logical windowing where we don’t count the number of rows, but look for a value offset. GROUPS counts all groups of tied rows within the window.

Standard SQL Gap Analysis -- PGCon 2018.
Bye bye Mongo, Hello Postgres.
The Internals of PostgreSQL for database administrators and system developers
vertically scaling pg
PostgreSQL Concurrency: Isolation and Locking

It took about 20 years for the research community to come up with a satisfying mathematical model for implementing serializable snapshot isolation in an efficient way, and then a single year for that major progress to be included in PostgreSQL!

Serializable

Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions. Where conflicts with other transactions would result in an inconsistent state within the database or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly. The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.


Before version 9.1, PostgreSQL did not support a full serializable isolation level. A request for serializable transaction isolation actually provided snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database. Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.


A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[1][2]. This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation. In particular, reads don't block anything and writes don't block reads. Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur. It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur. In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.

	A short list of common mistakes in PostgreSQL. hn.
Postgres 11

I use transactional DDL in my tests. All the tables, triggers, etc. are set up inside a transaction, and then the actual tests run inside nested transactions. At the end of the test run, the outer transaction gets rolled back, and everything disappears.
I don't know if it accomplishes anything truly new (other than ideas that aren't very useful in practice like being able to have multiple test runs going in parallel), but it's a pretty neat way to be able to do it and works well.


Transactional tests have some downsides, unfortunately. If your tests test transactional code, that code itself cannot create transactions; they have to use savepoints, which aren't quite the same. Transactional tests also don't work with testing anything concurrent, unless you share the session across threads/goroutines/whatever.
Lastly, if a test fails you'd typically like to leave the data behind so that you can inspect it. A transactional test that rolls back on failure won't allow that.


Save points, with proper management of them, seem to match a conceptual nested transaction as far as I've seen. We've got a test bed connection manager that mocks savepoint creation, rollback and committal into transaction management functions so doing something like beginTransaction && beginTransaction works fine.

atomic transactions
Generated columns
PostgreSQL example of self-contained stored procedures

For data integrity, I've always found databases not expressive enough. Sanity checks, okay, but you can't capture a non-trivial domain in some custom data types, check constraints and foreign keys. Even if you introduce stored procedures that will be responsible for keeping everything proper, you need to go crazy with permissions to block circumventing those. Might as well build your own API then. (I do find it difficult where to draw the line when it comes to what to enforce in the database still.)


"But you don't need another API! Just use the database as one!" Then I ask how they do testing, and the answer basically is: "We don't make mistakes or we find out about them (in production) soon enough." That pretty much ends the discussion for me. Surely there must be ways to devise a basic testing framework for stored procedures, but why bother? I don't want to spin up a database just to test some logic. Never mind testing, what about refactoring?

System design hack: Postgres is a great pub/sub and job server 
analysis of a performance regression
Let's dig into some #postgres and #linux internals
Postgresqlco.nf: a PostgreSQL configuration guide
The State of (Full) Text Search in PostgreSQL 12
opinions on postgres
How do PostgreSQL triggers work depending on the current transaction isolation level
things I hate about postgres
How do PostgreSQL advisory locks work
Our love story with Deadlocks (PostgreSQL Edition). lobsters
PostgreSQL repeatable read isn't repeatable read: it's snapshot isolation, which allows a specific class of G2-item anomalies (those with nonadjacent rw edges prohibited under formalizations of repeatable read.). hn
Mastering PostgreSQL Administration
Measuring the Memory Overhead of a Postgres Connection

I have followed various snippets of community folklore advice on this topic over the years (many of them suggesting that the proper number of connections should relate somehow to the number of CPU cores). It was, therefore, refreshing to see one of the core developers, Bruce Momjian, publish modern advice[0] based on empirical observations he made while working on real client problems through Enterprise DB. Spoiler: on modern hardware, without an over-agressive shared_buffer config setting[1], up to 250 direct connections to the DB may be made without detrimental impact.

partial indexes

Partial indexes are amazing but you have to keep in mind some pecularities.


If your query doesn't contain a proper match with the WHERE clause of the index - the index will not be used. It is easy to forget about it or to get it wrong in subtle ways.

Do You Really Need Redis? How to Get Away with Just PostgreSQL
Prevent Overlapping Ranges with Database-Level Exclusion Constraints
operating PostgreSQL at scale over the long run
Debugging random slow writes in PostgreSQL
	Hierarchical Structures in PostgreSQL . Models for hierarchical data

READ COMMITTED anomalies in PostgreSQL. tweet.
Postgres Full-Text Search: A search engine in a database .
How PostgreSQL aggregation works and how it inspired our hyperfunctions’ design
	In Praise of PostgreSQL 
	Working with Postgres Types 

  
## _sqlite.md

      
    Raw
  

              _sqlite.md
            
          
    Inserting One Billion Rows in SQLite Under a Minute 
Atomic Commit In SQLite