Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
To help you with the process of converting a MySQL app to PostgreSQL, I collected a list of differences between MySQL and PostgreSQL (PG).
Important changes:
* Strings are quoted with '...' or with $token$...$token$. Single-quotes are escaped with single-quotes.
* Identifiers are folded to lowercase, unless they are quoted with "..." which makes them case-sensitive
* The max identifier length is 63
* There are some system columns that cannot be used as an identifier (probably not a problem)
* Expressions are evaluated in arbitrary order, so WHERE x > 0 AND y/x > 1.5 can lead to a division by zero and has to be replaced with WHERE CASE WHEN y > 0 THEN y/x > 1.5 ELSE FALSE END
* SELECT CASE WHEN x > 0 THEN x ELSE 1/0 END would still lead to a division by zero because the constant 1/0 is evaluated even before the query starts
* Permissions can only be managed on the database level. (And row level with Row Security Policies.)
Types:
* SERIAL instead of AUTO_INCREMENT
* VARCHAR (for text) doesn't need a length limit, is called text and does not allow 0-bytes and invalid UTF-8 characters
* VARCHAR (for binary data) doesn't need a length limit and is called bytea
* DATETIME is called timestamp
* Casts are done by simply prepending the type (or appending ::type)
* There's a proper boolean type
* ENUM types have to be created explicitly or use text (see below)
* There is a jsonb type (and json which preserves the original text)
* There is a GUID type
* There are IP address types
* There are geometric types
* There are range types (time ranges, price ranges, etc.) that can be unioned, etc.
Features we should use to replace equivalent MySQL constructs:
* timestamp with time zone
* FULL OUTER JOIN
* Foreign keys (`something_id integer references somethings` points to the primary key of `somethings`)
* FILTER for aggregate functions
* Additional constraints with CHECK
* VALUES() subselects instead of SELECT 'a' UNION SELECT 'b' ...
* Condition pushdown and subquery caching
* Functionally dependent columns are properly detected, so we don't have to give all columns in GROUP BY if we give a unique key
* Grouping sets are WITH ROLLUP on steroids
* We can define distinctness with SELECT DISTINCT ON ...
* Besides UNION we also have intersection and non-symmetric diff
* Expression indexes
* Partial indexes
* RECURSIVE queries
* DELETE/UPDATE + RETURNING
* DELETE/UPDATE + RETURNING + WITH (note that they are running at the same time)
* Arbitrary precision (but slow) math with NUMERIC (= DECIMAL)
* NUMERIC does not need to be limited in scale and precision
* BETWEEN SYMMETRIC (like BETWEEN but works backwards)
* IS DISTINCT FROM combines = with IS NULL
* There are more string operators and fuctions
* Hahaha, there is a square root operator and it's |/
* JSON instead of GROUP_CONCAT
* Index sort order is honored
* OR is not ridiculously expensive because indexes can be combined to some extent
Additional features:
* Data migrations with USING
* UNIQUE indexes can contain more than one NULL value
* Default values can be expressions
* RANK() and other aggregate functions
* VIEWs
* Indexes with operator classes
* ARRAYs and unnest()
* Multiple unnest()s has two modes: same number of elements and different number of elements
* JOINs to set returning functions like unnest() or generate_series()
* Foreign keys can have some NULL values (see `MATCH`)
* Bit string, text search and XML types
* There are array types
* There are custom composite types (structs)
* We can run EXPLAIN on UPDATE queries if we use ROLLBACK
Finicky features:
* Window functions (GROUPs over other rows that share some property)
* Table inheritance
* Row security policies
* Table subselects can be named with WITH
Useless features:
* money type
Other features:
* Owners
* If permission has been given with WITH GRANT OPTION, it is revoked when the GRANT OPTION is taken away
ENUMs:
There are three typical ways to represent ENUMs in PostgreSQL:
* User-defined ENUM type
* text with CHECK constraint
* ID and lookup table
The text solution isn't as bad as one might think:
ENUM text
Key ---------------------------------- -----------------------------------
Length Rows Storage* Index Toast Table Storage* Index Toast Table Loss of text
-------- --------- ---------- ------- ------- ------- ---------- ------- ------- ------- --------------
1 1000000 4 21 0 42 2 21 0 42 0 MB
4 1000000 4 21 0 42 5 21 0 42 0 MB
7 1000000 4 21 0 42 8 21 0 42 0 MB
10 1000000 4 21 0 42 11 21 0 50 8 MB
7 5000000 4 107 0 211 8 107 0 211 0 MB
* Storage requirement according to docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment