AndreKR/mysql_to_postgresql.txt

## mysql_to_postgresql.txt
To help you with the process of converting a MySQL app to PostgreSQL, I collected a list of differences between MySQL and PostgreSQL (PG).

Important changes:

* Strings are quoted with '...' or with $token$...$token$. Single-quotes are escaped with single-quotes.
* Identifiers are folded to lowercase, unless they are quoted with "..." which makes them case-sensitive
* The max identifier length is 63
* There are some system columns that cannot be used as an identifier (probably not a problem)
* Expressions are evaluated in arbitrary order, so WHERE x > 0 AND y/x > 1.5 can lead to a division by zero and has to be replaced with WHERE CASE WHEN y > 0 THEN y/x > 1.5 ELSE FALSE END
* SELECT CASE WHEN x > 0 THEN x ELSE 1/0 END would still lead to a division by zero because the constant 1/0 is evaluated even before the query starts
* Permissions can only be managed on the database level. (And row level with Row Security Policies.)

Types:

* SERIAL instead of AUTO_INCREMENT
* VARCHAR (for text) doesn't need a length limit, is called text and does not allow 0-bytes and invalid UTF-8 characters
* VARCHAR (for binary data) doesn't need a length limit and is called bytea
* DATETIME is called timestamp
* Casts are done by simply prepending the type (or appending ::type)
* There's a proper boolean type
* ENUM types have to be created explicitly or use text (see below)
* There is a jsonb type (and json which preserves the original text)
* There is a GUID type
* There are IP address types
* There are geometric types
* There are range types (time ranges, price ranges, etc.) that can be unioned, etc.

Features we should use to replace equivalent MySQL constructs:

* timestamp with time zone
* FULL OUTER JOIN
* Foreign keys (`something_id integer references somethings` points to the primary key of `somethings`)
* FILTER for aggregate functions
* Additional constraints with CHECK
* VALUES() subselects instead of SELECT 'a' UNION SELECT 'b' ...
* Condition pushdown and subquery caching
* Functionally dependent columns are properly detected, so we don't have to give all columns in GROUP BY if we give a unique key
* Grouping sets are WITH ROLLUP on steroids
* We can define distinctness with SELECT DISTINCT ON ...
* Besides UNION we also have intersection and non-symmetric diff
* Expression indexes
* Partial indexes
* RECURSIVE queries
* DELETE/UPDATE + RETURNING
* DELETE/UPDATE + RETURNING + WITH (note that they are running at the same time)
* Arbitrary precision (but slow) math with NUMERIC (= DECIMAL)
* NUMERIC does not need to be limited in scale and precision
* BETWEEN SYMMETRIC (like BETWEEN but works backwards)
* IS DISTINCT FROM combines = with IS NULL
* There are more string operators and fuctions
* Hahaha, there is a square root operator and it's |/
* JSON instead of GROUP_CONCAT
* Index sort order is honored
* OR is not ridiculously expensive because indexes can be combined to some extent

Additional features:

* Data migrations with USING
* UNIQUE indexes can contain more than one NULL value
* Default values can be expressions
* RANK() and other aggregate functions
* VIEWs
* Indexes with operator classes
* ARRAYs and unnest()
* Multiple unnest()s has two modes: same number of elements and different number of elements
* JOINs to set returning functions like unnest() or generate_series()
* Foreign keys can have some NULL values (see `MATCH`)
* Bit string, text search and XML types
* There are array types
* There are custom composite types (structs)
* We can run EXPLAIN on UPDATE queries if we use ROLLBACK

Finicky features:

* Window functions (GROUPs over other rows that share some property)
* Table inheritance
* Row security policies
* Table subselects can be named with WITH

Useless features:
* money type

Other features:

* Owners
* If permission has been given with WITH GRANT OPTION, it is revoked when the GRANT OPTION is taken away


ENUMs:

There are three typical ways to represent ENUMs in PostgreSQL:

* User-defined ENUM type
* text with CHECK constraint
* ID and lookup table

The text solution isn't as bad as one might think:

                                 ENUM                               text
 Key               ---------------------------------- -----------------------------------
 Length   Rows      Storage*   Index   Toast   Table   Storage*   Index   Toast   Table   Loss of text
-------- --------- ---------- ------- ------- ------- ---------- ------- ------- ------- --------------
      1   1000000          4      21       0      42          2      21       0      42           0 MB
      4   1000000          4      21       0      42          5      21       0      42           0 MB
      7   1000000          4      21       0      42          8      21       0      42           0 MB
     10   1000000          4      21       0      42         11      21       0      50           8 MB
      7   5000000          4     107       0     211          8     107       0     211           0 MB

* Storage requirement according to docs
	To help you with the process of converting a MySQL app to PostgreSQL, I collected a list of differences between MySQL and PostgreSQL (PG).

	Important changes:

	* Strings are quoted with '...' or with $token$...$token$. Single-quotes are escaped with single-quotes.
	* Identifiers are folded to lowercase, unless they are quoted with "..." which makes them case-sensitive
	* The max identifier length is 63
	* There are some system columns that cannot be used as an identifier (probably not a problem)
	* Expressions are evaluated in arbitrary order, so WHERE x > 0 AND y/x > 1.5 can lead to a division by zero and has to be replaced with WHERE CASE WHEN y > 0 THEN y/x > 1.5 ELSE FALSE END
	* SELECT CASE WHEN x > 0 THEN x ELSE 1/0 END would still lead to a division by zero because the constant 1/0 is evaluated even before the query starts
	* Permissions can only be managed on the database level. (And row level with Row Security Policies.)

	Types:

	* SERIAL instead of AUTO_INCREMENT
	* VARCHAR (for text) doesn't need a length limit, is called text and does not allow 0-bytes and invalid UTF-8 characters
	* VARCHAR (for binary data) doesn't need a length limit and is called bytea
	* DATETIME is called timestamp
	* Casts are done by simply prepending the type (or appending ::type)
	* There's a proper boolean type
	* ENUM types have to be created explicitly or use text (see below)
	* There is a jsonb type (and json which preserves the original text)
	* There is a GUID type
	* There are IP address types
	* There are geometric types
	* There are range types (time ranges, price ranges, etc.) that can be unioned, etc.

	Features we should use to replace equivalent MySQL constructs:

	* timestamp with time zone
	* FULL OUTER JOIN
	* Foreign keys (`something_id integer references somethings` points to the primary key of `somethings`)
	* FILTER for aggregate functions
	* Additional constraints with CHECK
	* VALUES() subselects instead of SELECT 'a' UNION SELECT 'b' ...
	* Condition pushdown and subquery caching
	* Functionally dependent columns are properly detected, so we don't have to give all columns in GROUP BY if we give a unique key
	* Grouping sets are WITH ROLLUP on steroids
	* We can define distinctness with SELECT DISTINCT ON ...
	* Besides UNION we also have intersection and non-symmetric diff
	* Expression indexes
	* Partial indexes
	* RECURSIVE queries
	* DELETE/UPDATE + RETURNING
	* DELETE/UPDATE + RETURNING + WITH (note that they are running at the same time)
	* Arbitrary precision (but slow) math with NUMERIC (= DECIMAL)
	* NUMERIC does not need to be limited in scale and precision
	* BETWEEN SYMMETRIC (like BETWEEN but works backwards)
	* IS DISTINCT FROM combines = with IS NULL
	* There are more string operators and fuctions
	* Hahaha, there is a square root operator and it's \|/
	* JSON instead of GROUP_CONCAT
	* Index sort order is honored
	* OR is not ridiculously expensive because indexes can be combined to some extent

	Additional features:

	* Data migrations with USING
	* UNIQUE indexes can contain more than one NULL value
	* Default values can be expressions
	* RANK() and other aggregate functions
	* VIEWs
	* Indexes with operator classes
	* ARRAYs and unnest()
	* Multiple unnest()s has two modes: same number of elements and different number of elements
	* JOINs to set returning functions like unnest() or generate_series()
	* Foreign keys can have some NULL values (see `MATCH`)
	* Bit string, text search and XML types
	* There are array types
	* There are custom composite types (structs)
	* We can run EXPLAIN on UPDATE queries if we use ROLLBACK

	Finicky features:

	* Window functions (GROUPs over other rows that share some property)
	* Table inheritance
	* Row security policies
	* Table subselects can be named with WITH

	Useless features:
	* money type

	Other features:

	* Owners
	* If permission has been given with WITH GRANT OPTION, it is revoked when the GRANT OPTION is taken away


	ENUMs:

	There are three typical ways to represent ENUMs in PostgreSQL:

	* User-defined ENUM type
	* text with CHECK constraint
	* ID and lookup table

	The text solution isn't as bad as one might think:

	ENUM text
	Key ---------------------------------- -----------------------------------
	Length Rows Storage* Index Toast Table Storage* Index Toast Table Loss of text
	-------- --------- ---------- ------- ------- ------- ---------- ------- ------- ------- --------------
	1 1000000 4 21 0 42 2 21 0 42 0 MB
	4 1000000 4 21 0 42 5 21 0 42 0 MB
	7 1000000 4 21 0 42 8 21 0 42 0 MB
	10 1000000 4 21 0 42 11 21 0 50 8 MB
	7 5000000 4 107 0 211 8 107 0 211 0 MB

	* Storage requirement according to docs