Skip to content

Instantly share code, notes, and snippets.

@troykelly
Last active February 8, 2024 10:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save troykelly/616df024050dd50744dde4a9579e152e to your computer and use it in GitHub Desktop.
Save troykelly/616df024050dd50744dde4a9579e152e to your computer and use it in GitHub Desktop.
Rebuild and Refresh Collation Version for PostgreSQL

PostgreSQL Collation and Reindexing Utility

Overview

The reindex_and_refresh_collation.sh script is created to automate the process of reindexing tables and refreshing the collation version for all user databases in a PostgreSQL server. This utility is particularly useful when you've updated the locale or collation libraries on the system where PostgreSQL is installed, as these updates could cause mismatches between the databases' expected collation version and the operating system's provided version.

What It Does

  • Retrieves a list of all user databases (excluding system templates) on the PostgreSQL server.
  • Reindexes tables for each user database to ensure database performance is optimized.
  • Refreshes the collation version for each user database to stay in sync with the underlying system's locale settings.
  • Performs a follow-up check for any collation mismatches within the tables' column metadata.

Why It Was Created

This script was written to provide PostgreSQL administrators with a straightforward solution to upkeep databases following system collation updates, avoiding manual repetition and reducing human error. It is especially helpful after updates to the operating system's collation libraries to ensure that databases continue to function correctly and to prevent any potential issues with string sorting and comparisons.

Risks and Precautions

Running this script can be resource-intensive and may affect your database performance while the reindexing process is underway. This can be critical for large databases or production systems. Here are some precautions and risks to consider:

  • Downtime: The REINDEX operation may lock tables, preventing writes (and possibly reads, depending on the PostgreSQL version and reindex method). Plan to run this during a maintenance window or at off-peak times.
  • Disk Space: Reindexing can require a significant amount of disk space, temporarily duplicating the size of the indexes.
  • Backups: Ensure that your databases are fully backed up before running this script.

Usage Example in postgis Docker Container

To use this script within a postgis Docker container, follow these steps:

  1. Enter the running postgis container using docker exec:
docker exec -it <container_id_or_name> /bin/bash
  1. Inside the container, install curl (and optionally other utilities you might need):
apt-get update
apt-get install -y curl
  1. Download the script using curl, set execute permissions, and run it:
curl -Lo reindex_and_refresh_collation.sh https://gist.githubusercontent.com/troykelly/616df024050dd50744dde4a9579e152e/raw/reindex_and_refresh_collation.sh
chmod +x reindex_and_refresh_collation.sh
./reindex_and_refresh_collation.sh

Make sure to replace <container_id_or_name> with your actual container ID or name.

  1. (Optional) Cleanup the apt-get list and remove the installed packages once the operation is complete to keep the container size small:
apt-get purge -y curl
apt-get autoremove -y
apt-get clean
rm -rf /var/lib/apt/lists/*

Encouraging Contributions

Feel free to fork this gist, submit improvements, or raise issues if you encounter problems. We welcome any contributions that enhance the functionality or reliability of this tool.

#!/bin/bash
# This script reindexes tables and refreshes collation versions for all user databases
# and specifically for template1 on a PostgreSQL server. It checks for collation mismatches
# after the operation is completed, including within template1.
# Ensure required environment variables are set for PostgreSQL.
: "${POSTGRES_HOST?Environment variable POSTGRES_HOST needs to be set}"
: "${POSTGRES_PORT?Environment variable POSTGRES_PORT needs to be set}"
: "${POSTGRES_USER?Environment variable POSTGRES_USER needs to be set}"
: "${POSTGRES_PASSWORD?Environment variable POSTGRES_PASSWORD needs to be set}"
# Export POSTGRES_PASSWORD to be used by psql for non-interactive authentication.
export PGPASSWORD=$POSTGRES_PASSWORD
# Function to list all user databases and also include 'template1'.
# Outputs only the database names to stdout.
list_databases() {
# Print an informational message to stderr to avoid contaminating the stdout output.
>&2 echo "Retrieving list of databases from the PostgreSQL server..."
# Connect to the 'postgres' system database and retrieve the list of database names.
# Include 'template1' but exclude 'template0' explicitly.
psql --host="$POSTGRES_HOST" --port="$POSTGRES_PORT" --username="$POSTGRES_USER" --dbname="postgres" -Atc \
"SELECT datname FROM pg_database WHERE datistemplate = false OR datname = 'template1';"
}
# Reindex tables within the provided database.
# Apply reindexing selectively to 'template1' and normal user databases.
reindex_tables() {
local db=$1
if [ "$db" != "template0" ]; then # Skip template0
# Print an informational message to stderr.
>&2 echo "Reindexing tables in database: $db..."
# Issue the REINDEX command to reindex user tables in the specified database.
psql --host="$POSTGRES_HOST" --port="$POSTGRES_PORT" --username="$POSTGRES_USER" --dbname="$db" -c \
"REINDEX SCHEMA CONCURRENTLY public;"
fi
}
# Refresh the database collation version.
# Apply refresh selectively to 'template1' and normal user databases.
refresh_collation_version() {
local db=$1
if [ "$db" != "template0" ]; then # Skip template0
# Print an informational message to stderr.
>&2 echo "Refreshing collation version for database: $db..."
# Alter the database to refresh the collation version.
psql --host="$POSTGRES_HOST" --port="$POSTGRES_PORT" --username="$POSTGRES_USER" --dbname="$db" -c \
"ALTER DATABASE \"$db\" REFRESH COLLATION VERSION;"
fi
}
# Check for collation mismatches in all databases after the operations.
check_collation_mismatches() {
# Print an informational message to stderr.
>&2 echo "Checking for collation mismatches in all databases..."
# Loop through each database and check for mismatching collations in table columns.
while IFS= read -r db; do
if [ -n "$db" ]; then
# Print an informational message to stderr.
>&2 echo "Checking database: $db for collation mismatches..."
local mismatches=$(psql --host="$POSTGRES_HOST" --port="$POSTGRES_PORT" --username="$POSTGRES_USER" --dbname="$db" -Atc \
"SELECT 'Mismatch in table ' || table_name || ' column ' || column_name || ' with collation ' || collation_name
FROM information_schema.columns
WHERE collation_name IS NOT NULL AND collation_name <> 'default' AND table_schema = 'public'
EXCEPT
SELECT 'No mismatch - default collation of ' || datcollate || ' used.'
FROM pg_database WHERE datname = '$db';"
)
if [ -z "$mismatches" ]; then
# Print an informational message to stderr.
>&2 echo "No collation mismatches found in database: $db"
else
# Print an informational message to stderr.
>&2 echo "Collation mismatches found in database: $db:"
>&2 echo "$mismatches"
fi
fi
done
}
# Main script execution starts here.
# Print start message to stderr.
>&2 echo "Starting the reindexing and collation refresh process for all databases, including template1..."
# Retrieve the list of databases and store the result in a variable.
databases=$(list_databases)
# Check for an empty list of databases.
if [ -z "$databases" ]; then
>&2 echo "No databases found for reindexing or collation refresh. Please check connection details to PostgreSQL server."
exit 1
fi
# Process each database to reindex and refresh collation version.
# 'template1' is included for reindex and refresh, while 'template0' is skipped.
for db in $databases; do
reindex_tables "$db"
refresh_collation_version "$db"
done
# Checking for collation mismatches after reindexing and collation refresh.
# Pass the list of databases to the check_collation_mismatches function through stdin.
echo "$databases" | check_collation_mismatches
# Print completion message to stderr.
>&2 echo "Reindexing and collation refresh process completed."
# Clean up by unsetting PGPASSWORD.
unset PGPASSWORD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment