Skip to content

Instantly share code, notes, and snippets.

@frodsan
Last active December 6, 2018 13:13
Show Gist options
  • Save frodsan/b328c39b9ed42ca74948a07c07273a1b to your computer and use it in GitHub Desktop.
Save frodsan/b328c39b9ed42ca74948a07c07273a1b to your computer and use it in GitHub Desktop.
SQL Indices

Sequential scans:

When the optimizer can't find an index for a query, this results in full table scan (or sequential scan). A Sequential scan will read each row of the entire table to check if the given conditions meet.

Table = [
  # List of rows
]

Row = Struct.new(:id, :first_name, :last_name)

def seq_scan_by(params)
  Table.select { |row| where(row, params) }
end

private

def where(row, conditions)
  conditions.all? { |column, value| row.send(column) == value }
end

This is the slowest scan method if you want to find a set of rows. Larger table, slower scan.

benchmark("10_000 rows")    { seq_scan_by(last_name: 'Doe') }
benchmark("100_000 rows")   { seq_scan_by(last_name: 'Doe') }
benchmark("1_000_000 rows") { seq_scan_by(last_name: 'Doe') }

10_000 rows: 10 ms
100_000 rows: 100 ms
1_000_000 rows: 1000 ms

This is not bad if the number of rows in the table are low, or if you actually are looking for most of the rows in the table.

This is how a sequential scan looks on the execution plan:

explain (costs false)
select * from users where last_name = 'Doe'

                 QUERY PLAN
---------------------------------------------
 Seq Scan on users
   Filter: ((last_name)::text = 'Doe'::text)

Index scans:

"An index makes a query faster"

def create_index(column)
  hash = Hash.new { |h, k| h[k] = []}

  Table.each do |row|
    value = row.send(column)

    if !value.nil?
      hash[value] << row
    end
  end

  hash.sort.to_h
end

def index_scan_by(index, condition)
  index[condition]
end

last_name_index = create_index(:last_name)

benchmark("10_000 rows")    { index_scan_by(last_name_index, 'Doe') }
benchmark("100_000 rows")   { index_scan_by(last_name_index, 'Doe') }
benchmark("1_000_000 rows") { index_scan_by(last_name_index, 'Doe') }

10_000 rows: 5 ms
100_000 rows: 5 ms
1_000_000 rows: 6 ms

Characteristics

  1. Physical: Holds copy of the indexed data:

  2. Referential: It referes to a information in another place (table).

  3. Mutable: It undergoes constant change.

They also differ slightly from each DB engine (e.g. https://eng.uber.com/mysql-migration/).

Execution plan

This is how an index scan looks in the query plan:

explain (costs false) select * from users where last_name = 'Doe';

                   QUERY PLAN
-------------------------------------------------
 Index Scan using idx_last_name on users
   Index Cond: ((last_name)::text = 'Doe'::text)

How to create an index?

CREATE INDEX <index_name> ON <table> (<columns ...>)

When to create an index?

It depends ¯_(ツ)_/¯

B-Tree indexes

Default index type.

More: https://use-the-index-luke.com/sql/anatomy/the-tree

Multi-column indexes:

Order of multi column indexes can make an impact:

Having this multi-column index:

CREATE INDEX idx_name on users(first_name, last_name)

I want to find users by last name but there is no index:

explain (costs false)
select * from users where last_name = 'Doe'

                 QUERY PLAN
---------------------------------------------
 Seq Scan on users
   Filter: ((last_name)::text = 'Doe'::text)

Instead of making an index on last_name, you can re-arrange the ordering of the index like:

CREATE INDEX idx_name on users(last_name, first_name)

and then the index will be used in the query:

explain (costs false)
select * from users where last_name = 'Doe'

QUERY PLAN
-------------------------------------------------
 Index Scan using idx_name on users
   Index Cond: ((last_name)::text = 'Doe'::text)

Functions

I want to make a case insensitive search by last name:

explain (costs false)
select * from users where LOWER(last_name) = LOWER('Doe')

                 QUERY PLAN
---------------------------------------------
 Seq Scan on users
   Filter: (lower((last_name)::text) = 'doe'::text)

You can create index on expressions:

CREATE INDEX idx_lower_name on users(LOWER(last_name))

and then:

explain (costs false)
select * from users where LOWER(last_name) = LOWER('Doe')

                       QUERY PLAN
--------------------------------------------------------
 Index Scan using idx_lower_name on users
   Index Cond: (lower((last_name)::text) = 'doe'::text)

If you want case insensitive text, you should look at the Citext type (https://www.postgresql.org/docs/current/citext.html).

Booleans

Don't do it ... You probably want a partial index:

CREATE INDEX idx_active_names on users(last_name) WHERE active = true

Pattern matching:

Indexes support prefix matching:

explain (costs false)
select * from users where last_name LIKE 'Rodr%'

                   QUERY PLAN
-------------------------------------------------
 Index Scan using idx_last_name on users
   Index Cond: ((last_name)::text ~~ 'Rodr%'::text)

Unfortunately, not suffix matching:

explain (costs false)
select * from users where last_name LIKE '%guez'

                   QUERY PLAN
-------------------------------------------------
Seq Scan on users
   Filter: ((last_name)::text ~~ '%guez'::text)

But there is a trick:

CREATE INDEX backsearch ON users (reverse(last_name))

This makes it look like a prefix match:

explain (costs false)
select * from users where reverse(last_name) LIKE reverse('%guez')

                   QUERY PLAN
-------------------------------------------------
 Index Scan using idx_last_name on users
   Index Cond: (reverse((last_name)::text) ~~ 'zeug%'::text)

Kappa.

More?

Book: https://use-the-index-luke.com Reference: https://www.postgresql.org/docs/current/indexes.html Postgres MVCC: https://malisper.me/postgres-mvcc/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment