Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Ordering a query result set by an arbitrary list in PostgreSQL

I'm hunting for the best solution on how to handle keeping large sets of DB records "sorted" in a performant manner.

Problem Description

Most of us have work on projects at some point where we have needed to have ordered lists of objects. Whether it be a to-do list sorted by priority, or a list of documents that a user can sort in whatever order they want.

A traditional approach for this on a Rails project is to use something like the acts_as_list gem, or something similar. These systems typically add some sort of "postion" or "sort order" column to each record, which is then used when querying out the records in a traditional order by position SQL query.

This approach seems to work fine for smaller datasets, but can be hard to manage on large data sets with hundreds (or thousands) of records needing to be sorted. Changing the sort position of even a single object will require updating every single record in the database that is in the same sort group. This requires potentially thousands of writes to the database for every sort order update.

Possible Solutions

One typical alternative approach is to store a single, serialized column that contains the full sort order (for instance a serialized array column that contains the ordered list of record ids). Using this approach, we are now required to query the records out from the database using this specified order.

Ideally, it would be nice if the database just used the order of the in clause so a query of select * from foo where id in (67,23,1362,24) returned the records in that order, but I'm not aware of a database that does this.

MySQL has a built-in ORDER BY Field() clause where you could do something like ORDER BY FIELD(67,23,1362,24)

PostgreSQL doesn't seem to have any similar functions. There are two somewhat widely discussed solutions that I have come across.

  1. Using the PostgreSQL CASE clause like the following:
      WHEN id=67 THEN 1
      WHEN id=23 THEN 2
      WHEN id=1362 THEN 3
      WHEN id=24 THEN 4
  1. Using a custom JOIN table like the following:
    SELECT * FROM foo
    JOIN (VALUES (67,1), (23,2), (1362,3), (24,4)) as x(id, ordering) ON =
    ORDER BY x.ordering

I'm currently using solution #2 as it was easier to implement as a module that I can include in any AR::Base class and have it work well with other scopes.

I'm wondering if there is an obvious solution I'm just missing, or any alternatives or alternate implementations of the db schema to begin with that might make this a non-issue.

What are the best ways to keep a large set of data sorted according to user specified order with PostgreSQL?


For this particular use case that I'm dealing with, we allow sorting more than one object at a time. The objects (images) are laid out in a grid on the UI and the user can select a single image to reorder via drag and drop, or they can select multiple images to be reordered via drag and drop at the same time. This makes using something like the acts_as_list#insert_at method less useful. I suppose we could call #insert_at or manually #update_attribute(:position, N) one at a time for each item that was moved in position, but I'd like to do it all in one fell swoop if possible.

Copy link

shoe commented Sep 2, 2012

I think it depends a lot on the size of your list. For example, if N < 100, you'll get lots of reasonable behavior out of a serialized sort array. You can even do the sorting in ruby without much problem. You do end up reading and writing the list a lot, but it's small, so who cares?

OTOH, if N = 100000, you have some not-so-nice behaviors from a serialized array. Now the single reads and writes are larger, parse times are slower, etc. If you want to do things like get items 30000-30099 to show on a page, it's gonna get painful.

However, if you have large N, you can still get great performance out of an indexed column, while still keeping the number of records updated equal to the number of records being moved. You just have to use a sort column with a datatype with a continuous sort function, like float or string (not integer). Then, there are various convenient ways to update the sort column to reorder, depending on how much integrity you want to enforce.

Note: if you consider the pathological case, with very large datasets, and with extremely bad reordering operations (e.g. swapping two adjacent items 2^^64 times) you can run into problems, either with collisions in the sort key (float), or ballooning space requirement (string). In practice, this isn't really a concern, and the sort column can be renormalized periodically anyway to avoid such issues.

Copy link

david commented Sep 2, 2012

Not sure if this is what @shoe means, but I think you can also use an integer as long as indexes aren't consecutive (i.e. treated more like weights)? With sufficient space between weights, you can move objects without updating the other records.

E.g., say you have (Record 1, 100), (Record 2, 200), (Record 3, 300). If you want to move Record 3 up, you'd update Record 3's weight to 250, leaving room for more resorting. I guess at some point the sort column's weights would have to be reset. As usual, that depends on context.

Copy link

david commented Sep 2, 2012

Err, that would be 150, not 250.

Copy link

grin commented Sep 2, 2012

Copy link

shoe commented Sep 2, 2012

yep, that's the technique I was describing. I see that they even rebalance synchronously, if needed. I can see the advantage of using a spaced-out integer rather than string (storage), but I wonder if there is any advantage of using integer over float or float over integer. Behavior of integers is much easier to model. Otherwise, I suppose they are pretty similar. You have to choose your spacing a priori in either case.

Copy link

shoe commented Sep 2, 2012

@grin nice find!

Copy link

cpjolicoeur commented Sep 2, 2012

So, it seems the ideal solution is to use the original methodology of keeping the "sort position" on each individual record and not using a serialized field containing the sort order.

Copy link

shoe commented Sep 3, 2012

it depends on N, and on access patterns, but in general, the sort column on the record is the more robust solution.

Copy link

diogenes commented Sep 3, 2012

@cpjolicoeur What about to create a custom function in PostgreSQL? Maybe something like this:

Then you can do:

SELECT * FROM foo WHERE id IN (3, 20, 16, 8, 14)
ORDER BY array_idx(ARRAY[3, 20, 16, 8, 14],

Using this approach, will be easy to implement a generic scope to do this kind of sorting. Something like this:

scope :ordered_by_rank, lambda { |rank, column = :id|
  order("array_idx(ARRAY[#{Array(rank).join(', ')}], #{column})")

Copy link

agrberg commented Sep 4, 2012

@diogenes Your approach is interesting but I'm not familiar enough with custom functions in PostgreSQL to know how it would be different from the case statement example Craig listed as a potential solution.

Copy link

cpjolicoeur commented Sep 4, 2012

@diogenes, I had thought of duplicating the ORDER BY FIELD() feature of MySQL as a postgres function, and there are several examples online of such functions, but this app is currently hosted on Heroku and I dont have permissions to create functions

Copy link

leckylao commented Jul 16, 2014

Hi @cpjolicoeur, love the discussion and loving and using the custom join approach. It works like charm. Thank you very much.

Just want to share my use-case here that I have items(id, owner_id) and a list of owners(1, 3, 15, 4) and ordering items in the order of the list of owners.

owners = [1, 3, 15, 4]
values = ''
owners.each_with_index do |owner, index|
  values += ", " if index > 0
  values += %((#{owner}, #{index}))
@items = Item.joins("join (VALUES #{values}) as x(id, ordering) on items.owner_id =").order("x.ordering").page params[:page]

Copy link

g-ilham commented Nov 5, 2014

sort in mysql:

    > ids = [11,31,29]
    => [11, 31, 29]
    > User.where(id: ids).order("field(id, #{ids.join(',')})")

in postgres:

    def self.order_by_ids(ids)
      order_by = ["case"] do |id, index|
        order_by << "WHEN id='#{id}' THEN #{index}"
      order_by << "end"
      order(order_by.join(" "))

    User.where(:id => [3,2,1]).order_by_ids([3,2,1]).map(&:id) 
    #=> [3,2,1]

Copy link

alexbel commented Nov 6, 2014

@g-ilham Perfect solution!

Copy link

robuye commented Nov 13, 2014

given you have intarray extension ( it is possible to do:

def order_by_ids(ids)
  order("idx(ARRAY#{ids}, id")

irb(main):062:0* ids
=> [1, 2, 3, 4, 5, 9, 8, 7, 6]
irb(main):063:0> User.where("id < 10").order("idx(ARRAY#{ids}, id)").map(&:id)
  User Load (1.7ms)  SELECT "users".* FROM "users"  WHERE (id < 10)  ORDER BY idx(ARRAY[1, 2, 3, 4, 5, 9, 8, 7, 6], id)
=> [1, 2, 3, 4, 5, 9, 8, 7]

Copy link

JacobEvelyn commented Feb 27, 2015

@leckylao thanks for your example; I have a very similar use case and it works perfectly!

Does anyone know if any of these approaches are typically more performant than others (obviously individual data sets and database setups matter a lot; I'm more curious about obvious order-of-magnitude algorithmic differences)? My gut tells me that the idx and array_idx approaches will be slower because a database would naively be doing an O(n^2) search to find the indices, but I don't know how that might compare to the overhead of, say, another database join, or a ton of CASE statements, etc.

Copy link

murbanski commented Mar 9, 2015


Works perfectly.

Copy link

JacobEvelyn commented Mar 13, 2015

Thanks so much for the tip @murbanski! That's awesome, and works in both Postgres and SQLite (I didn't get a chance to test MySQL). We just bundled this up into a gem for easier use:

Edit: Just tested MySQL and it works as well.

Copy link

deryagin commented May 21, 2015

Works in Postgres:

ORDER BY (ID=10, ID=2, ID=56, ID=40) DESC

Copy link

DanielVartanov commented Oct 28, 2015

Works in Postgres:

ORDER BY position(id::text in '10, 20, 56, 40')

Copy link

vasilakisfil commented Nov 10, 2015

Which one is faster? Any db-specialist who would like to give us a hint? :)

Copy link

gingerlime commented Nov 11, 2015

thanks @DanielVartanov ! 👍 seems really neat. It's easy to use in active record scopes like this:

class Something < ActiveRecrd::Base

  scope :for_ids_with_order, ->(ids) {
    order = sanitize_sql_array(
      ["position(id::text in ?)", ids.join(',')]
    where(:id => ids).order(order)

# usage:
Something.for_ids_with_order([1, 3, 2])

also posted on this SO answer

Copy link

semanticart commented Jun 23, 2016

The answer put forward by @gingerlime uses text comparison, so it gets confused if you have ids where one is a substring of another.

Consider ids [1234, 55, 3, 1, 2]. The sorted results will have 55 last because 3, 1, and 2 match a substring of 1234.

A join with ORDINALITY seems to work fine, though.

Copy link

nhontpnv commented Jul 26, 2016

@murbanski thank you so much, I have to sign up just for thanking you.

Copy link

igtulm commented Sep 14, 2017

@cpjolicoeur I found the solution below very nice for me. E.g. you use PL/PGSQL and pass IN-values and array for array_position from single parameter. What do you think about it?

SELECT * FROM foo WHERE id IN (3, 1, 2) ORDER BY array_position(ARRAY[3, 1, 2], id);

It works since PostgreSQL 9.5.

Copy link

kevinzwhuang commented Oct 6, 2017

@cpjolicoeur @igtulm I also found array_position to be a great solution. In my research (using EXPLAIN), I found that using array_position has been the least expensive method.

ORDER BY array_position(ARRAY[1, 2, 3]::int[],

Note: I'm casting id to int because primary keys starting with Rails 5 are bigint
Note 2: Also casting the array into int[] to ensure it works even if I'm providing a list of strings in Rails

To build off the AR scope integration that @gingerlime proposed, a scope could be written like so:

class User < ApplicationRecord

  scope :with_id_order, ->(ids) {
    order = sanitize_sql_array(["array_position(ARRAY[?]::int[],", ids])


# Usage
User.with_id_order([5, 10, 20])
# Users 5, 10, 20, 1, 2, 3, 4...etc

# Restrict by the ids given
user_ids = [5, 10, 20]
User.where(id: user_ids).with_id_order(user_ids)
# Users 5, 10, 20

This scope gives a bit more flexibility in that it allows you to order with a given array of prioritized ids while querying for other users too.

Copy link

msacket commented Jan 13, 2018

I'm a bit late to this ballgame, but here is another solution using PostgreSQL that allows for user-definable ordering of lists. I've written about it here. There is also a php implementation.

CREATE TABLE fruit ( "id" serial NOT NULL, "name" text NOT NULL, PRIMARY KEY ("id") ); 
INSERT INTO fruit (name) VALUES ('Apple');
INSERT INTO fruit (name) VALUES ('Orange'); 
INSERT INTO fruit (name) VALUES ('Pear');
CREATE TABLE user_preferences (fk_user int4, fruits text[]);
INSERT INTO user_preferences (fk_user, fruits) VALUES (1, ARRAY['Pear','Orange','Apple'])
		p.fruits[s.position] AS name 
	FROM user_preferences p, 
	generate_series(1, array_length(p.fruits, 1)) AS s(position) 
	WHERE p.fk_user = 1 
) x 
JOIN fruit ON ( 
ORDER BY x.position asc;
| id | name   | 
| 3  | Pear   |
| 2  | Orange |
| 1  | Apple  |

Copy link

sgelbart commented Jun 1, 2020

@cpjolicoeur I found the solution below very nice for me. E.g. you use PL/PGSQL and pass IN-values and array for array_position from single parameter. What do you think about it?

SELECT * FROM foo WHERE id IN (3, 1, 2) ORDER BY array_position(ARRAY[3, 1, 2], id);
It works since PostgreSQL 9.5.

array_position FTW 🎉

Copy link

anjanaraveendra commented Nov 12, 2020

Awesome, after 3 hours of research i found this, very helpful and worked too

Copy link

alessandromacagno commented Aug 13, 2021

Another idea for the original problem is to use Recursive queries to explore the "List". I don't know yet about performances, but this looks a very elegant solution, apart the problem of polluting models with plain SQL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment