jimfulton/A GIN DDL and Schema

## A GIN DDL and Schema
=> create index pgtextindex_text_vector_idx on pgtextindex using gin (text_vector);
CREATE INDEX
=> analyze pgtextindex;
ANALYZE
=> \d pgtextindex
                       Table "public.pgtextindex"
      Column       |            Type             |      Modifiers
-------------------+-----------------------------+----------------------
 docid             | integer                     | not null
 community_docid   | character varying(100)      |
 content_type      | character varying(30)       |
 creation_date     | timestamp without time zone |
 modification_date | timestamp without time zone |
 coefficient       | real                        | not null default 1.0
 marker            | character varying[]         |
 text_vector       | tsvector                    |
Indexes:
    "pgtextindex_pkey" PRIMARY KEY, btree (docid)
    "pgtextindex_community_docid_index" btree (community_docid, content_type, creation_date)
    "pgtextindex_text_vector_idx" gin (text_vector)

## B gin_test.sql
SELECT docid,
       coefficient * (
        ts_rank_cd('{.1, 0.2, 0.4, 1.0}',
                   text_vector,
                   to_tsquery('english', :q))
        ) AS rank,
        count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', :q)
ORDER BY rank DESC
limit 9;

## C GIN explain analyze output for q='euro:*'
 Limit  (cost=14260.66..14260.68 rows=9 width=16) (actual time=420.212..420.215 rows=9 loops=1)
   ->  Sort  (cost=14260.66..14339.40 rows=31496 width=16) (actual time=420.210..420.210 rows=9 loops=1)
         Sort Key: ((coefficient * ts_rank_cd('{0.1,0.2,0.4,1}'::real[], text_vector, '''euro'''::tsquery))) DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  WindowAgg  (cost=324.10..13603.98 rows=31496 width=16) (actual time=59.031..414.574 rows=33988 loops=1)
               ->  Bitmap Heap Scan on pgtextindex  (cost=324.10..13052.80 rows=31496 width=252) (actual time=13.476..40.841 rows=33988 loops=1)
                     Recheck Cond: (text_vector @@ '''euro'':*'::tsquery)
                     Heap Blocks: exact=9679
                     ->  Bitmap Index Scan on pgtextindex_text_vector_idx  (cost=0.00..316.22 rows=31496 width=0) (actual time=11.823..11.823 rows=33988 loops=1)
                           Index Cond: (text_vector @@ '''euro'':*'::tsquery)
 Planning time: 0.424 ms
 Execution time: 420.661 ms

## D RUM DDL and Schema
=> create index pgtextindex_rum_text_vector_idx on pgtextindex using rum (text_vector rum_tsvector_ops);
CREATE INDEX
=> analyze pgtextindex;
ANALYZE
=> \d pgtextindex;
                       Table "public.pgtextindex"
      Column       |            Type             |      Modifiers
-------------------+-----------------------------+----------------------
 docid             | integer                     | not null
 community_docid   | character varying(100)      |
 content_type      | character varying(30)       |
 creation_date     | timestamp without time zone |
 modification_date | timestamp without time zone |
 coefficient       | real                        | not null default 1.0
 marker            | character varying[]         |
 text_vector       | tsvector                    |
Indexes:
    "pgtextindex_pkey" PRIMARY KEY, btree (docid)
    "pgtextindex_community_docid_index" btree (community_docid, content_type, creation_date)
    "pgtextindex_rum_text_vector_idx" rum (text_vector)

## E rum_test.sql
SELECT docid, (text_vector <=> to_tsquery('english', :q)) / coefficient as rank, count(*) over()
FROM pgtextindex
WHERE text_vector @@ to_tsquery('english', :q)
ORDER BY rank
limit 9;

## F RUM explain analyze output for q='euro:*'
 Limit  (cost=14897.28..14897.31 rows=9 width=16) (actual time=470.662..470.665 rows=9 loops=1)
   ->  Sort  (cost=14897.28..15002.50 rows=42087 width=16) (actual time=470.652..470.654 rows=9 loops=1)
         Sort Key: (((text_vector <=> '''euro'':*'::tsquery) / coefficient))
         Sort Method: top-N heapsort  Memory: 25kB
         ->  WindowAgg  (cost=422.17..14019.78 rows=42087 width=16) (actual time=92.669..464.930 rows=33988 loops=1)
               ->  Bitmap Heap Scan on pgtextindex  (cost=422.17..13283.26 rows=42087 width=260) (actual time=37.685..72.324 rows=33988 loops=1)
                     Recheck Cond: (text_vector @@ '''euro'':*'::tsquery)
                     Heap Blocks: exact=9679
                     ->  Bitmap Index Scan on pgtextindex_rum_text_vector_idx  (cost=0.00..411.65 rows=42087 width=0) (actual time=35.995..35.995 rows=33988 loops=1)
                           Index Cond: (text_vector @@ '''euro'':*'::tsquery)
 Planning time: 6.881 ms
 Execution time: 471.571 ms

## G Results
Query     count   GIN    RUM
euro:*    33988  420ms  430ms
afric:*   40169  410ms  430ms
open:*    56497  580ms  550ms
europe    20452  280ms  270ms
africa    33086  360ms  340ms
open      56120  560ms  530ms


Times are best of many (>30). Basically, for each test, I ran until I didn't see improvements.

I also ran explain analyze for each query before doing timing runs.
	=> create index pgtextindex_text_vector_idx on pgtextindex using gin (text_vector);
	CREATE INDEX
	=> analyze pgtextindex;
	ANALYZE
	=> \d pgtextindex
	Table "public.pgtextindex"
	Column \| Type \| Modifiers
	-------------------+-----------------------------+----------------------
	docid \| integer \| not null
	community_docid \| character varying(100) \|
	content_type \| character varying(30) \|
	creation_date \| timestamp without time zone \|
	modification_date \| timestamp without time zone \|
	coefficient \| real \| not null default 1.0
	marker \| character varying[] \|
	text_vector \| tsvector \|
	Indexes:
	"pgtextindex_pkey" PRIMARY KEY, btree (docid)
	"pgtextindex_community_docid_index" btree (community_docid, content_type, creation_date)
	"pgtextindex_text_vector_idx" gin (text_vector)
	SELECT docid,
	coefficient * (
	ts_rank_cd('{.1, 0.2, 0.4, 1.0}',
	text_vector,
	to_tsquery('english', :q))
	) AS rank,
	count(*) over()
	FROM pgtextindex
	WHERE text_vector @@ to_tsquery('english', :q)
	ORDER BY rank DESC
	limit 9;
	Limit (cost=14260.66..14260.68 rows=9 width=16) (actual time=420.212..420.215 rows=9 loops=1)
	-> Sort (cost=14260.66..14339.40 rows=31496 width=16) (actual time=420.210..420.210 rows=9 loops=1)
	Sort Key: ((coefficient * ts_rank_cd('{0.1,0.2,0.4,1}'::real[], text_vector, '''euro'''::tsquery))) DESC
	Sort Method: top-N heapsort Memory: 25kB
	-> WindowAgg (cost=324.10..13603.98 rows=31496 width=16) (actual time=59.031..414.574 rows=33988 loops=1)
	-> Bitmap Heap Scan on pgtextindex (cost=324.10..13052.80 rows=31496 width=252) (actual time=13.476..40.841 rows=33988 loops=1)
	Recheck Cond: (text_vector @@ '''euro'':*'::tsquery)
	Heap Blocks: exact=9679
	-> Bitmap Index Scan on pgtextindex_text_vector_idx (cost=0.00..316.22 rows=31496 width=0) (actual time=11.823..11.823 rows=33988 loops=1)
	Index Cond: (text_vector @@ '''euro'':*'::tsquery)
	Planning time: 0.424 ms
	Execution time: 420.661 ms
	=> create index pgtextindex_rum_text_vector_idx on pgtextindex using rum (text_vector rum_tsvector_ops);
	CREATE INDEX
	=> analyze pgtextindex;
	ANALYZE
	=> \d pgtextindex;
	Table "public.pgtextindex"
	Column \| Type \| Modifiers
	-------------------+-----------------------------+----------------------
	docid \| integer \| not null
	community_docid \| character varying(100) \|
	content_type \| character varying(30) \|
	creation_date \| timestamp without time zone \|
	modification_date \| timestamp without time zone \|
	coefficient \| real \| not null default 1.0
	marker \| character varying[] \|
	text_vector \| tsvector \|
	Indexes:
	"pgtextindex_pkey" PRIMARY KEY, btree (docid)
	"pgtextindex_community_docid_index" btree (community_docid, content_type, creation_date)
	"pgtextindex_rum_text_vector_idx" rum (text_vector)
	SELECT docid, (text_vector <=> to_tsquery('english', :q)) / coefficient as rank, count(*) over()
	FROM pgtextindex
	WHERE text_vector @@ to_tsquery('english', :q)
	ORDER BY rank
	limit 9;
	Limit (cost=14897.28..14897.31 rows=9 width=16) (actual time=470.662..470.665 rows=9 loops=1)
	-> Sort (cost=14897.28..15002.50 rows=42087 width=16) (actual time=470.652..470.654 rows=9 loops=1)
	Sort Key: (((text_vector <=> '''euro'':*'::tsquery) / coefficient))
	Sort Method: top-N heapsort Memory: 25kB
	-> WindowAgg (cost=422.17..14019.78 rows=42087 width=16) (actual time=92.669..464.930 rows=33988 loops=1)
	-> Bitmap Heap Scan on pgtextindex (cost=422.17..13283.26 rows=42087 width=260) (actual time=37.685..72.324 rows=33988 loops=1)
	Recheck Cond: (text_vector @@ '''euro'':*'::tsquery)
	Heap Blocks: exact=9679
	-> Bitmap Index Scan on pgtextindex_rum_text_vector_idx (cost=0.00..411.65 rows=42087 width=0) (actual time=35.995..35.995 rows=33988 loops=1)
	Index Cond: (text_vector @@ '''euro'':*'::tsquery)
	Planning time: 6.881 ms
	Execution time: 471.571 ms
	Query count GIN RUM
	euro:* 33988 420ms 430ms
	afric:* 40169 410ms 430ms
	open:* 56497 580ms 550ms
	europe 20452 280ms 270ms
	africa 33086 360ms 340ms
	open 56120 560ms 530ms


	Times are best of many (>30). Basically, for each test, I ran until I didn't see improvements.

	I also ran explain analyze for each query before doing timing runs.