Skip to content

Instantly share code, notes, and snippets.

@mariomui
Last active January 16, 2019 19:11
Show Gist options
  • Save mariomui/b7bb8c1177cbcdb92863459af34de510 to your computer and use it in GitHub Desktop.
Save mariomui/b7bb8c1177cbcdb92863459af34de510 to your computer and use it in GitHub Desktop.
Database Query

#QUERIES (NO PRIMARY KEYS) CASSANDRA (DAY ONE)

READING (SELECT * FROM SIMILARS WHERE ID = 999999);

category_id | clustering_id | bed | cost | description | favorite | picture| premium | rcount | stars | title 1 | 999999 | 6 | 102 | Ut quis mollit est labore reprehenderit. Consequat culpa fugiat aliqua dolore enim exercitation tempor incididunt laborum et laboris amet ad elit. Elit mollit anim culpa sit aute cillum enim magna ut ea consequat id irure. | False | http://d1bah53dmo2q93.cloudfront.net/999.jpg | True | 1396 | 5 | Ullamco irure quis minim enim enim amet fugiat eiusmod anim proident.

(1 rows)

  • Tracing session: 9a986dc0-16c5-11e9-8208-b3a911c7ed43
  • activity | timestamp | source | source_elapsed | client ----------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query | 2019-01-12 15:55:56.956000 | 127.0.0.1 | 0 | 127.0.0.1 Parsing select * from similars where clustering_id = 999999; [Native-Transport-Requests-1] | 2019-01-12 15:55:56.956000 | 127.0.0.1 | 125 | 127.0.0.1 Request complete | 2019-01-12 15:55:57.014267 | 127.0.0.1 | 58267 | 127.0.0.

0.058267 ms

DELETING

cqlsh:keyspacemarlio> DELETE FROM similars where clustering_id = 999997 and category_id=1 if exists; Tracing session: f7d8ff80-16c6-11e9-8208-b3a911c7ed43 Execute CQL3 query | 2019-01-12 16:05:42.904000 | 127.0.0.1 | 0 | 127.0.0.1 Request complete | 2019-01-12 16:05:42.914987 | 127.0.0.1 | 10987 | 127.0.0.1
904000 - 914987 10000 ms

INSERTING

INSERT INTO similars (category_id, clustering_id, title) VALUES (1, 999997, 'Mario') Execute CQL3 query | 2019-01-12 16:11:05.357000 | 127.0.0.1 | 0 | 127.0.0.1 Request complete | 2019-01-12 16:11:05.400606 | 127.0.0.1 | 43606 | 127.0.0.1
357000 - 400606 43000 ms

UPDATING

UPDATE similars SET title='Rides hard, gets along with others, a real winner' WHERE id = 999997 IF EXISTS;

Execute CQL3 query | 2019-01-12 16:22:35.959000 | 127.0.0.1 | Request complete | 2019-01-12 16:22:35.997824 | 127.0.0.1 | 959000 - 997824 38000 ms

With indexed keys CREATE INDEX similars_clustering_id_index ON keyspacemarlio.similars (clustering_id); Cluster order is also on where the latest update is on top of the database.

READ W/ NODE-TEST:

Using cassandra driver script I got 300ms per 100 files. That's like 3 ms per file, which is slower than the optimized ver 300, 202, 306, 278 ms, 120, 133. (The more you run it against the last 10% of the database, the faster it gets).

READ W/ T-ON;

25.488000 - 25.510370 .02 00.768000 - 00.795347 .03 30.938000 - 30.950808 .012 51.923000 - 51.944903 .022

UPDATE w/T-on;

37.786000 - 37.851600 .06 01.064000 - 01.077889 .013 28.498000 - 28.500097 .002 57.086000 - 57.086795 .0007

INSERT

12.300000 - 12.363769 .063 23.328000 - 23.329849 .0019 34.390000 - 34.391637 .0016 08.125097 - 08.112000 .013 38.038000 - 38.039932 .0012

DELETE

09.086000 - 09.105445 .019 30.293000 - 30.296558 .0035 51.198000 - 51.199626 .0016 12.591000 - 12.592575 .0015

POSTGRE=================

OPTIMIZED

                                               SELECT QUERY PLAN      

dbmarlio=# explain analyze select * from similars where room_id = (SELECT floor(random() * (10000000 - 9000000) + 9000000)::int);

Index Scan using id_idx on similars (cost=0.46..8.48 rows=1 width=481) (actual time=5.358..5.361 rows=1 loops=1) Index Cond: (room_id = $0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.02 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1) Planning Time: 0.202 ms Execution Time: 5.428 ms, 4.9ms, 6.122ms, 5.3 ms (6 rows)

(USING MY OWN PG-NODE SCRIPT) (USING DATE.NOW with a random seed); { room_id: 9991540, room_name: 'room9991540',

default: 38.057ms, 83 ms, 21.297ms, 28.978ms, 30.644ms, 28.744ms, 25ms

with not null in the rest of the fields: .124, .083, .121, .086, .097, .101 (ms)

end

                                        INSERT QUERY PLAN     

dbmarlio=# INSERT INTO similars (title,description) values('hey','hey');

Insert on similars (cost=0.00..0.01 rows=1 width=150) (actual time=0.103..0.103 rows=0 loops=1) -> Result (cost=0.00..0.01 rows=1 width=150) (actual time=0.032..0.032 rows=1 loops=1) Planning Time: 0.025 ms Execution Time: 0.130 ms, Execution Time: 0.105 ms Execution Time: 0.907 ms Execution Time: 0.496 ms

   			                 UPDATE  QUERY PLAN                                 

dbmarlio=# explain analyze UPDATE similars SET description = 'Drama' WHERE room_id = 9879799;

Update on similars (cost=0.43..8.45 rows=1 width=182) (actual time=23.063..23.063 rows=0 loops=1) -> Index Scan using id_idx on similars (cost=0.43..8.45 rows=1 width=182) (actual time=22.282..22.285 rows=1 loops=1) Index Cond: (room_id = 8879999) Planning Time: 0.090 ms

Execution Time: 25.159 ms, 1.559 ms, 20.895 ms, 2.322 ms, 33.466 ms,

  • POSTGRES: (updating the same point is much faster);

                                                   DELETE QUERY PLAN    
    

dbmarlio=# explain analyze delete from similars where room_id = 8273799;

Delete on similars (cost=0.43..8.45 rows=1 width=6) (actual time=7.755..7.755 rows=0 loops=1) -> Index Scan using id_idx on similars (cost=0.43..8.45 rows=1 width=6) (actual time=6.459..6.462 rows=1 loops=1) Index Cond: (room_id = 8373799) Planning Time: 0.154 ms

Execution Time: 7.792 ms, 7.870 ms, 9.569 ms

Un-optimized

  • USING OFFSET

  • explain analyze SELECT * FROM similars LIMIT 1 OFFSET (SELECT max(room_id) FROM similars)*.90;

    • (ms) 12807, 11805, 11758, 9311,
  • Using pg_analyze.js

    * 11762, 9689, 90942, 10921

    QUERY PLAN


Delete on similars (cost=0.43..8.45 rows=1 width=6) (actual time=2.024..2.025 rows=0 loops=1) -> Index Scan using similars_pkey on similars (cost=0.43..8.45 rows=1 width=6) (actual time=1.983..1.986 rows=1 loops=1) Index Cond: (room_id = 8313799) Planning Time: 0.085 ms Execution Time: 2.049 ms Execution Time: 8.290 ms Execution Time: 1.986 ms Execution Time: 12.333 ms

USING my own cassandra-driver script

READING

cqlsh:keyspacemarlio> select * from similars where category_id = 1 and room_name = 'room9875001' and clustering_id=9875001;

UNOPTIMIZED BUT WITH PRIMARY KEYS getting 100 random queries in the last 10%

  • myscript: 381 ms/100 queries .381 ms per file.

  • TRACING ON: 23.482466 - 23.447000 0.035466 ms, 33.290000 - 33.291408 .0014 ms (outlier due to running the same query consecutively); 06.816000 - 06.831503 .015 ms 59.571000 - 59.580134 .010 MS 56.083950 - 56.068000 0.01595 ms

AVERAGE .015 ms READS

UPDATING

55.856000 - 55.865870 .010 47.537000 - 47.542951 .006 36.348000 - 36.350129 .003

DELETING

32.974000 - 33.029662 .05 ms 24.316000 - 24.328329 .012 ms 56.814000 - 56.816526 .002 ms 04.913000 - 04.913793 .00007 ms 12.212000 - 12.212649 .00006 ms

INSERTING

21.982000 - 22.053376 .07ms 11.614000 - 11.620882 .006 ms 49.908000 - 49.911072 .003 ms 36.141000 - 36.144175 .003 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment