Skip to content

Instantly share code, notes, and snippets.

@tjake
Last active October 9, 2023 09:04
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save tjake/fb166a659e8fe4c8d4a3 to your computer and use it in GitHub Desktop.
Save tjake/fb166a659e8fe4c8d4a3 to your computer and use it in GitHub Desktop.
### DML ###
# Keyspace Name
keyspace: stresscql
# The CQL for creating a keyspace (optional if it already exists)
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
# Table name
table: blogposts
# The CQL for creating a table you wish to stress (optional if it already exists)
table_definition: |
CREATE TABLE blogposts (
domain text,
published_date timeuuid,
url text,
author text,
title text,
body text,
PRIMARY KEY(domain, published_date)
) WITH CLUSTERING ORDER BY (published_date DESC)
AND compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='A table to hold blog posts'
### Column Distribution Specifications ###
columnspec:
- name: domain
size: gaussian(5..100) #domain names are relatively short
population: uniform(1..10M) #10M possible domains to pick from
- name: published_date
cluster: fixed(1000) #under each domain we will have max 1000 posts
- name: url
size: uniform(30..300)
- name: title #titles shouldn't go beyond 200 chars
size: gaussian(10..200)
- name: author
size: uniform(5..20) #author names should be short
- name: body
size: gaussian(100..5000) #the body of the blog post can be long
### Batch Ratio Distribution Specifications ###
insert:
partitions: fixed(1) # Our partition key is the domain so only insert one per batch
select: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch
batchtype: UNLOGGED # Unlogged batches
#
# A list of queries you wish to run against the schema
#
queries:
singlepost:
cql: select * from blogposts where domain = ? LIMIT 1
fields: samerow
timeline:
cql: select url, title, published_date from blogposts where domain = ? LIMIT 10
fields: samerow
./bin/cassandra-stress user profile=./blogpost.yaml ops\(insert=1\)
Results:
op rate : 8625
partition rate : 8625
row rate : 8612
latency mean : 46.8
latency median : 34.5
latency 95th percentile : 121.9
latency 99th percentile : 203.4
latency 99.9th percentile : 600.4
latency max : 877.0
Total operation time : 00:00:42
Improvement over 271 threadCount: 1%
Sleeping for 15s
id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 224850 , 4459, 4459, 4463, 0.9, 0.6, 2.0, 3.5, 18.5, 399.9, 50.4, 0.02339
8 threadCount, 162950 , 5177, 5177, 5186, 1.5, 1.1, 3.9, 6.7, 33.4, 524.5, 31.5, 0.02161
16 threadCount, 244850 , 6439, 6439, 6425, 2.5, 1.6, 6.4, 11.7, 47.7, 672.4, 38.0, 0.01971
24 threadCount, 214200 , 6933, 6933, 6928, 3.4, 2.2, 9.4, 16.8, 57.7, 551.1, 30.9, 0.01743
36 threadCount, 231700 , 7345, 7345, 7334, 4.9, 3.0, 12.5, 22.0, 66.5, 732.5, 31.5, 0.01348
54 threadCount, 250700 , 7976, 7976, 7982, 6.8, 4.5, 18.4, 34.8, 73.6, 399.1, 31.4, 0.01045
81 threadCount, 263600 , 8238, 8238, 8207, 9.8, 6.8, 25.4, 45.8, 83.0, 472.1, 32.0, 0.00976
121 threadCount, 270400 , 8267, 8267, 8220, 14.6, 10.5, 37.3, 66.6, 133.5, 394.9, 32.7, 0.01334
181 threadCount, 282950 , 8409, 8409, 8398, 21.4, 15.8, 54.4, 85.0, 152.7, 227.8, 33.6, 0.01107
271 threadCount, 304350 , 8561, 8561, 8537, 31.4, 24.2, 81.3, 119.9, 224.3, 367.0, 35.6, 0.01268
406 threadCount, 365300 , 8625, 8625, 8612, 46.8, 34.5, 121.9, 203.4, 600.4, 877.0, 42.4, 0.01867
END
./bin/cassandra-stress user profile=./blogpost.yaml ops\(singlepost=2,timeline=1,insert=1\)
Results:
op rate : 5938
partition rate : 5583
row rate : 10555
latency mean : 67.6
latency median : 57.8
latency 95th percentile : 160.3
latency 99th percentile : 287.1
latency 99.9th percentile : 450.6
latency max : 719.7
Total operation time : 00:00:43
Improvement over 271 threadCount: -4%
Sleeping for 15s
id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 144779 , 3424, 3159, 5527, 1.2, 0.9, 2.1, 3.9, 23.2, 382.1, 45.8, 0.01988
8 threadCount, 302678 , 4240, 3921, 6907, 1.9, 1.5, 3.8, 7.3, 47.1, 718.7, 77.2, 0.01967
16 threadCount, 233321 , 5144, 4768, 8547, 3.1, 2.2, 7.3, 13.1, 77.0, 365.5, 48.9, 0.01945
24 threadCount, 152504 , 5251, 4875, 8808, 4.6, 3.2, 11.0, 19.8, 127.8, 330.0, 31.3, 0.02022
36 threadCount, 323510 , 5316, 4953, 9017, 6.8, 5.2, 18.6, 39.8, 157.5, 383.8, 65.3, 0.01950
54 threadCount, 192879 , 5533, 5162, 9368, 9.7, 7.2, 24.1, 50.8, 127.8, 373.5, 37.4, 0.01915
81 threadCount, 174440 , 5693, 5320, 9804, 14.1, 11.0, 32.8, 63.9, 127.2, 384.2, 32.8, 0.02233
121 threadCount, 192749 , 5989, 5608, 10436, 20.1, 16.2, 47.1, 83.4, 158.3, 362.0, 34.4, 0.01916
181 threadCount, 196909 , 6053, 5674, 10633, 29.8, 24.5, 67.8, 111.9, 195.2, 321.1, 34.7, 0.01669
271 threadCount, 214778 , 6186, 5808, 10962, 43.5, 35.5, 104.1, 177.8, 310.3, 526.3, 37.0, 0.01777
406 threadCount, 242622 , 5938, 5583, 10555, 67.6, 57.8, 160.3, 287.1, 450.6, 719.7, 43.5, 0.01863
END
./bin/cassandra-stress user profile=./blogpost.yaml ops\(singlepost=1\)
Results:
op rate : 7222
partition rate : 6456
row rate : 6456
latency mean : 83.0
latency median : 72.4
latency 95th percentile : 180.9
latency 99th percentile : 307.0
latency 99.9th percentile : 732.3
latency max : 1057.8
Total operation time : 00:00:40
Improvement over 406 threadCount: -1%
Sleeping for 15s
id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 86769 , 3212, 2871, 2871, 1.2, 1.1, 2.1, 3.3, 9.1, 39.9, 30.2, 0.00906
8 threadCount, 122557 , 4461, 3982, 3982, 1.8, 1.5, 3.2, 5.2, 18.2, 871.2, 30.8, 0.02569
16 threadCount, 157139 , 5769, 5156, 5156, 2.8, 2.4, 5.8, 9.0, 25.1, 47.0, 30.5, 0.00378
24 threadCount, 167097 , 6099, 5454, 5454, 3.9, 3.3, 8.3, 12.9, 38.0, 329.0, 30.6, 0.00963
36 threadCount, 162503 , 5894, 5262, 5262, 6.1, 5.0, 14.1, 23.4, 43.9, 196.7, 30.9, 0.01686
54 threadCount, 179920 , 6482, 5789, 5789, 8.3, 7.2, 16.5, 26.2, 53.5, 101.1, 31.1, 0.01165
81 threadCount, 195019 , 6967, 6229, 6229, 11.6, 10.2, 22.6, 34.5, 71.1, 372.4, 31.3, 0.00480
121 threadCount, 200841 , 7026, 6280, 6280, 17.1, 15.6, 31.9, 47.2, 103.7, 200.8, 32.0, 0.00737
181 threadCount, 209828 , 7267, 6490, 6490, 24.8, 23.1, 45.7, 62.2, 123.2, 156.5, 32.3, 0.00417
271 threadCount, 220879 , 7243, 6466, 6466, 37.7, 33.5, 74.9, 112.6, 688.4, 771.8, 34.2, 0.00883
406 threadCount, 238178 , 7299, 6514, 6514, 55.1, 48.7, 113.8, 179.6, 643.5, 916.4, 36.6, 0.00949
609 threadCount, 261881 , 7222, 6456, 6456, 83.0, 72.4, 180.9, 307.0, 732.3, 1057.8, 40.6, 0.01282
END
./bin/cassandra-stress user profile=./blogpost.yaml ops\(timeline=1\)
Results:
op rate : 7132
partition rate : 6366
row rate : 25337
latency mean : 37.7
latency median : 33.2
latency 95th percentile : 74.4
latency 99th percentile : 107.9
latency 99.9th percentile : 713.3
latency max : 929.6
Total operation time : 00:00:35
Improvement over 181 threadCount: -1%
Sleeping for 15s
id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 113793 , 4219, 3773, 15040, 0.9, 0.9, 1.5, 2.1, 6.3, 31.5, 30.2, 0.00846
8 threadCount, 137662 , 5075, 4538, 18106, 1.6, 1.4, 2.9, 5.1, 20.1, 36.0, 30.3, 0.00896
16 threadCount, 166180 , 6095, 5444, 21701, 2.6, 2.2, 5.5, 8.7, 21.4, 40.3, 30.5, 0.00619
24 threadCount, 171222 , 6258, 5586, 22224, 3.8, 3.1, 8.1, 13.4, 28.4, 703.5, 30.6, 0.01146
36 threadCount, 182360 , 6632, 5924, 23571, 5.4, 4.6, 11.5, 19.7, 34.7, 211.3, 30.8, 0.00579
54 threadCount, 190032 , 6834, 6109, 24323, 7.9, 6.9, 16.3, 26.0, 41.3, 68.8, 31.1, 0.00511
81 threadCount, 193598 , 6852, 6130, 24397, 11.8, 10.3, 23.9, 35.8, 51.9, 168.2, 31.6, 0.00700
121 threadCount, 199891 , 6899, 6170, 24551, 17.5, 15.3, 31.9, 45.6, 649.3, 864.2, 32.4, 0.00605
181 threadCount, 210030 , 7195, 6429, 25578, 25.1, 23.0, 46.1, 67.9, 110.5, 181.0, 32.7, 0.00411
271 threadCount, 223373 , 7132, 6366, 25337, 37.7, 33.2, 74.4, 107.9, 713.3, 929.6, 35.1, 0.01103
END
@arunsandu
Copy link

Hi,
I am trying to pass request_trace.yaml as an input to the stress-tool as below:
./cassandra-stress user profile=request_trace.yaml n=1000000 ops(likelyquery0=1,likelyquery1=2,insert=1) -node 10.32.100.16
the script is perfectly working fine for the other tables. But I get the below error for request_trace table. Please check the request_trace.yaml file for the script.
Can someone suggest a solution for this?

------------------------------------- request_trace.yaml---------------------------------------------------------
DML ### THIS IS UNDER CONSTRUCTION!!!
Keyspace Name

keyspace: autogeneratedtest
The CQL for creating a keyspace (optional if it already exists)

keyspace_definition: |
CREATE KEYSPACE autogeneratedtest WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
Table name

table: request_trace
The CQL for creating a table you wish to stress (optional if it already exists)

table_definition:
CREATE TABLE request_trace (
service_context_id text,
trace_statement text,
PRIMARY KEY (service_context_id, trace_statement)
)
Column Distribution Specifications

columnspec:

    name: service_context_id
    size: gaussian(10..20)
    population: gaussian(300..500)

    name: trace_statement
    size: gaussian(5..15)
    population: gaussian(800..1000)

Batch Ratio Distribution Specifications

insert:
partitions: fixed(1) # Our partition key is the domain so only insert one per batch

select: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch

batchtype: UNLOGGED # Unlogged batches
A list of queries you wish to run against the schema

#
queries:
likelyquery0:
cql: SELECT * FROM request_trace WHERE service_context_id = ?
fields: samerow
likelyquery1:
cql: SELECT * FROM request_trace WHERE service_context_id = ? AND trace_statement = ?
fields: samerow
ERROR:
Warming up likelyquery0 with 50000 iterations...
Warming up likelyquery1 with 50000 iterations...
Warming up insert with 50000 iterations...
Generating batches with [1..1] partitions and [0..0] rows (of [1..1] total rows in the partitions)
Exception in thread "main" com.datastax.driver.core.exceptions.SyntaxError: line 1:28 no viable alternative at input 'WHERE' (UPDATE "request_trace" SET  [WHERE]...)
    at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:35)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
    at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:82)
    at org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:84)
    at org.apache.cassandra.stress.StressProfile.getInsert(StressProfile.java:396)
    at org.apache.cassandra.stress.settings.SettingsCommandUser$1.get(SettingsCommandUser.java:82)
    at org.apache.cassandra.stress.settings.SettingsCommandUser$1.get(SettingsCommandUser.java:78)
    at org.apache.cassandra.stress.operations.SampledOpDistributionFactory$1.get(SampledOpDistributionFactory.java:80)
    at org.apache.cassandra.stress.StressAction$Consumer.<init>(StressAction.java:269)
    at org.apache.cassandra.stress.StressAction.run(StressAction.java:204)
    at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:105)
    at org.apache.cassandra.stress.StressAction.run(StressAction.java:61)
    at org.apache.cassandra.stress.Stress.main(Stress.java:114)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:28 no viable alternative at input 'WHERE' (UPDATE "request_trace" SET  [WHERE]...)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:123)
    at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:167)
    at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:142)
    at com.google.common.util.concurrent.Futures$1.apply(Futures.java:713)
    at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

@john19may
Copy link

Hi all, I am trying to define multiple DDL and columnSpec together in single yaml file,

`table: user_info

table_definition: |
CREATE TABLE IF NOT EXISTS user_info (
user_id text,
password text,
name text,
pic blob,
dept text,
contacts list,
PRIMARY KEY ((user_id))
)

columnspec:
- name: user_id
size: GAUSSIAN(17..35,27,3)

- name: password
  size: GAUSSIAN(8..200,12,1)

- name: name
  size: GAUSSIAN(5..40,12,3)

- name: pic
  size: UNIFORM(10000..200000)

- name: dept
  size: FIXED(100000)

- name: contacts
  size: UNIFORM(5..100)

table: category_info

table_definition: |
CREATE TABLE IF NOT EXISTS category_info (
user_id text,
category_id uuid,
category_name text,
category_color text,
unreads counter,
PRIMARY KEY ((user_id),category_id)
)

columnspec:
- name: user_id
size: GAUSSIAN(17..35,27,3)

- name: category_id
  cluster: FIXED(100)

- name: category_name
  size: GAUSSIAN(5..40,14,3)

- name: category_color
  size: FIXED(7)

table: mails_by_category

table_definition: |
CREATE TABLE IF NOT EXISTS user_info (
week timestamp,
category_id uuid,
user_id text,
all_unread text,
time timestamp,
mail_id uuid,
from_id text,
header text,
content_id uuid,
family_id uuid,
is_thread boolean,
is_read boolean,
is_starred boolean,
categories list,
PRIMARY KEY ((week,category_id,user_id),all_unread,time,mail_id)
)
WITH CLUSTERING ORDER BY (all_unread ASC, time DESC, mail_id ASC)

columnspec:
- name: week
population: FIXED(50000)

- name: category_id
  population: FIXED(100)

- name: user_id
  size: GAUSSIAN(17..35,27,3)

- name: all_unread
  size: FIXED(5)
  population: FIXED(2)

- name: from_id
  size: GAUSSIAN(17..35,27,3)

- name: header
  size: UNIFORM(0..10000)

- name: categories
  size: UNIFORM(8..100)

`

but i am getting an error "unconfigured columnfamily category_info" when i try to run insert test.

Thank in advance.

@jagadeesh4u
Copy link

May I know your CPU,RAM and Cluster Size??

@dragon-laurance
Copy link

Could you tell me what did the "cluster:uniform(20..40)" do?my English is not so good.
Thanks

@dragon-laurance
Copy link

how can i understand this?

Cluster distribution - Defines the distribution for the number of clustering prefixes within a given partition (default of FIXED(1))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment