Skip to content

Instantly share code, notes, and snippets.

@tjake
Last active September 8, 2024 04:11
Show Gist options
  • Save tjake/fb166a659e8fe4c8d4a3 to your computer and use it in GitHub Desktop.
Save tjake/fb166a659e8fe4c8d4a3 to your computer and use it in GitHub Desktop.
### DML ###
# Keyspace Name
keyspace: stresscql
# The CQL for creating a keyspace (optional if it already exists)
keyspace_definition: |
CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
# Table name
table: blogposts
# The CQL for creating a table you wish to stress (optional if it already exists)
table_definition: |
CREATE TABLE blogposts (
domain text,
published_date timeuuid,
url text,
author text,
title text,
body text,
PRIMARY KEY(domain, published_date)
) WITH CLUSTERING ORDER BY (published_date DESC)
AND compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='A table to hold blog posts'
### Column Distribution Specifications ###
columnspec:
- name: domain
size: gaussian(5..100) #domain names are relatively short
population: uniform(1..10M) #10M possible domains to pick from
- name: published_date
cluster: fixed(1000) #under each domain we will have max 1000 posts
- name: url
size: uniform(30..300)
- name: title #titles shouldn't go beyond 200 chars
size: gaussian(10..200)
- name: author
size: uniform(5..20) #author names should be short
- name: body
size: gaussian(100..5000) #the body of the blog post can be long
### Batch Ratio Distribution Specifications ###
insert:
partitions: fixed(1) # Our partition key is the domain so only insert one per batch
select: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch
batchtype: UNLOGGED # Unlogged batches
#
# A list of queries you wish to run against the schema
#
queries:
singlepost:
cql: select * from blogposts where domain = ? LIMIT 1
fields: samerow
timeline:
cql: select url, title, published_date from blogposts where domain = ? LIMIT 10
fields: samerow
./bin/cassandra-stress user profile=./blogpost.yaml ops\(insert=1\)
Results:
op rate : 8625
partition rate : 8625
row rate : 8612
latency mean : 46.8
latency median : 34.5
latency 95th percentile : 121.9
latency 99th percentile : 203.4
latency 99.9th percentile : 600.4
latency max : 877.0
Total operation time : 00:00:42
Improvement over 271 threadCount: 1%
Sleeping for 15s
id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 224850 , 4459, 4459, 4463, 0.9, 0.6, 2.0, 3.5, 18.5, 399.9, 50.4, 0.02339
8 threadCount, 162950 , 5177, 5177, 5186, 1.5, 1.1, 3.9, 6.7, 33.4, 524.5, 31.5, 0.02161
16 threadCount, 244850 , 6439, 6439, 6425, 2.5, 1.6, 6.4, 11.7, 47.7, 672.4, 38.0, 0.01971
24 threadCount, 214200 , 6933, 6933, 6928, 3.4, 2.2, 9.4, 16.8, 57.7, 551.1, 30.9, 0.01743
36 threadCount, 231700 , 7345, 7345, 7334, 4.9, 3.0, 12.5, 22.0, 66.5, 732.5, 31.5, 0.01348
54 threadCount, 250700 , 7976, 7976, 7982, 6.8, 4.5, 18.4, 34.8, 73.6, 399.1, 31.4, 0.01045
81 threadCount, 263600 , 8238, 8238, 8207, 9.8, 6.8, 25.4, 45.8, 83.0, 472.1, 32.0, 0.00976
121 threadCount, 270400 , 8267, 8267, 8220, 14.6, 10.5, 37.3, 66.6, 133.5, 394.9, 32.7, 0.01334
181 threadCount, 282950 , 8409, 8409, 8398, 21.4, 15.8, 54.4, 85.0, 152.7, 227.8, 33.6, 0.01107
271 threadCount, 304350 , 8561, 8561, 8537, 31.4, 24.2, 81.3, 119.9, 224.3, 367.0, 35.6, 0.01268
406 threadCount, 365300 , 8625, 8625, 8612, 46.8, 34.5, 121.9, 203.4, 600.4, 877.0, 42.4, 0.01867
END
./bin/cassandra-stress user profile=./blogpost.yaml ops\(singlepost=2,timeline=1,insert=1\)
Results:
op rate : 5938
partition rate : 5583
row rate : 10555
latency mean : 67.6
latency median : 57.8
latency 95th percentile : 160.3
latency 99th percentile : 287.1
latency 99.9th percentile : 450.6
latency max : 719.7
Total operation time : 00:00:43
Improvement over 271 threadCount: -4%
Sleeping for 15s
id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 144779 , 3424, 3159, 5527, 1.2, 0.9, 2.1, 3.9, 23.2, 382.1, 45.8, 0.01988
8 threadCount, 302678 , 4240, 3921, 6907, 1.9, 1.5, 3.8, 7.3, 47.1, 718.7, 77.2, 0.01967
16 threadCount, 233321 , 5144, 4768, 8547, 3.1, 2.2, 7.3, 13.1, 77.0, 365.5, 48.9, 0.01945
24 threadCount, 152504 , 5251, 4875, 8808, 4.6, 3.2, 11.0, 19.8, 127.8, 330.0, 31.3, 0.02022
36 threadCount, 323510 , 5316, 4953, 9017, 6.8, 5.2, 18.6, 39.8, 157.5, 383.8, 65.3, 0.01950
54 threadCount, 192879 , 5533, 5162, 9368, 9.7, 7.2, 24.1, 50.8, 127.8, 373.5, 37.4, 0.01915
81 threadCount, 174440 , 5693, 5320, 9804, 14.1, 11.0, 32.8, 63.9, 127.2, 384.2, 32.8, 0.02233
121 threadCount, 192749 , 5989, 5608, 10436, 20.1, 16.2, 47.1, 83.4, 158.3, 362.0, 34.4, 0.01916
181 threadCount, 196909 , 6053, 5674, 10633, 29.8, 24.5, 67.8, 111.9, 195.2, 321.1, 34.7, 0.01669
271 threadCount, 214778 , 6186, 5808, 10962, 43.5, 35.5, 104.1, 177.8, 310.3, 526.3, 37.0, 0.01777
406 threadCount, 242622 , 5938, 5583, 10555, 67.6, 57.8, 160.3, 287.1, 450.6, 719.7, 43.5, 0.01863
END
./bin/cassandra-stress user profile=./blogpost.yaml ops\(singlepost=1\)
Results:
op rate : 7222
partition rate : 6456
row rate : 6456
latency mean : 83.0
latency median : 72.4
latency 95th percentile : 180.9
latency 99th percentile : 307.0
latency 99.9th percentile : 732.3
latency max : 1057.8
Total operation time : 00:00:40
Improvement over 406 threadCount: -1%
Sleeping for 15s
id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 86769 , 3212, 2871, 2871, 1.2, 1.1, 2.1, 3.3, 9.1, 39.9, 30.2, 0.00906
8 threadCount, 122557 , 4461, 3982, 3982, 1.8, 1.5, 3.2, 5.2, 18.2, 871.2, 30.8, 0.02569
16 threadCount, 157139 , 5769, 5156, 5156, 2.8, 2.4, 5.8, 9.0, 25.1, 47.0, 30.5, 0.00378
24 threadCount, 167097 , 6099, 5454, 5454, 3.9, 3.3, 8.3, 12.9, 38.0, 329.0, 30.6, 0.00963
36 threadCount, 162503 , 5894, 5262, 5262, 6.1, 5.0, 14.1, 23.4, 43.9, 196.7, 30.9, 0.01686
54 threadCount, 179920 , 6482, 5789, 5789, 8.3, 7.2, 16.5, 26.2, 53.5, 101.1, 31.1, 0.01165
81 threadCount, 195019 , 6967, 6229, 6229, 11.6, 10.2, 22.6, 34.5, 71.1, 372.4, 31.3, 0.00480
121 threadCount, 200841 , 7026, 6280, 6280, 17.1, 15.6, 31.9, 47.2, 103.7, 200.8, 32.0, 0.00737
181 threadCount, 209828 , 7267, 6490, 6490, 24.8, 23.1, 45.7, 62.2, 123.2, 156.5, 32.3, 0.00417
271 threadCount, 220879 , 7243, 6466, 6466, 37.7, 33.5, 74.9, 112.6, 688.4, 771.8, 34.2, 0.00883
406 threadCount, 238178 , 7299, 6514, 6514, 55.1, 48.7, 113.8, 179.6, 643.5, 916.4, 36.6, 0.00949
609 threadCount, 261881 , 7222, 6456, 6456, 83.0, 72.4, 180.9, 307.0, 732.3, 1057.8, 40.6, 0.01282
END
./bin/cassandra-stress user profile=./blogpost.yaml ops\(timeline=1\)
Results:
op rate : 7132
partition rate : 6366
row rate : 25337
latency mean : 37.7
latency median : 33.2
latency 95th percentile : 74.4
latency 99th percentile : 107.9
latency 99.9th percentile : 713.3
latency max : 929.6
Total operation time : 00:00:35
Improvement over 181 threadCount: -1%
Sleeping for 15s
id, partitions, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 113793 , 4219, 3773, 15040, 0.9, 0.9, 1.5, 2.1, 6.3, 31.5, 30.2, 0.00846
8 threadCount, 137662 , 5075, 4538, 18106, 1.6, 1.4, 2.9, 5.1, 20.1, 36.0, 30.3, 0.00896
16 threadCount, 166180 , 6095, 5444, 21701, 2.6, 2.2, 5.5, 8.7, 21.4, 40.3, 30.5, 0.00619
24 threadCount, 171222 , 6258, 5586, 22224, 3.8, 3.1, 8.1, 13.4, 28.4, 703.5, 30.6, 0.01146
36 threadCount, 182360 , 6632, 5924, 23571, 5.4, 4.6, 11.5, 19.7, 34.7, 211.3, 30.8, 0.00579
54 threadCount, 190032 , 6834, 6109, 24323, 7.9, 6.9, 16.3, 26.0, 41.3, 68.8, 31.1, 0.00511
81 threadCount, 193598 , 6852, 6130, 24397, 11.8, 10.3, 23.9, 35.8, 51.9, 168.2, 31.6, 0.00700
121 threadCount, 199891 , 6899, 6170, 24551, 17.5, 15.3, 31.9, 45.6, 649.3, 864.2, 32.4, 0.00605
181 threadCount, 210030 , 7195, 6429, 25578, 25.1, 23.0, 46.1, 67.9, 110.5, 181.0, 32.7, 0.00411
271 threadCount, 223373 , 7132, 6366, 25337, 37.7, 33.2, 74.4, 107.9, 713.3, 929.6, 35.1, 0.01103
END
@raju-nuovo
Copy link

I am planning on using this tool. So if I give a query like "SELECT * FROM user where id = ? and name = ?" How do I pass the values for id and name for the load tests. Or does this tool automatically use the values in the table?

@naishe
Copy link

naishe commented Oct 19, 2014

I think the the semicolon should be at line #25 instead of #23.

@tjake
Copy link
Author

tjake commented Dec 1, 2014

Updated to reflect profile changes in 2.1.1

@halgrim
Copy link

halgrim commented Feb 11, 2015

The tool looks awesone but my company uses "map < text , text >" type in the schema.
How much effort would be needed to add support for this type?
Where the changes need to be made?
Is it something that junior developer can handle?

@infomaven
Copy link

I'm getting an error when I try to run cassandra-stress. I downloaded the source code and built it today using ant 1.9.4.
Command> infomav:tools infomav:tools$ ./bin/cassandra-stress user profile=blogpost.yaml ops(singlepost=1)

Error> Error: Could not find or load main class org.apache.cassandra.stress.Stress

How to fix this?

@infomaven
Copy link

Found the issue. I was using build instead of release command in Ant. This caused it to skip creation of the JAR files and binary tar files.

Once I changed commands, I found the correct tar archive (bin.tar.gz) in build folder and was able to unpack and use it to run cassandra-stress.

@srikanthr341
Copy link

When I make the below changes:

  • name: published_date
    cluster: fixed(100)

select: fixed(1)/100

I am seeing very different results. ( very high opcounts/rowcounts/pk counts and better mean latencies)
Some how it appears to me the tools itself is causing the delay when it generates the rows when the cluster: fixed values are high.

@arsonak47
Copy link

How can I set the write consistency level for running cassandra stress test?

@mshuler
Copy link

mshuler commented Sep 29, 2015

cl= gives the ability to set read/write consistency level, for example:
cassandra-stress write cl=QUORUM -schema 'replication(factor=3)'

@arunsandu
Copy link

Hi,
I am trying to pass request_trace.yaml as an input to the stress-tool as below:
./cassandra-stress user profile=request_trace.yaml n=1000000 ops(likelyquery0=1,likelyquery1=2,insert=1) -node 10.32.100.16
the script is perfectly working fine for the other tables. But I get the below error for request_trace table. Please check the request_trace.yaml file for the script.
Can someone suggest a solution for this?

------------------------------------- request_trace.yaml---------------------------------------------------------
DML ### THIS IS UNDER CONSTRUCTION!!!
Keyspace Name

keyspace: autogeneratedtest
The CQL for creating a keyspace (optional if it already exists)

keyspace_definition: |
CREATE KEYSPACE autogeneratedtest WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
Table name

table: request_trace
The CQL for creating a table you wish to stress (optional if it already exists)

table_definition:
CREATE TABLE request_trace (
service_context_id text,
trace_statement text,
PRIMARY KEY (service_context_id, trace_statement)
)
Column Distribution Specifications

columnspec:

    name: service_context_id
    size: gaussian(10..20)
    population: gaussian(300..500)

    name: trace_statement
    size: gaussian(5..15)
    population: gaussian(800..1000)

Batch Ratio Distribution Specifications

insert:
partitions: fixed(1) # Our partition key is the domain so only insert one per batch

select: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch

batchtype: UNLOGGED # Unlogged batches
A list of queries you wish to run against the schema

#
queries:
likelyquery0:
cql: SELECT * FROM request_trace WHERE service_context_id = ?
fields: samerow
likelyquery1:
cql: SELECT * FROM request_trace WHERE service_context_id = ? AND trace_statement = ?
fields: samerow
ERROR:
Warming up likelyquery0 with 50000 iterations...
Warming up likelyquery1 with 50000 iterations...
Warming up insert with 50000 iterations...
Generating batches with [1..1] partitions and [0..0] rows (of [1..1] total rows in the partitions)
Exception in thread "main" com.datastax.driver.core.exceptions.SyntaxError: line 1:28 no viable alternative at input 'WHERE' (UPDATE "request_trace" SET  [WHERE]...)
    at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:35)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
    at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:82)
    at org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:84)
    at org.apache.cassandra.stress.StressProfile.getInsert(StressProfile.java:396)
    at org.apache.cassandra.stress.settings.SettingsCommandUser$1.get(SettingsCommandUser.java:82)
    at org.apache.cassandra.stress.settings.SettingsCommandUser$1.get(SettingsCommandUser.java:78)
    at org.apache.cassandra.stress.operations.SampledOpDistributionFactory$1.get(SampledOpDistributionFactory.java:80)
    at org.apache.cassandra.stress.StressAction$Consumer.<init>(StressAction.java:269)
    at org.apache.cassandra.stress.StressAction.run(StressAction.java:204)
    at org.apache.cassandra.stress.StressAction.warmup(StressAction.java:105)
    at org.apache.cassandra.stress.StressAction.run(StressAction.java:61)
    at org.apache.cassandra.stress.Stress.main(Stress.java:114)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:28 no viable alternative at input 'WHERE' (UPDATE "request_trace" SET  [WHERE]...)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:123)
    at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:167)
    at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:142)
    at com.google.common.util.concurrent.Futures$1.apply(Futures.java:713)
    at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

@john19may
Copy link

Hi all, I am trying to define multiple DDL and columnSpec together in single yaml file,

`table: user_info

table_definition: |
CREATE TABLE IF NOT EXISTS user_info (
user_id text,
password text,
name text,
pic blob,
dept text,
contacts list,
PRIMARY KEY ((user_id))
)

columnspec:
- name: user_id
size: GAUSSIAN(17..35,27,3)

- name: password
  size: GAUSSIAN(8..200,12,1)

- name: name
  size: GAUSSIAN(5..40,12,3)

- name: pic
  size: UNIFORM(10000..200000)

- name: dept
  size: FIXED(100000)

- name: contacts
  size: UNIFORM(5..100)

table: category_info

table_definition: |
CREATE TABLE IF NOT EXISTS category_info (
user_id text,
category_id uuid,
category_name text,
category_color text,
unreads counter,
PRIMARY KEY ((user_id),category_id)
)

columnspec:
- name: user_id
size: GAUSSIAN(17..35,27,3)

- name: category_id
  cluster: FIXED(100)

- name: category_name
  size: GAUSSIAN(5..40,14,3)

- name: category_color
  size: FIXED(7)

table: mails_by_category

table_definition: |
CREATE TABLE IF NOT EXISTS user_info (
week timestamp,
category_id uuid,
user_id text,
all_unread text,
time timestamp,
mail_id uuid,
from_id text,
header text,
content_id uuid,
family_id uuid,
is_thread boolean,
is_read boolean,
is_starred boolean,
categories list,
PRIMARY KEY ((week,category_id,user_id),all_unread,time,mail_id)
)
WITH CLUSTERING ORDER BY (all_unread ASC, time DESC, mail_id ASC)

columnspec:
- name: week
population: FIXED(50000)

- name: category_id
  population: FIXED(100)

- name: user_id
  size: GAUSSIAN(17..35,27,3)

- name: all_unread
  size: FIXED(5)
  population: FIXED(2)

- name: from_id
  size: GAUSSIAN(17..35,27,3)

- name: header
  size: UNIFORM(0..10000)

- name: categories
  size: UNIFORM(8..100)

`

but i am getting an error "unconfigured columnfamily category_info" when i try to run insert test.

Thank in advance.

@jagadeesh4u
Copy link

May I know your CPU,RAM and Cluster Size??

@dragon-laurance
Copy link

Could you tell me what did the "cluster:uniform(20..40)" do?my English is not so good.
Thanks

@dragon-laurance
Copy link

how can i understand this?

Cluster distribution - Defines the distribution for the number of clustering prefixes within a given partition (default of FIXED(1))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment