Skip to content

Instantly share code, notes, and snippets.

@vanjos
Last active August 29, 2015 14:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vanjos/3e8d78e35aea6d690561 to your computer and use it in GitHub Desktop.
Save vanjos/3e8d78e35aea6d690561 to your computer and use it in GitHub Desktop.
Stargate Explained (Lucene indexing ON Cassandra) to replace Lucandra/Solandra and SOLR (from DSE)

STARGATE INSTALL

  • Prerequisites
  • ubuntu
  • java 1.7
  • likely want openjdk-7-jre-headless openjdk-7-jre-lib openjdk-7-jdk
  • cassandra 1.2.15 OR 2.0.6

FOR 1.2.15

  • install gradle
sudo add-apt-repository ppa:cwchien/gradle
sudo apt-get update
sudo apt-get install gradle
  • install stargate
git clone https://github.com/tuplejump/stargate-core
cd stargate-core/
git checkout for1.2.15
  • configure
cd stargate-core
gradle jar
cp build/libs/* /<cassandra lib folder>
  • add some options to C*
JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/aspectjweaver-1.7.4.jar "
echo "***************** STARTING WITH STARGATE ******************"

TESTING IT OUT

Open cassandra/bin/cqlsh and optionally create a keyspace.

CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1 };

Change into you keyspace

cqlsh> USE MY_KEYSPACE;

Create some dummy table

CREATE TABLE TAG(key varchar primary key, tags varchar, state varchar);

with some data

	INSERT INTO TAG(key,tags,state) values ('1','hello1', 'CA');
	INSERT INTO TAG(key,tags,state) values ('2','hello1', 'LA');
	INSERT INTO TAG(key,tags,state) values ('3','hello1', 'NY');
	INSERT INTO TAG(key,tags,state) values ('4','hello1', 'TX');
	INSERT INTO TAG(key,tags,state) values ('5','hello2', 'CA');
	INSERT INTO TAG(key,tags,state) values ('6','hello2', 'NY');
	INSERT INTO TAG(key,tags,state) values ('7','hello2', 'CA');
	INSERT INTO TAG(key,tags,state) values ('8','hello2', 'TX');

Create a PER-COLUMN Index

-- index on tags column
CREATE CUSTOM INDEX tagsindex ON TAG(tags) USING 'com.tuplejump.stargate.cas.PerColIndex';
-- another index on state column
CREATE CUSTOM INDEX stateindex ON TAG(state) USING 'com.tuplejump.stargate.cas.PerColIndex';

Query on PER-COLUMN Index

-- term query
SELECT * FROM TAG where tags = 'hello1';
-- Range query
SELECT * FROM TAG where tags = '[hello1 TO hello2]';
-- Boolean operators
SELECT * FROM TAG where tags = 'hello1 OR hello2';
-- multiple predicates
SELECT * FROM TAG where tags = 'hello1' AND state='CA' ALLOW FILTERING;
SELECT * FROM TAG where tags = 'hello*';

-- wild card -- need to specify additional configuration. See

Search Options.

SELECT * FROM TAG where tags = 'l?1';

PER-ROW Index

A single column query using a Per Row Index works just like querying a per column index

Additional columns in the query may be specified in 2 ways

  • COMBINING MULTIPLE PREDICATES NORMALLY
SELECT * FROM TAG where tags = 'hello1' AND state='CA' ALLOW FILTERING;
SELECT * FROM TAG where tags = 'hello*' AND state='CA' ALLOW FILTERING;
-- need to specify additional configuration for allowing wild card on tags. See

Search Options.

SELECT * FROM TAG where tags = 'l?2' AND state='CA';

Native secondary indexes in Cassandra have some limitations. Cassandra uses only one of the native indexes even if you defined multiple in a query. With 'ALLOW FILTERING' suffixed to your query, it allows you to brute force filtering for additional columns.

A Stargate Per Row Index does not suffer from this limitation. Since all the columns are in the same index, it can filter using all configured columns which were specified in the filter clause even though you add 'ALLOW FILTERING'. However only the 'AND' operator is supported. Other operators can be specified using a lucene query.

  • LUCENE QUERY on PRIMARY COLUMN.
SELECT * FROM TAG where tags = 'tags:"hello1" OR state:"CA"';
-- need to specify additional configuration for allowing wild card on tags. See:

Search Options.

SELECT * FROM TAG where tags = 'tags:hello* AND state:"CA"';

The Primary column can be used to search for additional fields. It becomes the default field for the lucene query parser.

You can use Lucene syntax and Classic Lucene Query syntax to construct all types of queries on multiple columns. For this additional configuration needs to be provided. See Search Options

SECOND/THIRD EXAMPLE

USER -- with height and weight as double

create table USER (
fname varchar,
lname varchar,
sex varchar,
city varchar,
id bigint,
height double,
eyecolour varchar,
weight double,
primary key (id));
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (1,'Victor','Anjos','M','Toronto','Blue',6.083,185);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (2,'Filipe','Anjos','M','Toronto','Blue',6.083,185);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (3,'Victor','Da Silva','M','Toronto','Blue',6.083,185);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (4,'Filipe','Da Silva','M','Toronto','Blue',6.083,185);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (5,'Suzana','Anjos','F','Montreal','Blue',5.75,125);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (6,'Maria','Anjos','F','Brampton','Brown',5.5,120);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (7,'Kendra','Kam','F','Markham','Brown',5.083,110);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (8,'Rhino','Kanjos','M','Toronto','Brown',1.083,36);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (9,'Joseph','Bastarache','M','Montreal','Blue',3.25,45);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (10,'Lucas','Bastarache','M','Montreal','Blue',3.25,42);
insert into USER(id,fname,lname,sex,city,eyecolour,height,weight) values (11,'Francois','Basarache','M','Montreal','Blue',5.833,152);

USER2 -- with height and weight as varchar

CREATE CUSTOM INDEX fnameindex ON USER(fname) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX lnameindex ON USER(lname) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX sexindex ON USER(sex) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX cityindex ON USER(city) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX eyecolourindex ON USER(eyecolour) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX heightindex ON USER(height) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX weightindex ON USER(weight) USING 'com.tuplejump.stargate.cas.PerColIndex';
create table USER2 (
fname varchar,
lname varchar,
sex varchar,
city varchar,
id bigint,
height varchar,
eyecolour varchar,
weight varchar,
primary key (id));
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (1,'Victor','Anjos','M','Toronto','Blue','6.083','185');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (2,'Filipe','Anjos','M','Toronto','Blue','6.083','185');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (3,'Victor','Da Silva','M','Toronto','Blue','6.083','185');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (4,'Filipe','Da Silva','M','Toronto','Blue','6.083','185');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (5,'Suzana','Anjos','F','Montreal','Blue','5.75','125');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (6,'Maria','Anjos','F','Brampton','Brown','5.5','120');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (7,'Kendra','Kam','F','Markham','Brown','5.083','110');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (8,'Rhino','Kanjos','M','Toronto','Brown','1.083','36');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (9,'Joseph','Bastarache','M','Montreal','Blue','3.25','45');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (10,'Lucas','Bastarache','M','Montreal','Blue','3.25','42');
insert into USER2(id,fname,lname,sex,city,eyecolour,height,weight) values (11,'Francois','Basarache','M','Montreal','Blue','5.833','152');
CREATE CUSTOM INDEX fnameindex2 ON USER2(fname) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX lnameindex2 ON USER2(lname) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX sexindex2 ON USER2(sex) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX cityindex2 ON USER2(city) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX eyecolourindex2 ON USER2(eyecolour) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX heightindex2 ON USER2(height) USING 'com.tuplejump.stargate.cas.PerColIndex';
CREATE CUSTOM INDEX weightindex2 ON USER2(weight) USING 'com.tuplejump.stargate.cas.PerColIndex';

QUERIES --- against both USER and USER2

--- these will be same for USER or USER2
select * from USER where lname = 'A*';
select * from USER where lname = 'Ba*e';
select * from USER where fname = 'Victor' AND lname = 'Anjos OR Da Silva' ALLOW FILTERING;
select * from USER where lname = 'Ba?tarache';
select * from USER where lname = 'Ba*arache';
select * from USER where eyecolour = 'Blu*' AND city = 'Montreal OR Toronto' ALLOW FILTERING;

--- these will only work against USER2 (with height/weight as varchar)
select * from USER2 where height = '[2 TO 7]';
select * from USER2 where height = '[5.5 TO 6.5]' AND weight = '[150 TO 200]' AND eyecolour = 'B*' ALLOW FILTERING;
  • SOME LEARNINGS
  • Cannot name indices the same in any part of the same keyspace (i think keyspace)
  • OUTSTANDING QUESTIONS
  • Should all fields now be VARCHAR?
  • How would we use PER-ROW Index instead of PER-COLUMN
  • How will this work against 2.0.6+
  • How well does this work across very large clusters
  • Does this remove the need for TRUTHINESS TABLES
  • Maybe test this against a dev version of API counts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment