vmarquez/CassandraIOFeatureProposal.java

## CassandraIOFeatureProposal.java
public class Event {
 @PartitionKey(0) public UUID accountId
 @PartitionKey(1)public String yearMonthDay;
 @ClusteringKey public UUID eventId;
  //other data...
}

public static void sampleUsage() {
  //we want to ONLY query data from three years ago for a set of accounts, so we will generate that somehow.
  //Also note that one token will likely generate many Events...
  Set<UUID> accounts = getRelevantAccounts();
  List<String> dateRange = generateDateRange("2016-01-01", "2017-01-01");
  PCollection<Token> tokensToQuery = p.apply(generateMyTokens(accounts, dateRange)); //Note token is not serializable, we can represent it with a custom class wrapping byte arrays

  PCollection<Event> events = tokensToQuery.apply(CassandraIO.<Event>readAll("Select * from Event where token(accountId, yearMonthDay) = ?"));
  //query above or even just table could be specificed wtih the builder pattern, this is just an example.


}
/*
   Currently CassandraIO queries over the entire token range, and allows for filtering.  Obviously if we want to exclude tens or
   hundreds of thousands of primary keys this won't work so well, so instead i'm proposing a way to supply a list of Tokens to query.
   Similar to how CassandraIO currently bunches up token range queries as a List<List<Query>> we can do the same under the hood, ideally grouping by the node
   that owns the token.

   I believe it would also be possible to, under the hood, make the current implementation use something similar, where it would take a PCollection<TokenRange>
   and a query, and in the above proposed case each token range would only span one actual token.
*/
	public class Event {
	@PartitionKey(0) public UUID accountId
	@PartitionKey(1)public String yearMonthDay;
	@ClusteringKey public UUID eventId;
	//other data...
	}

	public static void sampleUsage() {
	//we want to ONLY query data from three years ago for a set of accounts, so we will generate that somehow.
	//Also note that one token will likely generate many Events...
	Set<UUID> accounts = getRelevantAccounts();
	List<String> dateRange = generateDateRange("2016-01-01", "2017-01-01");
	PCollection<Token> tokensToQuery = p.apply(generateMyTokens(accounts, dateRange)); //Note token is not serializable, we can represent it with a custom class wrapping byte arrays

	PCollection<Event> events = tokensToQuery.apply(CassandraIO.<Event>readAll("Select * from Event where token(accountId, yearMonthDay) = ?"));
	//query above or even just table could be specificed wtih the builder pattern, this is just an example.


	}
	/*
	Currently CassandraIO queries over the entire token range, and allows for filtering. Obviously if we want to exclude tens or
	hundreds of thousands of primary keys this won't work so well, so instead i'm proposing a way to supply a list of Tokens to query.
	Similar to how CassandraIO currently bunches up token range queries as a List<List<Query>> we can do the same under the hood, ideally grouping by the node
	that owns the token.

	I believe it would also be possible to, under the hood, make the current implementation use something similar, where it would take a PCollection<TokenRange>
	and a query, and in the above proposed case each token range would only span one actual token.
	*/