daschl/samples.py

## samples.py
// Start the Shell
./pyspark --packages com.couchbase.client:spark-connector_2.10:1.2.1 --conf "spark.couchbase.bucket.travel-sample="


// Create a DF
>>> df = sqlContext.read.format("com.couchbase.spark.sql.DefaultSource").option("schemaFilter", "type=\"airline\"").load()

// Print the Schema
>>> df.printSchema()
root
 |-- META_ID: string (nullable = true)
 |-- callsign: string (nullable = true)
 |-- country: string (nullable = true)
 |-- iata: string (nullable = true)
 |-- icao: string (nullable = true)
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- type: string (nullable = true)

==========

Available options:
  - schemaFilter => the predicate used like above in the WHERE clause of each query that defines the schema/type (see http://developer.couchbase.com/documentation/server/4.5/connectors/spark-1.2/spark-sql.html)
  - bucket => if more than one bucket is open, bucket specifies the bucket name to use
  - idField => renames the document ID field, by default its META_ID and thats how you'd access it from your sparksql query
	// Start the Shell
	./pyspark --packages com.couchbase.client:spark-connector_2.10:1.2.1 --conf "spark.couchbase.bucket.travel-sample="


	// Create a DF
	>>> df = sqlContext.read.format("com.couchbase.spark.sql.DefaultSource").option("schemaFilter", "type=\"airline\"").load()

	// Print the Schema
	>>> df.printSchema()
	root
	\|-- META_ID: string (nullable = true)
	\|-- callsign: string (nullable = true)
	\|-- country: string (nullable = true)
	\|-- iata: string (nullable = true)
	\|-- icao: string (nullable = true)
	\|-- id: long (nullable = true)
	\|-- name: string (nullable = true)
	\|-- type: string (nullable = true)

	==========

	Available options:
	- schemaFilter => the predicate used like above in the WHERE clause of each query that defines the schema/type (see http://developer.couchbase.com/documentation/server/4.5/connectors/spark-1.2/spark-sql.html)
	- bucket => if more than one bucket is open, bucket specifies the bucket name to use
	- idField => renames the document ID field, by default its META_ID and thats how you'd access it from your sparksql query