Created
August 22, 2016 07:07
-
-
Save daschl/20c9d64dcb254256cbc70ee63843a853 to your computer and use it in GitHub Desktop.
Couchbase Spark Samples
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Start the Shell | |
./pyspark --packages com.couchbase.client:spark-connector_2.10:1.2.1 --conf "spark.couchbase.bucket.travel-sample=" | |
// Create a DF | |
>>> df = sqlContext.read.format("com.couchbase.spark.sql.DefaultSource").option("schemaFilter", "type=\"airline\"").load() | |
// Print the Schema | |
>>> df.printSchema() | |
root | |
|-- META_ID: string (nullable = true) | |
|-- callsign: string (nullable = true) | |
|-- country: string (nullable = true) | |
|-- iata: string (nullable = true) | |
|-- icao: string (nullable = true) | |
|-- id: long (nullable = true) | |
|-- name: string (nullable = true) | |
|-- type: string (nullable = true) | |
========== | |
Available options: | |
- schemaFilter => the predicate used like above in the WHERE clause of each query that defines the schema/type (see http://developer.couchbase.com/documentation/server/4.5/connectors/spark-1.2/spark-sql.html) | |
- bucket => if more than one bucket is open, bucket specifies the bucket name to use | |
- idField => renames the document ID field, by default its META_ID and thats how you'd access it from your sparksql query |
@markmikostv you can provide more "bucket" properties on startup and then you need to, on each dataframe add an option saying which bucket you want.
I am looking to infer the schema first from either an id or a json document then load my data based on a "type="airline"".
Whats the best way to infer the schema then load the data?
df = sqlContext.read.format("com.couchbase.spark.sql.DefaultSource").option("schemaFilter", "type="airline"").load()
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
Thanks this has been really helpful.
At the moment I have 3 buckets - Game, Beer and Airline. If I have linked to more than one bucket within my interpreter how do I specify a specific bucket?
thanks,
Mark