Created
August 22, 2016 07:07
-
-
Save daschl/20c9d64dcb254256cbc70ee63843a853 to your computer and use it in GitHub Desktop.
Couchbase Spark Samples
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Start the Shell | |
./pyspark --packages com.couchbase.client:spark-connector_2.10:1.2.1 --conf "spark.couchbase.bucket.travel-sample=" | |
// Create a DF | |
>>> df = sqlContext.read.format("com.couchbase.spark.sql.DefaultSource").option("schemaFilter", "type=\"airline\"").load() | |
// Print the Schema | |
>>> df.printSchema() | |
root | |
|-- META_ID: string (nullable = true) | |
|-- callsign: string (nullable = true) | |
|-- country: string (nullable = true) | |
|-- iata: string (nullable = true) | |
|-- icao: string (nullable = true) | |
|-- id: long (nullable = true) | |
|-- name: string (nullable = true) | |
|-- type: string (nullable = true) | |
========== | |
Available options: | |
- schemaFilter => the predicate used like above in the WHERE clause of each query that defines the schema/type (see http://developer.couchbase.com/documentation/server/4.5/connectors/spark-1.2/spark-sql.html) | |
- bucket => if more than one bucket is open, bucket specifies the bucket name to use | |
- idField => renames the document ID field, by default its META_ID and thats how you'd access it from your sparksql query |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@daschl
I am looking to infer the schema first from either an id or a json document then load my data based on a "type="airline"".
Whats the best way to infer the schema then load the data?