Created
March 17, 2019 06:38
-
-
Save corneliouzbett/0fbbf43193469e5ac7984733b5e73637 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sc = spark.sparkContext | |
# A JSON dataset is pointed to by path. | |
# The path can be either a single text file or a directory storing text files | |
path = "examples/src/main/resources/people.json" | |
peopleDF = spark.read.json(path) | |
# The inferred schema can be visualized using the printSchema() method | |
peopleDF.printSchema() | |
# root | |
# |-- age: long (nullable = true) | |
# |-- name: string (nullable = true) | |
# Creates a temporary view using the DataFrame | |
peopleDF.createOrReplaceTempView("people") | |
# SQL statements can be run by using the sql methods provided by spark | |
teenagerNamesDF = spark.sql("SELECT name FROM people WHERE age BETWEEN 13 AND 19") | |
teenagerNamesDF.show() | |
# +------+ | |
# | name| | |
# +------+ | |
# |Justin| | |
# +------+ | |
# Alternatively, a DataFrame can be created for a JSON dataset represented by | |
# an RDD[String] storing one JSON object per string | |
jsonStrings = ['{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}'] | |
otherPeopleRDD = sc.parallelize(jsonStrings) | |
otherPeople = spark.read.json(otherPeopleRDD) | |
otherPeople.show() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment