This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark import SparkContext, SparkConf | |
def display_words(words): | |
for w, we in words.items(): | |
print("{} : {}".format(w, we)) | |
if __name__ == "__main__": | |
conf = SparkConf().setAppName("word count").setMaster("local[2]") | |
sc = SparkContext(conf = conf) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark import SparkContext, SparkConf | |
if __name__ == "__main__": | |
conf = SparkConf().setAppName("take").setMaster("local[*]") | |
sc = SparkContext(conf = conf) | |
inputWords = ["spark", "hadoop", "spark", "hive", "pig", "cassandra", "hadoop"] | |
wordRdd = sc.parallelize(inputWords) | |
words = wordRdd.take(3) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark = SparkSession.builder\ | |
.appName("Python Spark SQL basic example")\ | |
.config("spark.some.config.option", "") | |
.getOrCreate() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import java.util.Arrays; | |
import java.util.Collections; | |
import java.io.Serializable; | |
import org.apache.spark.api.java.function.MapFunction; | |
import org.apache.spark.sql.Dataset; | |
import org.apache.spark.sql.Row; | |
import org.apache.spark.sql.Encoder; | |
import org.apache.spark.sql.Encoders; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql import Row | |
sc = spark.sparkContext | |
# Load a text file and convert each line to a Row. | |
lines = sc.textFile("examples/src/main/resources/people.txt") | |
parts = lines.map(lambda l: l.split(",")) | |
people = parts.map(lambda p: Row(name=p[0], age=int(p[1]))) | |
# Infer the schema, and register the DataFrame as a table. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Loading data for JDBC source | |
jdbcDF = spark.read\ | |
.format("jdbc")\ | |
.option("url", "jdbc:postgresql:dbserver")\ | |
.option("dbtable", "schema.tablename") \ | |
.option("user", "username") \ | |
.option("password", "password") \ | |
.load() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sc = spark.sparkContext | |
# A JSON dataset is pointed to by path. | |
# The path can be either a single text file or a directory storing text files | |
path = "examples/src/main/resources/people.json" | |
peopleDF = spark.read.json(path) | |
# The inferred schema can be visualized using the printSchema() method | |
peopleDF.printSchema() | |
# root |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A lambda function can take any number of arguments, but can only have one expression | |
x = lamda a,b,c : (a + b) * c | |
print(x(1,2,3)) | |
# output = 9 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. | |
# RegEx can be used to check if a string contains the specified search pattern | |
import re | |
text = "The above code is for dummies like you" | |
# Check if the string starts with "The" and ends with "Spain": | |
x = re.search("^The.*code$", text) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import datetime | |
dt = datetime.datetime.now() | |
print(dt) | |
print(dt.year) | |
print(dt.month) |
OlderNewer