This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# membuat dataframe baru | |
schemaGender = StructType([ | |
StructField("id", IntegerType(), True), | |
StructField("name", StringType(), True)]) | |
dataGender = sparkSession.createDataFrame([(0, "Female"), (1, "Male")], schemaGender) | |
dataGender.show() | |
# join dengan table lama | |
print(dataFrameFile.columns) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// membuat dataframe baru | |
StructType schema = DataTypes.createStructType(new StructField[]{ | |
DataTypes.createStructField("id", DataTypes.IntegerType, false), | |
DataTypes.createStructField("name", DataTypes.StringType, false) | |
}); | |
Dataset<Row> dataGender = sparkSession.createDataFrame(Arrays.asList( | |
RowFactory.create(0, "Female"), | |
RowFactory.create(1, "Male") | |
), schema); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# menambah column company | |
@udf() | |
def mapping_function_company(email): | |
sub_by_at = email[email.find("@"):] | |
return sub_by_at[1:sub_by_at.find('.')] | |
dataFrameFile = dataFrameFile.withColumn("company", mapping_function_company("email")) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// menambah column company | |
UserDefinedFunction mappingFunctionCompany = udf( | |
(String email) -> { | |
String subByAt = email.substring(email.indexOf('@')); | |
return subByAt.substring(1, subByAt.indexOf('.')); | |
}, | |
DataTypes.StringType | |
); | |
datasetFile = datasetFile.withColumn("company", mappingFunctionCompany.apply( |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// ambil file nanti disimpan ke RDD (bentuknya semacam list) | |
JavaRDD<String> rddFile = sparkContext.textFile("/Users/rya.meyvriska/Downloads/mock_data.csv"); | |
rddFile.take(3).forEach(element -> System.out.println(element)); | |
// membaca file ke dataset | |
Dataset<Row> datasetFile = sparkSession.read() | |
.option("header", "true") | |
.csv("/Users/rya.meyvriska/Downloads/mock_data.csv"); | |
datasetFile.show(); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# simpan file ke RDD (nanti bentuknya semacam list) | |
rddFile = sparkContext.textFile("/Users/rya.meyvriska/Downloads/mock_data.csv") | |
for v in rddFile.take(3): | |
print(v) | |
# membaca file ke dataframe | |
dataFrameFile = sparkSession.read.option("header", "true").csv("/Users/rya.meyvriska/Downloads/mock_data.csv") | |
dataFrameFile.show() | |
dataFrameFile.describe().show() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# membuat RDD dari String | |
# membuat string yang nantinya ingin disimpan pada RDD | |
story = "In the song, Maui told Moana about his amazing deeds. Why - he pulled up the islands from the sea," + | |
"he lifted the sky, he even found fire and gave it to humans! As a demi-god, Maui was born with" + | |
"special powers. Demi-god means one parent is a god and the other is human. Maui’s father was the god" + | |
"and his mother was human." | |
# menyimpan string pada rdd | |
rddStory = sparkContext.parallelize([story]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// membuat RDD dari String | |
// membuat string yang nantinya ingin disimpan pada RDD | |
String story = "In the song, Maui told Moana about his amazing deeds. Why - he pulled up the islands from the sea," + | |
"he lifted the sky, he even found fire and gave it to humans! As a demi-god, Maui was born with" + | |
"special powers. Demi-god means one parent is a god and the other is human. Maui’s father was the god" + | |
"and his mother was human." | |
// bikin semacam list di spark dari story yg udah dibikin, karena isinya 1 jadinya singletonList | |
JavaRDD<String> rddStory = sparkContext.parallelize(Collections.singletonList(story)); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# membuat konfigurasi spark | |
sparkConf = SparkConf().setMaster("local").setAppName("Python Spark Playground") | |
# membuat spark context | |
sparkContext = SparkContext(sparkConf) | |
# membuat spark session | |
sparkSession = SparkSession(sparkContext) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// konfigrasi spark context (aplikasi spark) | |
SparkConf sparkConf = new SparkConf(); | |
sparkConf.setMaster("local"); | |
sparkConf.setAppName("learn-spark"); | |
// membuat spark context | |
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); | |
sparkContext.setLogLevel("ERROR"); | |
// khusus untuk dataset |