sample spark2 application demonstrating dataset api
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[root@rkk1 Spark2StarterApp]# /usr/hdp/current/spark2-client/bin/spark-shell | |
Setting default log level to "WARN". | |
To adjust logging level use sc.setLogLevel(newLevel). | |
16/11/30 18:01:48 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect. | |
Spark context Web UI available at http://172.26.81.127:4040 | |
Spark context available as 'sc' (master = local[*], app id = local-1480528906336). | |
Spark session available as 'spark'. | |
Welcome to | |
____ __ | |
/ __/__ ___ _____/ /__ | |
_\ \/ _ \/ _ `/ __/ '_/ | |
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0.2.5.0.0-1133 | |
/_/ | |
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) | |
Type in expressions to have them evaluated. | |
Type :help for more information. | |
scala> import spark.implicits._ | |
import spark.implicits._ | |
scala> val dsArr = spark.read.text("/tmp/sample_07.txt").as[String] | |
dsArr: org.apache.spark.sql.Dataset[String] = [value: string] | |
scala> val words = dsArr.flatMap(value => value.split("\\s+")) | |
words: org.apache.spark.sql.Dataset[String] = [value: string] | |
scala> val groupedWords = words.groupByKey(_.toLowerCase) | |
groupedWords: org.apache.spark.sql.KeyValueGroupedDataset[String,String] = org.apache.spark.sql.KeyValueGroupedDataset@5d9627d3 | |
scala> val counts = groupedWords.count() | |
counts: org.apache.spark.sql.Dataset[(String, Long)] = [value: string, count(1): bigint] | |
scala> counts.show() | |
+----------+--------+ | |
| value|count(1)| | |
+----------+--------+ | |
| 13-2081| 1| | |
| 349140| 1| | |
| 45300| 1| | |
|electrical| 8| | |
| 49290| 1| | |
| 77930| 1| | |
| 9030| 2| | |
| art| 1| | |
| 52800| 1| | |
|paramedics| 1| | |
| 49630| 1| | |
| 39-4021| 1| | |
| travel| 3| | |
| 43-5111| 1| | |
| 47-3013| 1| | |
| 1090| 1| | |
| 49-2095| 1| | |
| 18130| 1| | |
| still| 1| | |
| surveying| 1| | |
+----------+--------+ | |
only showing top 20 rows | |
scala> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment