This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM python:3.8.1-buster as python-base | |
ENV PYTHONUNBUFFERED=TRUE | |
ENV PYTHONDONTWRITEBYTECODE=TRUE | |
RUN pip install sklearn | |
COPY . /opt/code | |
ENTRYPOINT ["python", "main.py"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM python:3.8.1-buster as python-base | |
ENV PYTHONUNBUFFERED=TRUE | |
ENV PYTHONDONTWRITEBYTECODE=TRUE | |
RUN pip install sklearn | |
ENTRYPOINT ["python", "main.py"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
students.join(majors, Seq("student_id"), "full").show() | |
+----------+------------+----------------+ | |
|student_id|student_name| major| | |
+----------+------------+----------------+ | |
| 1| John| null| | |
| 3| Mary| History| | |
| 4| Jane| null| | |
| 2| Bill|Computer Science| | |
+----------+------------+----------------+ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
students.join(colleges, Seq("student_id"), "right").show() | |
+----------+------------+--------------------+ | |
|student_id|student_name| college_name| | |
+----------+------------+--------------------+ | |
| 1| John| Harvard| | |
| 1| John| Stanford| | |
| 3| Mary| University of Texas| | |
| 3| Mary| Columbia| | |
| 4| Jane|University of Was...| |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
students.join(colleges, Seq("student_id"), "left").show() | |
+----------+------------+--------------------+ | |
|student_id|student_name| college_name| | |
+----------+------------+--------------------+ | |
| 1| John| Stanford| | |
| 1| John| Harvard| | |
| 2| Bill| null| | |
| 3| Mary| Columbia| | |
| 3| Mary| University of Texas| |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.sql.functions._ | |
val multiUDF = udf((value: Double) => { | |
value - 10 | |
}) | |
val scoresDF = sc.parallelize( | |
Array(("Fred", 82.0), ("Fred", 90.0), ("Fred", 12.0)) | |
) | |
.toDF("key", "value") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val partition = sc.parallelize(Seq( | |
("1234", 1), | |
("1234", 1), | |
("1234", 1) | |
)) | |
val result = partition.reduceByKey(_ + _) | |
// ("1234", 3) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val partition = sc.parallelize(Seq( | |
("1234", 1), | |
("1234", 1), | |
("1234", 1) | |
)).toDF("key", "value") | |
partition.groupBy("key").agg(sum('value)) | |
// ("1234", 3) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val scoresRDD = sc.parallelize( | |
Array(("Fred", 82.0), ("Fred", 90.0), ("Fred", 12.0)) | |
) | |
val createScoreCombiner = (score: Double) => List(score) | |
val scoreCombiner = (collector: List[Double}, score: Double) => { | |
collection += score | |
} | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val scoresDF = sc.parallelize( | |
Array(("Fred", 82.0), ("Fred", 90.0), ("Fred", 12.0)) | |
) | |
.toDF("key", "value") | |
val scores = scoresDF.groupBy('key).agg(collect_list('value)) | |
// ("Fred", List(82.0, 90.0, 12.0)) |
NewerOlder