Skip to content

Instantly share code, notes, and snippets.

@chongwangcc
Last active October 17, 2021 09:54
Show Gist options
  • Save chongwangcc/acfa10008f5dc36ee92cf5b35fe12c0f to your computer and use it in GitHub Desktop.
Save chongwangcc/acfa10008f5dc36ee92cf5b35fe12c0f to your computer and use it in GitHub Desktop.
[Pyspark] python 使用spark基础 #spark
import sys
print(sys.version)
from pyspark.sql import SparkSession
# Python3.8的例子--GPU机器本地运行,取消下面的注释
# import os
# os.environ["PYSPARK_PYTHON"] = "/usr/local/bin/python3.8"
spark = SparkSession\
.builder\
.appName("PythonWordCount1")\
.master("spark://192.168.100.13:7077")\
.config("spark.driver.memory", "500M") \
.config("spark.executor.memory", "500M") \
.getOrCreate()
# spark.conf.set("spark.executor.memory", "500M")
sc = spark.sparkContext
a = sc.parallelize([1, 2, 3])
b = a.flatMap(lambda x: (x, x**2))
print(a.collect())
print(b.collect())
# 关闭sc,这样别的应用可以使用
sc.stop()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment