Skip to content

Instantly share code, notes, and snippets.

@sharathgrao
Created March 12, 2024 17:36
Show Gist options
  • Save sharathgrao/939e1bc20aab963544df82eddbc8e751 to your computer and use it in GitHub Desktop.
Save sharathgrao/939e1bc20aab963544df82eddbc8e751 to your computer and use it in GitHub Desktop.
RDD to a DataFrame in Python using Spark
from pyspark.sql import SparkSession
# Create a SparkSession
spark = SparkSession.builder.appName("RDD to DataFrame").getOrCreate()
# Create an example RDD
data = [("Alice", 25), ("Bob", 30), ("Charlie", 28)]
rdd = spark.sparkContext.parallelize(data)
# Define column names
column_names = ["name", "age"]
# Convert RDD to DataFrame with column names
df = rdd.toDF(column_names)
# Display the DataFrame
df.show()
# Stop the SparkSession
spark.stop()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment