Skip to content

Instantly share code, notes, and snippets.

@kishanpython
Last active August 19, 2022 07:23
Show Gist options
  • Save kishanpython/39179bf2c3e8b66cc9737009ce03293c to your computer and use it in GitHub Desktop.
Save kishanpython/39179bf2c3e8b66cc9737009ce03293c to your computer and use it in GitHub Desktop.
pyspark repeat function notebook
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "pyspark_repeat.ipynb",
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# π‘πžπ©πžπšπ­() function in #pyspark ✍️\n",
"\n",
"π‘πžπ©πžπšπ­():-\n",
"======\n",
"β˜‘οΈ In pyspark to repeat the column values, we will be using the repeat() function. \n",
"\n",
"β˜‘οΈ 𝐫𝐞𝐩𝐞𝐚𝐭(𝐬𝐭𝐫, 𝐧) - Returns the string which repeats the given string value 𝐧 times.\n",
"\n",
"=================================================\n",
"\n",
"Follow for more:- \n",
"https://www.linkedin.com/in/kishanyadav/\n",
"\n",
"\n",
"############...... H@@py Learning!!! .......##########"
],
"metadata": {
"id": "55i2WzI53ceu"
}
},
{
"cell_type": "code",
"source": [
"# install pyspark in google colab\n",
"!pip install pyspark"
],
"metadata": {
"id": "eq0lZ5biTCFe"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# importing neccessary libs\n",
"from pyspark.sql import SparkSession\n",
"from pyspark.sql.functions import expr, repeat\n",
"\n",
"# creating session\n",
"spark = SparkSession.builder.appName(\"practice\").getOrCreate()\n",
"\n",
"# # create dataframe\n",
"data = [(\"Prashant\",25, 80), (\"Ankit\",26, 90),(\"Ramakant\", 24, 85)]\n",
"columns= [\"student_name\", \"student_age\", \"student_score\"]\n",
"df_students = spark.createDataFrame(data = data, schema = columns)\n",
"df_students.show()\n",
"\n",
"# repeating the column (student_name) twice and saving results in new column\n",
"df_repeated = df_students.withColumn(\"student_name_repeated\",(expr(\"repeat(student_name, 2)\")))\n",
"df_repeated.show()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6pSMJtnR2aJF",
"outputId": "fc06ddc7-4d29-4ad8-a456-a9d2d8ae6b2a"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"+------------+-----------+-------------+\n",
"|student_name|student_age|student_score|\n",
"+------------+-----------+-------------+\n",
"| Prashant| 25| 80|\n",
"| Ankit| 26| 90|\n",
"| Ramakant| 24| 85|\n",
"+------------+-----------+-------------+\n",
"\n",
"+------------+-----------+-------------+---------------------+\n",
"|student_name|student_age|student_score|student_name_repeated|\n",
"+------------+-----------+-------------+---------------------+\n",
"| Prashant| 25| 80| PrashantPrashant|\n",
"| Ankit| 26| 90| AnkitAnkit|\n",
"| Ramakant| 24| 85| RamakantRamakant|\n",
"+------------+-----------+-------------+---------------------+\n",
"\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment