Last active
August 19, 2022 07:23
-
-
Save kishanpython/39179bf2c3e8b66cc9737009ce03293c to your computer and use it in GitHub Desktop.
pyspark repeat function notebook
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "pyspark_repeat.ipynb", | |
"provenance": [] | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# πππ©πππ() function in #pyspark βοΈ\n", | |
"\n", | |
"πππ©πππ():-\n", | |
"======\n", | |
"βοΈ In pyspark to repeat the column values, we will be using the repeat() function. \n", | |
"\n", | |
"βοΈ π«ππ©πππ(π¬ππ«, π§) - Returns the string which repeats the given string value π§ times.\n", | |
"\n", | |
"=================================================\n", | |
"\n", | |
"Follow for more:- \n", | |
"https://www.linkedin.com/in/kishanyadav/\n", | |
"\n", | |
"\n", | |
"############...... H@@py Learning!!! .......##########" | |
], | |
"metadata": { | |
"id": "55i2WzI53ceu" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# install pyspark in google colab\n", | |
"!pip install pyspark" | |
], | |
"metadata": { | |
"id": "eq0lZ5biTCFe" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# importing neccessary libs\n", | |
"from pyspark.sql import SparkSession\n", | |
"from pyspark.sql.functions import expr, repeat\n", | |
"\n", | |
"# creating session\n", | |
"spark = SparkSession.builder.appName(\"practice\").getOrCreate()\n", | |
"\n", | |
"# # create dataframe\n", | |
"data = [(\"Prashant\",25, 80), (\"Ankit\",26, 90),(\"Ramakant\", 24, 85)]\n", | |
"columns= [\"student_name\", \"student_age\", \"student_score\"]\n", | |
"df_students = spark.createDataFrame(data = data, schema = columns)\n", | |
"df_students.show()\n", | |
"\n", | |
"# repeating the column (student_name) twice and saving results in new column\n", | |
"df_repeated = df_students.withColumn(\"student_name_repeated\",(expr(\"repeat(student_name, 2)\")))\n", | |
"df_repeated.show()" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "6pSMJtnR2aJF", | |
"outputId": "fc06ddc7-4d29-4ad8-a456-a9d2d8ae6b2a" | |
}, | |
"execution_count": 4, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"+------------+-----------+-------------+\n", | |
"|student_name|student_age|student_score|\n", | |
"+------------+-----------+-------------+\n", | |
"| Prashant| 25| 80|\n", | |
"| Ankit| 26| 90|\n", | |
"| Ramakant| 24| 85|\n", | |
"+------------+-----------+-------------+\n", | |
"\n", | |
"+------------+-----------+-------------+---------------------+\n", | |
"|student_name|student_age|student_score|student_name_repeated|\n", | |
"+------------+-----------+-------------+---------------------+\n", | |
"| Prashant| 25| 80| PrashantPrashant|\n", | |
"| Ankit| 26| 90| AnkitAnkit|\n", | |
"| Ramakant| 24| 85| RamakantRamakant|\n", | |
"+------------+-----------+-------------+---------------------+\n", | |
"\n" | |
] | |
} | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment