Created
August 19, 2022 12:25
-
-
Save kishanpython/b55ab49d742f6233ec5b2ec016e80c07 to your computer and use it in GitHub Desktop.
Use of ๐ฌ๐ญ๐๐ซ๐ญ๐ฌ๐ฐ๐ข๐ญ๐ก() and ๐๐ง๐๐ฌ๐ฐ๐ข๐ญ๐ก() in #pyspark โ๏ธ
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "pyspark_startswith_and_endswith.ipynb", | |
"provenance": [] | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Use of ๐ฌ๐ญ๐๐ซ๐ญ๐ฌ๐ฐ๐ข๐ญ๐ก() and ๐๐ง๐๐ฌ๐ฐ๐ข๐ญ๐ก() in #pyspark โ๏ธ\n", | |
"\n", | |
"\n", | |
"###๐๐ญ๐๐ซ๐ญ๐ฌ๐ฐ๐ข๐ญ๐ก():-\n", | |
"\n", | |
"========\n", | |
"\n", | |
"โ๏ธ It will return the boolean value (๐ง๐ฟ๐๐ฒ/๐๐ฎ๐น๐๐ฒ).\n", | |
"\n", | |
"\n", | |
"\n", | |
"โ๏ธ It return ๐ง๐ฟ๐๐ฒ when DataFrame column value starts with a string specified as an argument to this method.\n", | |
"\n", | |
" \n", | |
"\n", | |
"โ๏ธ If Not matched returns ๐๐ฎ๐น๐๐ฒ.\n", | |
"\n", | |
"\n", | |
"\n", | |
"===========================================================\n", | |
"\n", | |
"\n", | |
"\n", | |
"###๐๐ง๐๐ฌ๐ฐ๐ข๐ญ๐ก():-\n", | |
"\n", | |
"========\n", | |
"\n", | |
"โ๏ธ It will return the boolean value (๐ง๐ฟ๐๐ฒ/๐๐ฎ๐น๐๐ฒ).\n", | |
"\n", | |
"\n", | |
"\n", | |
"โ๏ธ It return ๐ง๐ฟ๐๐ฒ when DataFrame column value ends with a string specified as an argument to this method. \n", | |
"\n", | |
"\n", | |
"\n", | |
"โ๏ธ If Not matched returns ๐๐ฎ๐น๐๐ฒ.\n", | |
"\n", | |
"\n", | |
"\n", | |
"๐๐จ๐ญ๐:-\n", | |
"\n", | |
"==== \n", | |
"\n", | |
"โ ๏ธ Return ๐๐๐๐, if either of column values or input strings is ๐๐๐๐.\n", | |
"\n", | |
"โ ๏ธ Return ๐ง๐ฟ๐๐ฒ, if input check strings is empty.\n", | |
"\n", | |
"โ ๏ธ These methods are case-sensitive. \n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"=================================================\n", | |
"\n", | |
"Follow for more:- \n", | |
"https://www.linkedin.com/in/kishanyadav/\n", | |
"\n", | |
"\n", | |
"############...... H@@py Learning!!! .......##########" | |
], | |
"metadata": { | |
"id": "55i2WzI53ceu" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# install pyspark in google colab\n", | |
"!pip install pyspark" | |
], | |
"metadata": { | |
"id": "eq0lZ5biTCFe" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# importing neccessary libs\n", | |
"from pyspark.sql import SparkSession\n", | |
"from pyspark.sql.functions import col\n", | |
"\n", | |
"# creating session\n", | |
"spark = SparkSession.builder.appName(\"practice\").getOrCreate()\n", | |
"\n", | |
"# # create dataframe\n", | |
"data = [(\"Prashant\",25, 80), (\"Ankit\",26, 90),(\"Ramakant\", 24, 85), (None, 23, 87)]\n", | |
"columns= [\"student_name\", \"student_age\", \"student_score\"]\n", | |
"df_students = spark.createDataFrame(data = data, schema = columns)\n", | |
"df_students.show()\n", | |
"\n", | |
"# Result return will be boolean\n", | |
"# the output val is null for last row value b/c the corresponding value of students_name column is NULL \n", | |
"df_internal_res = df_students.select(col(\"student_name\").endswith(\"it\").alias(\"internal_bool_val\"))\n", | |
"df_internal_res.show()\n", | |
"\n", | |
"# Now we use the filter() method to fetch rows correspond to True value.\n", | |
"# first we see result for startswith method\n", | |
"df_check_start = df_students.filter(col(\"student_name\").startswith(\"Pra\"))\n", | |
"df_check_start.show()\n", | |
"\n", | |
"# let see endswith results\n", | |
"df_check_end = df_students.filter(col(\"student_name\").endswith(\"ant\"))\n", | |
"df_check_end.show()\n", | |
"\n", | |
"# if arguments in methods is empty then o/p will be True \n", | |
"df_check_empty = df_students.filter(col(\"student_name\").endswith(\"\"))\n", | |
"df_check_empty.show()\n" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "6pSMJtnR2aJF", | |
"outputId": "20dfc1bd-6323-48b4-cfda-29161fec14ea" | |
}, | |
"execution_count": 20, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"+------------+-----------+-------------+\n", | |
"|student_name|student_age|student_score|\n", | |
"+------------+-----------+-------------+\n", | |
"| Prashant| 25| 80|\n", | |
"| Ankit| 26| 90|\n", | |
"| Ramakant| 24| 85|\n", | |
"| null| 23| 87|\n", | |
"+------------+-----------+-------------+\n", | |
"\n", | |
"+-----------------+\n", | |
"|internal_bool_val|\n", | |
"+-----------------+\n", | |
"| false|\n", | |
"| true|\n", | |
"| false|\n", | |
"| null|\n", | |
"+-----------------+\n", | |
"\n", | |
"+------------+-----------+-------------+\n", | |
"|student_name|student_age|student_score|\n", | |
"+------------+-----------+-------------+\n", | |
"| Prashant| 25| 80|\n", | |
"+------------+-----------+-------------+\n", | |
"\n", | |
"+------------+-----------+-------------+\n", | |
"|student_name|student_age|student_score|\n", | |
"+------------+-----------+-------------+\n", | |
"| Prashant| 25| 80|\n", | |
"| Ramakant| 24| 85|\n", | |
"+------------+-----------+-------------+\n", | |
"\n", | |
"+------------+-----------+-------------+\n", | |
"|student_name|student_age|student_score|\n", | |
"+------------+-----------+-------------+\n", | |
"| Prashant| 25| 80|\n", | |
"| Ankit| 26| 90|\n", | |
"| Ramakant| 24| 85|\n", | |
"+------------+-----------+-------------+\n", | |
"\n" | |
] | |
} | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment