Skip to content

Instantly share code, notes, and snippets.

@kishanpython
Created August 19, 2022 12:25
Show Gist options
  • Save kishanpython/b55ab49d742f6233ec5b2ec016e80c07 to your computer and use it in GitHub Desktop.
Save kishanpython/b55ab49d742f6233ec5b2ec016e80c07 to your computer and use it in GitHub Desktop.
Use of ๐ฌ๐ญ๐š๐ซ๐ญ๐ฌ๐ฐ๐ข๐ญ๐ก() and ๐ž๐ง๐๐ฌ๐ฐ๐ข๐ญ๐ก() in #pyspark โœ๏ธ
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "pyspark_startswith_and_endswith.ipynb",
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Use of ๐ฌ๐ญ๐š๐ซ๐ญ๐ฌ๐ฐ๐ข๐ญ๐ก() and ๐ž๐ง๐๐ฌ๐ฐ๐ข๐ญ๐ก() in #pyspark โœ๏ธ\n",
"\n",
"\n",
"###๐’๐ญ๐š๐ซ๐ญ๐ฌ๐ฐ๐ข๐ญ๐ก():-\n",
"\n",
"========\n",
"\n",
"โ˜‘๏ธ It will return the boolean value (๐—ง๐—ฟ๐˜‚๐—ฒ/๐—™๐—ฎ๐—น๐˜€๐—ฒ).\n",
"\n",
"\n",
"\n",
"โ˜‘๏ธ It return ๐—ง๐—ฟ๐˜‚๐—ฒ when DataFrame column value starts with a string specified as an argument to this method.\n",
"\n",
" \n",
"\n",
"โ˜‘๏ธ If Not matched returns ๐—™๐—ฎ๐—น๐˜€๐—ฒ.\n",
"\n",
"\n",
"\n",
"===========================================================\n",
"\n",
"\n",
"\n",
"###๐„๐ง๐๐ฌ๐ฐ๐ข๐ญ๐ก():-\n",
"\n",
"========\n",
"\n",
"โ˜‘๏ธ It will return the boolean value (๐—ง๐—ฟ๐˜‚๐—ฒ/๐—™๐—ฎ๐—น๐˜€๐—ฒ).\n",
"\n",
"\n",
"\n",
"โ˜‘๏ธ It return ๐—ง๐—ฟ๐˜‚๐—ฒ when DataFrame column value ends with a string specified as an argument to this method. \n",
"\n",
"\n",
"\n",
"โ˜‘๏ธ If Not matched returns ๐—™๐—ฎ๐—น๐˜€๐—ฒ.\n",
"\n",
"\n",
"\n",
"๐๐จ๐ญ๐ž:-\n",
"\n",
"==== \n",
"\n",
"โš ๏ธ Return ๐๐”๐‹๐‹, if either of column values or input strings is ๐๐”๐‹๐‹.\n",
"\n",
"โš ๏ธ Return ๐—ง๐—ฟ๐˜‚๐—ฒ, if input check strings is empty.\n",
"\n",
"โš ๏ธ These methods are case-sensitive. \n",
"\n",
"\n",
"\n",
"\n",
"=================================================\n",
"\n",
"Follow for more:- \n",
"https://www.linkedin.com/in/kishanyadav/\n",
"\n",
"\n",
"############...... H@@py Learning!!! .......##########"
],
"metadata": {
"id": "55i2WzI53ceu"
}
},
{
"cell_type": "code",
"source": [
"# install pyspark in google colab\n",
"!pip install pyspark"
],
"metadata": {
"id": "eq0lZ5biTCFe"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# importing neccessary libs\n",
"from pyspark.sql import SparkSession\n",
"from pyspark.sql.functions import col\n",
"\n",
"# creating session\n",
"spark = SparkSession.builder.appName(\"practice\").getOrCreate()\n",
"\n",
"# # create dataframe\n",
"data = [(\"Prashant\",25, 80), (\"Ankit\",26, 90),(\"Ramakant\", 24, 85), (None, 23, 87)]\n",
"columns= [\"student_name\", \"student_age\", \"student_score\"]\n",
"df_students = spark.createDataFrame(data = data, schema = columns)\n",
"df_students.show()\n",
"\n",
"# Result return will be boolean\n",
"# the output val is null for last row value b/c the corresponding value of students_name column is NULL \n",
"df_internal_res = df_students.select(col(\"student_name\").endswith(\"it\").alias(\"internal_bool_val\"))\n",
"df_internal_res.show()\n",
"\n",
"# Now we use the filter() method to fetch rows correspond to True value.\n",
"# first we see result for startswith method\n",
"df_check_start = df_students.filter(col(\"student_name\").startswith(\"Pra\"))\n",
"df_check_start.show()\n",
"\n",
"# let see endswith results\n",
"df_check_end = df_students.filter(col(\"student_name\").endswith(\"ant\"))\n",
"df_check_end.show()\n",
"\n",
"# if arguments in methods is empty then o/p will be True \n",
"df_check_empty = df_students.filter(col(\"student_name\").endswith(\"\"))\n",
"df_check_empty.show()\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6pSMJtnR2aJF",
"outputId": "20dfc1bd-6323-48b4-cfda-29161fec14ea"
},
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"+------------+-----------+-------------+\n",
"|student_name|student_age|student_score|\n",
"+------------+-----------+-------------+\n",
"| Prashant| 25| 80|\n",
"| Ankit| 26| 90|\n",
"| Ramakant| 24| 85|\n",
"| null| 23| 87|\n",
"+------------+-----------+-------------+\n",
"\n",
"+-----------------+\n",
"|internal_bool_val|\n",
"+-----------------+\n",
"| false|\n",
"| true|\n",
"| false|\n",
"| null|\n",
"+-----------------+\n",
"\n",
"+------------+-----------+-------------+\n",
"|student_name|student_age|student_score|\n",
"+------------+-----------+-------------+\n",
"| Prashant| 25| 80|\n",
"+------------+-----------+-------------+\n",
"\n",
"+------------+-----------+-------------+\n",
"|student_name|student_age|student_score|\n",
"+------------+-----------+-------------+\n",
"| Prashant| 25| 80|\n",
"| Ramakant| 24| 85|\n",
"+------------+-----------+-------------+\n",
"\n",
"+------------+-----------+-------------+\n",
"|student_name|student_age|student_score|\n",
"+------------+-----------+-------------+\n",
"| Prashant| 25| 80|\n",
"| Ankit| 26| 90|\n",
"| Ramakant| 24| 85|\n",
"+------------+-----------+-------------+\n",
"\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment