Skip to content

Instantly share code, notes, and snippets.

@mukul54
Created January 14, 2023 19:29
Show Gist options
  • Save mukul54/a1e41dcbb866d97bc434f0e416f01620 to your computer and use it in GitHub Desktop.
Save mukul54/a1e41dcbb866d97bc434f0e416f01620 to your computer and use it in GitHub Desktop.
pyspark_array_function
ARRAY FUNCTION SYNTAX ARRAY FUNCTION DESCRIPTION
array_contains(column: Column, value: Any) Check if a value presents in an array column. Return below values.true - Returns if value presents in an array.false - When valu eno presents.null - when array is null.
array_distinct(e: Column) Return distinct values from the array after removing duplicates.
array_except(col1: Column, col2: Column) Returns all elements from col1 array but not in col2 array.
array_intersect(col1: Column, col2: Column) Returns all elements that are present in col1 and col2 arrays.
array_join(column: Column, delimiter: String, nullReplacement: String)array_join(column: Column, delimiter: String) Concatenates all elments of array column with using provided delimeter. When Null valeus are present, they replaced with 'nullReplacement' string
array_max(e: Column) Return maximum values in an array
array_min(e: Column) Return minimum values in an array
array_position(column: Column, value: Any) Returns a position/index of first occurrence of the 'value' in the given array. Returns position as long type and the position is not zero based instead starts with 1.Returns zero when value is not found.Returns null when any of the arguments are null.
array_remove(column: Column, element: Any) Returns an array after removing all provided 'value' from the given array.
array_repeat(e: Column, count: Int) Creates an array containing the first argument repeated the number of times given by the second argument.
array_repeat(left: Column, right: Column) Creates an array containing the first argument repeated the number of times given by the second argument.
array_sort(e: Column) Returns the sorted array of the given input array. All null values are placed at the end of the array.
array_union(col1: Column, col2: Column) Returns an array of elements that are present in both arrays (all elements from both arrays) with out duplicates.
arrays_overlap(a1: Column, a2: Column) true - if a1 and a2 have at least one non-null element in commonfalse - if a1 and a2 have completely different elements.null - if both the arrays are non-empty and any of them contains a null
arrays_zip(e: Column*) Returns a merged array of structs in which the N-th struct contains all N-th values of input
concat(exprs: Column*) Concatenates all elements from a given columns
element_at(column: Column, value: Any) Returns an element of an array located at the 'value' input position.
exists(column: Column, f: Column => Column) Checks if the column presents in an array column.
explode(e: Column) Create a row for each element in the array column
explode_outer ( e : Column ) Create a row for each element in the array column. Unlike explode, if the array is null or empty, it returns null.
filter(column: Column, f: Column => Column)filter(column: Column, f: (Column, Column) => Column) Returns an array of elements for which a predicate holds in a given array
flatten(e: Column) Creates a single array from an array of arrays column.
forall(column: Column, f: Column => Column) Returns whether a predicate holds for every element in the array.
posexplode(e: Column) Creates a row for each element in the array and creaes a two columns "pos' to hold the position of the array element and the 'col' to hold the actual array value.
posexplode_outer(e: Column) Creates a row for each element in the array and creaes a two columns "pos' to hold the position of the array element and the 'col' to hold the actual array value. Unlike posexplode, if the array is null or empty, it returns null,null for pos and col columns.
reverse(e: Column) Returns the array of elements in a reverse order.
sequence(start: Column, stop: Column) Generate the sequence of numbers from start to stop number.
sequence ( start : Column , stop : Column , step : Column ) Generate the sequence of numbers from start to stop number by incrementing with given step value.
shuffle(e: Column) Shuffle the given array
size(e: Column) Return the length of an array.
slice(x: Column, start: Int, length: Int) Returns an array of elements from position 'start' and the given length.
sort_array(e: Column) Sorts the array in an ascending order. Null values are placed at the beginning.
sort_array(e: Column, asc: Boolean) Sorts the array in an ascending or descending order based of the boolean parameter. For assending, Null values are placed at the beginning. And for desending they are places at the end.
transform(column: Column, f: Column => Column)transform(column: Column, f: (Column, Column) => Column) Returns an array of elments after applying transformation.
zip_with(left: Column, right: Column, f: (Column, Column) => Column) Merges two input arrays.
aggregate(expr: Column,zero: Column,merge: (Column, Column) => Column,finish: Column => Column) Aggregates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment