Skip to content

Instantly share code, notes, and snippets.

@afranzi
Created January 31, 2019 15:04
Show Gist options
  • Save afranzi/abec0919124282682120e5a8ebdb9664 to your computer and use it in GitHub Desktop.
Save afranzi/abec0919124282682120e5a8ebdb9664 to your computer and use it in GitHub Desktop.
# Error 1 - to_upper returns a Column instead of a str
self.assertEqual(to_upper('potato'), 'POTATO')
"""
Column<b'(<lambda>(potato) = POTATO)'>
ValueError: Cannot convert column into bool:
please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
"""
# Error 2 - Spark is expecting a column name <str> or <Column>.
to_upper(None)
"""
TypeError: Invalid argument, not a string or column: None of type <class 'NoneType'>.
For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
"""
# Error 3 - [Bonus track] - Same as error 2
def to_upper(s):
if s is not None:
return s.upper()
to_upper_list = udf(lambda s: [to_upper(i) for i in s], StringType())
to_upper_list(['potato', 'carrot', 'tomato'])
"""
TypeError: Invalid argument, not a string or column: ['potato', 'carrot', 'tomato'] of type <class 'list'>.
For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment