Skip to content

Instantly share code, notes, and snippets.

@yvan
Last active February 22, 2023 23:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yvan/c6e9d14f89abdc07588cf0fa7347c8f3 to your computer and use it in GitHub Desktop.
Save yvan/c6e9d14f89abdc07588cf0fa7347c8f3 to your computer and use it in GitHub Desktop.
# pandas
row_ct = df.shape[0]
num_ct = pd.to_numeric(df[col3], errors='coerce').count() # coerce makes nan, count drops nan
# another check using regex
num_regex = r"^((-)?[0-9]+)(,[0-9]+)*(\.[0-9]+)?$|(^$)"
all_are_nums = all(df[col3].fillna('').astype(str).apply(lambda x: re.match(num_regex, x)))
if (num_ct == row_ct) or all_are_nums:
df[col3] = pd.to_numeric(df[col3], errors='coerce')
# pyspark
row_ct = df.count()
rowswith_col3float = df.filter(F.col(col3).cast("float").isNotNull()).count()
if row_ct == rowswith_col3float:
df = df.withColumn(col3, F.col(col3).cast("float"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment