Skip to content

Instantly share code, notes, and snippets.

View JSMboli's full-sized avatar

Julius Sechang Mboli JSMboli

  • University of Bradford
  • https://scholar.google.com/citations?user=RyLQoU8AAAAJ&hl=en
  • X @MboliJ
View GitHub Profile
@devforfu
devforfu / gendata.py
Created March 6, 2019 13:46
A simple random dataset generating script
def generate_dataset(n_rows, num_count, cat_count, max_nan=0.1, max_cat_size=100):
"""Randomly generate datasets with numerical and categorical features.
The numerical features are taken from the normal distribution X ~ N(0, 1).
The categorical features are generated as random uuid4 strings with
cardinality C where 2 <= C <= max_cat_size.
Also, a max_nan proportion of both numerical and categorical features is replaces
with NaN values.
"""