Skip to content

Instantly share code, notes, and snippets.

View glebmikha's full-sized avatar

Gleb Mikhaylov glebmikha

View GitHub Profile
@glebmikha
glebmikha / eda_prompt.md
Last active May 28, 2024 12:51
EDA Prompt for ChatGPT (and Humans)
  1. Calculate the percentage of missing values in each column and sort them in descending order.
    1. Missing values and outliers are not problems to be fixed! They are facts.
    2. During EDA you must not “fix” them because you have to deal with your data and problem as it is.
    3. If you see missing values, just report them.
  2. Identify and understand your target variable.
    1. Understand the type of the target variable: binary, categorical, or numeric.
    2. Examine the distribution of the target variable.
      1. For a binary variable (which needs to be converted into 0s and 1s if it is in string format), the mean (a proportion of 1s) is simply used.
      2. For a categorical variable, value counts are used.
  3. For a numeric variable, a histogram or a pandas' describe table is used.