Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Jarvis3310/69f98923264228791cc7e40be7a8c0b9 to your computer and use it in GitHub Desktop.
Save Jarvis3310/69f98923264228791cc7e40be7a8c0b9 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Jarvis3310
Copy link
Author

常見的資料前處理如下所示:
1,缺失值的處理

  • 丟棄,如果資料量夠多
  • 補值

2,類別資料的處理(有序、無序) One-hot encoding
3,資料特徵縮放

  • Normalization
  • Standardization(標準化)

經過Standardization之後,資料會符合常態分佈,不會有偏單邊的形況,由於常態分佈機器學習的加權迭代學習(梯度下降)可以更容易完成。另外Standardization還可以使離群值(outlier)對整個model的影響大大減低。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment