Skip to content

Instantly share code, notes, and snippets.

@mylamour
Last active September 25, 2017 05:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mylamour/8cd6aba0f2a152b7dc7c56b48c020339 to your computer and use it in GitHub Desktop.
Save mylamour/8cd6aba0f2a152b7dc7c56b48c020339 to your computer and use it in GitHub Desktop.
jupyter test on gist, Also Machine learning training for Colleagues
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mylamour
Copy link
Author

Q:

  • 往常训练时我们为了简便,采用数据集中的一部分进行训练,以加快速度,在此处是否可以?

不可以,因为此处是已经经过处理的mnist数据集,有序排列的。选取其中一部分,不能覆盖所有数字

  • 如果不可以,怎么样才可以?

和标签配对后,混洗即可。

  • 有没有其他方法可以加快速度?

处理成为灰度图,缩放图片大小,降维算法

Q:

  • K 可以为其他数值吗?

当然可以,K只是最近邻居数目,并不是分类的类别数

  • 不同的K速度会更快吗?有什么其他的影响吗?怎么抉择?

K值过小的话,对邻近样本点过于敏感,如果邻近是噪声点,那么模型就容易出错。K值变小,意味着模型复杂,容易出现过拟合。K值变大的话意味着模型变得简单。可采用交叉验证进行处理

Q:

  • 灰度图是不是快点?
  • 有时候为什么要对高维数据进行可视化?

为了查看数据形势,以便于降维调参

  • 特征怎么提取?

可以说特征工程决定了整个结果。先看是单个特征还是多个特征,连续的,还是离散的。自变量和因变量之间的关联等等……

  • 特征怎么可视化?

@mylamour
Copy link
Author

距离度量

  • 闵可夫斯基
  • 曼哈顿距离
  • 欧式距离

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment