Last active
November 4, 2015 13:12
-
-
Save arosh/3f0b2be594c45cd6ddc2 to your computer and use it in GitHub Desktop.
Document-Term Matrix
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
LabelBinarizer : ラベル → OneHot表現 | |
LabelEncoder : ラベル → 0, 1, ..., N-1 | |
OneHotEncoder : 0, 1, ..., N-1 → OneHot表現 | |
https://github.com/recruit-tech/summpy/blob/master/summpy/lexrank.py#L34 | |
CountVectorizer : テキスト → document term matrix | |
TfidfTransformer : document term matrix → IDF重み,l2正規化 | |
TfidfVectorizer : テキスト → IDF重み,l2正規化 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment