Skip to content

Instantly share code, notes, and snippets.

@hschafer
Created May 8, 2018 21:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hschafer/e8ffb1344b55ecbb3f912196b5cbc718 to your computer and use it in GitHub Desktop.
Save hschafer/e8ffb1344b55ecbb3f912196b5cbc718 to your computer and use it in GitHub Desktop.

So it seems like it takes an SFrame with the tf-idf scores

name      |   tf-idf
---------------------
Obama     |  {'a': 0.1, 'b': 0.2}
Bush      |  {'a': 0.3, 'b': 0.4, 'c': 0.5}

After both steps to make triple x should have

index    |      name      |   feature     |    value     |   tf-idf
-----------------------------------------------------------------------------------
0        |      Obama     |       'a'     |     0.1      |   {'a': 0.1, 'b': 0.2}
0        |      Obama     |       'b'     |     0.2      |   ....
1        |      Bush      |       'a'     |     0.3      |   {'a': 0.3, 'b': 0.4, 'c': 0.5}
1        |      Bush      |       'b'     |     0.4      |   ...
1        |      Bush      |       'c'     |     0.5      |   ...

After the one hot encoder steps, feature_encoding

feature       |   category    | index 
------------------------------------------
'feature'     |      'a'      |      0
'feature'     |      'c'      |      1
'feature'     |      'd'      |      2

And after assigning feature_id, x should be

index |      name    |   feature  |    value    |    feature_id   |  tf-idf
-----------------------------------------------------------------------------------
0     |      Obama   |    'a'     |    0.1      |        0        |  {'a': 0.1, 'b': 0.2}
0     |      Obama   |    'b'     |    0.2      |        1        |  ....
1     |      Bush    |    'a'     |    0.3      |        0        |  {'a': 0.3, 'b': 0.4, 'c': 0.5}
1     |      Bush    |    'b'     |    0.4      |        1        |  ...
1     |      Bush    |    'c'     |    0.5      |        2        |  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment