So it seems like it takes an SFrame
with the tf-idf scores
name | tf-idf
---------------------
Obama | {'a': 0.1, 'b': 0.2}
Bush | {'a': 0.3, 'b': 0.4, 'c': 0.5}
After both steps to make triple x
should have
index | name | feature | value | tf-idf
-----------------------------------------------------------------------------------
0 | Obama | 'a' | 0.1 | {'a': 0.1, 'b': 0.2}
0 | Obama | 'b' | 0.2 | ....
1 | Bush | 'a' | 0.3 | {'a': 0.3, 'b': 0.4, 'c': 0.5}
1 | Bush | 'b' | 0.4 | ...
1 | Bush | 'c' | 0.5 | ...
After the one hot encoder steps, feature_encoding
feature | category | index
------------------------------------------
'feature' | 'a' | 0
'feature' | 'c' | 1
'feature' | 'd' | 2
And after assigning feature_id
, x
should be
index | name | feature | value | feature_id | tf-idf
-----------------------------------------------------------------------------------
0 | Obama | 'a' | 0.1 | 0 | {'a': 0.1, 'b': 0.2}
0 | Obama | 'b' | 0.2 | 1 | ....
1 | Bush | 'a' | 0.3 | 0 | {'a': 0.3, 'b': 0.4, 'c': 0.5}
1 | Bush | 'b' | 0.4 | 1 | ...
1 | Bush | 'c' | 0.5 | 2 | ...