- save_as_text : don't use this unless you just want to read the text in the file. Otherwise it will cause issues if you want to go back later and revise/filter the dictionary
- If you choose to import a dictionary then alter it, the corpus must also be updated as outlined here - Q8
- You have to limit the number of features in large datasets otherwise the memory consumption is huge
- This is regardless of weather the corpus is loaded in RAM or serialized
- Iterations argument - refers to the number of iterations in the EM step
Last active
August 8, 2016 18:08
-
-
Save ctufts/8fabcf356c86d52691b1d83c58d17c0f to your computer and use it in GitHub Desktop.
General notes from using gensim on 20 million messages
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment