Last active
January 24, 2020 10:11
-
-
Save analyticascent/556ddbc3c74043676545b6ea43fa907d to your computer and use it in GitHub Desktop.
text_classification_demo.ipynb
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Today I uploaded what comes very close to being the final version of the notebook.
There may be enough spelling/grammatical errors to warrant another revision, but one thing I'm tempted to do is include links to the official documentation for each of the libraries used so people can learn more about what they do and the parameters that can be changed. I'm also trying to think of a more clear and concise way to describe to readers what document-term matrices are.
At this point I can't really think of any other major ways to improve it without making it too wordy for newcomers or not detailed enough for the same group.
@2112bytes - Those three libraries are more or less the three main libraries used in most machine learning projects (although what specific tools are needed from
sklearn
will depend on what you're trying to do).When it comes to
random_state
values, it's been hard for me to find a value that results in accuracy below 91% or above 93%. For example, setting it to15
results in91.06%
and100
results in92.98%
. Part of why the results appear to be so consistent is that the two users truly do have very different ways of talking about the same thing (such as how often they link to things), along with the fact that the amount of available training samples is quite high.By default,
train_test_split
will use 75% of the sample data to train the model and 25% to test it. That's one of the things made clear in the online documentation that I probably need to add in the next revision.I won't post a new version until I'm sure it'll be the last. Thus far two coders (yourself included) and one non-coder has given me feedback, so I plan on seeking more input (especially from non-coders) until I can't find any more room for improvement. Really appreciate the feedback I've gotten from you and others thus far!