Some context for anyone that happens to stumble across this:
I've put this together as a way to give a few people I know (most of whom have never coded before) an idea of how text classification works. Reading through the markdown documentation and the code cells in the above post should take most people 30 minutes or so to do, but I'm not yet sure how long it would take a non-coder to understand why the code does what it does.
Any questions from people that happen to stumble upon this page are more than welcome. I plan on including this as part of a Github repo meant to serve as a Python crash course: https://github.com/analyticascent/python-inflection
The end goal of that repo is for someone with little or no coding experience to be able to pick up enough Python to be dangerous in ten days or less.
I ran through this and it was very instructional. I came away with a few things:
This was super helpful; to have a workable, explained code is meaningful. The concepts are explained well. So on understanding that code...
*The "brains" of it appear to be in the vectorizing and algorithm, and the choices made in those would affect the outcome in more complex scenarios.
Today I uploaded what comes very close to being the final version of the notebook.
There may be enough spelling/grammatical errors to warrant another revision, but one thing I'm tempted to do is include links to the official documentation for each of the libraries used so people can learn more about what they do and the parameters that can be changed. I'm also trying to think of a more clear and concise way to describe to readers what document-term matrices are.
At this point I can't really think of any other major ways to improve it without making it too wordy for newcomers or not detailed enough for the same group.
@2112bytes - Those three libraries are more or less the three main libraries used in most machine learning projects (although what specific tools are needed from
When it comes to
I won't post a new version until I'm sure it'll be the last. Thus far two coders (yourself included) and one non-coder has given me feedback, so I plan on seeking more input (especially from non-coders) until I can't find any more room for improvement. Really appreciate the feedback I've gotten from you and others thus far!