Skip to content

Instantly share code, notes, and snippets.

@jrosen48
Created June 10, 2020 16:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jrosen48/6365694d4ba202ead4b57227015f0be4 to your computer and use it in GitHub Desktop.
Save jrosen48/6365694d4ba202ead4b57227015f0be4 to your computer and use it in GitHub Desktop.

The Generality Widget

The goal of this document

The goal of this document is to describe how and why we created The Generality Widget and how we made it, focusing on:

  • What data we used
  • How we coded the data
  • How we prepared the data
  • How we analyzed the data
  • How we used the results of the analysis in the widget
  • The intended use and specific functionality of the widget

What data we used

How we coded the data

  • We used a combined ML (first) and human-driven approach (second) to inductively develop a coding frame for students' consideration of the generality of their explanations and/or models.
  • We also sought to understand how reliably we could apply the coding frame with supervised ML

How we prepared the data

How we analyzed the data

  • used the R package quanteda (https://quanteda.io/) and quanteda.textmodels (https://cran.r-project.org/web/packages/quanteda.textmodels/quanteda.textmodels.pdf) for a naive bayes and support vector machine classifer
  • also used quanteda.classifers (https://github.com/quanteda/quanteda.classifiers) for a deep learning/neural net classifer
  • used leave one out cross validation and the weighted kappa and percent accuracy statistics to select the best classifier
  • (could focus on other metrics)
  • (could inspect the confusion matrix to see for which codes the classifier performed best)
  • (we did not focus a great deal on optimizing each specific classifier/tuning the classifier)
  • (need to understand how each specific algorithm is working better and why)
  • selected the support vector machine, which seemed to perform both best (relative to the other classifiers) and adequately (kappa = .65, percent agreement = .70); neural network was very similar

How we used the results of the analysis in the widget

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment