The goal of this document is to describe how and why we created The Generality Widget and how we made it, focusing on:
- What data we used
- How we coded the data
- How we prepared the data
- How we analyzed the data
- How we used the results of the analysis in the widget
- The intended use and specific functionality of the widget
- Data from 845 students' responses over five time points (with a good deal of missing data): https://drive.google.com/file/d/1PLRN2dCBXQaf5BPvIEEyuUspjrfoHwy-/view?usp=sharing
- From "embedded" assessments targeting students' explanations and/or models of a phenomena
- Students wrote between 1-3 sentences, and indicated whether their explanation and/or model is general or specific
- We used a combined ML (first) and human-driven approach (second) to inductively develop a coding frame for students' consideration of the generality of their explanations and/or models.
- We also sought to understand how reliably we could apply the coding frame with supervised ML
- Manual identification and removal of blank responses
- NLP techniques: stemming, removing stop words; see some details here: https://gist.github.com/jrosen48/6b5051640975d53d2f5d3b88f8c6a3fe
- winnowing our codes to six (by ignoring sub-codes)
- used the R package quanteda (https://quanteda.io/) and quanteda.textmodels (https://cran.r-project.org/web/packages/quanteda.textmodels/quanteda.textmodels.pdf) for a naive bayes and support vector machine classifer
- also used quanteda.classifers (https://github.com/quanteda/quanteda.classifiers) for a deep learning/neural net classifer
- used leave one out cross validation and the weighted kappa and percent accuracy statistics to select the best classifier
- (could focus on other metrics)
- (could inspect the confusion matrix to see for which codes the classifier performed best)
- (we did not focus a great deal on optimizing each specific classifier/tuning the classifier)
- (need to understand how each specific algorithm is working better and why)
- selected the support vector machine, which seemed to perform both best (relative to the other classifiers) and adequately (kappa = .65, percent agreement = .70); neural network was very similar
- used the fitted model can be used to make predictions on new data; this is what the Shiny app does: https://jmichaelrosenberg.shinyapps.io/generality-shiny/
- we are logging responses/feedback through the app (this could be improved; right now it's fairly manual)