To broaden the scope the MOOCs, an automated machine-scoring tool is a necessity. This paper will open up a scope aiding the grader in the process of evaluation of student answers to open-ended questions through the ASAG task, sufficing the need of accessible and feasible education at large scale, using a handful of statistics and deep learning toolkits.
our goal is simple, using NLP techniques we have to pick a better approach out of response-based approach and reference-based approach., juxtaposing the resulted outputs and expected output for the discrimination between correct and incorrect students' answer.
Comparing the performance of response-based and reference-based models, we are more likely to tug out the best semantic, analytic, and predictive tool of these two aforementioned methods, making the mind to contribute for ASAG.
Chapter 1
sort of introduction, motivation, and goal behind the research
Chapter 2
describing the jargons and intricacies of ASAG technique
Chapter 3
elaboration of each component
Chapter 4
applying the pipeline on three different standard datasets, and contracts the state-of-art resullts
Chapter 5
limitation and possible future work
Chapter 6
conclusion
MOOC, little bit of history, a course named connectivism and connected knowledge
as research continued on it, Coursera and Edx came into being.
tldr; assessment for learning
interactive learning experience between teachers and students, based on continued feedback loop, incremental update what the students are learning
Summative assessments is arguably critical because it assesses and evaluates their mastery on subject. It is a challenge for automated tools for open-ended assessments(ASAG) and for essay scoring(AES).
Basically objective questions
concept mapping -> juxtaposing students' answers and expected answers by replacing morphological and syntactic variation of words or phrases or clauses or sentences.
information extraction -> semantically pattern matching extracted from course material, comparing and contrasting students' answer and formulated teachers' answers.
corpus-based methods -> It basically utilize the collection of answers which relied on statical analysis, enhancing paraphrase recognition and calculating distance measures.
machine learning
predictive and clustering models
evalution
Here we need to evalute which method or methods work best. We can post problem as a kaggle problem
How it works on a new set of unseen answers
three approaches
-
reference-based approaches
-
response-based approaches
-
hybrid approaches
comparing a student answer and a model answer at different levels, taking various aspects into accounts, like, content, structure, and style
counting the ratio of matches, sequencing of words
another study, learn the cost of the operations, for instances, stemming match, synonym alignment, number of word shifts, inserting, deletion, paraphrasing, substitution, which essentially does the job of measuring a similarity between two sentences.
checks stylistic aspects of text
Deep Belief Network coalesced three ideas: capturing similarity between student answers and model answers, representing difficulty level, the probability of students' mastery based on past performance
few models to get it done:
Naive Bayes
Logistic Regression
Decision Tree
Artificial Neural Network
Support Vector Machine
DBM
It generates vector space classification from all students answers to identify syntactic, semantic and lexical characteristics
Random forest model, gradient boost machine models to generate an average score
four-grams, six-grams models, which essentially used as RF, GBM, Ridge Regression(RR), Support Vector Regression (SVR) and K-nearest neighbors(KNN)
basically the amalgamation of reference-based and response-based approaches, but in theory it can be stacked by other algorithms, for example, Canonical Correlation Analysis(CCA)
training set
validation set
test set
three classification algorithm
random forest classifier
support vector machine
stacked logistic regression
AI covering linguistic aspects such as morphology, syntax, semantics and discourse, on the purpose of applying it on various aspects of field, for example, QA, text recognition, text summarization, and text generation, spelling correction, tone of a text
It gives us an ability to read machine-readable contents from the internet, which basically glean huge amount of data from the wikipedia, or anchored links
N-grams -> n word sequence length
Word Embeddings -> curse of dimensionality
Glove -> a count based unsupervised learning method model learns through the ratio of occurrence probabilities
Word2Vec -> a predictive-based unsupervised learning method which gives us ability on both predicting words using (CBOW) model, and predicting context using skip-gram model or architecture
DBpedia -> generates vector space representation of DBpedia entities
FastText -> mitigate a problem which skip-gram leads to.
projecting sentence representations into continuous spaces, using a RNN to extract information of a sentence, feeding into a multi-layer perceptron, importantly when the learning rate passes the threshold of 10-5 training stops.
Jargonic explanation of the whole thesis paper
Few diagram of how algorithm works
It has limitations because of language nuances, and we need to curate huge amount of large datasets for test set, validation set, and train set. For some open-ended questions there are some subjective matter we have to take into accounts, and our deep learning algorithms need to be adaptive for various circumstances. Therefore, we are subject to trails distinctive algorithm for distinctive cases.