Github repo for the Course: Stanford Machine Learning (Coursera)
Quiz Needs to be viewed here at the repo (because the image solutions cant be viewed as part of a gist)
True or False | Statement | Explanation |
---|---|---|
False | Given historical weather records, predict if tomorrow's weather will be sunny or rainy | K-means cannot make classification predictions as it does not label its inputs. |
True | Given a set of news articles from many different websites, find out what topics are the main topics covered | You can use K-means to cluster, and each cluster will correspond to a different market segment. |
True | From the user usage patterns on a website, figure out what different groups of users exists. | You can use K-means to cluster users with each cluster corresponding to a different market segment. |
False | Given many emails, you want to determine if they are Spam or Non-Spam emails. | K-means cannot make classification predictions as it does not label its inputs |
True | Given a database of information about your users, automatically group them into different market segments. | You can use K-means to cluster the database entries, and each cluster will correspond to a different market segment. |
True | Given sales data from a large number of products in a supermarket, figure out which products tend to form coherent groups (say are frequently purchased together) and thus should be put on the same shelf. | Market Segmentation. |
False | Given sales data from a large number of products in a supermarket, estimate future sales for each of these products. | Such a prediction is a regression problem, and K-means does not use labels on the data, so it cannot perform regression. |
Answer | Explanation |
---|---|
c(i) = 1 | x(i) is closest to μ1, so c(i) = 1 |
True or False | Statement | Explanation |
---|---|---|
False | Randomly initialize the cluster centroids | Done earlier |
False | Test on the cross-validation set | Any sort of testing is outside the scope of K-means algorithm itself |
True | Move the cluster centroids, where the centroids, μk are updated | The cluster update is the second step of the K-means loop |
True | The cluster assignment step, where the parameters c(i) are updated | This is the correst first step of the Kmeans loop |
Answer | Explanation |
---|---|
![]() |
This is the distortion cost function which we seek to minimize |
True or False | Statement | Explanation |
---|---|---|
False | Once an example has been assigned to a particular centroid, it will never be reassigned to another centroid | Not sure yet |
True | A good way to initialize K-means is to select K (distinct) examples from the training set and set the cluster centroids equal to these selected examples. | This is the recommended method of initialization. |
True | On every iteration of K-means, the cost funtion J(c(1), ..., c(m), μ1, ..., μk (the distortion function) should either stay the same or decrease; in particular, it should not increase | True |
False | K-Means will always give the same results regardless of the initialization of the centroids. | K-means is sensitive to different initializations, which is why you should run it multiple times from different random initializations |
True | For some datasets, the "right" or "correct" value of K (the number of clusters) can be ambiguous, and hard even for a human expert looking carefully at the data to decide. | Look at an elbow curve for an example. It can often be ambiguous. |
True | If we are worried about K-means getting stuck in bad local optima, one way to ameliorate (reduce) this problem is if we try using multiple random initializations. | None needed |