-
google news articles
- no timing feature in the dataset: link
- clustering stories
- The source ranking involves many things. Is there original content? The timeliness. Coverage of recent developments? The relevancy to the cluster at hand. In some cases, is there local relevancy? Is there content from a local source with local content? link
-
is a topic modeling problem link
-
modeling link
-
news algorithm architecture link
-
linkslink
-
linkslink
-
on techmemelink
-
incremental clusteringlink
-
factorlink
-
link..link
-
duplicate newslink
-
headline clusteringlink
TF IDF link
- “the”, “will”, and “you” — called stopwords
- TF-IDF stands for “Term Frequency — Inverse Data Frequency”
- ![score](https://cdn-images-1.medium.com/max/1600/1*nSqHXwOIJ2fa_EFLTh5KYw.png| width=30)
- a paper involved incremental tfidf [link] (https://arxiv.org/pdf/1810.00664.pdf)
- incremental tfidf [link] (https://stats.stackexchange.com/questions/18819/incremental-idf-inverse-document-frequency)
- mitigate non convex dataset by combining hierachical clustering and k means link
- tf-idf clustering multiple approaches link
- cosine dis vs euclidean dis link
- retrieve cluster from hierarchical clustering model link
- determining number of cluster: CH index link
- determine number of cluster tutorial link
intro link
gensim python lib link
- topic modeling + deep learning?
- hierarchical clustering unsupervised?
- missing time feature from dataset