Skip to content

Instantly share code, notes, and snippets.

@tinychaos42
tinychaos42 / OptimizelyCLA
Created June 26, 2023 11:14
Optimizely CLA
OPTIMIZELY CONTRIBUTION LICENSE AGREEMENT
This contribution license agreement (“Agreement”) is an agreement between you
and Optimizely North America Inc., and grants certain rights to Optimizely
North America Inc. and its affiliates (collectively, “Optimizely”) with respect
to your open-source contributions to Optimizely’s Repositories. This Agreement
is effective on the date of your acceptance and is confirmed by you Submitting
Contributions.
1. Definitions – (and words denoting the singular includes the plural and vice
13 2016-04-18 17:16:55
222 2016-04-19 09:51:55
287 2016-04-19 09:54:30
801 2016-04-21 16:52:54
803 2016-04-21 16:52:54
812 2016-04-21 16:53:02
816 2016-04-21 16:54:08
1114 2016-04-21 17:39:34
1415 2016-04-21 19:49:07
1471 2016-04-24 19:01:09
NOI ?
<nonident/>
Web site does not collect identified data.
ADM not needed
<admin/>
Web Site and System Administration: Information may be used for the technical support of the Web site and its computer system. This would include processing computer account information, information used in the course of securing and maintaining the site, and verification of Web site activity by the site or its agents.
DEV ?
<develop/>
@tinychaos42
tinychaos42 / clusters
Created January 17, 2012 22:05
The clusters as results
Creating word bags...
Calculating index numbers...
Checking top keywords in each document...
Checking correlations...
Creating clusters based on the correlations...
Swapping document id-s with titles for readability...
Cluster 1 contents:
The Queen toasts Barack Obama and special relationship with the US
Obama gives message of support to the Queen at lavish state banquet
@tinychaos42
tinychaos42 / algorithm.textile
Created January 17, 2012 21:58
Explanation of the clustering algorithm

Task 3: Article Clustering

The algorithm I used is basically the if-idf algorithm, which can be found here . The idea behind the algorithm is that for each term in each document, it calculates two frequencies. One is the term frequency, which is just literally the number of occurrences of the term in that specific document, ‘normalized’ by the length of the document. The second is the inverse document frequency, which is the relative frequency of the term in the whole document store, namely the logarithm of the size of the whole document store divided by the number of occurrences. After a certain amount of research I concluded that this algorithm is fairly ideal for the task’s purposes, can be programmed in a nice and readable way and not overly complex.

During the research I found two other options which I concluded either slightly irrelevant or too complex for the task. The first one was a Bag-of-words solu

@tinychaos42
tinychaos42 / cluster.php
Created January 17, 2012 21:24
The clustering algorithm
<?php
// no argument, process demo json
if(!isset($argv[1]))
{
$file = file_get_contents('data.json');
}
else
{
$file = file_get_contents($argv[1]);
}