Hierarchical clustering php vs python
Cluster 500 2-dimensional euclidean points using hierarchical clustering with group average linkage and cosine similarity as distance metric. The python implementation is from the nltk library and the php one is from NlpTools.
To run the test you need to install those libraries, you will find instructions in the corresponding sites.
PHP: 7.9s (best of 4)
Python: 198.8s (best of 4)
The php implementation is ~25 times faster with this dataset of 500 points. In addition, the php implementation is pure php with no C extensions for matrix manipulation (numpy).
In fact nltk is asymptotically slower since the algorithm is uses for hierarchical clustering is O(n3) and the php implementation is O(n2logn).
PHP: 60M (As reported by top under RES)
Python: 34M (As reported by top under RES)
To achieve the above time complexity there has been a sacrifice in memory consumption, you can read more about the implementation in an article describing it.