Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?

Euclidean Distance vs Cosine Similarity (Time)

import time
import numpy as np

for i in range(10):
	start = time.time() 
	for i in range(10000):
		a, b = np.random.rand(100), np.random.rand(100) 
		np.dot(a, b) / ( np.linalg.norm(a) * np.linalg.norm(b))
	print 'Cosine similarity took', time.time() - start

	start = time.time() 
	for i in range(10000):
		a, b = np.random.rand(100), np.random.rand(100) 
		2 * (1 - np.dot(a, b) / ( np.linalg.norm(a) * np.linalg.norm(b)))
	print 'Euclidean from 2*(1 - cosine_similarity) took', time.time() - start


	start = time.time() 
	for i in range(10000):
		a, b = np.random.rand(100), np.random.rand(100) 
		np.linalg.norm(a-b)
	print 'Euclidean Distance using np.linalg.norm() took', time.time() - start


	start = time.time() 
	for i in range(10000):
		a, b = np.random.rand(100), np.random.rand(100) 
		np.sqrt(np.sum((a-b)**2))
	print 'Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took', time.time() - start
	print '--------------------------------------------------------'

[out]:

Cosine similarity took 0.15826010704
Euclidean from 2*(1 - cosine_similarity) took 0.179041862488
Euclidean Distance using np.linalg.norm() took 0.10684299469
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.113723039627
--------------------------------------------------------
Cosine similarity took 0.161732912064
Euclidean from 2*(1 - cosine_similarity) took 0.178358793259
Euclidean Distance using np.linalg.norm() took 0.107393980026
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.111194849014
--------------------------------------------------------
Cosine similarity took 0.16274189949
Euclidean from 2*(1 - cosine_similarity) took 0.178978919983
Euclidean Distance using np.linalg.norm() took 0.106336116791
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.111373186111
--------------------------------------------------------
Cosine similarity took 0.161939144135
Euclidean from 2*(1 - cosine_similarity) took 0.177414178848
Euclidean Distance using np.linalg.norm() took 0.106301784515
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.11181807518
--------------------------------------------------------
Cosine similarity took 0.162333965302
Euclidean from 2*(1 - cosine_similarity) took 0.177582979202
Euclidean Distance using np.linalg.norm() took 0.105742931366
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.111120939255
--------------------------------------------------------
Cosine similarity took 0.16153883934
Euclidean from 2*(1 - cosine_similarity) took 0.176836967468
Euclidean Distance using np.linalg.norm() took 0.106392860413
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.110891103745
--------------------------------------------------------
Cosine similarity took 0.16018986702
Euclidean from 2*(1 - cosine_similarity) took 0.177738189697
Euclidean Distance using np.linalg.norm() took 0.105060100555
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.110497951508
--------------------------------------------------------
Cosine similarity took 0.159607887268
Euclidean from 2*(1 - cosine_similarity) took 0.178565979004
Euclidean Distance using np.linalg.norm() took 0.106383085251
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.11084485054
--------------------------------------------------------
Cosine similarity took 0.161075115204
Euclidean from 2*(1 - cosine_similarity) took 0.177822828293
Euclidean Distance using np.linalg.norm() took 0.106630086899
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.110257148743
--------------------------------------------------------
Cosine similarity took 0.161051988602
Euclidean from 2*(1 - cosine_similarity) took 0.181928873062
Euclidean Distance using np.linalg.norm() took 0.106360197067
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.111301898956
--------------------------------------------------------

Assuming that the scalar element in the cosine similarity don't matter:

import time
import numpy as np

for i in range(10):
    start = time.time() 
    for i in range(10000):
        np.dot(np.random.rand(100), np.random.rand(100))
    print 'Cosine similarity took', time.time() - start

    start = time.time() 
    for i in range(10000):
        2 * (1 - np.dot(np.random.rand(100), np.random.rand(100)))
    print 'Euclidean from 2*(1 - cosine_similarity) took', time.time() - start


    start = time.time() 
    for i in range(10000):
        np.linalg.norm(np.random.rand(100) - np.random.rand(100))
    print 'Euclidean Distance using np.linalg.norm() took', time.time() - start


    start = time.time() 
    for i in range(10000):
        np.sqrt(np.sum((np.random.rand(100) - np.random.rand(100) )**2))
    print 'Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took', time.time() - start
    print '--------------------------------------------------------'

[out]:

Cosine similarity took 0.0457179546356
Euclidean from 2*(1 - cosine_similarity) took 0.0642158985138
Euclidean Distance using np.linalg.norm() took 0.105226993561
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.112046957016
--------------------------------------------------------
Cosine similarity took 0.0465199947357
Euclidean from 2*(1 - cosine_similarity) took 0.0622699260712
Euclidean Distance using np.linalg.norm() took 0.10528087616
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.114109992981
--------------------------------------------------------
Cosine similarity took 0.0462138652802
Euclidean from 2*(1 - cosine_similarity) took 0.0617589950562
Euclidean Distance using np.linalg.norm() took 0.106434106827
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.112962007523
--------------------------------------------------------
Cosine similarity took 0.0471642017365
Euclidean from 2*(1 - cosine_similarity) took 0.0623321533203
Euclidean Distance using np.linalg.norm() took 0.106025934219
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.112977981567
--------------------------------------------------------
Cosine similarity took 0.046373128891
Euclidean from 2*(1 - cosine_similarity) took 0.0621299743652
Euclidean Distance using np.linalg.norm() took 0.104951858521
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.112581968307
--------------------------------------------------------
Cosine similarity took 0.0461659431458
Euclidean from 2*(1 - cosine_similarity) took 0.0618479251862
Euclidean Distance using np.linalg.norm() took 0.105072975159
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.112339019775
--------------------------------------------------------
Cosine similarity took 0.0464940071106
Euclidean from 2*(1 - cosine_similarity) took 0.0628280639648
Euclidean Distance using np.linalg.norm() took 0.104840993881
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.111845016479
--------------------------------------------------------
Cosine similarity took 0.0464551448822
Euclidean from 2*(1 - cosine_similarity) took 0.0619559288025
Euclidean Distance using np.linalg.norm() took 0.104452848434
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.112362861633
--------------------------------------------------------
Cosine similarity took 0.0458228588104
Euclidean from 2*(1 - cosine_similarity) took 0.063982963562
Euclidean Distance using np.linalg.norm() took 0.105643987656
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.110949039459
--------------------------------------------------------
Cosine similarity took 0.0464689731598
Euclidean from 2*(1 - cosine_similarity) took 0.0611040592194
Euclidean Distance using np.linalg.norm() took 0.105679988861
Euclidean Distance using np.sqrt(np.sum((a-b)**2)) took 0.111272096634
--------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment