Last active
August 29, 2015 14:01
-
-
Save Qyoom/dc92137f659a57a59306 to your computer and use it in GitHub Desktop.
Scala worksheet for experimenting with Spark Vector class
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.util.Vector | |
import org.apache.spark._ | |
object Vector_lab_1 { | |
println("Vector_lab_1") | |
val vec1 = Vector(Array(1.5, 2.5, 3.5)) | |
val vec2 = Vector(Array(1.0, 2.0, 3.0)) | |
val vec3 = Vector(Array(5.1, 6.1, 7.1, 8.1)) | |
val vec4 = Vector(Array(1.0, 1.0, 1.0)) | |
val vec5 = Vector(6, 3.2, 9.8) | |
val vec6 = Vector(3, (x:Int)=>1*2.6) | |
vec1(0) | |
vec1(0) + vec2(1) | |
vec1 + vec2 | |
vec1 add vec2 | |
// vec1 + vec3 // illegal arg ex: not same length | |
// (vec1 + vec2).reduce(_ + _) // reduce not a member of Vector | |
vec1 - vec2 | |
vec1 dot vec2 | |
1.5 + 5 + 10.5 | |
// vec1 * vec2 // Error | |
// vec1 dot vec3 // IllegalArgumentException: Vectors of different length | |
vec1 plusDot(vec2, vec4) | |
vec1 plusDot(vec4, vec2) | |
vec1 | |
vec1 += vec2 | |
vec1 | |
vec1 addInPlace(vec2) | |
3 * vec2 | |
vec2 * 3 | |
vec3.sum | |
vec3 / 3.9 | |
vec2.unary_- | |
vec1 squaredDist vec2 | |
vec1 dist vec2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
References:
http://en.wikipedia.org/wiki/Dot_product
"[Dot product] is the sum of the products of the corresponding entries of the two sequences of numbers. Geometrically, it is the product of the magnitudes of the two vectors and the cosine of the angle between them."
Manning, Christopher D, Foundations of Statistical Natural Language Processing, 1999, p.539
"The vector space model is one of the most widely used models for ad-hoc retrieval, mainly because of its conceptual simplicity and the appeal of the underlying metaphor of using spatial proximity for semantic proximity. Documents and queries are represented in a high-dimensional space, in which each dimension of the space corresponds to a word in the document collection. The most relevant documents for a query are expected to be those represented by the vectors closest to the query, that is, documents that use similar words to the query. Rather than considering the magnitude of the vectors, closeness is often calculated by just looking at angles and choosing documents that enclose the smallest angle with the query vector."