Created
July 16, 2018 13:28
-
-
Save btbytes/e2f9d7faa802083f60eb6cf96501cf52 to your computer and use it in GitHub Desktop.
APL for ML: K-means clustering (from http://dpaste.com/3Z4K4B0)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
⍝ This is a comment in APL. Comments are denoted by the "⍝" symbol | |
⍝ which is called "lamp" or "thumb". A comment begins with the lamp | |
⍝ symbol and continues to the end of the line. | |
⍝ | |
⍝ Here is a coordinate in 2-D space. It is a vector of two elements. | |
p←0.712 0.848 | |
⍝ The symbol "←" denotes assignment and is called "copula". We have | |
⍝ given the name "p" to our first coordinate. Here is another 2-D | |
⍝ coordinate we will name "q". | |
q←0.962 0.362 | |
⍝ | |
⍝ Most APL functions are called "verbs". A verb that takes a single | |
⍝ argument is called a "monadic" verb, while a verb that takes two | |
⍝ arguments is called a "dyadic" verb. Many symbols have both a | |
⍝ monadic and dyadic meaning. | |
⍝ | |
⍝ Monadic verbs appear to the left of their argument. The symbol "≢" | |
⍝ is called "tally" when used as a monadic verb; it returns the number | |
⍝ of elements in a vector. | |
≢p | |
⍝ The result of "≢p" is "2" since "p" is a vector with two elements. | |
⍝ | |
⍝ Dyadic verbs appear between their two arguments. Subtraction is a | |
⍝ dyadic verb denoted by the symbol "-" called "minus". | |
p-q | |
⍝ The result of "p-q" is "¯0.25 0.486" which is the difference of each | |
⍝ element of "q" from the corresponding element of "p". Negative | |
⍝ numbers are indicated by a prefix symbol "¯" called "high minus". | |
⍝ | |
⍝ Most APL operations work element-wise with vector arguments. | |
⍝ | |
⍝ The dyadic verb called "exponential" or "power" is denoted by the | |
⍝ "⋆" symbol. It raises the argument on the left to the power of the | |
⍝ argument on the right. The expression "7⋆2" can be read as "seven | |
⍝ squared"; the expression "9⋆.5" can be read as "square root of | |
⍝ nine". | |
(p-q)⋆2 | |
⍝ The result of "(p-q)⋆2" is "0.0625 0.236196" which is the square of | |
⍝ each element in "p-q". When one of the arguments of a dyadic verb is | |
⍝ a vector and the other is a scalar, the function will extend the | |
⍝ scalar argument by repeating it to match the length of the vector | |
⍝ argument. | |
⍝ | |
⍝ Parentheses are used for grouping parts of an expression; the part | |
⍝ inside the parentheses is evaluated before the parts outside. APL | |
⍝ expressions without parentheses have no special operator precedence | |
⍝ rules; they always evaluate from right-to-left. | |
2⋆7-4 | |
⍝ The result of "2⋆7-4" is "8" because "7-4" is evaluated first, the | |
⍝ result of which is "3"; the result of "2⋆3" is "8". | |
⍝ | |
⍝ Addition is a dyadic verb denoted by the symbol "+" which is called | |
⍝ "plus". | |
1+1+2+3+5 | |
⍝ The result of "1+1+2+3+5" is "12". | |
⍝ | |
⍝ Some special APL functions are called "operators". An operator takes | |
⍝ an additional argument, often another function, in addition to its | |
⍝ normal arguments. The operator denoted by "/" is called "reduce". | |
+/1 1 2 3 5 | |
⍝ The reduce operator takes a dyadic verb argument on the left, which | |
⍝ it evaluates between each element of the vector argument on the | |
⍝ right. The result of "+/1 1 2 3 5" is also "12". The symbols "+/" | |
⍝ together can be read as "sum". | |
⍝ | |
⍝ The "euclidean distance" between two coordinates is the length of | |
⍝ the straight line that connects them. It is defined as the square | |
⍝ root of the sum of the squares of the element-wise differences of | |
⍝ the elements of the coordinates. | |
⍝ <https://en.wikipedia.org/wiki/Euclidean_distance> | |
(+/(p-q)⋆2)⋆.5 | |
⍝ It is complicated to describe this formula in plain language but | |
⍝ simple in APL. The result of "(+/(p-q)⋆2)⋆.5" is "0.5465308774". | |
⍝ | |
⍝ Braces are used for defining custom verbs called "direct functions". | |
p {(+/(⍺-⍵)⋆2)⋆.5} q | |
⍝ The symbols "⍺" and "⍵" are called "alpha" and "omega" which | |
⍝ respectively denote the left and right arguments of a dyadic direct | |
⍝ function. We can assign a name to a direct function using a copula. | |
Distance←{(+/(⍺-⍵)⋆2)⋆.5} | |
⍝ We can now use the name "Distance" to measure the euclidean distance | |
⍝ between any two coordinates. | |
p Distance q | |
⍝ The result of "p Distance q" is also "0.5465308774". | |
⍝ | |
⍝ Here is a vector of coordinates in 2-D space, including "p" and "q" | |
⍝ and seven more points, which we will call "coords". | |
coords←p q (0.301 0.234) (0.347 0.168) (0.57 0.003) (0.16 0.009) (0.252 0.539) (0.595 0.487) (0.439 0.412) | |
⍝ | |
⍝ The symbol "." denotes an operator called "dot". The symbol "∘" | |
⍝ denotes an operator called "jot" or "compose". When used together | |
⍝ the symbols "∘." are called the "outer product" operator. This | |
⍝ operator applies its function argument to each combination of the | |
⍝ elements of its left and right arguments. | |
2 3 ∘.+ 1 1 2 3 5 | |
⍝ The result of "2 3 ∘.+ 1 1 2 3 5" is a matrix that looks like this: | |
⍝ | |
⍝ 3 3 4 5 7 | |
⍝ 4 4 5 6 8 | |
⍝ | |
⍝ The shape of the result depends on the number of elements in each | |
⍝ argument. In this case, the left argument has two elements and the | |
⍝ right argument has five elements. The result is a two-by-five | |
⍝ matrix. | |
⍝ | |
⍝ The symbol "⍴" is called "rho". As a monadic verb it is called | |
⍝ "shape" because it returns the length of each dimension of its | |
⍝ argument. | |
⍴2 3 ∘.+ 1 1 2 3 5 | |
⍝ The result of "⍴2 3 ∘.+ 1 1 2 3 5" is "2 5". | |
⍝ | |
⍝ We can use the outer product operator with our "Distance" function | |
⍝ to compute the distance from each coordinate to each of "p" and "q". | |
⍝ We will call the result "deltas". | |
deltas←coords ∘.Distance p q | |
⍝ The result is a nine-by-two matrix that looks like this: | |
⍝ | |
⍝ 0 0.5465308774 | |
⍝ 0.5465308774 0 | |
⍝ 0.7388619628 0.6732792883 | |
⍝ 0.771767452 0.6448728557 | |
⍝ 0.8568482946 0.5315496214 | |
⍝ 1.004303241 0.8762493937 | |
⍝ 0.554148897 0.7317301415 | |
⍝ 0.3794864951 0.387703495 | |
⍝ 0.5144171459 0.525384621 | |
⍝ | |
⍝ The symbol "⌊" is called "downstile". As a dyadic verb it is also | |
⍝ called "minimum" because it returns the smaller of its two | |
⍝ arguments. | |
5⌊3 | |
⍝ The result of "5⌊3" is "3". | |
⍝ | |
⍝ When its argument is a matrix, the reduce operator works on each row | |
⍝ independently. | |
⌊/deltas | |
⍝ The result of "⌊/deltas" is a vector of nine elements containing the | |
⍝ smallest value from each row of "deltas". It looks like this: | |
⍝ | |
⍝ 0 0 0.6732792883 0.6448728557 0.5315496214 0.8762493937 0.554148897 0.3794864951 0.5144171459 | |
⍝ | |
⍝ The symbol "=" is called "equal to". As a dyadic verb it compares | |
⍝ its arguments and returns "1" when they are equal and "0" when they | |
⍝ are not. | |
3=1 1 2 3 5 | |
⍝ The result of "3=1 1 2 3 5" is "0 0 0 1 0". | |
⍝ | |
⍝ We can use "equal to" to find whether "p" or "q" is closer to each | |
⍝ coordinate by comparing the elements of "deltas" to the minima of | |
⍝ their respective rows. We call the result "clusters". | |
clusters←deltas=[1]⌊/deltas | |
⍝ When the arguments of a verb have different shapes, we can specify | |
⍝ the dimension along which the verb will be applied in brackets, | |
⍝ which are called the "axis" operator. In this case we want to | |
⍝ compare along the first dimension. | |
⍝ | |
⍝ The result is a nine-by-two matrix that looks like this: | |
⍝ | |
⍝ 1 0 | |
⍝ 0 1 | |
⍝ 0 1 | |
⍝ 0 1 | |
⍝ 0 1 | |
⍝ 0 1 | |
⍝ 1 0 | |
⍝ 1 0 | |
⍝ 1 0 | |
⍝ | |
⍝ The number of coordinates in each cluster is the sum of each column | |
⍝ of the "clusters" matrix. | |
counts←+/[1]clusters | |
⍝ The result of "+/[1]clusters" is "4 5". | |
⍝ | |
⍝ Multiplication is a dyadic verb denoted by the symbol "×" called | |
⍝ "times". Division is a dyadic verb denoted by the symbol "÷" called | |
⍝ "divide". | |
¯1 0 1×3 | |
⍝ The result of "¯1 0 1×3" is "¯3 0 3". | |
1 1 2 3 5÷5 | |
⍝ The result of "1 1 2 3 5÷5" is "0.2 0.2 0.4 0.6 1". | |
⍝ | |
⍝ We can use multiplication to select the coordinates of each cluster. | |
⍝ To find the average (mean) coordinate in each cluster, we divide its | |
⍝ sum by its count. | |
means←(+/[1]clusters×[1]coords)÷counts | |
⍝ The result "means" is a vector of two coordinates "(0.4995 0.5715) (0.468 0.1552)". | |
⍝ | |
⍝ This completes been a single iteration of the algorithm for "k-means | |
⍝ clustering" of our coordinates where k is 2. | |
⍝ <https://en.wikipedia.org/wiki/K-means_clustering> | |
⍝ | |
⍝ We would now want to repeat this iteration with the new means in | |
⍝ place of "p" and "q", until the results converge. First, let us | |
⍝ define a direct function for this step. | |
Step←{(+/[1]clusters×[1]⍺)÷+/[1]clusters←deltas=[1]⌊/deltas←⍺ ∘.Distance ⍵} | |
⍝ We can now use our function called "Step" to compute the same answer | |
⍝ as before. | |
coords Step p q | |
⍝ The result of "coords Step p q" is also "(0.4995 0.5715) (0.468 0.1552)". | |
⍝ | |
⍝ The symbol "≡" is called "match". As a dyadic verb it returns "1" | |
⍝ when its arguments are completely identical, and "0" otherwise. It | |
⍝ is different from "=" in that it compares the arguments entirely | |
⍝ rather than element-wise. | |
¯1 0 1≡1 0 1 | |
⍝ The result of "¯1 0 1≡1 0 1" is "0". | |
1 2 3≡1 2 3 | |
⍝ The result of "1 2 3≡1 2 3" is "1". | |
¯1 0 1=1 0 1 | |
⍝ The result of "¯1 0 1=1 0 1" is "0 1 1". | |
1 2 3=1 2 3 | |
⍝ The result of "1 2 3=1 2 3" is "1 1 1". | |
⍝ | |
⍝ The symbol "⍣" denotes the "power operator". When used with two | |
⍝ functions, it recursively evaluates the left function, and uses the | |
⍝ right function to determine when to stop. When the right function is | |
⍝ "≡" it is called a "fixpoint operator". | |
means←coords Step⍣≡ p q | |
⍝ This repeats our "Step" function until the answer converges. The | |
⍝ result is a vector of the cluster means "(0.592 0.5296) (0.3445 0.1035)". |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment