This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************* Module pyspark | |
W: 48, 0: Wildcard import pyspark.status (wildcard-import) | |
************* Module pyspark.broadcast | |
W: 27, 4: Redefining built-in 'unicode' (redefined-builtin) | |
C: 1, 0: Missing module docstring (missing-docstring) | |
C: 27, 4: Invalid class name "unicode" (invalid-name) | |
C: 33, 0: Invalid constant name "_broadcastRegistry" (invalid-name) | |
W: 37, 4: Redefining name '_broadcastRegistry' from outer scope (line 33) (redefined-outer-name) | |
C: 36, 0: Missing function docstring (missing-docstring) | |
W: 37, 4: Module import itself (import-self) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
echo "$whatever" | |
# To make the shell search this directory for an executable | |
export "PATH=newdirec:$PATH" | |
# To make python call non - usr/local/lib/distpackages | |
export "PYTHONPATH=" | |
# To iterate ober a string | |
for i in string_separated_sentence |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.mllib.linalg.{Vectors, Matrices} | |
import org.apache.spark.mllib.stat.distribution.MultivariateGaussian | |
import org.apache.spark.mllib.clustering.GaussianMixture | |
import scala.util.Random | |
val rng = Random | |
rng.setSeed(0) | |
val nSamplesArray = Array(100, 200) | |
val nFeaturesArray = Array(10, 20, 50, 100, 200) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val trainData = { | |
if (sparse) | |
data.map(sample => sample.asInstanceOf[SparseVector]).cache() | |
else | |
data.map(u => u.toBreeze.toDenseVector).cache() | |
} | |
// Now since trainData can have two possible types, this statement returns an error. | |
val sums = { |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[error] /home/manoj/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala:308: polymorphic expression cannot be instantiated to expected type; | |
[error] found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] | |
[error] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] | |
[error] implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func) | |
[error] ^ | |
[error] /home/manoj/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala:310: polymorphic expression cannot be instantiated to expected type; | |
[error] found : [T(in method apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)] | |
[error] required: org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method functionToUdfBuilder)] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. It does not scale well for really sparse data in high dimensions, where the memory crashes, | |
for example in the newsgroup dataset, for around 80k features, memory crashes in my laptop. | |
2. From the profile, it seems it is as optimized as possible, it is | |
slightly faster than MiniBatchKMeans, for high n_clusters around 1000 | |
slower than MiniBatchKMeans for higher n_features | |
slightly faster than MiniBatchKMeans for higher n_features (~400) and n_clusters (~1000). | |
3. The problem of setting the threshold, almost all the times I had to set the threshold manually |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Total time: 0.18872 s | |
Line # Hits Time Per Hit % Time Line Contents | |
============================================================== | |
18 @profile | |
19 def _iterate_X(X): | |
20 """ | |
21 This little hack returns a densified row when iterating over a sparse | |
22 matrix, insted of constructing a sparse matrix for every row that is | |
23 expensive. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Store downloaded tamil songs in a directory from tamiltunes.com | |
# Supply links like http://tamiltunes.com/kayal-2014.html | |
# TODO: Format stuff like % in songs | |
import urllib | |
import os | |
a = raw_input("Enter link ") | |
b = urllib.urlopen(a) | |
html = b.read().split() |
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5.816400000000000000e+04 8.134310000000000000e+05 | |
5.816000000000000000e+04 8.451400000000000000e+04 | |
5.814800000000000000e+04 5.500010000000000000e+05 | |
5.814400000000000000e+04 8.226450000000000000e+05 | |
5.813400000000000000e+04 4.472990000000000000e+05 | |
5.812100000000000000e+04 8.176100000000000000e+04 | |
5.811500000000000000e+04 3.284160000000000000e+05 | |
5.810900000000000000e+04 3.391240000000000000e+05 | |
5.809800000000000000e+04 2.581170000000000000e+05 | |
5.809100000000000000e+04 3.233850000000000000e+05 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
============================================================== | |
110 @profile | |
111 def insert_cf_subcluster(self, subcluster): | |
112 """ | |
113 Insert a new subcluster into the nide | |
114 """ | |
115 265652 183822 0.7 1.9 if not self.subclusters_: | |
116 1 3 3.0 0.0 self.update(subcluster) | |
117 1 0 0.0 0.0 return False | |
118 |