Skip to content

Instantly share code, notes, and snippets.

@robenalt
robenalt / macruby_string_token_lang_detect.rb
Created August 3, 2010 16:38
detect languages and tokenize via core foundation
framework 'Foundation'
class String
def language
CFStringTokenizerCopyBestStringLanguage(self, CFRangeMake(0, self.size))
end
def tokens
str_array = []
stok = CFStringTokenizerCreate(nil,self,[0,self.length],0,nil)
CFStringTokenizerGetCurrentTokenRange(stok)
@robenalt
robenalt / gist:1107782
Created July 26, 2011 19:41
Command Line Pipe from inside ruby
IO.popen("grep -i what", "w").write ( IO.popen('find . ').read )
@robenalt
robenalt / gist:3802791
Created September 29, 2012 00:59
Fresh Mountain Lion OS X 10.8 DP3
package mllib
import scala.util.Random
import org.jblas.DoubleMatrix
import org.apache.spark.SparkContext
import org.apache.spark.rdd._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
@robenalt
robenalt / gist:f6ad257236db5ae8fb88
Last active February 7, 2017 15:17
Load a Remote file to hadoop through an edge node via ssh piping
# Load a Remote file to hadoop through ssh
cat /some/path/tofile.csv | ssh user@host "hadoop fs -put - /some_hdfs/path/for/the/file.csv"
@robenalt
robenalt / iterm.scpt
Last active June 6, 2016 18:35 — forked from gnachman/iterm.scpt
Fix docker quickstart terminal for iTerm2 version 3
on write_to_file(this_data, target_file, append_data)
try
set the target_file to the target_file as string
set the open_target_file to open for access file target_file with write permission
if append_data is false then set eof of the open_target_file to 0
write this_data to the open_target_file starting at eof
close access the open_target_file
return true
on error
try
@robenalt
robenalt / run_commandline.scala
Last active June 10, 2016 19:01
run a commandline and capture output
import scala.sys.process._
//"ls -la".!!
val result = "ls -la".!!
@robenalt
robenalt / read_avro_spark_1.3.0.py
Last active June 10, 2016 19:08
1.3.0 pyspark read avro
# pyspark --packages com.databricks:spark-avro_2.10:1.0.0
# read avro files from 1.3.0 spark
df = sqlCtx.load("/path/to/my_avro", "com.databricks.spark.avro")
@robenalt
robenalt / scala_spark_logger.scala
Last active June 10, 2016 19:18
spark scala set logger level
// Set logging level for spark scala
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)
@robenalt
robenalt / save_dataframe_pyspark.py
Last active June 10, 2016 19:30
pyspark save dataframe
#from pyspark.sql import HiveContext
#sqlContext = HiveContext(sc)
query = """
select * from db.sometable where col>50
"""
results = sqlContext.sql(query)
result_writer = pyspark.sql.DataFrameWriter(results)
result_writer.saveAsTable('db.new_table_name',format='parquet', mode='overwrite',path='/path/to/new/data/files')