Skip to content

Instantly share code, notes, and snippets.

@mwinkle
mwinkle / azure-blob-xplat-cli-sample
Last active May 23, 2017 23:36
Azure XPlat CLI in order to list and download blobs
# first, make sure you have downloaded the Azure XPlat CLI
# http://azure.microsoft.com/en-us/documentation/articles/xplat-cli/#install
# eg, sudo npm install azure-cli -g
# login to Azure (will prompt for your username/password)
azure login
# show all of your storage accounts
azure storage account list
@mwinkle
mwinkle / foo.py
Last active August 18, 2016 21:33
PySpark UDF for calling Text Analytics, a Microsoft Cognitive Service
from pyspark.sql.functions import udf
import httplib, urllib, base64, json
def whichLanguage(text):
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '{your subscription key here}',
}
params = urllib.urlencode({
text_file = sc.textFile("hdfs://sandbox.hortonworks.com/user/guest/install.log")
counts = text_file.flatMap( lambda line: line.split(" ")) \
.map(lambda word: (word, 1) ) \
.reduceByKey(lambda a, b : a + b)
counts.saveAsTextFile("hdfs://sandbox.hortonworks.com/user/guest/output1.txt")
su hdfs
hadoop fs –mkdir /user/root
hadoop fs –chmod 777 /user/root
hadoop fs –chmod 777 /user/guest
exit
wget http://www.gutenberg.org/files/50831/50831-0.txt
@mwinkle
mwinkle / get-gitrepos.ps1
Last active December 17, 2015 01:39
Very basic one-liner to enumerate all git repo's in a directory (and child directories)
# start of something to help me manage git repos...
# starting with a very basic list of them.
gci -recurse -force -include .git | select -Property Parent, {$_.Parent.FullName}
# ideal next step would potentially parse & return a list of remotes...
# once I get remotes, I should also output current branches
# serialize to a json doc and allow me to pick that back up elsewhere
@mwinkle
mwinkle / fSharpProcessor.fs
Last active November 12, 2015 02:12
An example of using F# to implement a U-SQL UDO (in this case, a processor).
// sample U-SQL UDO (Processor) written in F#
// Note, currently (11/2015) requires deployment of F#.Core
namespace fSharpProcessor
open Microsoft.Analytics.Interfaces
type myProcessor() =
inherit IProcessor()
' sample U-SQL UDO (Processor) written in VB.NET
Imports Microsoft.Analytics.Interfaces
Public Class vbProcessor
Inherits IProcessor
Private CountryTranslation As New Dictionary(Of String, String) From
{{"Deutschland", "Germany"},
{"Schwiiz", "Switzerland"},
# assumes table is my_json, with one column containing all of the json body
add file wasb:///example/apps/process_json.py;
SELECT transform(json_body)
USING 'd:\python27\python.exe process_json.py'
AS id, lessonbranch, elapsedseconds, activity, the_date
FROM my_json;
@mwinkle
mwinkle / process_json.py
Last active August 29, 2015 14:17
python Processing Json docs
# this is a python streaming program designed to be called from a Hive query
# this will process a complex json document, and will return the right set of columns and rows
# a second GIST will contain the hive query that can be used to process this
import sys
import json
# this returns five columns
# id, lessonbranch, elapsedseconds, activity, datetime
@mwinkle
mwinkle / manytosinglepython.py
Created March 14, 2015 01:21
Python file used to consolidate json files to a single line wtih no CR's
import sys
lines = []
for line in sys.stdin:
lines.append(line)
if len(lines) > 0:
cleaned_lines = [line.strip() for line in lines]
single_line = ' '.join(cleaned_lines)