Skip to content

Instantly share code, notes, and snippets.

@mwinkle
mwinkle / get-gitrepos.ps1
Last active December 17, 2015 01:39
Very basic one-liner to enumerate all git repo's in a directory (and child directories)
# start of something to help me manage git repos...
# starting with a very basic list of them.
gci -recurse -force -include .git | select -Property Parent, {$_.Parent.FullName}
# ideal next step would potentially parse & return a list of remotes...
# once I get remotes, I should also output current branches
# serialize to a json doc and allow me to pick that back up elsewhere
@mwinkle
mwinkle / gist:9808807
Last active August 29, 2015 13:57
This is the result of work done with https://issues.apache.org/jira/browse/HIVE-3554 and https://issues.apache.org/jira/browse/HIVE-5795 that enable easy traversal of subdirectories, and exclusion of header rows. This makes Hive a turnkey way to query any type of log data partitioned by time series, without having to explicitly program the parti…
CREATE EXTERNAL TABLE websites_logs_raw (datereq STRING,
timereq STRING,
s_sitename STRING,
cs_method STRING,
cs_uri_stem STRING,
cs_uri_query STRING,
s_port STRING,
cs_username STRING,
c_ip STRING,
cs_User_Agent STRING,
@mwinkle
mwinkle / AzureDB-HDInsight-Integration.txt
Created April 6, 2014 18:01
Interacting with Azure SQL DB From HDInsight
Sqoop using Command line (SQL to Hive)
bin\sqoop import --connect "jdbc:sqlserver://[yourserver].database.windows.net:1433;database=AdventureWorks2012;user=[username];password=[password]" --table SalesOrderDetail --hive-import -m 10 -- --schema Sales
Sqoop using Remote PowerShell
New-AzureHDInsightSqoopJobDefinition –Command 'import --connect "jdbc:sqlserver://[yourserver].database.windows.net:1433;database=AdventureWorks2012;user=[username];password=[password]" --table SalesOrderDetail --hive-import -m 10 -- --schema Sales'
Storing from Pig to Mongo
Obtain jar's from https://github.com/mongodb/mongo-hadoop
REGISTER wasb://container@account/mongo-hadoop/mongo-java-driver-2.11.4.jar;
REGISTER wasb://container@account/mongo-hadoop/mongo-hadoop-core-1.2.1-SNAPSHOT-hadoop_2.2.jar;
REGISTER wasb://container@account/mongo-hadoop/mongo-hadoop-pig-1.2.1-SNAPSHOT-hadoop_2.2.jar;
-- pig code
@mwinkle
mwinkle / HDInsight-Transient-Clusters.ps1
Created April 6, 2014 18:05
This gist shows how to pass in arbitrary storage accounts, as well as HCatalog Metadata stores on a per job basis via PowerShell.
$subscriptionName = "<subscriptionname>"
$clusterName = "<cluster name>"
Select-AzureSubscription -SubscriptionName $subscriptionName
Use-AzureHDInsightCluster $clusterName
$querystring = "show tables; select name, salary from emp;"
[HashTable] $defines
$defines = @{}
$defines.Add("hive.metastore.uris", "")
$defines.Add("javax.jdo.option.ConnectionURL", "jdbc:sqlserver://<name>.database.windows.net;database=<dbname>; encrypt=true;trustServerCertificate=true;create=false")
$defines.Add("fs.azure.account.key.<name1>.blob.core.windows.net", "<clear text key1>")
@mwinkle
mwinkle / gist:ac9dbb152a1e10e06c16
Created July 2, 2014 06:00
HDInsight Hive with Groovy inline UDF's
set hive.aux.jars.path=file:///c:/apps/dist/hive-0.13.0.2.1.3.0-1887/lib/groovy-all-2.1.6.jar
COMPILE `import org.apache.hadoop.hive.ql.exec.UDF \;
public class Pyth extends UDF {
public double evaluate(double a, double b){
return Math.sqrt((a*a) + (b*b)) \;
}
} ` AS GROOVY NAMED Pyth.groovy;
CREATE TEMPORARY FUNCTION Pyth as 'Pyth';
@mwinkle
mwinkle / HDI_SAS_Cluster.ps1
Last active August 29, 2015 14:03
Creating an HDInsight Cluster using a SAS protected storage container
#This script lets you access data in a second storage account, where you are only granted access
#via a Shared Access Signature to a given container
#For more details on SAS, see this http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-shared-access-signature-part-1/
#Configure SAS key
$sasStorageAccountName = "YOUR_STORAGE_ACCCOUNT_NAME"
$sasContainerName = "YOUR_SAS_CONTAINER_NAME"
$context = New-AzureStorageContext -StorageAccountName $sasStorageAccountName -StorageAccountKey (Get-AzureStorageKey -StorageAccountName $sasStorageAccountName).Primary
$sas = New-AzureStorageContainerSASToken -Name $sasContainerName -Permission rl -Context $context -ExpiryTime (get-date).AddDays(3)
@mwinkle
mwinkle / gist:8fb31f6c7c096a60dc44
Created August 15, 2014 01:03
Uploading data to Azure Blob Store
$storageAccountName = "STORAGE_ACCOUNT_NAME"
$destinationContainer = "DESTINATION_CONTAINER_NAME"
$context = New-AzureStorageContext $storageAccountName -StorageAccountKey (get-azurestoragekey $storageAccountName).Primary
Set-AzureStorageBlobContent -Blob "path/to/your/blob" -Container $destinationContainer -File "c:\your\local\path" -Context $context -Force
@mwinkle
mwinkle / hivehbasetable.hql
Created November 14, 2014 17:38
Hive Table in Hbase
CREATE EXTERNAL TABLE reverse_ip_hbase (
rowkey STRING,
IpAddress string,
Continent string,
Country string,
CountryIso2 string,
CountryConfidence INT,
Region string,
State string,
StateConfidence INT,
@mwinkle
mwinkle / azure-blob-xplat-cli-sample
Last active May 23, 2017 23:36
Azure XPlat CLI in order to list and download blobs
# first, make sure you have downloaded the Azure XPlat CLI
# http://azure.microsoft.com/en-us/documentation/articles/xplat-cli/#install
# eg, sudo npm install azure-cli -g
# login to Azure (will prompt for your username/password)
azure login
# show all of your storage accounts
azure storage account list