Skip to content

Instantly share code, notes, and snippets.

@wjohnson
wjohnson / get-schema-purview.py
Last active January 13, 2021 16:10 — forked from mdrakiburrahman/get-schema-purview.py
Extracting metadata from Azure Purview with Synapse Spark Pools
# Reusable Functions
def azuread_auth(tenant_id: str, client_id: str, client_secret: str, resource_url: str):
"""
Authenticates Service Principal to the provided Resource URL, and returns the OAuth Access Token
"""
url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"
payload= f'grant_type=client_credentials&client_id={client_id}&client_secret={client_secret}&resource={resource_url}'
headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
{
"type" : "record",
"name" : "TrainingExample",
"namespace" : "com.linkedin.metronome.avro.generated",
"fields" : [ {
"name" : "uid",
"type" : [ "null", "string", "long", "int" ],
"doc" : "a unique id for the training event",
"default" : null
}, {
@wjohnson
wjohnson / model-automation-options.R
Last active July 28, 2016 01:07
Model Automation in R
#http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
data <- read.csv("~/in/bank/bank-full.csv",header=T,
sep=";")
####Decision Tree####
library(rpart)
library(rpart.plot)
rp <- rpart(y~., data = data)
rpart.plot(rp)
@wjohnson
wjohnson / recsys-pyspark.py
Last active June 17, 2020 10:01
Using Pyspark's ALS Matrix Factorization Model for RecSys
#Get the data here http://grouplens.org/datasets/movielens/
movielens = sc.textFile("../in/ml-100k/u.data")
movielens.first() #u'196\t242\t3\t881250949'
movielens.count() #100000
#Clean up the data by splitting it
#Movielens readme says the data is split by tabs and
#is user product rating timestamp
clean_data = movielens.map(lambda x:x.split('\t'))