Skip to content

Instantly share code, notes, and snippets.

@dataGriff
dataGriff / 01_MountDataLake
Created November 17, 2019 17:04
ETLDataBricksJSONToParquet
spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", "<registeredappid>")
spark.conf.set("fs.azure.account.oauth2.client.secret", "<passwordhere>")
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/<tenantid>/oauth2/token")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
dbutils.fs.ls("abfss://<container>@accountname.dfs.core.windows.net/")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")
@dataGriff
dataGriff / ConvertToJSONString.js
Created December 23, 2019 13:28
StreamJSONtoSQL
function ConvertToJSONString(InputJSON) {
var InputJSONString = JSON.stringify(InputJSON);
return InputJSONString;
}
@dataGriff
dataGriff / DatabricksUtilsHelp.txt
Created January 4, 2020 17:11
Databricks help commands
dbutils.help()
dbutils.fs.help()
dbutils.notebook.help()
dbutils.widgets.help()
dbutils.secrets.help()
@dataGriff
dataGriff / AddDirectoryGen2.ps1
Last active January 18, 2020 18:17
Adding Directory to Gen 2 Lake with powershell
Connect-AzAccount
Select-AzSubscription -SubscriptionId ddcf4317-347c-4cb5-9153-1508b94f3088
$ctx = New-AzStorageContext -StorageAccountName 'griffvnetlk2' -UseConnectedAccount
##write-host $ctx.StorageAccount
$storageAccount = Get-AzStorageAccount -ResourceGroupName "testvnet-rg" -AccountName "griffvnetlk2"
$ctx = $storageAccount.Context
@dataGriff
dataGriff / AlertMetricDeployIfNotExists.json
Last active September 20, 2020 17:50
AlertActivityDeployIfNotExists.json
{
"mode": "All",
"policyRule": {
"if": {
"field": "type",
"equals": "Microsoft.Storage/storageAccounts"
},
"then": {
"effect": "deployIfNotExists",
"details": {
@dataGriff
dataGriff / dropFields_Example.py
Created March 14, 2021 17:21
Example of how can use dropFields in pyspark potentially for sensitive data
# https://towardsdatascience.com/spark-3-nested-fields-not-so-nested-anymore-9b8d34b00b95
# https://medium.com/@fqaiser94/manipulating-nested-data-just-got-easier-in-apache-spark-3-1-1-f88bc9003827
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, json
from pyspark.sql.functions import struct, from_json, to_json, col
spark = (SparkSession
.builder
.appName('dog_removecols')
{
"mode": "All",
"policyRule": {
"if": {
"allOf": [
{
"field": "Microsoft.Web/sites/config/ipSecurityRestrictions[*].ipAddress",
"notIn": "[parameters('allowedIPs')]"
},
{

Agile on the Beacn

  1. What is the power of the technology?

  2. What limitations does the technology diminish?

  3. What rules enabled us to manage this limitation?

  4. What new rules will we need?

This could be used for user stories and how to refine them. You should always start with the chaos or complex tasks for a feature and move through complicate until land in obvious so the tasks can be done. If you do the obvious first and leave the complicated until later, it may mean you have change the obvious implementation as those simple means could not be used once the comlpex or complicated has been interpreted.

clear complicated complex chaotic confusion