Skip to content

Instantly share code, notes, and snippets.

View DaveRuijter's full-sized avatar

Dave Ruijter DaveRuijter

View GitHub Profile
@DaveRuijter
DaveRuijter / pipeline-backup-weekly.yml
Created October 21, 2021 20:30
This YAML is part of the Data Lake Backup Strategy
parameters:
- name: backupStore
displayName: 'Backup 05 store'
type: boolean
default: true
- name: backupBronze
displayName: 'Backup 10 bronze'
type: boolean
default: true
- name: backupSilver
@DaveRuijter
DaveRuijter / pipeline-backup-daily.yml
Created October 21, 2021 20:32
This YAML pipeline is part of the Data Lake Backup Strategy
parameters:
- name: backupStore
displayName: 'Backup 05 store'
type: boolean
default: true
- name: backupBronze
displayName: 'Backup 10 bronze'
type: boolean
default: true
- name: backupSilver
@DaveRuijter
DaveRuijter / job-backup-dls.yml
Last active November 23, 2021 19:32
This YAML file is part of the Backup Strategy
parameters:
- name: backups
displayName: 'Array of backups'
type: object
default: []
- name: serviceConnectionName
displayName: 'Name of the DevOps Service Connection'
type: string
- name: execute
displayName: 'Execute this Job'
@DaveRuijter
DaveRuijter / is_pipeline_running.json
Created December 12, 2021 11:39
ADF/ASA pipeline to verify if a pipeline is running / in progress
{
"name": "00_is_pipeline_running",
"properties": {
"activities": [
{
"name": "Get Pipeline Runs",
"type": "WebActivity",
"dependsOn": [
{
"activity": "getSubscriptionID",
@DaveRuijter
DaveRuijter / multicolumn_expression_evaluation.py
Created January 2, 2022 09:04
Custom multi-column sql expression evaluation expectation for the Great Expectation framework
from great_expectations.expectations.expectation import MulticolumnMapExpectation
from great_expectations.expectations.util import render_evaluation_parameter_string
from great_expectations.render.util import (
num_to_str,
substitute_none_for_missing,
parse_row_condition_string_pandas_engine,
)
from scipy import stats as stats
from great_expectations.execution_engine import (
PandasExecutionEngine,
@DaveRuijter
DaveRuijter / generate_hash.py
Created April 3, 2022 07:13
Couple functions to easily create an integer based hash. Use it for the key column of a dimension.
spark.udf.register("udf_removehtmltagsfromstring", udf_removehtmltagsfromstring, "string")
# This is the central hashing function, used by other functions. It uses the blake2b hashing algorithm. With a central function, we can adjust the hashing when needed.
def udf_centralhash(string: str) -> int:
val = hashlib.blake2b(
digest_size=6
) # Increase digest size to make the hashing bigger. 6 seems a good start for our use for dimensions.
val.update(string.encode("utf-8")) # give the input string as utf-8 to the blake2b object
intval = int(val.hexdigest(), 16) # and convert it to an integer
@DaveRuijter
DaveRuijter / AddServicePrincipalToPowerBIWorkspaces.ps1
Last active July 22, 2022 09:11
Script to add Service Principal to Power BI workspaces
# =================================================================================================================================================
## This script will add the given Service Principal to Power BI workspaces
## It will first ask for the (correct) ObjectId of the Service Principal
## Then it will ask for the credentials of a Power BI Service Administrator
## Note: this script only works with v2 workspaces (you can't add a Service Principal to a v1 workspace)
# =================================================================================================================================================
## Parameters
@DaveRuijter
DaveRuijter / pipeline-release-administration.yml
Created October 3, 2021 19:47
Azure DevOps YAML pipeline for release administration, with automatic SemVer based on GitVersion and automatic release notes publication
trigger:
- main
- master
pool:
vmImage: ubuntu-latest
## Job to create release and add tag
jobs:
- job: CalculateVersion
@DaveRuijter
DaveRuijter / gitversion.yml
Created October 3, 2021 20:27
My GitVersion configuration for automatic SemVer
next-version: 1.0
assembly-versioning-scheme: MajorMinorPatch
assembly-file-versioning-scheme: MajorMinorPatchTag
assembly-informational-format: '{InformationalVersion}'
mode: ContinuousDelivery
increment: Inherit
continuous-delivery-fallback-tag: ci
tag-prefix: '[vV]'
major-version-bump-message: '\+semver:\s?(breaking|major)'
minor-version-bump-message: '\+semver:\s?(feature|minor)'
@DaveRuijter
DaveRuijter / pl_PBI_dataset_refresh.json
Last active November 28, 2022 19:23
Azure Data Factory pipeline to refresh a Power BI dataset using a Service Principal, and Azure Key Vault.
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Call dataset refresh",
"type": "WebActivity",
"dependsOn": [
{
"activity": "Get AAD Token",