Skip to content

Instantly share code, notes, and snippets.

@Gorgoras
Last active November 25, 2020 18:15
Show Gist options
  • Save Gorgoras/9faca9a726e874489c2b1d76078764e4 to your computer and use it in GitHub Desktop.
Save Gorgoras/9faca9a726e874489c2b1d76078764e4 to your computer and use it in GitHub Desktop.
Check for Datasets not being used in any Pipeline in Azure Data Factory. Just clone the repo, then set repo_path, ADF name, and run!
import os
# set repo location and ADF name
repo_path = "D:/Work/Python/Azure"
dataFactory_name = "Highwayman-ADFv2"
full_path = "/".join([repo_path, dataFactory_name])
# get list of dataset and pipeline files
datasets = os.listdir("{}/dataset".format(full_path))
pipelines = os.listdir("{}/pipeline".format(full_path))
# just the name of the dataset, without .json
list_datasets = [x.split(".")[0] for x in datasets]
# iterate over pipelines looking for dataset names
for i in range(0, len(pipelines)):
f = open("{}/pipeline/{}".format(full_path, pipelines[i]), "r", encoding='utf-8')
stringPipe = f.read()
for o in range(0, len(list_datasets)):
# if the dataset is being used, replace it with ''
if list_datasets[o] in stringPipe:
list_datasets[o] = ""
# remove used datasets from the list
list_datasets = [s for s in list_datasets if s != ""]
# datasets not removed are the ones not being used
print("These are your datasets that are not being used.")
print(list_datasets)
@NowinskiK
Copy link

Good stuff! I think it is a very interesting idea, so I will use it to extend my PS module:
Azure-Player/azure.datafactory.tools#37

@Gorgoras
Copy link
Author

Absolutely! As I said on Twitter it can also be applied to linked services by iterating over the datasets. I love your module! I've been trying to find some time to study it and contribute :)

@NowinskiK
Copy link

Thanks and please do. If you want I can leave this feature for you to contribute and implement. No rush.

@Gorgoras
Copy link
Author

Great, I'll try my best to rewrite this in PowerShell and make it work like you describe in the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment