Skip to content

Instantly share code, notes, and snippets.

@danielscholl
Last active September 15, 2022 14:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danielscholl/1766190b2f5318d1aae395c6d8ef077d to your computer and use it in GitHub Desktop.
Save danielscholl/1766190b2f5318d1aae395c6d8ef077d to your computer and use it in GitHub Desktop.
---
title: Tutorial: Deploy Microsoft Energy Data Services and injest OSDU community open data using Azure CLI
description: In this tutorial, you'll learn how to create platform instance with the Azure CLI and load it with the OSDU community contributed DataSet
author: danielscholl
ms.author: dascholl
ms.service: energy
ms.topic: tutorial
ms.date: 9/15/2022
ms.custom: template-tutorial
---
# Tutorial: Deploy Microsoft Energy Data Services and injest OSDU community open data using Azure CLI.
## Overview and Prerequisites
In this tutorial, you'll learn how to create a data platform using the Azure CLI and ingest [Open Test Data](https://community.opengroup.org/osdu/platform/data-flow/data-loading/open-test-data).
This tutorial can be completed with the interactive experience offered through Azure Cloud Shell and knowledge of Azure CLI and bash scripts.
Use __ctrl-shift-v__ (__cmd-shift-v__ on macOS) to paste tutorial text into an [Azure Cloud Shell](https://docs.microsoft.com/en-us/azure/cloud-shell/overview) bash environment.
[![Launch Cloud Shell in a new window](https://docs.microsoft.com/en-us//cli/azure/media/cloud-shell-try-it/launch-cloud-shell.png)](https://shell.azure.com)
The tool [direnv](https://direnv.net/) allows for an easy way to load environment variables. To install in Cloud Shell run the following command.
```bash
# Export bin installation directory
export bin_dir=$HOME/.local/bin
# Install direnv
curl -sfL https://direnv.net/install.sh | bash
# Hook Installation
cat >> .bashrc <<EOF
eval "$(direnv hook bash)"
EOF
```
You peform the following steps in this tutorial
* Register an Application.
* Deploy the data platform.
* Retrieve Tokens.
* Add User.
* Add Legal Tag.
* Author a dataload Dockerfile and deployment.
* Execute the dataload.
* Validate the dataload.
## Shell variables
Shell variables store values for future use and can be used to pass values to command parameters. Shell variables allow for the reuse of commands, both on their own and in scripts. This tutorial uses shell variables for easier customization of command parameters.
The following shell variables are required:
| Name | Description |
| --------- | ----------- |
| RANDOM_NUMBER | 4 digit unique value. |
| AZURE_TENANT | Azure Active Directory tenant ID. |
| AZURE_LOCATION | Azure region location that resources are deployed to. |
| NAME | Name to be used for the resources. |
| APPLICATION | Name for the application to be registered with the Microsoft Identity Platform. |
| PARTITION | Name of a data partition. |
```azurecli
# Create Shell Variables
RANDOM_NUMBER=$(echo $((RANDOM%9999+1000)))
AZURE_TENANT=$(az account show --query tenantId -otsv)
AZURE_LOCATION="eastus"
NAME="meds${RANDOM_NUMBER}"
APPLICATION="meds-${RANDOM_NUMBER}"
PARTITION="opendes"
# Persist Variables to .envrc file
cat >> .envrc <<EOF
NAME=${NAME}
DATA_PARTITION=${NAME}-${PARTITION}
AZURE_TENANT=${AZURE_TENANT}
EOF
```
## Register an Application with the Microsoft Identity Platform
The Microsoft Identity Platform performs identity and access management (IAM) for registered applications. The data platform integrates natively with Azure Identity Management. An [application registration](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app) will make use of a delegated [User.Read](https://docs.microsoft.com/en-us/graph/api/user-get?view=graph-rest-1.0&tabs=http) API permission for the Microsoft Graph.
The following shell variables are created:
| Name | Description |
| --------- | ----------- |
| CLIENT_ID | The Application (client) ID. |
| CLIENT_SECRET | The Application (client) Secret. |
```azurecli
# Graph API User.Read (Delegated)
cat > manifest.json <<EOF
[
{
"resourceAppId": "00000003-0000-0000-c000-000000000000",
"resourceAccess": [
{
"id": "e1fe6dd8-ba31-4d61-89e7-88639da4683d",
"type": "Scope"
}
]
}
]
EOF
# Create Application
CLIENT_ID=$(az ad app create \
--display-name ${APPLICATION}-app \
--required-resource-accesses "manifest.json" \
--query appId -otsv) && rm manifest.json
# Create Identity
az ad sp create --id $CLIENT_ID
# Create Secret
CLIENT_SECRET=$(az ad app credential reset --id $CLIENT_ID --query password -otsv)
# Persist Variables to .envrc file
cat >> .envrc <<EOF
CLIENT_ID=$CLIENT_ID
CLIENT_SECRET=$CLIENT_SECRET
EOF
```
## Deploy the data platform
The data platform is a collection of microservice APIs deployed as a managed service offering compliant with the [Open Subsurface Data Universe](https://community.opengroup.org/osdu).
> Warning: This is a long running command and will take about 1 hour to complete.
```azurecli
# Create Resource Group
az group create \
--name ${NAME}-rg \
--location $AZURE_LOCATION -ojsonc
# Create Platform
az resource create \
--name ${NAME} \
--resource-group ${NAME}-rg \
--location $AZURE_LOCATION \
--resource-type Microsoft.OpenEnergyPlatform/energyservices \
--is-full-object \
--properties "{ \"location\": \"$AZURE_LOCATION\", \
\"properties\": { \
\"authAppId\": \"$CLIENT_ID\", \
\"publicNetworkAccess\": \"Enabled\", \
\"dataPartitionNames\": [{ \"name\": \"$PARTITION\" }] \
} \
}" -ojsonc
```
## Get an API access token
The APIs can be accessed using the identity of the registered application. To accomplish this, we will perform a curl request using the [OAuth 2.0 client credentials flow](https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-client-creds-grant-flow#request-an-access-token). This token now represents the identity for the data platform service account.
The following shell variables will be created:
| Name | Description |
| --------- | ----------- |
| TOKEN | The service account bearer token. |
```azurecli
# Get Bearer Token
TOKEN=$(curl --request POST \
--url https://login.microsoftonline.com/${AZURE_TENANT}/oauth2/v2.0/token \
--header "content-type: application/x-www-form-urlencoded" \
--data "grant_type=client_credentials" \
--data "client_id=${CLIENT_ID}" \
--data "client_secret=${CLIENT_SECRET}" \
--data "scope=${CLIENT_ID}/.default" |jq -r .access_token)
```
## Add a user to the data platform
The data platform has native integration with [Azure Identity Management](https://docs.microsoft.com/en-us/azure/security/fundamentals/identity-management-overview) for authentication. However, authorization checks are performed within the data platform. The service account identity token is used to add a user to the data platform and assign it the owner role.
```azurecli
# Retrieve the object ID (oid) for the signed in user
AZURE_USER=$(az ad signed-in-user show --query id -otsv)
# Add user to Project Oak Forest
curl --request POST \
--url https://${NAME}.energy.azure.com/api/entitlements/v2/groups/users@${NAME}-${PARTITION}.dataservices.energy/members \
--header "accept: application/json" \
--header "content-type: application/json" \
--header "authorization: Bearer ${TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data "{\"email\": \"${AZURE_USER}\",\"role\": \"MEMBER\"}" | jq
# Assign owner role to user
curl --request POST \
--url https://${NAME}.energy.azure.com/api/entitlements/v2/groups/users.datalake.ops@${NAME}-${PARTITION}.dataservices.energy/members \
--header "accept: application/json" \
--header "content-type: application/json" \
--header "authorization: Bearer ${TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data "{\"email\": \"${AZURE_USER}\",\"role\": \"MEMBER\"}" | jq
# List user permissions (Optional)
curl --request GET \
--url "https://${NAME}.energy.azure.com/api/entitlements/v2/members/${AZURE_USER}/groups?type=none" \
--header "content-type: application/json" \
--header "authorization: Bearer ${TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" | jq
```
## Authenticate a user
The data platform supports [OpenID Connect authentication](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/auth-oidc) with Azure Active Directory. OIDC enables apps to securely acquire an access token that can be used to access resources secured by the Microsoft Identity platform. Apps can refresh tokens to get other access tokens for the signed in user.
This section will assist in setting up a [Storage Account static web site](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-static-website) to host a web page and obtain [OAuth 2.0 Authorization Codes](https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-auth-code-flow).
The following shell variables will be created:
| Name | Description |
| --------- | ----------- |
| WEB_ENDPOINT | The FQDN of the Static Web Site. |
| AUTH_CODE | The OAuth 2.0 Authorization Code. |
| REFRESH_TOKEN | The OAuth 2.0 Refresh Token. |
```azurecli
# Create Storage Account
az storage account create \
--name ${NAME} \
--resource-group ${NAME}-rg \
--location $AZURE_LOCATION \
--sku Standard_LRS -ojsonc
# Configure for Static Web Site
az storage blob service-properties update \
--account-name ${NAME} \
--static-website \
--404-document error.html \
--index-document index.html -ojsonc
# Retrieve the endpoint
WEB_ENDPOINT=$(az storage account show \
--name ${NAME} \
--resource-group ${NAME}-rg \
--query "primaryEndpoints.web" \
--output tsv)
```
__Create Web Page__
```azurecli
# Working Directory
mkdir -p ./dataload
# Create web page
cat > dataload/index.html << EOF
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>OAuth Tool</title>
<!-- Font Awesome -->
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.2/css/all.css">
<!-- Bootstrap core CSS -->
<link href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" rel="stylesheet">
<style type="text/css">
textarea {
margin-bottom: 5px;
}
</style>
</head>
<body>
<div id="app">
<form id="oauth-form" @submit.prevent="getToken" class="text-center border border-light p-5" action="#!">
<p class="h2 mb-4">Microsoft Identity Platform (v2.0)</p>
<div class="form-group">
<div class="form-row">
<div class="form-group col-md-6">
<label class="float-left" for="clientId">ClientId</label>
<input type="text" class="form-control" id="clientId" v-model="clientId">
</div>
<div class="form-group col-md-6">
<label class="float-left" for="tenantId">TenantId</label>
<input type="text" class="form-control" id="tenantId" v-model="tenantId">
</div>
</div>
<div class="form-row">
<div class="col-md-6 mb-3">
<label class="float-left" for="responseType">RedirectUrl</label>
<input type="text" class="form-control" id="redirectUrl" v-model="redirectUrl">
</div>
<div class="col-md-3 mb-3">
<label class="float-left" for="responseType">ResponseType</label>
<input type="text" class="form-control" id="responseType" v-model="responseType">
</div>
<div class="col-md-3 mb-3">
<label class="float-left" for="responseMode">ResponseMode</label>
<input type="text" class="form-control" id="responseMode" v-model="responseMode">
</div>
</div>
<div class="form-row">
<div class="col-md-12 mb-3">
<label class="float-left" for="scope">Scope</label>
<input type="text" class="form-control" id="scope" v-model="scope">
</div>
</div>
<div class="form-row">
<div class="col-md-12 mb-1">
<a :href="signOutUrl" class="btn btn-link float-left" v-if="authorizationCode" class="col-2">Logout</a>
<a :href="authorizeUrl" class="btn btn-primary float-right" class="col-2">Authorize</a>
</div>
</div>
</div>
<hr />
<div class="form-group shadow-textarea">
<div class="form-row">
<div class="col-md-4 mb-4">
<label class="float-left" for="accessToken">ResponseType: code</label>
<textarea id="authorizationCode" v-model="authorizationCode" class="form-control z-depth-1"
rows="10"></textarea>
</div>
<div class="col-md-4 mb-4">
<label class="float-left" for="accessToken">ResponseType: token</label>
<textarea id="accessToken" v-model="accessToken" class="form-control z-depth-1" rows="10"></textarea>
<a @click="decodeAccess()" class="btn btn-info btn-sm float-right" v-if="accessToken">Decode</a>
</div>
<div class="col-md-4 mb-4">
<label class="float-left" for="idToken">ResponseType: id_token</label>
<textarea id="idToken" v-model="idToken" class="form-control z-depth-1" rows="10"></textarea>
<a @click="decodeId()" class="btn btn-info btn-sm float-right" v-if="idToken">Decode</a>
</div>
</div>
</div>
</form>
</div>
<!-- SCRIPTS -->
<!-- JQuery -->
<script type="text/javascript" src="https://code.jquery.com/jquery-3.2.1.slim.min.js"></script>
<!-- Bootstrap core JavaScript -->
<script type="text/javascript" src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js"></script>
<!-- Vue JavaScript -->
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/vue/dist/vue.js"></script>
<script>
var app = new Vue({
el: '#oauth-form',
data: {
tenantId: '${AZURE_TENANT}',
clientId: '${CLIENT_ID}',
redirectUrl: '${WEB_ENDPOINT}',
responseType: 'code',
responseMode: 'fragment',
scope: '${CLIENT_ID}/.default openid profile offline_access',
authorizationCode: null,
accessToken: null,
idToken: null
},
computed: {
authorizeUrl: function () {
return "https://login.microsoftonline.com/" + this.tenantId +
"/oauth2/v2.0/authorize?" +
"client_id=" + this.clientId +
"&response_type=" + this.responseType +
"&redirect_uri=" + this.redirectUrl +
"&response_mode=" + this.responseMode +
"&scope=" + this.scope +
"&state=12345" +
"&nonce=dummy123";
},
decodeId: function () {
decodeUrl = "https://jwt.ms/#id_token=" + this.idToken;
window.open(decodeUrl, "_blank");
},
decodeAccess: function () {
decodeUrl = "https://jwt.ms/#id_token=" + this.accessToken;
window.open(decodeUrl, "_blank");
},
signOutUrl: function () {
return "https://login.microsoftonline.com/" + this.clientId + "/oauth2/logout";
},
redirect: function () {
return location.protocol + '//' + location.host + location.pathname
}
},
beforeMount: function () {
if (window.location.hash) {
var params = window.location.hash.substr(1).split('&').reduce(function (result, item) {
var parts = item.split('=');
result[parts[0]] = parts[1];
return result;
}, {});
this.authorizationCode = params['code'];
this.accessToken = params['access_token'];
this.idToken = params['id_token'];
}
}
})
</script>
</body>
</html>
EOF
# Publish web page
az storage blob upload-batch \
--account-name ${NAME} \
--source dataload \
--destination '$web' -ojsonc && rm -rf dataload/index.html
```
__Update Application Registration__
A [redirect URI](https://docs.microsoft.com/en-us/azure/active-directory/develop/reply-url), or reply URL, is the location where the authorization server sends the user once the app has been successfully authorized and granted an authorization code or access token. The authorization server sends the code or token to the redirect URI, so it's important to register the correct location as part of the app registration.
```azurecli
# Update app registration
az ad app update --id $CLIENT_ID --web-redirect-uris $WEB_ENDPOINT -ojsonc
# Display Web Endpoint
echo $WEB_ENDPOINT
```
__Retrieve Authorization Code__
The website will now perform the function of requesting authorization codes for users. Using the website, click the authorize button and a refresh token should be returned. Copy the refresh token to the clipboard and save it as a variable.
> Note: Authorization Codes can only be used 1 time and typically have a life cycle of about 60 seconds.
```azurecli
AUTH_CODE="<your_code>"
```
__Retrieve Refresh Token__
Now that an authorization code has been acquired this code can be redemed for a refresh token. Refresh tokens have a much longer lifecycle of about 90 days and can be used to create access tokens which have a lifecycle of about 1 hour.
```azurecli
# Request a refresh token
REFRESH_TOKEN=$(curl --request POST \
--url https://login.microsoftonline.com/${AZURE_TENANT}/oauth2/v2.0/token \
--header "content-type: application/x-www-form-urlencoded" \
--data "grant_type=authorization_code" \
--data "redirect_uri=${WEB_ENDPOINT}" \
--data "client_id=${CLIENT_ID}" \
--data "client_secret=${CLIENT_SECRET}" \
--data "scope=${CLIENT_ID}/.default openid profile offline_access" \
--data code=${AUTH_CODE} | jq -r .refresh_token)
# Persist Variables to .envrc file
cat >> .envrc <<EOF
REFRESH_TOKEN=$REFRESH_TOKEN
EOF
```
## Add a Legal Tag in the data platform.
A [legal tag](https://community.opengroup.org/osdu/platform/security-and-compliance/legal/-/blob/master/docs/tutorial/ComplianceService.md#what-is-a-legaltag) represents the legal status of data in the platform. It is a collection of properties that governs how the data can be consumed and ingested and is required for data ingestion.
The following shell variables will be created:
| Name | Description |
| --------- | ----------- |
| ACCESS_TOKEN | The OAuth 2.0 Access Token. |
```azurecli
# Request Access Token
ACCESS_TOKEN=$(curl --request POST \
--url https://login.microsoftonline.com/${AZURE_TENANT}/oauth2/v2.0/token \
--header "content-type: application/x-www-form-urlencoded" \
--data "grant_type=refresh_token" \
--data "client_id=${CLIENT_ID}" \
--data "client_secret=$CLIENT_SECRET" \
--data refresh_token=$REFRESH_TOKEN \
--data "scope=${CLIENT_ID}/.default openid profile offline_access" |jq -r .access_token)
# Create Legal Tag
curl --request POST \
--url https://${NAME}.energy.azure.com/api/legal/v1/legaltags \
--header "accept: application/json" \
--header "authorization: Bearer ${TOKEN}" \
--header "content-type: application/json" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data-raw '{
"name": "legal-tag-load",
"description": "Open Data Set Legal Contract",
"properties": {
"contractId": "A1234",
"countryOfOrigin": ["US"],
"dataType": "Public Domain Data",
"expirationDate": "2099-01-25",
"exportClassification": "EAR99",
"originator": "Contoso",
"personalData": "No Personal Data",
"securityClassification": "Public"
}
}'
# Retrieve Legal Tag (Optional)
curl --request GET \
--url "https://${NAME}.energy.azure.com/api/legal/v1/legaltags/${NAME}-${PARTITION}-legal-tag-load" \
--header "authorization: Bearer ${TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" |jq
```
## Ingest the Test Data
The community open test data set includes a collection of python scripts that can be used to submit jobs to the [ingestion workflow](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow#introduction). The data set is large and has specific requirements for installation to run. A custom docker image will be built for this Project Oak Forest configured to generate the manifests. This image will be built and stored in a container registry along with a [managed identity](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication-managed-identity) to be used for image retrieval.
The following shell variables will be created:
| Name | Description |
| --------- | ----------- |
| REGISTRY_ID | The resource id of the Container Registry. |
| USER_PRINCIPAL | The principal id of the managed user identity. |
```azurecli
# Create Registry
REGISTRY_ID=$(az acr create \
--name ${NAME} \
--resource-group ${NAME}-rg \
--sku Basic \
--query id --output tsv)
# Create Managed Identity
USER_PRINCIPAL=$(az identity create \
--name ${NAME} \
--resource-group ${NAME}-rg \
--query principalId --output tsv)
# Create Role Assignment
az role assignment create --assignee $USER_PRINCIPAL --scope $REGISTRY_ID --role acrpull -ojsonc
```
### Build the loader image
Using the Azure Container Registry [Build Tasks](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-tasks-overview) capability the docker image can be built and uploaded directly by the ACR.
> Warning: This is a long running command and can take up to 20 minutes to complete.
```azurecli
# Create Dockerfile
cat <<"EOF" > dataload/Dockerfile
FROM alpine:latest as builder
RUN apk --update add --virtual build-dependencies --no-cache wget tar
RUN apk --update add libc6-compat ca-certificates
WORKDIR /download
RUN wget -O azcopyv10.tar.gz https://aka.ms/downloadazcopy-v10-linux && \
tar -xzvf azcopyv10.tar.gz --strip-components=1 && \
mv azcopy /usr/local/bin/azcopy && \
rm -rf azcopy* && rm NOTICE.txt && \
apk del build-dependencies
RUN azcopy copy 'https://azureingestiondata.blob.core.windows.net/tno-datasets' "." --recursive
RUN echo "**** Download TNO Data ****" && \
wget https://community.opengroup.org/osdu/platform/data-flow/data-loading/open-test-data/-/archive/Azure/M8/open-test-data-Azure-M8.tar.gz
RUN echo "**** Extract R3 Loader ****" && \
tar -xzvf open-test-data-Azure-M8.tar.gz --strip-components=2 open-test-data-Azure-M8/rc--3.0.0 && \
rm open-test-data-Azure-M8.tar.gz && \
chmod +x 6-data-load-scripts/scripts/*.sh
FROM ubuntu:20.04 as base
ARG USERNAME=app
ARG USER_UID=1000
ARG USER_GID=$USER_UID
RUN apt-get update \
&& apt-get -y --no-install-recommends install \
python3 \
python3-pip \
jq \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN ln -s /usr/bin/python3 /usr/bin/python
WORKDIR /app
COPY --from=builder /download ./
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
#
# [Optional] Add sudo support. Omit if you don't need to install software after connecting.
&& apt-get update \
&& apt-get install -y sudo \
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME \
&& chown -R $USERNAME:$USER_GID /app
USER $USERNAME
WORKDIR /app/2-scripts/load_manifest_scripts
RUN echo "**** Install Python Requirements ****" && \
pip install --upgrade pip && \
pip install -r requirements.txt
WORKDIR /app/6-data-load-scripts/azure
RUN echo "**** Install Python Requirements ****" && \
pip install -r requirements.txt
WORKDIR /app/6-data-load-scripts/scripts
ARG TENANT_ID
ARG NAME="platform"
ARG PARTITION="opendes"
ENV TENANT_ID=$TENANT_ID
ENV DATA_PARTITION=$NAME-$PARTITION
RUN echo "**** Generate Manifests ****" && \
./generate-manifests.sh -p ${DATA_PARTITION}
ENV PATH="/app/6-data-load-scripts/scripts:$PATH"
ENV MANIFESTS_REPO_DIRECTORY="../.."
ENV WORK_PRODUCT_DATASETS_DIRECTORY="../../tno-datasets/wpc-datasets"
ENV SEQUENCE_FILE="ingestion_sequence.json"
ENV DOMAIN="contoso.com"
ENV LOGIN_ENDPOINT="https://login.microsoftonline.com/$TENANT_ID/oauth2/v2.0/token"
ENV OSDU_ENDPOINT="https://$NAME.energy.azure.com"
ENV LEGAL_TAG="$DATA_PARTITION-legal-tag-load"
CMD /bin/bash
EOF
# Build the image
az acr build \
--registry $NAME \
--file dataload/Dockerfile \
--image data-load \
--build-arg TENANT_ID=$AZURE_TENANT \
--build-arg NAME=$NAME \
./dataload && rm dataload/Dockerfile
```
### Execute the loader image
Azure provides a way to run a containers on demand and is a great way for running automation tasks. The data loader image can now be deployed as an [Azure Container Instance](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview) which will submit the generated manifests to Project OAK forest.
> Warning: This is a long running command and can take up to 30 minutes to complete.
```azurecli
# Create Deploymentfile
cat <<EOF > dataload/deploy.yaml
apiVersion: 2021-10-01
location: $AZURE_LOCATION
name: $NAME-loader
identity:
type: UserAssigned
userAssignedIdentities:
$(az identity show --name ${NAME} --resource-group ${NAME}-rg --query id --output tsv): {}
properties:
imageRegistryCredentials:
- server: $NAME.azurecr.io
identity: $(az identity show --name ${NAME} --resource-group ${NAME}-rg --query id --output tsv)
containers:
- name: masterdata
properties:
command:
- data-loader.sh
environmentVariables:
- name: 'CLIENT_ID'
value: '$CLIENT_ID'
- name: 'CLIENT_SECRET'
secureValue: '$CLIENT_SECRET'
- name: 'REFRESH_TOKEN'
value: '$REFRESH_TOKEN'
image: $NAME.azurecr.io/data-load:latest
ports: []
resources:
requests:
cpu: 1.0
memoryInGB: 2
- name: filedata
properties:
command:
- wpc-datasets-loader.sh
environmentVariables:
- name: 'CLIENT_ID'
value: '$CLIENT_ID'
- name: 'CLIENT_SECRET'
secureValue: '$CLIENT_SECRET'
- name: 'REFRESH_TOKEN'
value: '$REFRESH_TOKEN'
image: $NAME.azurecr.io/data-load:latest
ports: []
resources:
requests:
cpu: 1.0
memoryInGB: 2
- name: wpcdata
properties:
command:
- data-loader.sh
environmentVariables:
- name: 'MANIFESTS_REPO_DIRECTORY'
value: '../../4-instances/TNO'
- name: 'SEQUENCE_FILE'
value: 'wpc_sequence.json'
- name: 'CLIENT_ID'
value: '$CLIENT_ID'
- name: 'CLIENT_SECRET'
secureValue: '$CLIENT_SECRET'
- name: 'REFRESH_TOKEN'
value: '$REFRESH_TOKEN'
image: $NAME.azurecr.io/data-load:latest
ports: []
resources:
requests:
cpu: 1.0
memoryInGB: 2
osType: Linux
restartPolicy: Never
tags: null
type: Microsoft.ContainerInstance/containerGroups
EOF
# Run the container instance
az container create \
--resource-group ${NAME}-rg \
--file dataload/deploy.yaml -ojsonc && rm dataload/deploy.yaml
# Tail the Log file
az container logs --name loader --resource-group ${NAME}-rg --follow
```
### Validate the loaded data
Now that data has been loaded, validation can be performed by requests to the [search API](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md). Search queries can be filtered on legal tag with a limiter of 1, then inspecting the response totalCount.
The following shell variables will be created:
| Name | Description |
| --------- | ----------- |
| ACCESS_TOKEN | The OAuth 2.0 Access Token. |
```azurecli
# Request Access Token
ACCESS_TOKEN=$(curl --request POST \
--url https://login.microsoftonline.com/${AZURE_TENANT}/oauth2/v2.0/token \
--header "content-type: application/x-www-form-urlencoded" \
--data "grant_type=refresh_token" \
--data "client_id=${CLIENT_ID}" \
--data "client_secret=$CLIENT_SECRET" \
--data "refresh_token=$REFRESH_TOKEN" \
--data "scope=${CLIENT_ID}/.default openid profile offline_access" |jq -r .access_token)
# Generic Files (Expect 10000)
curl --request POST \
--url https://${NAME}.energy.azure.com/api/search/v2/query \
--header "content-type: application/json" \
--header "authorization: Bearer ${ACCESS_TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data-raw "{
\"kind\": \"osdu:wks:dataset--File.Generic:1.0.0\",
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\",
\"offset\": 0,
\"limit\": 1
}" | jq
# Reference Data Manifests (Expect 92)
curl --request POST \
--url https://${NAME}.energy.azure.com/api/search/v2/query \
--header "content-type: application/json" \
--header "authorization: Bearer ${ACCESS_TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data-raw "{
\"kind\": \"osdu:wks:reference-data--*:1.0.0\",
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\",
\"offset\": 0,
\"limit\": 1
}" | jq
# Master Data Field (Expect 422)
curl --request POST \
--url https://${NAME}.energy.azure.com/api/search/v2/query \
--header "content-type: application/json" \
--header "authorization: Bearer ${ACCESS_TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data-raw "{
\"kind\": \"osdu:wks:master-data--Field:1.0.0\",
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\",
\"offset\": 0,
\"limit\": 1
}" | jq
# Master Data GeoPoliticalEntities (Expect 406)
curl --request POST \
--url https://${NAME}.energy.azure.com/api/search/v2/query \
--header "content-type: application/json" \
--header "authorization: Bearer ${ACCESS_TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data-raw "{
\"kind\": \"osdu:wks:master-data--GeoPoliticalEntity:1.0.0\",
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\",
\"offset\": 0,
\"limit\": 1
}" | jq
# Master Data Organisation (Expect 213)
curl --request POST \
--url https://${NAME}.energy.azure.com/api/search/v2/query \
--header "content-type: application/json" \
--header "authorization: Bearer ${ACCESS_TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data-raw "{
\"kind\": \"osdu:wks:master-data--Organisation:1.0.0\",
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\",
\"offset\": 0,
\"limit\": 1
}" | jq
# Master Data Well (Expect 4947)
curl --request POST \
--url https://${NAME}.energy.azure.com/api/search/v2/query \
--header "content-type: application/json" \
--header "authorization: Bearer ${ACCESSI_TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data-raw "{
\"kind\": \"osdu:wks:master-data--Well:1.0.0\",
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\",
\"offset\": 0,
\"limit\": 1
}" | jq
# Master Data Wellbore (Expect 6484)
curl --request POST \
--url https://${NAME}.energy.azure.com/api/search/v2/query \
--header "content-type: application/json" \
--header "authorization: Bearer ${ACCESS_TOKEN}" \
--header "data-partition-id: ${NAME}-${PARTITION}" \
--data-raw "{
\"kind\": \"osdu:wks:master-data--Wellbore:1.0.0\",
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\",
\"offset\": 0,
\"limit\": 1
}" | jq
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment