Last active
September 15, 2022 14:21
-
-
Save danielscholl/1766190b2f5318d1aae395c6d8ef077d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: Tutorial: Deploy Microsoft Energy Data Services and injest OSDU community open data using Azure CLI | |
description: In this tutorial, you'll learn how to create platform instance with the Azure CLI and load it with the OSDU community contributed DataSet | |
author: danielscholl | |
ms.author: dascholl | |
ms.service: energy | |
ms.topic: tutorial | |
ms.date: 9/15/2022 | |
ms.custom: template-tutorial | |
--- | |
# Tutorial: Deploy Microsoft Energy Data Services and injest OSDU community open data using Azure CLI. | |
## Overview and Prerequisites | |
In this tutorial, you'll learn how to create a data platform using the Azure CLI and ingest [Open Test Data](https://community.opengroup.org/osdu/platform/data-flow/data-loading/open-test-data). | |
This tutorial can be completed with the interactive experience offered through Azure Cloud Shell and knowledge of Azure CLI and bash scripts. | |
Use __ctrl-shift-v__ (__cmd-shift-v__ on macOS) to paste tutorial text into an [Azure Cloud Shell](https://docs.microsoft.com/en-us/azure/cloud-shell/overview) bash environment. | |
[![Launch Cloud Shell in a new window](https://docs.microsoft.com/en-us//cli/azure/media/cloud-shell-try-it/launch-cloud-shell.png)](https://shell.azure.com) | |
The tool [direnv](https://direnv.net/) allows for an easy way to load environment variables. To install in Cloud Shell run the following command. | |
```bash | |
# Export bin installation directory | |
export bin_dir=$HOME/.local/bin | |
# Install direnv | |
curl -sfL https://direnv.net/install.sh | bash | |
# Hook Installation | |
cat >> .bashrc <<EOF | |
eval "$(direnv hook bash)" | |
EOF | |
``` | |
You peform the following steps in this tutorial | |
* Register an Application. | |
* Deploy the data platform. | |
* Retrieve Tokens. | |
* Add User. | |
* Add Legal Tag. | |
* Author a dataload Dockerfile and deployment. | |
* Execute the dataload. | |
* Validate the dataload. | |
## Shell variables | |
Shell variables store values for future use and can be used to pass values to command parameters. Shell variables allow for the reuse of commands, both on their own and in scripts. This tutorial uses shell variables for easier customization of command parameters. | |
The following shell variables are required: | |
| Name | Description | | |
| --------- | ----------- | | |
| RANDOM_NUMBER | 4 digit unique value. | | |
| AZURE_TENANT | Azure Active Directory tenant ID. | | |
| AZURE_LOCATION | Azure region location that resources are deployed to. | | |
| NAME | Name to be used for the resources. | | |
| APPLICATION | Name for the application to be registered with the Microsoft Identity Platform. | | |
| PARTITION | Name of a data partition. | | |
```azurecli | |
# Create Shell Variables | |
RANDOM_NUMBER=$(echo $((RANDOM%9999+1000))) | |
AZURE_TENANT=$(az account show --query tenantId -otsv) | |
AZURE_LOCATION="eastus" | |
NAME="meds${RANDOM_NUMBER}" | |
APPLICATION="meds-${RANDOM_NUMBER}" | |
PARTITION="opendes" | |
# Persist Variables to .envrc file | |
cat >> .envrc <<EOF | |
NAME=${NAME} | |
DATA_PARTITION=${NAME}-${PARTITION} | |
AZURE_TENANT=${AZURE_TENANT} | |
EOF | |
``` | |
## Register an Application with the Microsoft Identity Platform | |
The Microsoft Identity Platform performs identity and access management (IAM) for registered applications. The data platform integrates natively with Azure Identity Management. An [application registration](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app) will make use of a delegated [User.Read](https://docs.microsoft.com/en-us/graph/api/user-get?view=graph-rest-1.0&tabs=http) API permission for the Microsoft Graph. | |
The following shell variables are created: | |
| Name | Description | | |
| --------- | ----------- | | |
| CLIENT_ID | The Application (client) ID. | | |
| CLIENT_SECRET | The Application (client) Secret. | | |
```azurecli | |
# Graph API User.Read (Delegated) | |
cat > manifest.json <<EOF | |
[ | |
{ | |
"resourceAppId": "00000003-0000-0000-c000-000000000000", | |
"resourceAccess": [ | |
{ | |
"id": "e1fe6dd8-ba31-4d61-89e7-88639da4683d", | |
"type": "Scope" | |
} | |
] | |
} | |
] | |
EOF | |
# Create Application | |
CLIENT_ID=$(az ad app create \ | |
--display-name ${APPLICATION}-app \ | |
--required-resource-accesses "manifest.json" \ | |
--query appId -otsv) && rm manifest.json | |
# Create Identity | |
az ad sp create --id $CLIENT_ID | |
# Create Secret | |
CLIENT_SECRET=$(az ad app credential reset --id $CLIENT_ID --query password -otsv) | |
# Persist Variables to .envrc file | |
cat >> .envrc <<EOF | |
CLIENT_ID=$CLIENT_ID | |
CLIENT_SECRET=$CLIENT_SECRET | |
EOF | |
``` | |
## Deploy the data platform | |
The data platform is a collection of microservice APIs deployed as a managed service offering compliant with the [Open Subsurface Data Universe](https://community.opengroup.org/osdu). | |
> Warning: This is a long running command and will take about 1 hour to complete. | |
```azurecli | |
# Create Resource Group | |
az group create \ | |
--name ${NAME}-rg \ | |
--location $AZURE_LOCATION -ojsonc | |
# Create Platform | |
az resource create \ | |
--name ${NAME} \ | |
--resource-group ${NAME}-rg \ | |
--location $AZURE_LOCATION \ | |
--resource-type Microsoft.OpenEnergyPlatform/energyservices \ | |
--is-full-object \ | |
--properties "{ \"location\": \"$AZURE_LOCATION\", \ | |
\"properties\": { \ | |
\"authAppId\": \"$CLIENT_ID\", \ | |
\"publicNetworkAccess\": \"Enabled\", \ | |
\"dataPartitionNames\": [{ \"name\": \"$PARTITION\" }] \ | |
} \ | |
}" -ojsonc | |
``` | |
## Get an API access token | |
The APIs can be accessed using the identity of the registered application. To accomplish this, we will perform a curl request using the [OAuth 2.0 client credentials flow](https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-client-creds-grant-flow#request-an-access-token). This token now represents the identity for the data platform service account. | |
The following shell variables will be created: | |
| Name | Description | | |
| --------- | ----------- | | |
| TOKEN | The service account bearer token. | | |
```azurecli | |
# Get Bearer Token | |
TOKEN=$(curl --request POST \ | |
--url https://login.microsoftonline.com/${AZURE_TENANT}/oauth2/v2.0/token \ | |
--header "content-type: application/x-www-form-urlencoded" \ | |
--data "grant_type=client_credentials" \ | |
--data "client_id=${CLIENT_ID}" \ | |
--data "client_secret=${CLIENT_SECRET}" \ | |
--data "scope=${CLIENT_ID}/.default" |jq -r .access_token) | |
``` | |
## Add a user to the data platform | |
The data platform has native integration with [Azure Identity Management](https://docs.microsoft.com/en-us/azure/security/fundamentals/identity-management-overview) for authentication. However, authorization checks are performed within the data platform. The service account identity token is used to add a user to the data platform and assign it the owner role. | |
```azurecli | |
# Retrieve the object ID (oid) for the signed in user | |
AZURE_USER=$(az ad signed-in-user show --query id -otsv) | |
# Add user to Project Oak Forest | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/entitlements/v2/groups/users@${NAME}-${PARTITION}.dataservices.energy/members \ | |
--header "accept: application/json" \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data "{\"email\": \"${AZURE_USER}\",\"role\": \"MEMBER\"}" | jq | |
# Assign owner role to user | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/entitlements/v2/groups/users.datalake.ops@${NAME}-${PARTITION}.dataservices.energy/members \ | |
--header "accept: application/json" \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data "{\"email\": \"${AZURE_USER}\",\"role\": \"MEMBER\"}" | jq | |
# List user permissions (Optional) | |
curl --request GET \ | |
--url "https://${NAME}.energy.azure.com/api/entitlements/v2/members/${AZURE_USER}/groups?type=none" \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" | jq | |
``` | |
## Authenticate a user | |
The data platform supports [OpenID Connect authentication](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/auth-oidc) with Azure Active Directory. OIDC enables apps to securely acquire an access token that can be used to access resources secured by the Microsoft Identity platform. Apps can refresh tokens to get other access tokens for the signed in user. | |
This section will assist in setting up a [Storage Account static web site](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-static-website) to host a web page and obtain [OAuth 2.0 Authorization Codes](https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-auth-code-flow). | |
The following shell variables will be created: | |
| Name | Description | | |
| --------- | ----------- | | |
| WEB_ENDPOINT | The FQDN of the Static Web Site. | | |
| AUTH_CODE | The OAuth 2.0 Authorization Code. | | |
| REFRESH_TOKEN | The OAuth 2.0 Refresh Token. | | |
```azurecli | |
# Create Storage Account | |
az storage account create \ | |
--name ${NAME} \ | |
--resource-group ${NAME}-rg \ | |
--location $AZURE_LOCATION \ | |
--sku Standard_LRS -ojsonc | |
# Configure for Static Web Site | |
az storage blob service-properties update \ | |
--account-name ${NAME} \ | |
--static-website \ | |
--404-document error.html \ | |
--index-document index.html -ojsonc | |
# Retrieve the endpoint | |
WEB_ENDPOINT=$(az storage account show \ | |
--name ${NAME} \ | |
--resource-group ${NAME}-rg \ | |
--query "primaryEndpoints.web" \ | |
--output tsv) | |
``` | |
__Create Web Page__ | |
```azurecli | |
# Working Directory | |
mkdir -p ./dataload | |
# Create web page | |
cat > dataload/index.html << EOF | |
<!DOCTYPE html> | |
<html lang="en"> | |
<head> | |
<meta charset="utf-8"> | |
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> | |
<meta http-equiv="x-ua-compatible" content="ie=edge"> | |
<title>OAuth Tool</title> | |
<!-- Font Awesome --> | |
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.2/css/all.css"> | |
<!-- Bootstrap core CSS --> | |
<link href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" rel="stylesheet"> | |
<style type="text/css"> | |
textarea { | |
margin-bottom: 5px; | |
} | |
</style> | |
</head> | |
<body> | |
<div id="app"> | |
<form id="oauth-form" @submit.prevent="getToken" class="text-center border border-light p-5" action="#!"> | |
<p class="h2 mb-4">Microsoft Identity Platform (v2.0)</p> | |
<div class="form-group"> | |
<div class="form-row"> | |
<div class="form-group col-md-6"> | |
<label class="float-left" for="clientId">ClientId</label> | |
<input type="text" class="form-control" id="clientId" v-model="clientId"> | |
</div> | |
<div class="form-group col-md-6"> | |
<label class="float-left" for="tenantId">TenantId</label> | |
<input type="text" class="form-control" id="tenantId" v-model="tenantId"> | |
</div> | |
</div> | |
<div class="form-row"> | |
<div class="col-md-6 mb-3"> | |
<label class="float-left" for="responseType">RedirectUrl</label> | |
<input type="text" class="form-control" id="redirectUrl" v-model="redirectUrl"> | |
</div> | |
<div class="col-md-3 mb-3"> | |
<label class="float-left" for="responseType">ResponseType</label> | |
<input type="text" class="form-control" id="responseType" v-model="responseType"> | |
</div> | |
<div class="col-md-3 mb-3"> | |
<label class="float-left" for="responseMode">ResponseMode</label> | |
<input type="text" class="form-control" id="responseMode" v-model="responseMode"> | |
</div> | |
</div> | |
<div class="form-row"> | |
<div class="col-md-12 mb-3"> | |
<label class="float-left" for="scope">Scope</label> | |
<input type="text" class="form-control" id="scope" v-model="scope"> | |
</div> | |
</div> | |
<div class="form-row"> | |
<div class="col-md-12 mb-1"> | |
<a :href="signOutUrl" class="btn btn-link float-left" v-if="authorizationCode" class="col-2">Logout</a> | |
<a :href="authorizeUrl" class="btn btn-primary float-right" class="col-2">Authorize</a> | |
</div> | |
</div> | |
</div> | |
<hr /> | |
<div class="form-group shadow-textarea"> | |
<div class="form-row"> | |
<div class="col-md-4 mb-4"> | |
<label class="float-left" for="accessToken">ResponseType: code</label> | |
<textarea id="authorizationCode" v-model="authorizationCode" class="form-control z-depth-1" | |
rows="10"></textarea> | |
</div> | |
<div class="col-md-4 mb-4"> | |
<label class="float-left" for="accessToken">ResponseType: token</label> | |
<textarea id="accessToken" v-model="accessToken" class="form-control z-depth-1" rows="10"></textarea> | |
<a @click="decodeAccess()" class="btn btn-info btn-sm float-right" v-if="accessToken">Decode</a> | |
</div> | |
<div class="col-md-4 mb-4"> | |
<label class="float-left" for="idToken">ResponseType: id_token</label> | |
<textarea id="idToken" v-model="idToken" class="form-control z-depth-1" rows="10"></textarea> | |
<a @click="decodeId()" class="btn btn-info btn-sm float-right" v-if="idToken">Decode</a> | |
</div> | |
</div> | |
</div> | |
</form> | |
</div> | |
<!-- SCRIPTS --> | |
<!-- JQuery --> | |
<script type="text/javascript" src="https://code.jquery.com/jquery-3.2.1.slim.min.js"></script> | |
<!-- Bootstrap core JavaScript --> | |
<script type="text/javascript" src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js"></script> | |
<!-- Vue JavaScript --> | |
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/vue/dist/vue.js"></script> | |
<script> | |
var app = new Vue({ | |
el: '#oauth-form', | |
data: { | |
tenantId: '${AZURE_TENANT}', | |
clientId: '${CLIENT_ID}', | |
redirectUrl: '${WEB_ENDPOINT}', | |
responseType: 'code', | |
responseMode: 'fragment', | |
scope: '${CLIENT_ID}/.default openid profile offline_access', | |
authorizationCode: null, | |
accessToken: null, | |
idToken: null | |
}, | |
computed: { | |
authorizeUrl: function () { | |
return "https://login.microsoftonline.com/" + this.tenantId + | |
"/oauth2/v2.0/authorize?" + | |
"client_id=" + this.clientId + | |
"&response_type=" + this.responseType + | |
"&redirect_uri=" + this.redirectUrl + | |
"&response_mode=" + this.responseMode + | |
"&scope=" + this.scope + | |
"&state=12345" + | |
"&nonce=dummy123"; | |
}, | |
decodeId: function () { | |
decodeUrl = "https://jwt.ms/#id_token=" + this.idToken; | |
window.open(decodeUrl, "_blank"); | |
}, | |
decodeAccess: function () { | |
decodeUrl = "https://jwt.ms/#id_token=" + this.accessToken; | |
window.open(decodeUrl, "_blank"); | |
}, | |
signOutUrl: function () { | |
return "https://login.microsoftonline.com/" + this.clientId + "/oauth2/logout"; | |
}, | |
redirect: function () { | |
return location.protocol + '//' + location.host + location.pathname | |
} | |
}, | |
beforeMount: function () { | |
if (window.location.hash) { | |
var params = window.location.hash.substr(1).split('&').reduce(function (result, item) { | |
var parts = item.split('='); | |
result[parts[0]] = parts[1]; | |
return result; | |
}, {}); | |
this.authorizationCode = params['code']; | |
this.accessToken = params['access_token']; | |
this.idToken = params['id_token']; | |
} | |
} | |
}) | |
</script> | |
</body> | |
</html> | |
EOF | |
# Publish web page | |
az storage blob upload-batch \ | |
--account-name ${NAME} \ | |
--source dataload \ | |
--destination '$web' -ojsonc && rm -rf dataload/index.html | |
``` | |
__Update Application Registration__ | |
A [redirect URI](https://docs.microsoft.com/en-us/azure/active-directory/develop/reply-url), or reply URL, is the location where the authorization server sends the user once the app has been successfully authorized and granted an authorization code or access token. The authorization server sends the code or token to the redirect URI, so it's important to register the correct location as part of the app registration. | |
```azurecli | |
# Update app registration | |
az ad app update --id $CLIENT_ID --web-redirect-uris $WEB_ENDPOINT -ojsonc | |
# Display Web Endpoint | |
echo $WEB_ENDPOINT | |
``` | |
__Retrieve Authorization Code__ | |
The website will now perform the function of requesting authorization codes for users. Using the website, click the authorize button and a refresh token should be returned. Copy the refresh token to the clipboard and save it as a variable. | |
> Note: Authorization Codes can only be used 1 time and typically have a life cycle of about 60 seconds. | |
```azurecli | |
AUTH_CODE="<your_code>" | |
``` | |
__Retrieve Refresh Token__ | |
Now that an authorization code has been acquired this code can be redemed for a refresh token. Refresh tokens have a much longer lifecycle of about 90 days and can be used to create access tokens which have a lifecycle of about 1 hour. | |
```azurecli | |
# Request a refresh token | |
REFRESH_TOKEN=$(curl --request POST \ | |
--url https://login.microsoftonline.com/${AZURE_TENANT}/oauth2/v2.0/token \ | |
--header "content-type: application/x-www-form-urlencoded" \ | |
--data "grant_type=authorization_code" \ | |
--data "redirect_uri=${WEB_ENDPOINT}" \ | |
--data "client_id=${CLIENT_ID}" \ | |
--data "client_secret=${CLIENT_SECRET}" \ | |
--data "scope=${CLIENT_ID}/.default openid profile offline_access" \ | |
--data code=${AUTH_CODE} | jq -r .refresh_token) | |
# Persist Variables to .envrc file | |
cat >> .envrc <<EOF | |
REFRESH_TOKEN=$REFRESH_TOKEN | |
EOF | |
``` | |
## Add a Legal Tag in the data platform. | |
A [legal tag](https://community.opengroup.org/osdu/platform/security-and-compliance/legal/-/blob/master/docs/tutorial/ComplianceService.md#what-is-a-legaltag) represents the legal status of data in the platform. It is a collection of properties that governs how the data can be consumed and ingested and is required for data ingestion. | |
The following shell variables will be created: | |
| Name | Description | | |
| --------- | ----------- | | |
| ACCESS_TOKEN | The OAuth 2.0 Access Token. | | |
```azurecli | |
# Request Access Token | |
ACCESS_TOKEN=$(curl --request POST \ | |
--url https://login.microsoftonline.com/${AZURE_TENANT}/oauth2/v2.0/token \ | |
--header "content-type: application/x-www-form-urlencoded" \ | |
--data "grant_type=refresh_token" \ | |
--data "client_id=${CLIENT_ID}" \ | |
--data "client_secret=$CLIENT_SECRET" \ | |
--data refresh_token=$REFRESH_TOKEN \ | |
--data "scope=${CLIENT_ID}/.default openid profile offline_access" |jq -r .access_token) | |
# Create Legal Tag | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/legal/v1/legaltags \ | |
--header "accept: application/json" \ | |
--header "authorization: Bearer ${TOKEN}" \ | |
--header "content-type: application/json" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data-raw '{ | |
"name": "legal-tag-load", | |
"description": "Open Data Set Legal Contract", | |
"properties": { | |
"contractId": "A1234", | |
"countryOfOrigin": ["US"], | |
"dataType": "Public Domain Data", | |
"expirationDate": "2099-01-25", | |
"exportClassification": "EAR99", | |
"originator": "Contoso", | |
"personalData": "No Personal Data", | |
"securityClassification": "Public" | |
} | |
}' | |
# Retrieve Legal Tag (Optional) | |
curl --request GET \ | |
--url "https://${NAME}.energy.azure.com/api/legal/v1/legaltags/${NAME}-${PARTITION}-legal-tag-load" \ | |
--header "authorization: Bearer ${TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" |jq | |
``` | |
## Ingest the Test Data | |
The community open test data set includes a collection of python scripts that can be used to submit jobs to the [ingestion workflow](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow#introduction). The data set is large and has specific requirements for installation to run. A custom docker image will be built for this Project Oak Forest configured to generate the manifests. This image will be built and stored in a container registry along with a [managed identity](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication-managed-identity) to be used for image retrieval. | |
The following shell variables will be created: | |
| Name | Description | | |
| --------- | ----------- | | |
| REGISTRY_ID | The resource id of the Container Registry. | | |
| USER_PRINCIPAL | The principal id of the managed user identity. | | |
```azurecli | |
# Create Registry | |
REGISTRY_ID=$(az acr create \ | |
--name ${NAME} \ | |
--resource-group ${NAME}-rg \ | |
--sku Basic \ | |
--query id --output tsv) | |
# Create Managed Identity | |
USER_PRINCIPAL=$(az identity create \ | |
--name ${NAME} \ | |
--resource-group ${NAME}-rg \ | |
--query principalId --output tsv) | |
# Create Role Assignment | |
az role assignment create --assignee $USER_PRINCIPAL --scope $REGISTRY_ID --role acrpull -ojsonc | |
``` | |
### Build the loader image | |
Using the Azure Container Registry [Build Tasks](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-tasks-overview) capability the docker image can be built and uploaded directly by the ACR. | |
> Warning: This is a long running command and can take up to 20 minutes to complete. | |
```azurecli | |
# Create Dockerfile | |
cat <<"EOF" > dataload/Dockerfile | |
FROM alpine:latest as builder | |
RUN apk --update add --virtual build-dependencies --no-cache wget tar | |
RUN apk --update add libc6-compat ca-certificates | |
WORKDIR /download | |
RUN wget -O azcopyv10.tar.gz https://aka.ms/downloadazcopy-v10-linux && \ | |
tar -xzvf azcopyv10.tar.gz --strip-components=1 && \ | |
mv azcopy /usr/local/bin/azcopy && \ | |
rm -rf azcopy* && rm NOTICE.txt && \ | |
apk del build-dependencies | |
RUN azcopy copy 'https://azureingestiondata.blob.core.windows.net/tno-datasets' "." --recursive | |
RUN echo "**** Download TNO Data ****" && \ | |
wget https://community.opengroup.org/osdu/platform/data-flow/data-loading/open-test-data/-/archive/Azure/M8/open-test-data-Azure-M8.tar.gz | |
RUN echo "**** Extract R3 Loader ****" && \ | |
tar -xzvf open-test-data-Azure-M8.tar.gz --strip-components=2 open-test-data-Azure-M8/rc--3.0.0 && \ | |
rm open-test-data-Azure-M8.tar.gz && \ | |
chmod +x 6-data-load-scripts/scripts/*.sh | |
FROM ubuntu:20.04 as base | |
ARG USERNAME=app | |
ARG USER_UID=1000 | |
ARG USER_GID=$USER_UID | |
RUN apt-get update \ | |
&& apt-get -y --no-install-recommends install \ | |
python3 \ | |
python3-pip \ | |
jq \ | |
&& apt-get clean \ | |
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* | |
RUN ln -s /usr/bin/python3 /usr/bin/python | |
WORKDIR /app | |
COPY --from=builder /download ./ | |
RUN groupadd --gid $USER_GID $USERNAME \ | |
&& useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \ | |
# | |
# [Optional] Add sudo support. Omit if you don't need to install software after connecting. | |
&& apt-get update \ | |
&& apt-get install -y sudo \ | |
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \ | |
&& chmod 0440 /etc/sudoers.d/$USERNAME \ | |
&& chown -R $USERNAME:$USER_GID /app | |
USER $USERNAME | |
WORKDIR /app/2-scripts/load_manifest_scripts | |
RUN echo "**** Install Python Requirements ****" && \ | |
pip install --upgrade pip && \ | |
pip install -r requirements.txt | |
WORKDIR /app/6-data-load-scripts/azure | |
RUN echo "**** Install Python Requirements ****" && \ | |
pip install -r requirements.txt | |
WORKDIR /app/6-data-load-scripts/scripts | |
ARG TENANT_ID | |
ARG NAME="platform" | |
ARG PARTITION="opendes" | |
ENV TENANT_ID=$TENANT_ID | |
ENV DATA_PARTITION=$NAME-$PARTITION | |
RUN echo "**** Generate Manifests ****" && \ | |
./generate-manifests.sh -p ${DATA_PARTITION} | |
ENV PATH="/app/6-data-load-scripts/scripts:$PATH" | |
ENV MANIFESTS_REPO_DIRECTORY="../.." | |
ENV WORK_PRODUCT_DATASETS_DIRECTORY="../../tno-datasets/wpc-datasets" | |
ENV SEQUENCE_FILE="ingestion_sequence.json" | |
ENV DOMAIN="contoso.com" | |
ENV LOGIN_ENDPOINT="https://login.microsoftonline.com/$TENANT_ID/oauth2/v2.0/token" | |
ENV OSDU_ENDPOINT="https://$NAME.energy.azure.com" | |
ENV LEGAL_TAG="$DATA_PARTITION-legal-tag-load" | |
CMD /bin/bash | |
EOF | |
# Build the image | |
az acr build \ | |
--registry $NAME \ | |
--file dataload/Dockerfile \ | |
--image data-load \ | |
--build-arg TENANT_ID=$AZURE_TENANT \ | |
--build-arg NAME=$NAME \ | |
./dataload && rm dataload/Dockerfile | |
``` | |
### Execute the loader image | |
Azure provides a way to run a containers on demand and is a great way for running automation tasks. The data loader image can now be deployed as an [Azure Container Instance](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview) which will submit the generated manifests to Project OAK forest. | |
> Warning: This is a long running command and can take up to 30 minutes to complete. | |
```azurecli | |
# Create Deploymentfile | |
cat <<EOF > dataload/deploy.yaml | |
apiVersion: 2021-10-01 | |
location: $AZURE_LOCATION | |
name: $NAME-loader | |
identity: | |
type: UserAssigned | |
userAssignedIdentities: | |
$(az identity show --name ${NAME} --resource-group ${NAME}-rg --query id --output tsv): {} | |
properties: | |
imageRegistryCredentials: | |
- server: $NAME.azurecr.io | |
identity: $(az identity show --name ${NAME} --resource-group ${NAME}-rg --query id --output tsv) | |
containers: | |
- name: masterdata | |
properties: | |
command: | |
- data-loader.sh | |
environmentVariables: | |
- name: 'CLIENT_ID' | |
value: '$CLIENT_ID' | |
- name: 'CLIENT_SECRET' | |
secureValue: '$CLIENT_SECRET' | |
- name: 'REFRESH_TOKEN' | |
value: '$REFRESH_TOKEN' | |
image: $NAME.azurecr.io/data-load:latest | |
ports: [] | |
resources: | |
requests: | |
cpu: 1.0 | |
memoryInGB: 2 | |
- name: filedata | |
properties: | |
command: | |
- wpc-datasets-loader.sh | |
environmentVariables: | |
- name: 'CLIENT_ID' | |
value: '$CLIENT_ID' | |
- name: 'CLIENT_SECRET' | |
secureValue: '$CLIENT_SECRET' | |
- name: 'REFRESH_TOKEN' | |
value: '$REFRESH_TOKEN' | |
image: $NAME.azurecr.io/data-load:latest | |
ports: [] | |
resources: | |
requests: | |
cpu: 1.0 | |
memoryInGB: 2 | |
- name: wpcdata | |
properties: | |
command: | |
- data-loader.sh | |
environmentVariables: | |
- name: 'MANIFESTS_REPO_DIRECTORY' | |
value: '../../4-instances/TNO' | |
- name: 'SEQUENCE_FILE' | |
value: 'wpc_sequence.json' | |
- name: 'CLIENT_ID' | |
value: '$CLIENT_ID' | |
- name: 'CLIENT_SECRET' | |
secureValue: '$CLIENT_SECRET' | |
- name: 'REFRESH_TOKEN' | |
value: '$REFRESH_TOKEN' | |
image: $NAME.azurecr.io/data-load:latest | |
ports: [] | |
resources: | |
requests: | |
cpu: 1.0 | |
memoryInGB: 2 | |
osType: Linux | |
restartPolicy: Never | |
tags: null | |
type: Microsoft.ContainerInstance/containerGroups | |
EOF | |
# Run the container instance | |
az container create \ | |
--resource-group ${NAME}-rg \ | |
--file dataload/deploy.yaml -ojsonc && rm dataload/deploy.yaml | |
# Tail the Log file | |
az container logs --name loader --resource-group ${NAME}-rg --follow | |
``` | |
### Validate the loaded data | |
Now that data has been loaded, validation can be performed by requests to the [search API](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md). Search queries can be filtered on legal tag with a limiter of 1, then inspecting the response totalCount. | |
The following shell variables will be created: | |
| Name | Description | | |
| --------- | ----------- | | |
| ACCESS_TOKEN | The OAuth 2.0 Access Token. | | |
```azurecli | |
# Request Access Token | |
ACCESS_TOKEN=$(curl --request POST \ | |
--url https://login.microsoftonline.com/${AZURE_TENANT}/oauth2/v2.0/token \ | |
--header "content-type: application/x-www-form-urlencoded" \ | |
--data "grant_type=refresh_token" \ | |
--data "client_id=${CLIENT_ID}" \ | |
--data "client_secret=$CLIENT_SECRET" \ | |
--data "refresh_token=$REFRESH_TOKEN" \ | |
--data "scope=${CLIENT_ID}/.default openid profile offline_access" |jq -r .access_token) | |
# Generic Files (Expect 10000) | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/search/v2/query \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${ACCESS_TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data-raw "{ | |
\"kind\": \"osdu:wks:dataset--File.Generic:1.0.0\", | |
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\", | |
\"offset\": 0, | |
\"limit\": 1 | |
}" | jq | |
# Reference Data Manifests (Expect 92) | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/search/v2/query \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${ACCESS_TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data-raw "{ | |
\"kind\": \"osdu:wks:reference-data--*:1.0.0\", | |
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\", | |
\"offset\": 0, | |
\"limit\": 1 | |
}" | jq | |
# Master Data Field (Expect 422) | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/search/v2/query \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${ACCESS_TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data-raw "{ | |
\"kind\": \"osdu:wks:master-data--Field:1.0.0\", | |
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\", | |
\"offset\": 0, | |
\"limit\": 1 | |
}" | jq | |
# Master Data GeoPoliticalEntities (Expect 406) | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/search/v2/query \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${ACCESS_TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data-raw "{ | |
\"kind\": \"osdu:wks:master-data--GeoPoliticalEntity:1.0.0\", | |
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\", | |
\"offset\": 0, | |
\"limit\": 1 | |
}" | jq | |
# Master Data Organisation (Expect 213) | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/search/v2/query \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${ACCESS_TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data-raw "{ | |
\"kind\": \"osdu:wks:master-data--Organisation:1.0.0\", | |
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\", | |
\"offset\": 0, | |
\"limit\": 1 | |
}" | jq | |
# Master Data Well (Expect 4947) | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/search/v2/query \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${ACCESSI_TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data-raw "{ | |
\"kind\": \"osdu:wks:master-data--Well:1.0.0\", | |
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\", | |
\"offset\": 0, | |
\"limit\": 1 | |
}" | jq | |
# Master Data Wellbore (Expect 6484) | |
curl --request POST \ | |
--url https://${NAME}.energy.azure.com/api/search/v2/query \ | |
--header "content-type: application/json" \ | |
--header "authorization: Bearer ${ACCESS_TOKEN}" \ | |
--header "data-partition-id: ${NAME}-${PARTITION}" \ | |
--data-raw "{ | |
\"kind\": \"osdu:wks:master-data--Wellbore:1.0.0\", | |
\"query\": \"legal.legaltags:${NAME}-${PARTITION}-legal-tag-load\", | |
\"offset\": 0, | |
\"limit\": 1 | |
}" | jq | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment