Skip to content

Instantly share code, notes, and snippets.

@trungnt13
Last active December 21, 2023 15:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save trungnt13/77ba0c2b2c87626789893ae88c9b9979 to your computer and use it in GitHub Desktop.
Save trungnt13/77ba0c2b2c87626789893ae88c9b9979 to your computer and use it in GitHub Desktop.
[CheatSheet] Azure

Azure


Helloworld

  1. Install azure-cli: pip install azure-cli
  2. login: az login, to clear login: az logout or az account clear

If installed azure-cli via pip, run . az.completion.sh to enable auto completion.

Subscription

logical container represented a set of services and resources

  1. List subscription: az account list
  2. Set subscription: az account set --subscription "your subscription name"
  3. Show current subscription: az account show --list table|json|yaml

A resource is a single unit of computing, storage or network interfaces. A resource group is a logical container for Azure resources. A service is a higher-level concept represented a collection of related resources and functionality.

Services:

  • Azure Virtual Machines
  • Azure App Service
  • Azure SQL Database
  • Azure Storage

IoT

Overview of device management with Microsoft Azure IoT Hub | Microsoft Learn

First, you need to create an Azure IoT Hub instance and configure IoT Edge devices. Follow the official documentation to set up the IoT Hub and IoT Edge: Azure IoT Hub and IoT Edge setup

graph TD
    A[Plan] --> B[Provision]
    B --> C[Configure]
    C --> D[Monitor]
    D --> E[Retire]

    A --> A1[1. Create metadata]
    A --> A2[2. Group devices]
    A --> A3[3. Device twin to store metadata: \n tags and properties]

    B --> B1[1. Create flexible device \n identities and credentials]
    B --> B2[2. Report their capabilities and \n conditions through device twin]

    C --> C1[1. Changes and firmware updates to devices]
    C --> C2[2. `desired` and `direct methods` or \n `broadcast jobs`]

    D --> D1[1. Device twin to report real-time operational conditions]

    E --> E1[1. Device twin to maintain device info]
    E --> E2[2. IoT Hub registry for securely revoking \ndevice credentials and identities]

IoT Hub

graph LR
  A[Event Hubs] --> B[Functions]
  A --> C[Stream Analytics]
  A --> D[Time Series Insights]
  A --> E[Apache Spark]
  A --> F[Databricks]

Hub Query Language

Understand the Azure IoT Hub query language | Microsoft Learn

  • Columns: DeviceID, LastActivityTime
  • SELECT COUNT() as TotalNumber

"jobs" provide a way to execute operations on sets of devices.

SELECT <select_list>
  FROM <from_specification>
  [WHERE <filter_condition>]
  [GROUP BY <group_specification>]

SELECT * FROM devices.jobs
  WHERE devices.jobs.deviceId = 'device-0'

SELECT * FROM devices.jobs
  WHERE devices.jobs.deviceId = 'myDeviceId'
    AND devices.jobs.jobType = 'scheduleUpdateTwin'
    AND devices.jobs.status = 'completed'
    AND devices.jobs.createdTimeUtc > '2016-09-01'

SELECT properties.reported.telemetryConfig.status AS status,
  COUNT() AS numberOfDevices
FROM devices
GROUP BY properties.reported.telemetryConfig.status

SELECT DeviceId, LastActivityTime
FROM devices
WHERE status = 'enabled' AND connectionState = 'Disconnected'

SELECT COUNT() as totalNumberOfDevices FROM devices

SELECT * FROM devices
WHERE tags.location.region = 'US'

SELECT * FROM devices
WHERE properties.reported.connectivity IN ['wired', 'wifi']

SELECT * FROM devices
WHERE is_defined(properties.reported.connectivity)

SELECT * FROM devices.modules
  WHERE properties.reported.status = 'scanning'
  AND deviceId IN ['device1', 'device2']

Routing

  • "IoT Hub" > "Message routing" > "Add" > "Add endpoint" > "Add route" (Events Hub, Storage, Cosmos DB)
  • Builtin Event Hubs Endpoint will be disable if routing added.
  • Hierarchy: Create Endpoint (Storage, Cosmos, Hub, etc) > Create Route

Message to IoT Hub contains 3 parts:

  • System properties
  • Application properties: to send a timestamp from the device using the iothub-creation-time-utc property to record when the message was sent by the device.
  • Message body

Query language for routing Tutorial - Configure message routing | Microsoft Learn:

  • System properties: $contentType = 'application/json' or $iothub-connection-device-id = 'myDevice'
  • Application properties: test = 'true
  • Message body: $body.Weather.HistoricalData[0].Month = 'Feb'
  • Logic: $contentEncoding = 'UTF-8' AND processingPath = 'hot'
  • Twin: $twin.properties.desired.telemetryConfig.sendFrequency = '5m'
    • $twin.tags.deploymentLocation.floor = 1

Use base64 to read binary data in IoT Hub message body: json.loads(base64.b64decode(msg["Body"]).decode("utf-8"))

IoT Edge

Develop module for Linux devices using Azure IoT Edge tutorial | Microsoft Learn


Stream Analytics

Sure, here's a markdown table comparing the pros and cons of each Azure service alternative to Azure Stream Analytics:

Service Pros Cons
Azure Functions - Serverless architecture allows for automatic scaling and reduced costs
- Supports a variety of programming languages, including Python
- Can be used to perform real-time data processing and analytics
- Easy to set up and use
- Limited to processing small amounts of data
- Limited to processing data in response to events, rather than continuously
- Limited control over the underlying infrastructure
Azure Data Factory - Supports a variety of sources and destinations, including Azure Blob Storage, Azure SQL Database, and Azure Cosmos DB
- Provides a visual interface for designing and monitoring data pipelines
- Supports integration with other Azure services, such as Azure Databricks and Azure HDInsight
- Can be used to perform batch processing and scheduled data transfers
- Limited real-time data processing capabilities
- Limited control over the underlying infrastructure
- Can be complex to set up and configure
Azure Event Hubs - Highly scalable and can handle millions of events per second
- Supports various protocols, including AMQP, Kafka, and HTTP
- Can be used to ingest data from various sources, including IoT devices and applications
- Provides built-in support for event processing and streaming analytics
- Limited control over the underlying infrastructure
- Limited support for data transformation and enrichment
- Can be complex to set up and configure
Azure Databricks - Fully managed service that supports Apache Spark
- Provides a collaborative environment for data engineers, data scientists, and machine learning practitioners
- Provides built-in support for data transformation and machine learning
- Can be used to perform batch processing, real-time processing, and machine learning
- Can be expensive, especially for large data volumes
- Limited control over the underlying infrastructure
- Can be complex to set up and configure
Azure HDInsight - Fully managed service that supports various big data technologies, including Hadoop, Spark, and Hive
- Provides a flexible and scalable environment for processing and analyzing large datasets
- Provides built-in support for data transformation and machine learning
- Can be used for batch processing, real-time processing, and machine learning
- Can be expensive, especially for large data volumes
- Limited control over the underlying infrastructure
- Can be complex to set up and configure

Functions

Prerequisites: 1) install azure cli, 2) install azure functions core tools

Two types of bindings:

  • Trigger: Respond to events sent to an event hub event stream
  • Output binding: Write events to an event stream
Type Trigger Input binding Output binding
HTTP x
Timer x
Azure Queue Storage x x
Azure Service Bus topic x x
Azure Service Bus queue x x
Azure Cosmos DB x x x
Azure Blob Storage x x x
Azure Hub x x
brew tap azure/functions
brew install azure-functions-core-tools@4
# file: .zshrc
# rosetta terminal setup
if [ $(arch) = "i386" ]; then
    alias python="/usr/local/bin/python3"
    alias brew86='/usr/local/bin/brew'
    alias pyenv86="arch -x86_64 pyenv"
    alias func="/usr/local/Cellar/azure-functions-core-tools@4/4.0.4785/func"
fi

Function Project Structure

 <project_root>/
 | - .venv/ # used by local development.
 | - .vscode/
 | - function_app.py
 | - additional_functions.py
 | - tests/
 | | - test_my_function.py
 | - .funcignore #  ignore .vscode/ .venv/
 | - host.json #  configuration options that affect all functions in a function app instance. This file does get published to Azure
 | - local.settings.json #  store app settings and connection strings when it's running locally. This file doesn't get published to Azure
 | - requirements.txt
 | - Dockerfile

Function step-by-step

  1. Install Azure Functions Core Tools
  2. Create Virtual Environment: python -m venv .venv and source .venv/bin/activate
  3. func init LocalFunctionProj --python -m V2
    • Create a function in an existing project: func new --template "Http Trigger" --name MyHttpTrigger
    • func new --template "Azure Queue Storage Trigger" --name MyQueueTrigger
  4. cd LocalFunctionProj
  5. func templates list -l python:
    1. Azure Blob Storage trigger
    2. Azure Cosmos DB trigger
    3. Durable Functions activity
    4. Durable Functions entity
    5. Durable Functions HTTP starter
    6. Durable Functions orchestrator
    7. Azure Event Grid trigger
    8. Azure Event Hub trigger
    9. HTTP trigger
    10. Kafka output
    11. Kafka trigger
    12. Azure Queue Storage trigger
    13. RabbitMQ trigger
    14. Azure Service Bus Queue trigger
    15. Azure Service Bus Topic trigger
    16. Timer trigger
  6. (optional) Run the function locally
    1. start storage emulator: azurite. This is used when AzureWebJobsStorage setting in the local.settings.json project file is set to UseDevelopmentStorage=true
    2. Start function locally: func start (not support arm64)
    3. x86 emulation on ARM64
      1. Enable Rosetta in Terminal: open "Terminal" application from "Get Info" and "Open using Rosetta"
      2. make sure your shell is zsh
      3. Run command arch
      4. Reinstall all dependencies
  7. Create Azure resources for your function
    1. az login
    2. az config param-persist on
    3. az group create --name AzureFunctionsQuickstart-rg --location <REGION>
    4. az storage account create --name <STORAGE_NAME> --sku Standard_LRS
    5. az functionapp create --consumption-plan-location westeurope --runtime python --runtime-version 3.9 --functions-version 4 --name <APP_NAME> --os-type linux --storage-account <STORAGE_NAME>
  8. (optional) Get your storage connection strings
    1. Azure Portal
    2. "Storage accounts"
    3. "Settings"
    4. "Access keys"
    5. Copy the "Connection string"
  9. Deploy func azure functionapp publish <APP_NAME>
  10. Update app setting az functionapp config appsettings set --name <FUNCTION_APP_NAME> --resource-group <RESOURCE_GROUP_NAME> --settings AzureWebJobsFeatureFlags=EnableWorkerIndexing (v2 require AzureWebJobsFeatureFlags=EnableWorkerIndexing but alrady included with -m v2)
  11. Verify func azure functionapp logstream <APP_NAME> --browser
  12. Cleanup az group delete --name AzureFunctionsQuickstart-rg
  13. Kubernetes cluster: func kubernetes deploy --name <DEPLOYMENT_NAME> --registry <REGISTRY_USERNAME>
  14. Extensions
    • Install all extensions func extensions install
    • Specific extension func extensions install --package Microsoft.Azure.WebJobs.Extensions.Storage --version 5.0.0
  15. Monitor executions in Azure Functions
  16. Configure monitoring for Azure Functions
  17. Enable streaming logs:
    1. built-in: func azure functionapp logstream <FunctionAppName>
    2. live metrics: func azure functionapp logstream <FunctionAppName> --browser

Azure function core tools reference

Python developer reference for Azure Functions | Microsoft Learn

  1. func init <name> --python -m V2 [--dockerfile]
  2. func templates list -l python
  3. func new --name <name> --template <template> --language <language>
    • --authlevel <authlevel>: function, anonymous, admin (for HTTP trigger)
  4. func start: start local runtime host
  5. Function App:
    • func azure functionapp fetch-app-settings <APP_NAME>
    • func azure functionapp list-functions <app_name>
    • func azure functionapp logstream <APP_NAME> [--browser]: Connects the local cmd to streaming logs
    • func azure functionapp publish <FunctionAppName>
      • --additional-packages: List of packages to install when building native dependencies
      • --build [remove|local]: build action when deploying to a Linux function app
      • --list-ignored-files: Displays a list of files that are ignored during publishing, which is based on the .funcignore file.
      • --list-included-files: Displays a list of files that are published
      • --no-build: Project isn't built during publishing. For Python, pip install isn't performed.
      • --slot: Optional name of a specific slot to which to publish.
    • func azure storage fetch-connection-string <STORAGE_ACCOUNT_NAME>
  6. Deploy:
    • func kubernetes deploy [--max-replicas] [--min-replicas] [--name]
    • func kubernetes install
    • func kubernetes remove
  7. Setting:
    • func settings list
    • func settings add <name> <value> and func settings delete <SETTING_NAME>
    • func settings decrypt and func settings encrypt

Function v2 blue print

  • Break up the function app into modular components
  • Reusable APIs
# http_blueprint.py
import logging

import azure.functions as func

bp = func.Blueprint()

@bp.route(route="default_template")
def default_template(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')

    name = req.params.get('name')
    if not name:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            name = req_body.get('name')

    if name:
        return func.HttpResponse(
            f"Hello, {name}. This HTTP-triggered function "
            f"executed successfully.")
    else:
        return func.HttpResponse(
            "This HTTP-triggered function executed successfully. "
            "Pass a name in the query string or in the request body for a"
            " personalized response.",
            status_code=200
        )
# function_app.py
import azure.functions as func
from http_blueprint import bp

app = func.FunctionApp()

app.register_functions(bp)

Horizontal scaling for functions

graph LR
  A[Event Producers] -->|kafka| B[Azure Event Hubs]
  B -->|partition| C1[partition1]
  B -->|partition| C2[partition2]
  B -->|partition| C3[partition3]
  B -->|partition| C4[partition4]

  C1 -->|Consumer Group 1| D1[Function\nLease 1-2]
  C2 -->|Consumer Group 1| D1
  C3 -->|Consumer Group 1| D2[Function\nLease 2-4]
  C4 -->|Consumer Group 1| D2

Deploy and test your Azure Functions

Once you have implemented the Azure Functions for processing IoT data and exposing it via REST API, you can deploy them to Azure. Follow the official documentation to deploy your Python Azure Functions: Deploy Python Azure Functions

After deploying your Azure Functions, you can test them by sending data from your factory production machines to the IoT Hub and then calling the REST API to retrieve the data.

For example, you can use a tool like Postman to send HTTP requests to your REST API and verify that the data is being returned correctly.

By following these steps, you should be able to build IoT data for factory production machines using Azure IoT Edge, Azure IoT Hub, and Azure Functions in Python. The data will be saved to SQL at hourly intervals, and you can expose this SQL data using a REST API. If you have any questions or need further clarification, please let me know.

  1. Set up Azure resources: Create an Azure IoT Hub and an Azure SQL Database. Follow the official Azure documentation to learn how to set up these resources and configure them.

  2. Set up Azure IoT Edge: Install Azure IoT Edge on your production machines and configure it to send data to your Azure IoT Hub.

  3. Set up Azure Functions: Create an Azure Functions app and a function that triggers on an IoT Hub event. Use this function to process the data sent by the IoT Edge device and save it to the SQL Database.

  4. Expose SQL data using REST API: Create an Azure Function that exposes the data from the SQL Database using a REST API.

  5. Build and deploy your solution: Write Python code to implement the data processing logic in your Azure Functions and deploy the code to Azure.

To start developing your solution in Python, you can use the Azure SDK for Python. The SDK provides a set of Python libraries that allow you to interact with Azure services and resources from your Python code. You can use the SDK to manage Azure resources, send and receive data from IoT devices, and invoke Azure Functions.

The Azure SDK for Python can be installed using pip, like this:

pip install azure-iot-device azure-iothub-service-client azure-functions

Once you have the SDK installed, you can start writing Python code to interact with Azure services and resources. The official Azure documentation provides detailed guidance on how to use the Azure SDK for Python, as well as sample code and tutorials to help you get started.


Azure Stream Analytics

Azure Stream Analytic

Stream Analytics can connect to:

  • Azure Event Hubs and Azure IoT Hub for streaming data ingestion, and
  • Azure Blob storage to ingest historical data.
  • Job input can also include static or slow-changing reference data from Azure Blob storage or SQL Database that you can join to streaming data to perform lookup operations.

run on:

  • serverless cloud
  • run on IoT Edge for ultra-low latency analytics.

use for:

  • Dashboards for data visualization
  • Real-time alerts from temporal and spatial patterns or anomalies
  • Extract, Transform, Load (ETL)

Stream Analytics query language is consistent to the SQL language

  • simple data manipulation,
  • aggregation functions, and
  • complex geospatial functions
  • defining and invoking your own functions (only support Javascript and C#)
  • define function calls in the Azure Machine Learning

You can continue to use Stream Analytics by sending events to Event Hubs using the Event Hubs Kafka API without changing the event sender

Spark Structured Streaming + Databricks for Python

Property Description
EventProcessedUtcTime the event was processed.
EventEnqueuedUtcTime event was received by the IoT Hub.
PartitionId partition ID for the input adapter.
IoTHub.MessageId correlate two-way communication in IoT Hub.
IoTHub.CorrelationId message responses and feedback in IoT Hub.
IoTHub.ConnectionDeviceId The authentication ID used to send this message.
IoTHub.ConnectionDeviceGenerationId The generation ID of the authenticated device
IoTHub.EnqueuedTime message was received by the IoT Hub.

Stream Analytics Query Language Reference - Stream Analytics Query | Microsoft Learn

SELECT
    DeviceID,
    Location.Lat,
    Location.Long,
    SensorReadings.SensorMetadata.Version
FROM input
-------------------
SELECT input.Location.*
FROM input
-------------------
SELECT
    input.DeviceID,
    thresholds.SensorName
FROM input      -- stream input
JOIN thresholds -- reference data input
ON
    input.DeviceId = thresholds.DeviceId
WHERE
    GetRecordPropertyValue(input.SensorReadings, thresholds.SensorName) > thresholds.Value
    -- the where statement selects the property value coming from the reference data
-------------------
SELECT
    GetArrayElement(arrayField, 0) AS firstElement
FROM input
------------------- Select all array element as individual events. The APPLY operator together with the GetArrayElements built-in function extracts all array elements as individual events
SELECT
    arrayElement.ArrayIndex,
    arrayElement.ArrayValue
FROM input as event
CROSS APPLY GetArrayElements(event.arrayField) AS arrayElement
Function Name Description
AVG Calculates the average value of a numeric input over a specified time window.
COUNT Counts the number of input events over a specified time window.
Collect Collects input events into an array over a specified time window.
CollectTOP Collects the top N input events into an array over a specified time window.
MAX Finds the maximum value of a numeric input over a specified time window.
MIN Finds the minimum value of a numeric input over a specified time window.
Percentile_Cont Calculates the continuous percentile of a numeric input over a specified time window.
Percentile_Disc Calculates the discrete percentile of a numeric input over a specified time window.
STDEV Calculates the standard deviation of a numeric input over a specified time window.
STDEVP Calculates the population standard deviation of a numeric input over a specified time window.
SUM Calculates the sum of a numeric input over a specified time window.
TopOne Finds the top input event over a specified time window.
VAR Calculates the variance of a numeric input over a specified time window.
VARP Calculates the population variance of a numeric input over a specified time window.

Stream Analytics Using Reference Data

Use reference data for lookups in Azure Stream Analytics | Microsoft Learn

The LicensePlate data can join with a static dataset that has registration details to identify license plates that have expired

SELECT I1.EntryTime, I1.LicensePlate, I1.TollId, R.RegistrationId
FROM Input1 I1 TIMESTAMP BY EntryTime
JOIN Registration R
ON I1.LicensePlate = R.LicensePlate
WHERE R.Expired = '1'

Examples

Currently, Azure Stream Analytics (ASA) only supports inserting (appending) rows to SQL outputs (Azure SQL Databases, and Azure Synapse Analytics). This article discusses workarounds to enable UPDATE, UPSERT, or MERGE on SQL databases, with Azure Functions as the intermediary layer.

Detecting fraudulent calls using self-joint on "CallRecTime" (idea: if the same IMSI calls two different switches within 1-5 seconds, it is a fraudulent call)

 SELECT System.Timestamp AS WindowEnd, COUNT(*) AS FraudulentCalls
 INTO "MyPBIoutput"
 FROM "CallStream" CS1 TIMESTAMP BY CallRecTime
 JOIN "CallStream" CS2 TIMESTAMP BY CallRecTime
 ON CS1.CallingIMSI = CS2.CallingIMSI
 AND DATEDIFF(ss, CS1, CS2) BETWEEN 1 AND 5
 WHERE CS1.SwitchNum != CS2.SwitchNum
 GROUP BY TumblingWindow(Duration(second, 1))

Azure Bicep

Create Bicep files - Visual Studio Code - Azure Resource Manager | Microsoft Learn

infrastructure-as-code: instruction manual for your infrastructure. The manual details the end configuration of your resources and how to reach that configuration state.

Deploy bicep file

  1. Right-click the Bicep file inside the VSCode, and then select Deploy Bicep file.
  2. From the Select Resource Group listbox on the top, select Create new Resource Group.
  3. Enter exampleRG as the resource group name, and then press [ENTER].
  4. Select a location for the resource group, and then press [ENTER].
  5. From Select a parameter file, select None.
  6. Enter a unique storage account name, and then press [ENTER]. If you get an error message indicating the storage account is already taken, the storage name you provided is in use. Provide a name that is more likely to be unique.
  7. From Create parameters file from values used in this deployment?, select No.

Using Azure-cli:

az group create --name exampleRG --location eastus
az deployment group create --resource-group exampleRG --template-file main.bicep --parameters storageName=uniquename

Cleanup resources: az group delete --name exampleRG


Azure App Service (Web App)

  1. Make sure Docker image build and run in local successfully
  2. Create user-assigned managed identity (Azure Portal -> Managed Identities)
  3. Create container registry (Azure Portal -> Container registries)
  4. Go to resource -> Container registries -> Access keys -> Enable Admin user -> Copy Username and Password
  5. Push the image to Azure Container registry
    1. docker login strandaiapistaging.azurecr.io
    2. docker tag strand-linux strandaiapistaging.azurecr.io/stranddemo:latest
    3. docker push strandaiapistaging.azurecr.io/stranddemo:latest
  6. Authorize managed identity for your registry
    1. Container registry -> Access control (IAM) -> Role assignments -> Add role assignment
    2. Select ArcPull
    3. Member: select the managed identity you created
  7. Create web app: App Services -> Create -> Web App
  8. Configure the Web App:
    1. Web App -> Configuration -> Application settings -> New application setting -> WEBSITES_PORT: 8502
    2. Identity -> User assigned -> Select the managed identity you created
    3. Deployment Center -> Authentication -> Managed Identity -> Select the managed identity you created (also enable Continuous Deployment at the end)
docker rmi strandgptapiprd.azurecr.io/strandapi:latest
docker tag strand-prod-amd64 strandgptapiprd.azurecr.io/strandapi:latest
docker push strandgptapiprd.azurecr.io/strandapi:latest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment