Skip to content

Instantly share code, notes, and snippets.

@brentmcconnell
Last active October 7, 2021 14:37
Show Gist options
  • Save brentmcconnell/7472b38100b64350c221525ea682e400 to your computer and use it in GitHub Desktop.
Save brentmcconnell/7472b38100b64350c221525ea682e400 to your computer and use it in GitHub Desktop.
Setup ADO

Bootstrap Project

Setup Resource Group, service principal and keyvault. The service principal is created and has Owner on the resource group as well as the ability to read and write to the storage account. This can be done manually but the follow gist can be used to perform these actions automatically if executed by someone who has the ability to create service principals and resource groups. See (https://gist.github.com/brentmcconnell/d1bb14d31ab69578c5d9ef816015ddda).

Example execution (no arguments are required but for instance -r is recommended as eastus is the default ./az-terraform-basic -g name_of_rg_to_create (optional) -r region (eastus by default).

After this program executes you will have a resource group that contains a storage account and a keyvault. These will be used by Azure DevOps to execute pipelines using the service principal's credentials in the keyvault.

Steps performed by the script above.

  1. Create Resource Group (take note of name will be used later)
  2. Create SP with Owner on RG
  3. Create KeyVault in RG (take note of name will be used later)
  4. Create Storage Account in RG
    1. Create a container called tfstate in SA
  5. Set policy on KV to allow SP access to secrets
  6. Load SP values within KV (used by Terraform). If you want everything to go smoothly use the names suggested below, otherwise you will need to modify many files.
    1. SP-CLIENTID
    2. SP-PASSWORD
    3. SP-TENANTID
    4. SP-SUBSCRIPTIONID
  7. Load Access Key into KV (used by Terraform)
    1. SA-ACCESS_KEY

Create ADO Project

The ADO project is the interface between users and Azure. In this step you'll need to create an ADO project, Service Connection to Azure and link to the values in the Azure keyvault.

  1. Create an ADO project
  2. Create Service Connection (Azure Resource Manager type) in Project settings with SP values from the original keyvault.
    1. Service principal (manual)
    2. Enter SP info taken from KeyVault
      1. Scope “Subscription”
      2. Subscription id = SP-SUBSCRIPTIONID
      3. Subscription Name can be anything
      4. Service Principal Id = SP-CLIENTID
      5. Service principal key = SP-PASSWORD
      6. Tenant ID = SP-TENANTID
      7. Choose name for service connection. Will be used later
      8. Grant access to all pipelines = checked
  3. Create Variable Group called SP_VALUES under Pipelines -> Library and link to KeyVault in RG.
    1. Choose “Link secrets from an Azure key value as variables”
    2. Use the Service Connection Created Above under Azure Subscription
    3. Select the project key vault
    4. Select +Add
    5. Import all the values from the Keyvault
    6. Select “Save”
  4. Create Variable Group under Pipelines -> Library names AZURE_INFO
    1. AZURE_LOCATION=Location for resources to be created
    2. AZURE_RESOURCE_GROUP=Resource Group from step #1
    3. AZURE_CLOUD=Public
    4. AZURE_KEYVAULT_NAME=KV name created in step #1

Create VM Image for Workload

  1. Import https://github.com/brentmcconnell/packer-img-sequencing to your ADO repo
  2. Create New ADO Pipeline for packer-img-sequencing with source created above. There is already a pipeline in the repo.
    1. Ensure that environment variables in azure-pipelines.yml are pointing to KV values. They should match if you created them with the names given above in #1.. Ensure that Library link in the pipeline is what you created above (ie. SP_VALUES, AZURE_INFO)
    2. Run Pipeline.
      This execution will create temporary Packer resources in the RG defined in step #2. After the run you should have a new image in your RG.

Create the ADO Agent image

  1. Import https://github.com/brentmcconnell/packer-img-ado-agent to an ADO repo in your project
  2. Create New ADO Pipeline for packer-img-ado-agent with source created above
    1. Ensure that environment variables in azure-pipelines.yml are pointing to KV values and AZURE_INFO groups. SP_VALUES and AZURE_INFO are the default in the pipeline.
    2. Run Pipeline

Create ADO VMSS

  1. Import https://github.com/brentmcconnell/terraform-ado-vmss into your ADO project
  2. Create new pipeline for terraform-ado-vmss
    1. Modify azure-pipelines.yml during import
      1. Change azureSubscription to name chosen for Service Connection when creating ADO project
      2. Save Pipeline Only
    2. Modify provider.tf with storage account name from above
      1. Change storage_account_name
    3. Modify terraform.tfvars values in repo
      1. Change project-rg
      2. Change agent-image using image that was created above by the packer-img-ado-agent pipeline
      3. Change agent-image-rg
    4. Run Pipeline

Create ADO agent that points to Azure VMSS

  1. Project Settings -> Agent Pools and “Add pool”
  2. New type -> virtual machine scale set
  3. Pool type = Azure VMSS
  4. Azure subscription = Name of Service Connection
  5. Select “ado-vmss” from dropdown list
  6. Select automatically tear down VMSS after use
  7. Max 2
  8. Number on standby 0
  9. Grant access to all pipelines = checked (true)

Create Storage Account for Data

  1. Import https://github.com/brentmcconnell/terraform-simple-sa to your ADO project
  2. Modify azure-pipelines.yml
    1. Change azureSubscription to name chosen for Service Connection when creating ADO project
    2. Change agent section to vmss agent name created in ADO
  3. Modify provider.tf with Storage account info info from above
    1. Change storage_account_name
  4. Modify terraform.tfvars values in repo
    1. Change project-rg
    2. Change location
    3. Change prefix
  5. Create new pipeline for terraform-simple-sa
  6. Run Pipeline

Setup Workload job

  1. Import https://github.com/brentmcconnell/terraform-fmc to your ADO project
  2. Modify azure-pipelines.yml
    1. Change “pool” to name of ADO agent
    2. Change azureSubscription to name chosen for Service Connection when creating ADO project
    3. Change default value for STORAGE_ACCOUNT
  3. Modify iac/provider.tf with SA info from above or use generated provider.tf file from Step #1
    1. Change storage_account_name
  4. Modify iac/terraform.tfvars values in repo
    1. Change prefix
    2. Change project-rg
    3. Change vmImage using image that was created by packer-img-sequencing
    4. Change agent-vnet-rg
    5. Change agent-vnet-name
    6. Change datadisk_size_gb based on needs
  5. Create new pipeline for terraform-fmc
  6. Run Pipeline

Run Workload with Test Data

  1. You'll need some test data to actually see it work. The easiest and fastest is probably the ecoli dataset.
  2. The storage account created above for storing the test data already should have 2 containers in it called "input" and "output".
  3. Retrieve the ecoli data via curl -L -o pacbio.fastq http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq
  4. Upload the pacbio.fastq to the input container. I usually use a directory strucuture for input data like /ecoli/pacbio.fastq
  5. Run the terraform-fmc pipeline.
    • In the popup ensure you select "main" for branch/tag.
    • Job prefix can be anything. This is used on the work machine to create a directory structure to separate work if a VM is not destroyed after use. This would allow you to for instance to execute different job types and not interfere with each other.
    • Make sure the storage account is the storage account used to hold the ecoli data
    • Input container should be "input" unless you've changed where the data is in the storage account
    • The input path is where you stored the pacbio.fastq file. In my case I would use ecoli/*.fastq because I stored the pacbio.fastq file in a directory called ecoli in the container
    • Output container is the container you want the data to go into in the storage account
    • Output path is where you want the data to go once done processing in the container. My my case I'll use ecoli/.
    • Size of genome can stay 5m
    • The checkboxes should be self explanatory
    • Run the job
    • At this point you should be able to view the logs in ADO to see what is happening.
  6. Review the storage account and determine if you have data in the output container after the job completes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment