brentmcconnell/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Bootstrap Project

Setup Resource Group, service principal and keyvault.  The service principal is created and has Owner on the resource group as well as the ability to read and write to the storage account.  This can be done manually but the follow gist can be used to perform these actions automatically if executed by someone who has the ability to create service principals and resource groups.  See (https://gist.github.com/brentmcconnell/d1bb14d31ab69578c5d9ef816015ddda).
Example execution (no arguments are required but for instance -r is recommended as eastus is the default
./az-terraform-basic -g name_of_rg_to_create (optional) -r region (eastus by default).
After this program executes you will have a resource group that contains a storage account and a keyvault.  These will be used by Azure DevOps to execute pipelines using the service principal's credentials in the keyvault.
Steps performed by the script above.


Create Resource Group (take note of name will be used later)
Create SP with Owner on RG
Create KeyVault in RG (take note of name will be used later)
Create Storage Account in RG

Create a container called tfstate in SA


Set policy on KV to allow SP access to secrets
Load SP values within KV (used by Terraform).  If you want everything to go smoothly use the names suggested below, otherwise you will need to modify many files.

SP-CLIENTID
SP-PASSWORD
SP-TENANTID
SP-SUBSCRIPTIONID


Load Access Key into KV (used by Terraform)

SA-ACCESS_KEY


Create ADO Project

The ADO project is the interface between users and Azure.  In this step you'll need to create an ADO project, Service Connection to Azure and link to the values in the Azure keyvault.

Create an ADO project
Create Service Connection (Azure Resource Manager type) in Project settings with SP values from the original keyvault.

Service principal (manual)
Enter SP info taken from KeyVault

Scope “Subscription”
Subscription id = SP-SUBSCRIPTIONID
Subscription Name can be anything
Service Principal Id = SP-CLIENTID
Service principal key = SP-PASSWORD
Tenant ID = SP-TENANTID
Choose name for service connection.  Will be used later
Grant access to all pipelines = checked


Create Variable Group called SP_VALUES under Pipelines -> Library and link to KeyVault in RG.

Choose “Link secrets from an Azure key value as variables”
Use the Service Connection Created Above under Azure Subscription
Select the project key vault
Select +Add
Import all the values from the Keyvault
Select “Save”


Create Variable Group under Pipelines -> Library names AZURE_INFO

AZURE_LOCATION=Location for resources to be created
AZURE_RESOURCE_GROUP=Resource Group from step #1
AZURE_CLOUD=Public
AZURE_KEYVAULT_NAME=KV name created in step #1


Create VM Image for Workload


Import https://github.com/brentmcconnell/packer-img-sequencing to your ADO repo
Create New ADO Pipeline for packer-img-sequencing with source created above.  There is already a pipeline in the repo.

Ensure that environment variables in azure-pipelines.yml are pointing to KV values.  They should match if you created them with the names given above in #1.. Ensure that Library link in the pipeline is what you created above (ie. SP_VALUES, AZURE_INFO)
Run Pipeline.

This execution will create temporary Packer resources in the RG defined in step #2.  After the run you should have a new image in your RG.


Create the ADO Agent image


Import https://github.com/brentmcconnell/packer-img-ado-agent to an ADO repo in your project
Create New ADO Pipeline for packer-img-ado-agent with source created above

Ensure that environment variables in azure-pipelines.yml are pointing to KV values and AZURE_INFO groups.  SP_VALUES and AZURE_INFO are the default in the pipeline.
Run Pipeline


Create ADO VMSS


Import https://github.com/brentmcconnell/terraform-ado-vmss into your ADO project
Create new pipeline for terraform-ado-vmss

Modify azure-pipelines.yml during import

Change azureSubscription to name chosen for Service Connection when creating ADO project
Save Pipeline Only


Modify provider.tf with storage account name from above

Change storage_account_name


Modify terraform.tfvars values in repo

Change project-rg
Change agent-image using image that was created above by the packer-img-ado-agent pipeline
Change agent-image-rg


Run Pipeline


Create ADO agent that points to Azure VMSS


Project Settings -> Agent Pools and “Add pool”
New type -> virtual machine scale set
Pool type = Azure VMSS
Azure subscription = Name of Service Connection
Select “ado-vmss” from dropdown list
Select automatically tear down VMSS after use
Max 2
Number on standby 0
Grant access to all pipelines = checked (true)

Create Storage Account for Data


Import https://github.com/brentmcconnell/terraform-simple-sa to your ADO project
Modify azure-pipelines.yml

Change azureSubscription to name chosen for Service Connection when creating ADO project
Change agent section to vmss agent name created in ADO


Modify provider.tf with Storage account info info from above

Change storage_account_name


Modify terraform.tfvars values in repo

Change project-rg
Change location
Change prefix


Create new pipeline for terraform-simple-sa
Run Pipeline

Setup Workload job


Import https://github.com/brentmcconnell/terraform-fmc to your ADO project
Modify azure-pipelines.yml

Change “pool” to name of ADO agent
Change azureSubscription to name chosen for Service Connection when creating ADO project
Change default value for STORAGE_ACCOUNT


Modify iac/provider.tf with SA info from above or use generated provider.tf file from Step #1

Change storage_account_name


Modify iac/terraform.tfvars values in repo

Change prefix
Change project-rg
Change vmImage using image that was created by packer-img-sequencing
Change agent-vnet-rg
Change agent-vnet-name
Change datadisk_size_gb based on needs


Create new pipeline for terraform-fmc
Run Pipeline

Run Workload with Test Data


You'll need some test data to actually see it work.  The easiest and fastest is probably the ecoli dataset.
The storage account created above for storing the test data already should have 2 containers in it called "input" and "output".
Retrieve the ecoli data via curl -L -o pacbio.fastq http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq
Upload the pacbio.fastq to the input container.  I usually use a directory strucuture for input data like /ecoli/pacbio.fastq
Run the terraform-fmc pipeline.

In the popup ensure you select "main" for branch/tag.
Job prefix can be anything.  This is used on the work machine to create a directory structure to separate work if a VM is not destroyed after use.  This would allow you to for instance to execute different job types and not interfere with each other.
Make sure the storage account is the storage account used to hold the ecoli data
Input container should be "input" unless you've changed where the data is in the storage account
The input path is where you stored the pacbio.fastq file.  In my case I would use ecoli/*.fastq because I stored the pacbio.fastq file in a directory called ecoli in the container
Output container is the container you want the data to go into in the storage account
Output path is where you want the data to go once done processing in the container.  My my case I'll use ecoli/.
Size of genome can stay 5m
The checkboxes should be self explanatory
Run the job
At this point you should be able to view the logs in ADO to see what is happening.


Review the storage account and determine if you have data in the output container after the job completes