Skip to content

Instantly share code, notes, and snippets.

@kongou-ae
Last active October 15, 2018 13:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kongou-ae/5a16e31965ce71761ca2dda0a7565b25 to your computer and use it in GitHub Desktop.
Save kongou-ae/5a16e31965ce71761ca2dda0a7565b25 to your computer and use it in GitHub Desktop.
The alerts of Azure Stack

The alerts of Azure Stack 1808 update

BRPAlertTemplates

  • "Backup failed because of an unknown error"
    • "Critical"
    • "Infrastructure backup failed because of an unknown error."
  • "Backup failed because can't access backup share"
    • "Critical"
    • "Infrastructure backup failed because the backup file share is not accessible. This might be because of an authent ication issue, or access is denied by the external file server."
  • "Backup failed because backup file share path is not valid"
    • "Critical"
    • "Infrastructure backup failed because of an issue with the path of the external backup file share."
  • "Backup failed because of network connectivity issues"
    • "Critical"
    • "Infrastructure backup failed to complete because of an issue connecting to the external backup file share. This m ay be a temporary issue caused by a network outage, or some other infrastructure issue."
  • "Backup failed because file share is full"
    • "Critical"
    • "Infrastructure backup failed because the backup file share is out of capacity."
  • "Cannot write to the backup file share"
    • "Critical"
    • "The file share is accessible over the network, but infrastructure backup failed to write to the file share."
  • "Backup file share is not accessible"
    • "Critical"
    • "Infrastructure backup failed to access the file share over the network. This could be due to a problem with the b ackup account, the backup accounts permissions to the file share, or a general network issue."
  • "Backup failed during file copy to external share"
    • "Critical"
    • "Infrastructure backup failed because of an issue writing the backup files to the external backup file share. This may be a temporary issue caused by a network outage or some other issue with the infrastructure."
  • "Backup only partially completed because of an unexpected error"
    • "Critical"
    • "Infrastructure backup completed only for a subset of services. This might be a temporary issue with the backup co ntroller."
  • "Backup is not enabled for a location"
    • "Warning"
    • "Backup is not enabled for the location. If this is a production environment, make sure to enable backup."
  • "Infrastructure backups are paused"
    • "Warning"
    • "Backup scheduler is currently paused so infrastructure backups have not been created in the past 24 hours. This w arning will appear every 24 hours until the issue is resolved."
  • "The scheduled backup was skipped due to a conflict with failed operations"
    • "Warning"
    • "The infrastructure backup process did not start because another operation failed to complete. Automatic backups c annot start if another process has not completed. Examples of operations that must complete before an automatic backup can start include an Azure Stack update, secrets rotation, or a field replaceable unit."
  • "Backup could not be deleted"
    • "Warning"
    • "Infrastructure backup failed to delete backup data from the backup location. The backup data that was not deleted has the following IDs: {FailedToDeleteBackupIds}. The backup location is accessible over the network. This issue coul d be due to a problem with the permissions of the backup account, the permissions set on the files on the backup locat ion, or due to locked files."
  • "The backup file share is almost full"
    • "Warning"
    • "The backup file share is at {ExternalShareCapacityThreshold} utilization. If you don't free up space, future infr astructure backups will fail to complete."

ArmAlertTemplates

AzureBridgeServiceAlertTemplates

  • "Activation Required"
    • "Warning"
    • "Azure Stack is not activated."
  • "Activation Expired"
    • "Warning"
    • "Azure Stack activation expired. Please reactivate your Azure Stack."
  • "Activation Expiring Soon"
    • "Warning"
    • "Your Azure Stack activation will expire on {AzureBridgeActivationExpiration}"

BMCAliveAlertTemplates

  • "BMC credentials are not valid"
    • "Warning"
    • "The baseboard management controller (BMC) credentials on {NodeName} do not match the credentials stored in Azure Stack."
  • "BMC connection timeout"
    • "Warning"
    • "Azure Stack cannot connect to baseboard management controller (BMC) on {NodeName}"
  • "The BMC default credentials remain in use. Consider updating the credentials to use a strong password."
    • "Warning"
    • "The default credentials for the bmc on are in use"

BridgeServiceAlertTemplates

  • "Unable to connect to {UsageBridgeStore}"
    • "Warning"
    • "The Azure Stack Usage Bridge service is not able to connect to storage. Resource utilization data will not be sen t."
  • "Unable to connect to the remote service"
    • "Warning"
    • "The Azure Stack Usage Bridge service is unable to connect to the remote service. Resource utilization data will n ot be sent."
  • "Unable to process usage data"
    • "Warning"
    • "The Azure Stack Usage Bridge service has encountered an error. Resource utilization data will not be sent."

ComputeControllerAlertTemplate

  • "The compute scale unit is inaccessible for virtual machine placement"
    • "Critical"
    • "Scale unit {ScaleUnitName} is inaccessible. No new virtual machines can be created on the scale unit. Virtual mac hines on the scale unit may be inaccessible."
  • "Node inaccessible for virtual machine placement"
    • "Critical"
    • "Node {ScaleUnitNodeName} in the scale unit is inaccessible. There is now decreased capacity for virtual machine c reation. Virtual machines on node {ScaleUnitNodeName} will be moved to other nodes. If there is no available capacity, some virtual machines may not be restarted."
  • "Low memory capacity"
    • "Warning"
    • "The region has consumed more than {Percentage} of available memory. Creating virtual machines with large amounts of memory may fail."
  • "Low memory capacity"
    • "Critical"
    • "The region has consumed more than {Percentage} of available memory. Creating virtual machines with large amounts of memory may fail."
  • "Low core capacity"
    • "Warning"
    • "The region has consumed more than {Percentage} of available logical cores. Creating virtual machines with large c ore counts may fail."
  • "Low core capacity"
    • "Critical"
    • "The region has consumed more than {Percentage} of available logical cores. Creating virtual machines with large c ore counts may fail."
  • "Unable to provision virtual machines for specific class and size due to low memory capacity"
    • "Critical"
    • "Low memory capacity prevented one or more virtual machines of the size {VMSize} to be provisioned."
  • "Unable to provision virtual machines for specific class and size due to low logical core capacity"
    • "Critical"
    • "Low logical core capacity prevented one or more virtual machines of the size {VMSize} to be provisioned."
  • "Infrastructure role is unresponsive"
    • "Critical"
    • "The compute controller infrastructure role is unresponsive. The region is unable to create new virtual machines.  Virtual machine actions will not be available. Additionally, Azure Stack administrators are not able to administer sca le units and nodes."

DomainControllerAlertTemplates

  • "Infrastructure role is unhealthy"
    • "Warning"
    • "The infrastructure role Directory Management is operating in a degraded state."
  • "Infrastructure role is unhealthy"
    • "Critical"
    • "The infrastructure role Directory Management is not functional."
  • "Infrastructure role is unhealthy"
    • "Warning"
    • "The infrastructure role, Directory Management, has reported time synchronization errors."
  • "Pending Service account password expiration"
    • "Critical"
    • "A service account password will expire within 7 days"
  • "Low disk space for Azure Stack infrastructure"
    • "Warning"
    • "The computer {Component} has only {FreeSpace} GB of available disk space. Azure Stack service availability may be at risk if disk space continues to be consumed, and Azure Stack updates will fail until you free up space."

FrpHeartbeatAlertTemplates

  • "Scale unit node is offline"
    • "Critical"
    • "The node {NodeName} in the scale unit is inaccessible. There is less capacity available for tenant workloads. A p rocess has been started to move tenant workloads from this node to other nodes. If there is no available capacity, som e workloads may not restart."
  • "Infrastructure role instance unavailable"
    • "Warning"
    • "The infrastructure role instance {NodeName} is unavailable. This may impact performance and availability of Azure Stack services."

HrpAlertTemplates

NRPFabricAlertTemplate

  • "Node unreachable"
    • "Critical"
    • "MUX is unhealthy (Common case is BGPRouter disconnected)"
  • "Route publication failure"
    • "Critical"
    • "Loadbalancer Mux is not connected to a BGP router."
  • "Load balancer MUX is overloaded"
    • "Critical"
    • "There are performance issues with the MUX indicating that it may be at full capacity."
  • "Certificate not authorized"
    • "Critical"
    • "Failed to connect to Mux due to network or cert errors"
  • "Certificate not trusted"
    • "Critical"
    • "Failed to connect to Mux due to network or cert errors"
  • "Public IP address utilization at 70% across all pools."
    • "Warning"
    • "Public IP address utilization is at 70% across all pools. If utilization reaches 100%, users will be unable to c reate VM instances, or create public IP addresses."
  • "Public IP address utilization at 90% across all pools."
    • "Critical"
    • "Public IP address utilization is at 90% across all pools. If utilization reaches 100%, users will be unable to c reate VM instances, or create public IP addresses."
  • "Public IP address utilization at 100% across all pools."
    • "Critical"
    • "Public IP address utilization is at 100% across all pools. Users are unable to create VM instances, or create pu blic IP addresses."
  • "Edge Gateway Pool at 70% utilization"
    • "Warning"
    • "The Edge Gateway Pool {Name} is 70% utilized. If utilization reaches 100% users will be unable to create new gat eway connections and performance may be impacted."
  • "Edge Gateway Pool at 90% utilization"
    • "Critical"
    • "The Edge Gateway Pool {Name} is 90% utilized. If utilization reaches 100% users will be unable to create new gat eway connections and performance may be impacted."
  • "Edge Gateway Pool at 100% utilization"
    • "Critical"
    • "The Edge Gateway Pool {Name} is 100% utilized. Users will be unable to create new gateway connections and perfor mance may be impacted."

OEMActivationOneNodePreviewAlertTemplates

  • "Missing OEM activation BIOS marker."
    • "Warning"
    • "The node is missing the OEM activation BIOS marker. A missing activation BIOS marker prevents Windows from activa ting. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning."
  • "Invalid OEM activation BIOS marker."
    • "Warning"
    • "The node has an invalid OEM activation BIOS marker. An invalid activation BIOS marker prevents Windows from activ ating. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning."
  • "Missing OEM activation license file."
    • "Warning"
    • "The node is missing the OEM activation license file. A missing activation license file prevents Windows from acti vating. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning."
  • "Mismatched OEM activation license."
    • "Warning"
    • "The physical OEM activation BIOS marker does not match the license file. A mismatched activation license file pre vents Windows from activating. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning."
  • "OEM activation error."
    • "Warning"
    • "The node failed OEM activation with error {LicenseStatusReason}. Windows is not activated on this node. If the ev aluation period for Windows is exceeded, Azure Stack will stop functioning."

PolicyServiceAlertTemplate

SCAlertTemplates

  • "Add Scale Unit Node operation failed to expand the storage capacity."
    • "Warning"
    • "Storage capacity failed to be expanded as part of the operation that adds a scale unit node in cluster {ClusterNa me}."

SecretExpirationAlertTemplates

  • "Pending internal certificate expiration"
    • "Warning"
    • "An internal certificate will expire within 30 days."
  • "Pending internal certificate expiration"
    • "Critical"
    • "An internal certificate will expire within 7 days."
  • "Pending external certificate expiration"
    • "Warning"
    • "An external certificate will expire within 30 days."
  • "Pending external certificate expiration"
    • "Critical"
    • "An external certificate will expire within 7 days."

SecurityAlertTemplates

  • "Code Integrity Off"
    • "Critical"
    • "Code Integrity on {Component} is not enabled. Azure Stack is at risk of running unauthorized binaries."
  • "Code Integrity in Audit Mode"
    • "Critical"
    • "Code Integrity on {Component} is in audit mode. Azure Stack is at risk of running unauthorized binaries."
  • "User Account Created"
    • "Critical"
    • "A user account {UserName} was created for {Component}. It's a potential security risk."

ServiceFabricAlertTemplates

  • "Infrastructure role is unhealthy"
    • "Warning"
    • "The infrastructure role {Name} is experiencing issues."
  • "Infrastructure role is unhealthy"
    • "Warning"
    • "The infrastructure role {Name} is experiencing issues."
  • "Infrastructure role cannot be monitored"
    • "Warning"
    • "The infrastructure role {Name} cannot be monitored."

StorageAlertTemplates

  • "File share to volume mapping lost"
    • "Warning"
    • "One or more file shares have lost mapping to their volumes. Access to those shares may fail."
  • "Internal data store offline."
    • "Warning"
    • "The Internal data store service {ServiceName} is offline."

UrpAlertTemplates

WhsAlertTemplates

  • "A physical disk has failed"
    • "Warning"
    • "A physical disk located at {FaultingObjectLocation} has failed. The storage repair process has started."
  • "Connectivity has been lost to a physical disk"
    • "Warning"
    • "Connectivity has been lost to the physical disk located at {FaultingObjectLocation}."
  • "A physical disk is failing"
    • "Warning"
    • "The physical disk located at {FaultingObjectLocation} is sometimes unresponsive and is showing signs of failure."
  • "A failure of a physical disk is predicted to occur soon"
    • "Warning"
    • "A failure of the physical disk at {FaultingObjectLocation} is predicted to occur soon."
  • "A physical disk is quarantined because it is not supported"
    • "Warning"
    • "The physical disk located at {FaultingObjectLocation} is quarantined because it is not supported by your solution vendor. Only disks that are approved for the solution and have the correct disk firmware are supported."
  • "A physical disk is quarantined because its firmware version is not supported"
    • "Warning"
    • "The physical disk located at {FaultingObjectLocation} is quarantined because its firmware version is not supporte d by your solution vendor. The physical disk does not have the minimum firmware version level required by this Azure S tack solution."
  • "A replacement disk has existing data and is quarantined"
    • "Warning"
    • "The replacement disk located at {FaultingObjectLocation} was previously used and may contain data from an unknown storage system. The disk is quarantined."
  • "Failed attempt to update firmware on a physical disk"
    • "Warning"
    • "There was a failed attempt to update firmware on the physical disk located at {FaultingObjectLocation}."
  • "Storage device failure"
    • "Critical"
    • "A storage device failure occurred which may cause one or more file shares to be inaccessible. Some data may be lo st."
  • "The scale unit does not have the minimum recommended storage reserve capacity"
    • "Warning"
    • "The scale unit {FaultingObjectDescription} does not have the minimum recommended storage reserve capacity of one node. This may limit the ability to restore data resiliency if one or more drive failures occur."
  • "The node is isolated from the scale unit"
    • "Critical"
    • "The node {FaultingObjectDescription} is isolated from the scale unit because of connectivity issues."
  • "Node quarantined because of recurring failures"
    • "Critical"
    • "The node {FaultingObjectDescription} is quarantined by the scale unit because of recurring failures. There is les s capacity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart."
  • "Scale unit will go down if one more node fails"
    • "Critical"
    • "The scale unit {FaultingObjectDescription} has multiple node failures. If another node fails, the scale unit will go down."
  • "Network interface disconnected"
    • "Critical"
    • "The network interface {FaultingObjectDescription} is disconnected. The node is unavailable and there is less capa city available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes . If there is no available capacity, some workloads may not restart."
  • "Node has missing network adapter(s) associated with the scale unit network"
    • "Critical"
    • "The node {FaultingObjectDescription} has missing network adapter(s) associated with the scale unit network {Fault ingObjectUniqueId}. The node is unavailable and there is less capacity available for tenant workloads. A process has b een started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart."
  • "A network interface has failed"
    • "Critical"
    • "The network interface {FaultingObjectDescription} has failed. The node is unavailable and there is less capacity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart."
  • "A network interface is not enabled"
    • "Critical"
    • "The network interface {FaultingObjectDescription} is not enabled. The node is unavailable and there is less capac ity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart."
  • "A file share is over 80% utilized"
    • "Warning"
    • "The file share {FaultingObjectDescription} on volume {VolumeName} is over 80% utilized. If it reaches 100%, this can affect system functionality."
  • "A file share is over 90% utilized"
    • "Critical"
    • "The file share {FaultingObjectDescription} on volume {VolumeName} is over 90% utilized. If it reaches 100%, this can affect system functionality."

NRPAlertTemplate

  • "Node not connected to network controller"
    • "Critical"
    • "{NodeName}< /link> is not connected to the network controller."

SRPAlertTemplate

  • "Storage Resource Provider infrastructure/dependencies not available."
    • "Warning"
    • "Storage Resource Provider dependencies are unavailable and some functionality may be lost."
  • "Storage Resource Provider internal data store unavailable"
    • "Critical"
    • "Cannot connect to the Storage Resource Provider internal data store."
  • "Storage Resource Provider internal data store corruption"
    • "Critical"
    • "Storage Resource Provider internal data store corruption is detected."
  • "Corrupted blob"
    • "Critical"
    • "Couldn't access some of the blobs stored on these file share(s):\n{FileShares}."
  • "The blob service isn't running on a node"
    • "Critical"
    • "Couldn't utilize all of your physical hardware for blob service resiliency and scale-out because the blob service didn't start on these node(s):\n{Nodes}."
  • "The blob service can't attach to a file share"
    • "Critical"
    • "Couldn't access blobs stored on these file share(s):\n{FileShares}."
  • "Blob service data is corrupted"
    • "Critical"
    • "We couldn't access blobs stored on these shares because of problems with data corruption:\n{Shares}."
  • "Blob service data store is corrupted"
    • "Critical"
    • "Failures in the following shares indicate corruption issues in the blob service data store:\n{Shares}."
  • "Table server data corruption"
    • "Critical"
    • "There's a data corruption error in the Table server, which could cause a drop in availability for the table or qu eue service and result in data loss. The affected data repositories are:\n{DataRepositories}."
  • "Account and Container Service data corruption"
    • "Critical"
    • "There's a data corruption error in the Account and Container Service, which could cause a drop in availability fo r the table, queue, or blob service and result in data loss. The affected data repositories are:\n{DataRepositories}."
  • "Share is not accessible"
    • "Critical"
    • "Services on nodes {Nodes} can't access data on the following shares:\n{Shares}\nStorage services might be unavail able."
  • "Storage service internal communication error"
    • "Critical"
    • "Storage service internal communication error occurred when sending requests to the following nodes:\n{Nodes}."
  • "ESENT data corruption"
    • "Critical"
    • "There is ESENT data corruption in the following services:\n{Services}."
  • "Table service errors"
    • "Critical"
    • "Errors in the Table service may cause a decrease in tenant workload performance."
  • "Table service errors"
    • "Warning"
    • "Errors in the Table service may cause a decrease in tenant workload performance."
  • "Blob service errors"
    • "Critical"
    • "Errors in the Blob service may cause a decrease in tenant workload performance."
  • "Blob service errors"
    • "Warning"
    • "Errors in the Blob service may cause a decrease in tenant workload performance."
  • "Queue service errors"
    • "Critical"
    • "Errors in the Queue service may cause a decrease in tenant workload performance."
  • "Queue service errors"
    • "Warning"
    • "Errors in the Queue service may cause a decrease in tenant workload performance."

WhsAlertTemplatesSrp

  • "A file share is over 80% utilized"
    • "Warning"
    • "The file share {FaultingObjectDescription} on volume {VolumeName} is over 80% utilized. If it reaches 100%, affec ted tenants will not be able to use blobs, tables, or queues."
  • "A file share is over 90% utilized"
    • "Critical"
    • "The file share {FaultingObjectDescription} on volume {VolumeName} is over 90% utilized. If it reaches 100%, affec ted tenants will not be able to use blobs, tables, or queues."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment