Azure Batch + Azure Monitor Agent: correlating Computer names with pool names (Python SDK)

Francesco Cipolla 60 Reputation points

Introduction

We're using the Python azure-mgmt-batch SDK to create Azure Batch pools backed by VMSS, with Azure Monitor Agent (AMA) enabled via the extensions field in the ARM pool definition. AMA is shipping Linux performance counters (CPU, memory etc.) to a Log Analytics workspace via a Data Collection Rule.

Here is the relevant pool creation code just for reference:

 pool_params = Pool(
 identity=BatchPoolIdentity(
 type=PoolIdentityType.USER_ASSIGNED,
 user_assigned_identities={
 node_monitoring_identity_resource_id: UserAssignedIdentities()
 },
 ),
 vm_size=self.pool_vm_size,
 deployment_configuration=DeploymentConfiguration(
 virtual_machine_configuration=VirtualMachineConfiguration(
 image_reference=ImageReference(...),
 node_agent_sku_id=sku_to_use,
 extensions=[
 VMExtension(
 name="AzureMonitorAgent",
 publisher="Microsoft.Azure.Monitor",
 type="AzureMonitorLinuxAgent",
 type_handler_version="1.0",
 auto_upgrade_minor_version=True,
 enable_automatic_upgrade=True,
 settings={
 "authentication": {
 "managedIdentity": {
 "identifier-name": "mi_res_id",
 "identifier-value": node_monitoring_identity_resource_id,
 }
 }
 },
 )
 ],
 )
 ),
 ...
 )
 batch_mgmt_client.pool.create(
 resource_group_name=batch_account_resource_group,
 account_name=batch_account_name,
 pool_name=pool_name,
 parameters=pool_params,
 )

 # DCR association created on the pool ARM resource after creation
 monitor_client.data_collection_rule_associations.create(
 resource_uri=created_pool.id,
 association_name="ama-dcr-association",
 body=DataCollectionRuleAssociationProxyOnlyResource(
 data_collection_rule_id=dcr_resource_id,
 ),
 )

The problem:

Querying CPU metrics per node works fine, example KQL query:

Perf
 | where ObjectName == "Processor" and CounterName == "% Processor Time" and InstanceName == "total"
 | summarize avg(CounterValue) by Computer, bin(TimeGenerated, 5m)

Result:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Computer β”‚ TimeGenerated β”‚ avg_CounterValue β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <computer_name_a>000000 β”‚ 2026-04-27T23:30:00Z β”‚ 78.4 β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <computer_name_a>000001 β”‚ 2026-04-27T23:30:00Z β”‚ 81.2 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <computer_name_a>000002 β”‚ 2026-04-27T23:30:00Z β”‚ 75.9 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <computer_name_b>000000 β”‚ 2026-04-27T23:30:00Z β”‚ 12.1 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ <computer_name_b>000001 β”‚ 2026-04-27T23:30:00Z β”‚ 10.8 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I need to be able to correlate these Computer names with the Batch pool name in order to aggregate metrics per pool in Grafana, so that the end user doens't need to double check the computer names. This is especially important for easily querying performance metrics of pools which allocate dozens of nodes that are automatically deleted as soon as the corresponding batch job ends.

There is no field in the Perf table, the _ResourceId, or the Computer name that contains the Batch pool name.
The VMSS UUID and the Computer name prefix are both auto-generated by Azure and have no documented relationship to the pool name.

My initial goal was to inject a custom BatchPoolRegistry_CL table into the same Log Analytics workspace containing the PoolName β†’ Computer mapping, so we can join it with the Perf table.
What I've investigated:

  • BatchManagementClient.pool.get() β€” ARM pool response does not expose the underlying VMSS resource group or UUID
  • compute_node.list() (legacy azure-batch SDK) β€” node IDs follow tvmps_{hex}_p format with no obvious link to the VMSS UUID or Computer name

Questions:

  1. Is there any API in azure-mgmt-batch (Python management SDK) that exposes the underlying VMSS resource group or a stable identifier correlating with what AMA reports as Computer?
  2. Alternatively, is there an officially supported way to associate a Batch pool name with its AMA-monitored nodes?

Thanks!

  1. Manish Deshpande 7,010 Reputation points β€’ Microsoft External Staff β€’ Moderator

    Hello Francesco Cipolla

    Thanks for reaching out to Microsoft Q&A and this is a really well scoped problem and you've clearly done solid investigation already. Let me address both your questions directly and then walk you through the recommended solution.

    Why This Happens

    This is a known gap in the Azure Batch + AMA integration. Azure Monitor in Batch Pool is only supported for Batch accounts created with Pool Allocation Mode in User Subscription mode. In Batch Service mode, nodes are created in Azure-managed subscriptions that users don't have access to, which makes enabling data collection for those nodes impossible.

    πŸ‘ User's image

    Link : https://techcommunity.microsoft.com/blog/azurepaasblog/integrating-azure-monitor-in-azure-batch-to-monitor-batch-pool-nodes-performance/4428929

    Even in User Subscription mode, the core issue is that Azure Batch has a predefined naming convention for the resources it creates (e.g., {guid}-AzureBatch-VMSS) and does not currently support customizing the names of scale sets or nodes. The VMSS UUID and the Computer name prefix reported by AMA are both auto-generated, and there is no documented field in the Batch Management API that directly exposes this mapping.

    Q1: Is there any azure-mgmt-batch API that exposes the underlying VMSS resource group or a stable identifier correlating with what AMA reports as Computer ?

    No, not directly. The batch_mgmt_client.pool.get( ) response does not expose the underlying VMSS resource ID or name. The Compute Node - List REST API (GET /pools/{poolId}/nodes) returns a node id in tvm-{hash}_{index}-{timestamp} format this id field contains the node identifier but does not directly map to the VMSS instance name or the Computer hostname reported by AMA.

    Q2: Is there an officially supported way to associate a Batch pool name with its AMA-monitored nodes?

    There is no single turnkey API for this today. However, there are two practical approaches you can combine to build the BatchPoolRegistry_CL lookup table you described one from the control plane and one from inside the node.

    Alternative Options :
    Resource Tags Popagation + KQL join Method

    The resource Tags property allows you to associate user-defined tags with a Batch pool. When specified, these tags are propagated to the backing Azure resources associated with the pool. This property can only be specified when the Batch account was created with the poolAllocationMode set to Usersubscription.

  2. Francesco Cipolla 60 Reputation points

    Hello Manish!
    Thanks for your detailed reply!

    I have some follow-up questions regarding your alternative, which seems to be the only way forward.
    Assuming the Batch account was indeed created with UserSubscription, then it means we can join the KQL query by fetching the correct VMSS via the PoolName tag (if I understood correctly).
    However, that would only work for as long as the VMSS lives, correct?
    If the VMSS is deleted after the corresponding Batch Job completes, we would lose this way of making the association via KQL query as the VMSS is no longer available.

    For getting historical data of a particular batch pool we would still require to create a custom table, which theoretically should be possible if tags are correctly propagated to the underlying batch pool resources.
    Given the Perf table already comes with the full _ResourceId which is in the form of:

     /subscriptions/<my-subscription>/resourcegroups/azurebatch-<auto-generated-rg>/providers/microsoft.compute/virtualmachinescalesets/<VMSS-name>/virtualmachines/0
    

    It should be possible to do the following:

    1. Get the VMSS name filtering based on tags (after Batch ARM pool is created)
    2. Push <VMSS name to Pool name> association to a custom table
    3. KQL join custom table against Perf's _ResourceId string.

    Please do let me know if I'm getting your suggestion correctly, or if I missed important details!

    Thanks.

  3. Francesco Cipolla 60 Reputation points

    Hi Manish,
    Great! Again, thanks so much for all the useful info.
    I'm now in the implementation process following the alternative proposal we discussed above.
    Will post an update after I'm able to test the full flow.

    Thank you!

  4. Manish Deshpande 7,010 Reputation points β€’ Microsoft External Staff β€’ Moderator

    Hello Francesco

    Just wanted to check if there is any update or did you had a chance to test the response which i have posted.

  5. Francesco Cipolla 60 Reputation points

    Hi Manish, finally sharing my update!
    I was able to successfully test this approach since our production Batch Account was created with Usersubscription mode.
    I had to create some new resources for ingesting data in the custom table (namely a Data Collection Rule + Data Collection Endpoint for defining the new data pipeline from Logs Ingestion API to the BatchPoolRegistry_CL), after that everything is working as expected as I am now recording the association between PoolName and its underlying VMSS which gives me a way to query historical data.
    Again, thank you very much for all your clarifications!

    EDIT: it seems I don't see the accept answer button on your previous message

    EDIT #2: another thing worth mentioning, when I double checked the VMSS resource I could already see a tag for PoolName. Though we did not specifically add it in our list of Pool tags, so my assumption is this was automatically handled by batch.


Sign in to comment

Answer accepted by question author

Manish Deshpande 7,010 Reputation points β€’ Microsoft External Staff β€’ Moderator

Yes you are understanding the recommendation correctly, and yes β€” you will still need a custom table for historical correlation.

  • VMSS-based lookups are only valid while the VMSS exists
  • Once a Batch pool is deleted (and its VMSS is torn down), Log Analytics retains Perf data but loses the control‑plane context
  • Therefore, the correct and supported approach is:
    1. Use resourceTags on the Batch pool to discover the VMSS while it exists
    2. Extract VMSS name β†’ Pool name
    3. Persist that mapping into a custom Log Analytics table
    4. Join historical Perf._ResourceId against that table

You did not miss any important steps.

This behavior is by design and not specific to AMA.

  • Azure Batch deliberately abstracts away the underlying infrastructure
  • The Batch control plane never exposes VMSS identity via azure-mgmt-batch
  • Azure Monitor (AMA) reports telemetry at the VM / VMSS layer, not the Batch layer
  • Once a VMSS is deleted:
    • Azure Resource Graph
    • ARM
    • Compute APIs no longer contain metadata
  • Log Analytics keeps Perf rows, but ResourceId becomes the only remaining anchor

There is currently no native field in Perf that stores the Batch pool name, nor is there a supported way to backfill that automatically after deletion.

Tag the Batch pool (control plane)

When creating the pool, set resourceTags:

pool_params = Pool(
 ...
 resource_tags={
 "BatchPoolName": pool_name,
 "BatchAccountName": batch_account_name
 },
 ...
)

Discover VMSS while it exists

Query Compute / ARM to find VMSS with that tag:

Tags["BatchPoolName"] == "<pool-name>"

https://learn.microsoft.com/en-us/rest/api/compute/virtual-machine-scale-sets/list?view=rest-compute-2025-11-01&tabs=HTTP

Query historical Perf data reliably

Because _ResourceId is retained forever in Perf:

let PoolMap = BatchPoolRegistry_CL
    | project PoolName, VmssResourceIdPrefix;
Perf
| where ObjectName == "Processor"
| where CounterName == "% Processor Time"
| extend VmssPrefix = extract(@"virtualMachineScaleSets/([^/]+)", 1, _ResourceId)
| join kind=leftouter PoolMap on $left.VmssPrefix == $right.VmssName
| summarize avg(CounterValue) by PoolName, bin(TimeGenerated, 5m)

Your understanding is exactly correct. The VMSS-based correlation works only while the backing infrastructure exists. For historical analysis, the supported pattern is to materialize the VMSS β†’ Batch pool association at pool creation time and persist it into a custom Log Analytics table, then join against the _ResourceId column in Perf.

Thanks,
Manish.

  1. Manish Deshpande 7,010 Reputation points β€’ Microsoft External Staff β€’ Moderator

    Hello @Francesco Cipolla

    I have posted my response it would be of great help if you could accept the response and upvote it.

    Thanks,
    Manish


Sign in to comment

1 additional answer

  1. AI answer

    1. The provided information about Azure Batch pools, nodes, and VM configuration does not describe any API in the azure-mgmt-batch Python management SDK that exposes the underlying VM scale set resource group, VMSS ID, or any other stable identifier that can be directly correlated with the Computer field reported by Azure Monitor Agent in the Perf table.
    2. The available documentation only covers:
      • How to define a pool with VirtualMachineConfiguration, including image_reference, node_agent_sku_id, and optional extensions.
      • General concepts of pools and nodes, including that Batch uses virtual machine scale sets under the hood and that each node has a unique name and IP address.
      • How to connect to Linux nodes via SSH and create users.
      • How to create pools from custom images or Azure Compute Gallery images.
      • Best practices for pool configuration and monitoring Batch with Azure Monitor.

    None of these describe an officially supported mechanism to:

    • Retrieve the backing VMSS resource or its identifiers from the Batch management plane, or
    • Join Batch pool metadata (such as pool name) with AMA-reported Computer values in Log Analytics.

    Within the provided material, there is therefore no documented, supported way to associate Batch pool names with AMA-monitored nodes via a stable identifier that appears in the Perf table.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.
Sign in to answer

Your answer