Problem Description
Newly created Cosmos DB containers on both accounts immediately enter a 410 Gone / substatus 1000 state and never recover without external intervention. The containers remain in this state for hours to weeks. The control-plane API reports "Collection is not yet available for read. Please retry in some time." indefinitely.
This is a recurring pattern affecting both accounts simultaneously, suggesting a degraded storage node in West US that both accounts are being assigned to when new containers are created.
Pattern
Our dev labs delete all databases daily (cleanup cycle) and recreate them via service migrations. After cleanup:
- Databases are deleted β
- Migrations create new databases and containers β
- New containers immediately enter 410 Gone (substatus 1000)
- They remain in this state indefinitely β we have observed durations of 4 hours, 6 days, and 2 weeks on separate occurrences
- The control-plane endpoint (
az cosmosdb sql container show) returns "Collection is not yet available for read. Please retry in some time."
- The only remediation we have found is deleting the container/database and hoping it gets assigned to a different partition on recreation
rror Details (from SDK diagnostics)
ClassName: CosmosException
statusCode: 410
substatus: 1000
error: "Gone β The requested resource is no longer available at the server"
operationType: ReadFeed
resourceType: StoredProcedure
connectionMode: DIRECT (RNTBD)
Control-plane (az cosmosdb sql container show):
"Collection is not yet available for read. Please retry in some time."
ActivityId: [varies per attempt]
Key Observations
Both accounts affected simultaneously β yam-npe-n4cilab3 and yam-npe-n4cilab3-2 are separate accounts in the same resource group and region, both exhibiting the same problem. This strongly suggests a shared degraded storage node/cluster in West US.
Happens immediately on creation β The 410 state begins as soon as the container is created by migrations; the containers never become healthy.
Does not self-heal β Incidents have persisted for 4 hours, 6 days, and 2 weeks. This is not transient initialization delay.
Recurring β We observe this pattern repeatedly after each daily cleanup cycle, suggesting the accounts are consistently being assigned partitions on the same degraded node.
Other containers on the same accounts are healthy β Only specific newly created containers are affected; existing containers on the same accounts work normally.
Ask
Identify the degraded storage node(s) in West US that are serving these partition IDs and investigate why partitions assigned to them immediately enter an unrecoverable 410 Gone state.
- Migrate the affected partitions (or the accounts themselves) off the degraded node so that newly created containers become available normally.
Advise on whether there is a way to request partition reassignment without deleting and recreating the entire Cosmos account, given the ~30 minute recreation cost impacts our daily lab automation.
rror Details (from SDK diagnostics)
ClassName: CosmosException
statusCode: 410
substatus: 1000
error: "Gone β The requested resource is no longer available at the server"
operationType: ReadFeed
resourceType: StoredProcedure
connectionMode: DIRECT (RNTBD)
Control-plane (az cosmosdb sql container show):
"Collection is not yet available for read. Please retry in some time."
ActivityId: [varies per attempt]
Key Observations
Both accounts affected simultaneously β yam-npe-n4cilab3 and yam-npe-n4cilab3-2 are separate accounts in the same resource group and region, both exhibiting the same problem. This strongly suggests a shared degraded storage node/cluster in West US.
**Happens immediately on creation** β The 410 state begins as soon as the container is created by migrations; the containers never become healthy.
**Does not self-heal** β Incidents have persisted for 4 hours, 6 days, and 2 weeks. This is not transient initialization delay.
**Recurring** β We observe this pattern repeatedly after each daily cleanup cycle, suggesting the accounts are consistently being assigned partitions on the same degraded node.
**Other containers on the same accounts are healthy** β Only specific newly created containers are affected; existing containers on the same accounts work normally.
### Ask
**Identify the degraded storage node(s)** in West US that are serving these partition IDs and investigate why partitions assigned to them immediately enter an unrecoverable 410 Gone state.
**Migrate the affected partitions** (or the accounts themselves) off the degraded node so that newly created containers become available normally.
**Advise** on whether there is a way to request partition reassignment without deleting and recreating the entire Cosmos account, given the ~30 minute recreation cost impacts our daily lab automation.