Description
Error
Trying to undelete https://commons.wikimedia.org/w/index.php?title=File:Tajiks_of_Uzbekistan.PNG
and I repeatedly get
Sorry! This site is experiencing technical difficulties. Try waiting a few minutes and refreshing. (Cannot access the database: Cannot access the database: Database servers in cluster28 are overloaded. In order to protect application servers, the circuit breaking to databases of this section have been activated. Please try again a few seconds.)
Impact
File cannot be undeleted.
Related Objects
- Mentioned In
- T413974: Northward Datacenter Switchover (March 2026; codfw to eqiad)
T422166: scap can’t deploy (blob upload unknown) after apus.discovery.wmnet is repooled in codfw
T422111: es1042 not starting after powercycle
T422140: Fatal exception of type "Wikibase\DataModel\Services\Lookup\EntityLookupException"
Event Timeline
For https://commons.wikimedia.org/w/index.php?title=File:DepEd_Undersecretary_Michael_T._Poa.jpg
I get
(Cannot access the database: Cannot access the database: Database servers in cluster30 are overloaded. In order to protect application servers, the circuit breaking to databases of this section have been activated. Please try again a few seconds.)
Many such servers: 26, 31. When just opening pages for read.
I've been experiencing these errors intermittently on English Wikipedia today, but only on trying to save edits. Each time trying again has resulted in the save being successful.
I experienced such errors when diffing and saving edits.
Should I expect the coming backport window be cancelled or delayed due to this incident?
In T422130#11781793, @1F616EMO wrote:Should I expect the coming backport window be cancelled or delayed due to this incident?
Very likely yes. A deployment won't take place unless incident responders are comfortable it won't affect or distract from the incident.
In T422130#11781814, @RhinosF1 wrote:In T422130#11781793, @1F616EMO wrote:Should I expect the coming backport window be cancelled or delayed due to this incident?
Very likely yes. A deployment won't take place unless incident responders are comfortable it won't affect or distract from the incident.
Thanks for the info, I've rescheduled my backports.
I've just encountered what I presume is the same error, this time when trying to use the reply tool
[6a4d47bf-961e-4513-9b1f-c6970e11f156] Caught exception of type Wikimedia\Rdbms\DBConnectionError
I know the user-unfriendliness of that error message is a different issue but I'm not sure where to document that?
We are hopeful the situation should have improved after codfw was repooled, adding additional capacity. Root cause of the circuit breaking is still being investigated.
The immediate impact has been mitigated, reducing priority, the task might still be used to collect followups.
FWIW, I'm still currently encountering this error on frwiki, and it prevents my local custom JS/CSS files from loading.
Unexpectedly not loaded:
- , , , …
Not impacted — loading as expected:
- , , , …
- ,
In T422130#11784439, @Od1n wrote:FWIW, I'm still currently encountering this error on frwiki, and it prevents my local custom JS/CSS files from loading.
Unexpectedly not loaded:
- , , , …
Not impacted — loading as expected:
- , , , …
- ,
Please let us know if you are still experiencing this issue
Right now, I’m still seeing the JS error in the console.
Interestingly, and are still not loading, but is now loading (and I don’t have a page).
I'm still seeing the issue. The UUID is always the same, so I'm posting it here in case it helps:
[e0e9c2f5-9aa0-47a2-92a1-6f9e523708fe] 2026-04-02 11:37:52: Fatal exception of type "Wikimedia\Rdbms\DBConnectionError"
The failing network request is consistently this URL:
https://fr.wikipedia.org/w/load.php?lang=fr&modules=user&skin=vector&user=Od1n&version=8ea0b
The error only occurs with this specific value; if I change or remove the version parameter, the request succeeds.
As an additional note, yesterday while editing I was repeatedly asked to re‑authenticate — not every time, but often when opening or submitting an edit page. I'm not sure whether this is related, but mentioning it in case it’s useful.
We haven't had db circuit breaking being active for two days now (based on logstash logs). Your issue seems to be completely different. One suggestion: Clear your caches (ctrl+shift+r). It could be that somehow the error page got cached into your browser (which it really shouldn't as this is a 500 response but you never know with browsers). The fact that I can load that URL with correct content and no problem says it's not related to this. If it's not fixed, I'd say file a new bug so we can investigate it separately.
I've cleared my browser cache and restarted Chrome.
- I still encounter the exact same error (same UUID and timestamp), even when requesting the asset in a Chrome Incognito window or from a different Chrome profile.
- But it works when I request it using another browser (Firefox).
This really feels like a reverse‑proxy issue — some stale or polluted cache that hasn’t been invalidated and is still being served based on IP, user‑agent, or similar, while a different browser triggers a cache miss.
Hopefully it will sort itself out within a few days at most.
If you're logged in, it should bypass all CDN caches since that can pollute the cache (e.g. if you set your interface language to something else, we don't want to serve that to logged out users :D) there is an exception and that's images but that's not related here. It doesn't mean there can't be any bugs here and there though. I notify the traffic team to investigate. It is caching on some layer somewhere since UUIDs by nature shouldn't repeat themselves.
I was still encountering the issue, and I’ve just resolved it by making an edit to MediaWiki:Group-sysop.js, which I noticed was included in the bundle, in order to trigger a server‑side cache refresh.
In T422130#11789154, @Marostegui wrote:Is this good to be closed?
Judging by T422130#11782760, I guess it's just waiting for followups to be filed (if any)
