Description
Problem
As part of the work under WE5.4 to protect our infrastructure from abusive scraping, we want to be able to understand the provenance of image requests. This means being able to distinguish when and where a URL to an image was generated.
This will allow us to use this information as a signal in request filtering at the CDN, by helping to determine if a request is coming from a browser session visiting the website, an API query, from dumps or if they are the result of hotlinking.
This intervention was originally proposed in Dec 2025 in Urgent needs for de-risking WE4.3, WE5.4, and our infrastructure (WMF-restricted)
Approach
Generate signed URLs for image requests, by adding query parameters that contain the provenance information and a signature that can be trivially validated at the CDN. The signature should be an HMAC that includes the URL, source (web, api, dumps), timestamp and a secret.
- Acceptance criteria
- Generated image URLs include provence query parameters
- Generated image URLs include an HMAC signature
- Signature contents and HMAC algorithm agreed with SRE
- SRE can configure the CDN based on the source that generated an image URL
- SRE can configure the CDN based on the freshness of an image URL
Status updates
- 4 Feb 2026, T414338#11584348
- 13 Mar 2026, T414338#11804201
- 9 Apr 2026, T414338#11804224
- 15 May 2026, T414338#11925175
Details
Related Objects
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | matmarex | T414338 FY25-26 WE5.4.12: Identify the provenance of image requests | |||
| Resolved | BUG REPORT | matmarex | T419458 Media dialog in VisualEditor shows odd UTM param strings where file type should be | ||
| Resolved | BUG REPORT | Krinkle | T422586 MediaViewer downloads high-res image twice if original is a medium-size JPEG | ||
| Open | Krinkle | T424082 MediaViewer preview sometimes lacks provenance parameters | |||
| Restricted Task |
- Mentioned In
- T425580: [Spike] [BUG] POTD Gallery doesn't load, crashes upon share
T426373: Mediaviews Analysis returns API not found error
T424082: MediaViewer preview sometimes lacks provenance parameters
T422586: MediaViewer downloads high-res image twice if original is a medium-size JPEG
T418957: Add client-side logging for non-MediaWiki action API errors (HTTP 429)
T419921: TypeError: MediaWiki\Extension\OAuth\ResourceServer::getUser(): Return value must be of type MediaWiki\User\User, false returned
T417278: Choosing client credentials grant for OAuth 2 results in an access token (JWT) with the 'sub' field empty
T419135: Gadget-Stockphoto.js on Commons uses non-common thumbnail sizes, leading to a HTTP 429
T419458: Media dialog in VisualEditor shows odd UTM param strings where file type should be
T246054: Consider dropping the '1.5x' size logos from srcsets
T414805: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only
T417309: mw.util.parseImageUrl() returns invalid thumb URLs for images where original size is under requested width
T414337: Identify requests for media files from logged-in users - Mentioned Here
- T425580: [Spike] [BUG] POTD Gallery doesn't load, crashes upon share
T424082: MediaViewer preview sometimes lacks provenance parameters
rMEXT1269442c61b4: Updated mediawiki/extensions Project: mediawiki/extensions/WikibaseQuery…
T426217: MediaViewer downloads high-res image twice if thumb URL is re-used
T419135: Gadget-Stockphoto.js on Commons uses non-common thumbnail sizes, leading to a HTTP 429
T422586: MediaViewer downloads high-res image twice if original is a medium-size JPEG
T419458: Media dialog in VisualEditor shows odd UTM param strings where file type should be
T417278: Choosing client credentials grant for OAuth 2 results in an access token (JWT) with the 'sub' field empty
T418957: Add client-side logging for non-MediaWiki action API errors (HTTP 429)
T419921: TypeError: MediaWiki\Extension\OAuth\ResourceServer::getUser(): Return value must be of type MediaWiki\User\User, false returned
T402792: Consider rate limiting non-standard thumbnail sizes
T414337: Identify requests for media files from logged-in users
Event Timeline
Change #1239464 merged by jenkins-bot:
[mediawiki/core@master] Media: Add provenance parameters to thumbnail and media file URLs
In T414338#11652561, @matmarex wrote:@Joe @CDanis I heard you're the people to talk to about the desired data and format of these query parameters.
Currently, the proposed patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1239464 includes the following data:
- Site which is requesting the image, e.g. 'www.mediawiki.org'
- Generator (the software component involved), e.g. 'parser' or 'imageinfo'. Entry point is used as fallback if not specified, e.g. 'index', 'api', 'rest'
- Format of the requested image, 'original', 'thumbnail' or 'thumbnail_unscaled'
The format is UTM parameters (respectively utm_source, utm_campaign and utm_content, in this order), on the assumption that they'll be stripped by search engines etc.
Your thoughts on that would be appreciated. I also have two questions:
- Do we need to sign these parameters so that they can't be spoofed, or do we start by assuming everyone will play nice? If yes, what format would be convenient? Can we just stick the data in a JWT instead of having separate parameters?
- If we're considering switching to a JWT, would it be more convenient to start with a single JSON parameter instead of separate parameters? (I mean something like https://upload.wikimedia.org/wikipedia/commons/a/a9/Example.jpg?utm_source={"site":"mediawiki.localhost","generator":"parser","format":"thumbnail"})
Sorry it's been a few weeks of intense work on other stuff. The proposed format is good as far as I'm concerned, as a first step.
I think adding a signature is useful. It would be enough to have a simple signature like a simple SHA1 of the other parameters as follows: which we can add in (again abusing the term). I would go with a simple sha1 instead of using hmac because the risk of compromise is pretty low.
Change #1253625 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):
[operations/mediawiki-config@master] Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster
Change #1253625 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster
Mentioned in SAL (#wikimedia-operations) [2026-03-16T20:57:26Z] <catrope@deploy2002> Started scap sync-world: Backport for [[gerrit:1253623|Fix client credentials access tokens (T417278 T419921)]], [[gerrit:1253625|Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster (T414338)]], [[gerrit:1253626|Configure $wgApiClientErrorSampleRate (T418957)]]
Mentioned in SAL (#wikimedia-operations) [2026-03-16T20:59:17Z] <catrope@deploy2002> matmarex, catrope: Backport for [[gerrit:1253623|Fix client credentials access tokens (T417278 T419921)]], [[gerrit:1253625|Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster (T414338)]], [[gerrit:1253626|Configure $wgApiClientErrorSampleRate (T418957)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2026-03-16T21:05:37Z] <catrope@deploy2002> Finished scap sync-world: Backport for [[gerrit:1253623|Fix client credentials access tokens (T417278 T419921)]], [[gerrit:1253625|Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster (T414338)]], [[gerrit:1253626|Configure $wgApiClientErrorSampleRate (T418957)]] (duration: 08m 06s)
Change #1260029 had a related patch set uploaded (by Krinkle; author: Krinkle):
[operations/mediawiki-config@master] Enable $wgTrackMediaRequestProvenance on group0 wikis
Change #1260029 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable $wgTrackMediaRequestProvenance on group0 wikis
Mentioned in SAL (#wikimedia-operations) [2026-03-31T23:10:45Z] <krinkle@deploy1003> Started scap sync-world: Backport for [[gerrit:1260029|Enable $wgTrackMediaRequestProvenance on group0 wikis (T414338)]]
Mentioned in SAL (#wikimedia-operations) [2026-03-31T23:12:45Z] <krinkle@deploy1003> krinkle: Backport for [[gerrit:1260029|Enable $wgTrackMediaRequestProvenance on group0 wikis (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2026-03-31T23:51:06Z] <krinkle@deploy1003> Finished scap sync-world: Backport for [[gerrit:1260029|Enable $wgTrackMediaRequestProvenance on group0 wikis (T414338)]] (duration: 40m 21s)
Change #1267437 had a related patch set uploaded (by Krinkle; author: Krinkle):
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on most group1 wikis
Change #1267437 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on most group1 wikis
Mentioned in SAL (#wikimedia-operations) [2026-04-08T07:36:29Z] <krinkle@deploy1003> Started scap sync-world: Backport for [[gerrit:1267437|Enable wgTrackMediaRequestProvenance on most group1 wikis (T414338)]]
Mentioned in SAL (#wikimedia-operations) [2026-04-08T07:38:18Z] <krinkle@deploy1003> krinkle: Backport for [[gerrit:1267437|Enable wgTrackMediaRequestProvenance on most group1 wikis (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2026-04-08T07:46:04Z] <krinkle@deploy1003> Finished scap sync-world: Backport for [[gerrit:1267437|Enable wgTrackMediaRequestProvenance on most group1 wikis (T414338)]] (duration: 09m 34s)
Change #1269440 had a related patch set uploaded (by Krinkle; author: Krinkle):
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on wikidata.org
Change #1269441 had a related patch set uploaded (by Krinkle; author: Krinkle):
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on Commons
Change #1269442 had a related patch set uploaded (by Krinkle; author: Krinkle):
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on remaining Wikipedias
Progress update (2-6 Mar, 9-13 Mar; copied here from Asana for transparancy):
- Investigate and fix broken thumbnails on officewiki (Timo investigated an found missing thumbnail steps on private wikis, Amir enabled this).
- Test and merge trial implementation of media provenance URLs in MediaWiki core behind a feature flag (developed by Bartosz and Timo). T414338
- Enable media provenance feature in Beta Cluster and on testwikis in production. T414338
Progress update (9 Apr 2026):
- Enable media provenance on 573 additional wikis (including all Wiktionary and Wikivoyage wikis, and 18 Wikipedias). We are now live on 720/1068 wikis. T414338
- Found regression in MediaViewer causing double downloads. T422586
- Prepare Stockphoto gadget on Commons ahead of rollout to prevent regression. T419135
Next steps:
- Deploy media provenance feature to Wikidata, Commons, and 346 remaining Wikipedias.
Change #1276086 had a related patch set uploaded (by Krinkle; author: Krinkle):
[mediawiki/extensions/MultimediaViewer@master] mmv.bootstrap: Avoid double download when thumb is unscaled original
Change #1276086 merged by jenkins-bot:
[mediawiki/extensions/MultimediaViewer@master] mmv.bootstrap: Avoid double download when thumb is unscaled original
Change #1269440 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on wikidata.org
Change #1269441 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on Commons
Mentioned in SAL (#wikimedia-operations) [2026-05-01T19:51:14Z] <krinkle@deploy1003> Started scap sync-world: Backport for [[gerrit:1269440|Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441|Enable wgTrackMediaRequestProvenance on Commons (T414338)]]
Mentioned in SAL (#wikimedia-operations) [2026-05-01T19:52:57Z] <krinkle@deploy1003> krinkle: Backport for [[gerrit:1269440|Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441|Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2026-05-01T20:06:42Z] <krinkle@deploy1003> Finished scap sync-world: Backport for [[gerrit:1269440|Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441|Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s)
I think these changes may the the cause behind https://commons.wikimedia.org/wiki/MediaWiki_talk:Gadget-GoogleImagesTineye.js#c-Masur-20251229182900-Reverse_Image_Search_-_Google_and_TinEye_failing_to_retrieve_source_images_from
the gadget's logic is simple. it takes the url of the image and gives it to the search engines in the form of https://lens.google.com/uploadbyurl?url=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2Ff%2Ffa%2FStatue_of_Taras_Shevchenko_in_Shevchenkove%252C_Shevchenkove_Raion_2019_by_Venzz_04.jpg%3Futm_source%3Dcommons.wikimedia.org%26utm_campaign%3Dindex%26utm_content%3Doriginal
but as i tested manually, search engines cannot get the image, no matter the link comes with or without the new trackers
utm_source=commons.wikimedia.org&utm_campaign=index&utm_content=original
please explain how to get the gadget working again, i.e. how to get a link of a file that can be read by other websites.
@RoyZuo A more robust way would be to make the gadget download the image (or a thumbnail), then upload it to the search engine, instead of asking the search engine to fetch it from us, which may be blocked if they don't respect our user-agent policy.
In the meantime, it looks like using a thumbnail URL instead of the original file URL works, at least for now.
I've tested everything I wanted to test on Commons and Wikidata.
- https://commons.wikimedia.org/wiki/Main_Page
- https://commons.wikimedia.org/wiki/File:Bloemknoppen_van_een_Crocosmia._25-06-2024._(d.j.b)_02.jpg
- https://www.wikidata.org/wiki/Q2
I expected Wikidata to perhaps not get the provenance params or not work with MMV, but it all looks good. I did find a bug, T426217: MediaViewer downloads high-res image twice if thumb URL is re-used, but that's pre-existing and not caused or made more common by provenance params, and so does not need to block roll-out.
In T414338#11804130, @gerritbot wrote:Change #1269442 had a related patch set uploaded (by Krinkle; author: Krinkle):
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on remaining Wikipedias
I've scheduled this for tomorrow afternoon, 13:00 UTC.
Change #1269442 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable wgTrackMediaRequestProvenance on remaining Wikipedias
Mentioned in SAL (#wikimedia-operations) [2026-05-14T13:42:53Z] <krinkle@deploy1003> Started scap sync-world: Backport for [[gerrit:1269442|Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]]
Mentioned in SAL (#wikimedia-operations) [2026-05-14T13:44:41Z] <krinkle@deploy1003> krinkle: Backport for [[gerrit:1269442|Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2026-05-14T13:49:57Z] <krinkle@deploy1003> Finished scap sync-world: Backport for [[gerrit:1269442|Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s)
Progress update (15 May 2026):
- SRE now includes media provenance as a signal in calculating the X-Is-Browser score on the edge.
- Fixed regression in MediaViewer causing high-res double downloads (blocking rollout). T422586
- Enable media provenance on Wikidata and Wikimedia Commons (720 -> 722/1068 wikis).
- Did broad manual testing across Wikimedia Commons and Wikidata post-rollout.
- Found bug in MediaViewer causing lack of provenance params in some cases (pre-existing, not blocking rollout). T424082
- Found bug in MediaViewer causing low-res double downloads (pre-existing, not blocking rollout). T426217
- Enable media provenance on remaining 346 Wikipedias, including English Wikipedia. Now live on all 1068 wikis.
The update to enable this feature on all wikis has to be reverted due to T425580: [Spike] [BUG] POTD Gallery doesn't load, crashes upon share
As explained by @RoyZuo above, we have at Wikimedia Commons a serious problem if the gadget that supports image reverse search on Google Lens, TinEye and Yandex doesn't work. Right now, TinEye and Yandex work but Google Lens fails as it is unable to access the images. I am not responsible for the gadget but as one of the admins at Commons I can tell you that this gadget is absolutely essential to fight against copyright violations. We delete about 2000 copyvios every day and we cannot do this efficiently if Google Lens cannot be conveniently queried. Hence, some solution is required such that this gadget can pass URLs that are subsequently not blocked when the respective services download them.
Change #1288925 had a related patch set uploaded (by Krinkle; author: Seddon):
[operations/mediawiki-config@master] Revert "Enable wgTrackMediaRequestProvenance on Commons"
Change #1288925 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Enable wgTrackMediaRequestProvenance on Commons"
Mentioned in SAL (#wikimedia-operations) [2026-05-18T21:31:09Z] <krinkle@deploy1003> Started scap sync-world: Backport for [[gerrit:1288925|Revert "Enable wgTrackMediaRequestProvenance on Commons" (T414338 T425580)]]
Mentioned in SAL (#wikimedia-operations) [2026-05-18T21:32:56Z] <krinkle@deploy1003> seddon, krinkle: Backport for [[gerrit:1288925|Revert "Enable wgTrackMediaRequestProvenance on Commons" (T414338 T425580)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2026-05-18T21:42:39Z] <krinkle@deploy1003> Finished scap sync-world: Backport for [[gerrit:1288925|Revert "Enable wgTrackMediaRequestProvenance on Commons" (T414338 T425580)]] (duration: 11m 29s)
In T414338#11885830, @RoyZuo wrote:I think these changes may the the cause behind https://commons.wikimedia.org/wiki/MediaWiki_talk:Gadget-GoogleImagesTineye.js#c-Masur-20251229182900-Reverse_Image_Search_-_Google_and_TinEye_failing_to_retrieve_source_images_from
but as i tested manually, search engines cannot get the image, no matter the link comes with or without the new trackers
@RoyZuo and @AFBorchert this and the note on VP, implying this has been going on for several weeks, imply that this is very likely NOT caused by this ticket. More likely simply the anti scraping measures of the foundation that have been implemented before have caught these systems as well..
Matmarex has already given advise on how to change the gadget in a way that might make it work more reliable. This can be done right now. Or you can open a separate ticket to investigate why this websites are blocked from accessing us, but it might be that the it is not actually possible to distinguish these systems from illegitimate scrapers. It's hard to say.
@AFBorchert use browsers like opera, which have "search image with google lens" when you right click on it. probably some extensions for other browsers also do this.
basically the same method described by matmarex: searching the copied image.
but i'm not gonna put my time into making that a gadget on commons.
who broke it should fix it. or who wants to.
@TheDJ I am not familiar with the architecture and the algorithms of the protection system against unwanted scraping. To me it appears quite likely that the amount of traffic from a particular site can play a role, causing the tool to work or to fail for some sites. But it appears to me very likely that the gadget failures are linked to the protection system. Regard the gadget: I am not the author of the gadget or anyhow involved in its development. However, downloading and uploading the image to submit them to various reverse searches as suggested by @matmarex do not appear to be the straightforward solution. I think it would be better to be able within the gadget to generate image URLs that are subsequently accepted by protection system. My point is that Wikimedia Commons and its defense against copyright violations is a critical part of the infrastructure. This perspective should be IMHO taken into account when designing and updating the protection system.
Change #1295921 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/puppet@production] P:cache:haproxy add image generator information
Change #1295921 merged by BCornwall:
[operations/puppet@production] P:cache:haproxy add image generator information
@SLyngshede-WMF I'm wondering: Instead of a new header () would it make sense to use the existing header? For example,
