| Kallichore |
| Tue, Jun 16, 4:57 PM |
| F88950660: Screenshot_T429376.jpg |
| Tue, Jun 16, 7:28 PM |
Description
Two files on Wikimedia Commons cannot be embedded in articles when using Parsoid. Purging does not fix the problem permanently.
Demo of the problem: https://de.wikipedia.org/w/index.php?title=Benutzer:Kallichore/Test4&action=parsermigration-edit
What happens: The legacy Parser embeds the images, while Parsoid just shows red text with the file names inside a box. However, when the problem started 3 days ago the Legacy paser was affected as well.
Affected files:
https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenquelle.jpg
https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenm%C3%BCndung.jpg
Parsoid generates this in the article HTML:
<figure class="mw-default-size" typeof="mw:Error mw:File/Thumb" id="mwAg" data-mw='{"errors":[{"key":"apierror-filedoesnotexist","message":"This image does not exist."}]}'><a href="//de.wikipedia.org/wiki/Special:FilePath/Mühlgrabenquelle.jpg" class="new" title="Datei:Mühlgrabenquelle.jpg" id="mwAw"><span class="mw-file-element mw-broken-media" resource="./Datei:Mühlgrabenquelle.jpg" data-width="250" id="mwBA">Datei:Mühlgrabenquelle.jpg</span></a><figcaption id="mwBQ"></figcaption></figure>Reports about this Problem:
de.wiki: https://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia#M%C3%BChlgraben_(Aubach)
commons: https://commons.wikimedia.org/wiki/Commons:Village_pump/Technical#Files_exist_on_commons,_but_cannot_be_embedded_in_articles
Related Objects
- Mentioned Here
- rMWf8733dc456f6: Fix MediaHandler caching to not preserve language
rMWf4335ac245d3: filebackend: rename cheapCache/expensiveCache fields in FileBackendStore
rMWec747693fb07: Introduce ShadowPage concept
rMW55efcf2c9eb7: imageinfo: Include metadata for file revisions with missing blobs
T28741: Migrate file tables to a modern layout (image/oldimage; file/filerevision; add primary keys)
T416548: Start reading from file table on wmf production
Event Timeline
Here is a screenshot of the migration tool showing the problem:
I don't think this is Parsoid related. Just now, I purged both images on commons via "?action=purge", and now https://de.wikipedia.org/w/index.php?title=Benutzer:Kallichore/Test4&action=parsermigration-edit shows identical behavior for both images. I don't offhand know what the problem is, but I'll try to find someone who may know what is happening.
Hmm, locally, from , I see
{
"batchcomplete": "",
"query": {
"pages": {
"-1": {
"ns": 6,
"title": "File:M\u00fchlgrabenquelle.jpg",
"missing": "",
"known": "",
"imagerepository": "wikimediacommons",
"imageinfo": [
{
"url": "https://upload.wikimedia.org/wikipedia/commons/3/34/M%C3%BChlgrabenquelle.jpg",
"descriptionurl": "https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenquelle.jpg",
"descriptionshorturl": "https://commons.wikimedia.org/w/index.php?curid=19091096"
}
]
}
}
}
}but at https://de.wikipedia.org/w/api.php?action=query&titles=Datei:M%C3%BChlgrabenquelle.jpg&prop=imageinfo&iiprop=url the response is
{
"batchcomplete": "",
"query": {
"pages": {
"-1": {
"ns": 6,
"title": "Datei:M\u00fchlgrabenquelle.jpg",
"missing": "",
"imagerepository": ""
}
}
}
}https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenquelle.jpg now says at the top:
Working fine now. The file was uploaded fifteen years ago, and there is no indication it has been moved or deleted recently (or at all).
No upload log either, which is weird, but maybe we did not have that yet in 2012?
The imageinfo API on dewiki still doesn't see it though, even though https://de.wikipedia.org/wiki/Datei:M%C3%BChlgrabenquelle.jpg works. The previous state is stuck in memcached, I imagine?
No idea how the situation arose in the first place.
It doesn't work for https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenm%C3%BCndung.jpg .. and Parsoid still doesn't see the image (because of the imageinfo API issue, because effectively, that is what Parsoid uses).
Ideas (from searching for recently merged core patches with "image"/"file"/"media"/"cache"):
- something wrong with T28741: Migrate file tables to a modern layout (image/oldimage; file/filerevision; add primary keys) / T416548: Start reading from file table on wmf production - SCHEMA_COMPAT_READ_NEW was enabled on dewiki a month ago, enabled on Commons for a short test period 12 days ago
- probably not recent enough, we have errors on Commons itself where the feature flag was reverted 12 days ago, all caches would clear after 7 days I think? but maybe the wrong flag is used when Commons DB is accessed from another wiki
- something wrong with ShadowPage (rMWec747693fb07: Introduce ShadowPage concept)
- cross-wiki file pages are a shadow-page-ish concept but at a glance it doesn't seem like ShadowPage is actually used for them? Also shouldn't affect imageinfo.
- some recent change about splitting media handler cache by language (rMWf8733dc456f6: Fix MediaHandler caching to not preserve language)
- don't see how it would affect file existence checks
- there was a small change in the filebackend caching (rMWf4335ac245d3: filebackend: rename cheapCache/expensiveCache fields in FileBackendStore)
- trivial refactoring, can't see how anything could have gone wrong with it
- there was a change to how the imageinfo API handles missing files (rMW55efcf2c9eb7: imageinfo: Include metadata for file revisions with missing blobs)
- doesn't affect file pages themselves, but I guess it's not completely impossible that we have two unrelated bugs a the same time
I only searched in commit summaries, and only skimmed the changes which looked relevant, so might have easily missed other changes. Still, my money is on the file table migration.
filerevision entries are missing:
mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select img_name, img_timestamp from image where img_name in ('Mühlgrabenquelle.jpg', 'Mühlgrabenmündung.jpg', 'Weapons_Instructor_Course.jpg');
+-------------------------------+----------------+
| img_name | img_timestamp |
+-------------------------------+----------------+
| Mühlgrabenmündung.jpg | 20120415085340 |
| Mühlgrabenquelle.jpg | 20120415085541 |
| Weapons_Instructor_Course.jpg | 20170617192717 |
+-------------------------------+----------------+
3 rows in set (0.002 sec)
mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select file_id, file_name, file_deleted, file_latest from file where file_name in ('Mühlgrabenquelle.jpg', 'Mühlgrabenmündung.jpg', 'Weapons_Instructor_Course.jpg');
+-----------+-------------------------------+--------------+-------------+
| file_id | file_name | file_deleted | file_latest |
+-----------+-------------------------------+--------------+-------------+
| 91422617 | Mühlgrabenmündung.jpg | 0 | 0 |
| 92086515 | Mühlgrabenquelle.jpg | 0 | 0 |
| 111052288 | Weapons_Instructor_Course.jpg | 0 | 119353871 |
+-----------+-------------------------------+--------------+-------------+
3 rows in set (0.002 sec)
mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select fr_id, fr_timestamp, fr_deleted from filerevision where fr_file in (91422617, 92086515, 111052288);
+-----------+----------------+------------+
| fr_id | fr_timestamp | fr_deleted |
+-----------+----------------+------------+
| 119353871 | 20170617192717 | 0 |
+-----------+----------------+------------+
1 row in set (0.003 sec)(Weapons_Instructor_Course.jpg is a random image picked from the Commons main page as control group)
mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select count(*) from file where file_latest = 0; +----------+ | count(*) | +----------+ | 740350 | +----------+ 1 row in set (0.544 sec) mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select count(*) from file where file_latest = 0 and file_deleted = 0; +----------+ | count(*) | +----------+ | 580 | +----------+ 1 row in set (53.210 sec)
Some files also have a revision but no :
mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select count(*) from filerevision left join file on fr_file = file_id where file_id is null; +----------+ | count(*) | +----------+ | 26 | +----------+ 1 row in set (7 min 0.544 sec)
So my hypothesis is: the migration script failed for a tiny number of Commons files and did not import any filerevision row (or in some cases imported the row but did not update ); this is mostly invisible because Commons is still using the old tables, but either something got cached 12 days ago when READ_NEW was enabled experimentally for a few minutes, and stuck there; or READ_NEW is used incorrectly when other wikis access the Commons DB via ForeignDBRepo.
Similar errors might or might not exist in other wiki databases (which already read the new tables so the file would just be deterministically missing there). I spot-checked enwiki and dewiki; dewiki had zero affected files, enwiki had no files with missing revisions but 4 files with missing field.
