VOOZH about

URL: https://phabricator.wikimedia.org/T429376

⇱ ⚓ T429376 Files exist but cannot be embedded with Parsoid


Maniphest T429376

Files exist but cannot be embedded with Parsoid
Open, Needs TriagePublicBUG REPORT

Description

Two files on Wikimedia Commons cannot be embedded in articles when using Parsoid. Purging does not fix the problem permanently.

Demo of the problem: https://de.wikipedia.org/w/index.php?title=Benutzer:Kallichore/Test4&action=parsermigration-edit

What happens: The legacy Parser embeds the images, while Parsoid just shows red text with the file names inside a box. However, when the problem started 3 days ago the Legacy paser was affected as well.

Affected files:
https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenquelle.jpg
https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenm%C3%BCndung.jpg

Parsoid generates this in the article HTML:

<figure class="mw-default-size" typeof="mw:Error mw:File/Thumb" id="mwAg" data-mw='{"errors":[{"key":"apierror-filedoesnotexist","message":"This image does not exist."}]}'><a href="//de.wikipedia.org/wiki/Special:FilePath/Mühlgrabenquelle.jpg" class="new" title="Datei:Mühlgrabenquelle.jpg" id="mwAw"><span class="mw-file-element mw-broken-media" resource="./Datei:Mühlgrabenquelle.jpg" data-width="250" id="mwBA">Datei:Mühlgrabenquelle.jpg</span></a><figcaption id="mwBQ"></figcaption></figure>

Reports about this Problem:
de.wiki: https://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia#M%C3%BChlgraben_(Aubach)
commons: https://commons.wikimedia.org/wiki/Commons:Village_pump/Technical#Files_exist_on_commons,_but_cannot_be_embedded_in_articles

Event Timeline

ssastry subscribed.
Comment Actions

I don't think this is Parsoid related. Just now, I purged both images on commons via "?action=purge", and now https://de.wikipedia.org/w/index.php?title=Benutzer:Kallichore/Test4&action=parsermigration-edit shows identical behavior for both images. I don't offhand know what the problem is, but I'll try to find someone who may know what is happening.

Comment Actions

Hmm, locally, from , I see

{
 "batchcomplete": "",
 "query": {
 "pages": {
 "-1": {
 "ns": 6,
 "title": "File:M\u00fchlgrabenquelle.jpg",
 "missing": "",
 "known": "",
 "imagerepository": "wikimediacommons",
 "imageinfo": [
 {
 "url": "https://upload.wikimedia.org/wikipedia/commons/3/34/M%C3%BChlgrabenquelle.jpg",
 "descriptionurl": "https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenquelle.jpg",
 "descriptionshorturl": "https://commons.wikimedia.org/w/index.php?curid=19091096"
 }
 ]
 }
 }
 }
}

but at https://de.wikipedia.org/w/api.php?action=query&titles=Datei:M%C3%BChlgrabenquelle.jpg&prop=imageinfo&iiprop=url the response is

{
 "batchcomplete": "",
 "query": {
 "pages": {
 "-1": {
 "ns": 6,
 "title": "Datei:M\u00fchlgrabenquelle.jpg",
 "missing": "",
 "imagerepository": ""
 }
 }
 }
}
Comment Actions

Working fine now. The file was uploaded fifteen years ago, and there is no indication it has been moved or deleted recently (or at all).
No upload log either, which is weird, but maybe we did not have that yet in 2012?

The imageinfo API on dewiki still doesn't see it though, even though https://de.wikipedia.org/wiki/Datei:M%C3%BChlgrabenquelle.jpg works. The previous state is stuck in memcached, I imagine?

No idea how the situation arose in the first place.

Comment Actions

It doesn't work for https://commons.wikimedia.org/wiki/File:M%C3%BChlgrabenm%C3%BCndung.jpg .. and Parsoid still doesn't see the image (because of the imageinfo API issue, because effectively, that is what Parsoid uses).

Comment Actions

Ideas (from searching for recently merged core patches with "image"/"file"/"media"/"cache"):

I only searched in commit summaries, and only skimmed the changes which looked relevant, so might have easily missed other changes. Still, my money is on the file table migration.

Comment Actions

filerevision entries are missing:

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select img_name, img_timestamp from image where img_name in ('Mühlgrabenquelle.jpg', 'Mühlgrabenmündung.jpg', 'Weapons_Instructor_Course.jpg');
+-------------------------------+----------------+
| img_name | img_timestamp |
+-------------------------------+----------------+
| Mühlgrabenmündung.jpg | 20120415085340 |
| Mühlgrabenquelle.jpg | 20120415085541 |
| Weapons_Instructor_Course.jpg | 20170617192717 |
+-------------------------------+----------------+
3 rows in set (0.002 sec)

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select file_id, file_name, file_deleted, file_latest from file where file_name in ('Mühlgrabenquelle.jpg', 'Mühlgrabenmündung.jpg', 'Weapons_Instructor_Course.jpg');
+-----------+-------------------------------+--------------+-------------+
| file_id | file_name | file_deleted | file_latest |
+-----------+-------------------------------+--------------+-------------+
| 91422617 | Mühlgrabenmündung.jpg | 0 | 0 |
| 92086515 | Mühlgrabenquelle.jpg | 0 | 0 |
| 111052288 | Weapons_Instructor_Course.jpg | 0 | 119353871 |
+-----------+-------------------------------+--------------+-------------+
3 rows in set (0.002 sec)

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select fr_id, fr_timestamp, fr_deleted from filerevision where fr_file in (91422617, 92086515, 111052288);
+-----------+----------------+------------+
| fr_id | fr_timestamp | fr_deleted |
+-----------+----------------+------------+
| 119353871 | 20170617192717 | 0 |
+-----------+----------------+------------+
1 row in set (0.003 sec)

(Weapons_Instructor_Course.jpg is a random image picked from the Commons main page as control group)

Comment Actions
mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select count(*) from file where file_latest = 0;
+----------+
| count(*) |
+----------+
| 740350 |
+----------+
1 row in set (0.544 sec)

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select count(*) from file where file_latest = 0 and file_deleted = 0;
+----------+
| count(*) |
+----------+
| 580 |
+----------+
1 row in set (53.210 sec)
Comment Actions

Some files also have a revision but no :

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> select count(*) from filerevision left join file on fr_file = file_id where file_id is null;
+----------+
| count(*) |
+----------+
| 26 |
+----------+
1 row in set (7 min 0.544 sec)
Comment Actions

So my hypothesis is: the migration script failed for a tiny number of Commons files and did not import any filerevision row (or in some cases imported the row but did not update ); this is mostly invisible because Commons is still using the old tables, but either something got cached 12 days ago when READ_NEW was enabled experimentally for a few minutes, and stuck there; or READ_NEW is used incorrectly when other wikis access the Commons DB via ForeignDBRepo.

Similar errors might or might not exist in other wiki databases (which already read the new tables so the file would just be deterministically missing there). I spot-checked enwiki and dewiki; dewiki had zero affected files, enwiki had no files with missing revisions but 4 files with missing field.

Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL · Credits