VOOZH about

URL: https://phabricator.wikimedia.org/T48143

⇱ ⚓ T48143 Spam blacklist ineffective on encoded URLs inside file inclusion syntax's link parameter


Maniphest T48143

Spam blacklist ineffective on encoded URLs inside file inclusion syntax's link parameter
Closed, ResolvedPublic

Assigned To
Authored By
MZMcBride
Mar 15 2013, 3:50 AM
Referenced Files
F7104295: T48143-REL1_27.patch
Mar 30 2017, 11:57 PM
F7104297: T48143-REL1_28.patch
Mar 30 2017, 11:57 PM
F7104296: T48143-REL1_23.patch
Mar 30 2017, 11:57 PM
F7104294: T48143-master.patch
Mar 30 2017, 11:57 PM
F7103902: T48143-REL1_28.patch
Mar 30 2017, 11:35 PM
F7103900: T48143-master.patch
Mar 30 2017, 11:35 PM
F7103901: T48143-REL1_27.patch
Mar 30 2017, 11:35 PM
F7102995: T48143-master.patch
Mar 30 2017, 11:00 PM

Description

Spam blacklist entries:


\bwikipediaforum\.com\b

\bwikipediocracy\.com\b

File inclusion syntax that makes a working disallowed link:


[[File:Symbol wtf vote.svg|20px|link=http://wikipediocracy%2Ecom]]


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:18 AM
bzimport added a project: Security-General.
bzimport set Reference to bz46143.
bzimport changed Security from none to Software security bug.
Restricted Application changed the visibility from "Public (No Login Required)" to "acl*security (Project)". · View Herald TranscriptNov 22 2014, 1:18 AM
Restricted Application changed the edit policy from "All Users" to "acl*security (Project)". · View Herald Transcript
Comment Actions

Thanks MZMcBride, it looks like Chrome and Opera treat a %2E in the hostname of the url as a valid . character.

The quick fix is just to replace %2e characters before matching, like the fix for bug 12896. But it may be good to do more extensive normalization in the near future. Maybe something like:

function normalizeUrl( $url ) {
$bits = wfParseUrl( strtolower( Sanitizer::cleanUrl( $url ) ) );
$bits['host'] = str_replace( array( '%2e', '.'), '.', $bits['host'] );
$bits['path'] = str_replace( array( '%2e', '.'), '.', $bits['path'] );
return wfAssembleUrl( $bits );
}

Comment Actions

Created attachment 11941
Decode %2E before matching blacklist

Attached:

bug46143.patch846 BDownload
Comment Actions

(In reply to comment #1)

The quick fix is just to replace %2e characters before matching, like the fix
for bug 12896. But it may be good to do more extensive normalization in the
near future.

Hmmm, this is something to do with file inclusion syntax's link parameter specifically, I think.

If I attempt to insert "http://badsite%2Ecom" or "[http://badsite%2Ecom hello]" or "[[File:Symbol wtf vote.svg|10px|link=http://badsite.com]]" into a wiki page, it appropriately fails if the domain is blacklisted.

However, "[[File:Symbol wtf vote.svg|10px|link=http://badsite%2Ecom]]" works.

URL unquoting (i.e., changing "%2E" to ".", etc.) is already being done in some cases for comparison against the spam blacklist, so the question is why this particular syntax is able to be saved without being flagged as blacklisted.

Comment Actions

Nice catch. Before most links are added to the list of external links, they're passed through Parser::replaceUnusualEscapes first. Image links are not at the moment.

Comment Actions

Created attachment 11942
Filter urls with replaceUnusualEscapes before adding to external links list

This would be instead of patching SpamBlacklist, so that any other uses of the external links list will have consistent url normalization.

Attached:

bug46143-parser.patch1 KBDownload
Restricted Application changed the visibility from "acl*security (Project)" to "Custom Policy". · View Herald TranscriptNov 24 2014, 9:28 PM
Restricted Application changed the edit policy from "acl*security (Project)" to "Custom Policy". · View Herald Transcript
Restricted Application changed the visibility from "Custom Policy" to "Custom Policy". · View Herald TranscriptDec 30 2014, 9:45 PM
Restricted Application changed the edit policy from "Custom Policy" to "Custom Policy". · View Herald Transcript
Bawolff subscribed.
Comment Actions

I think it'd be better to move the normalization check directly into ParserOutput::addExternalLink. You should never not normalize the link, and doing it separately seems to be inviting people to forget.

To that end:

Normalize urls always when adding to ParserOutput::addExternalLink3 KBDownload

Also, a similar issue is present in <gallery> tags, which the above patch fixes.

This comment was removed by Reedy.
This comment was removed by Reedy.
Comment Actions
T48143-master.patch3 KBDownload
T48143-REL1_27.patch3 KBDownload
T48143-REL1_23.patch3 KBDownload
T48143-REL1_28.patch3 KBDownload
Comment Actions

Closing for ease of tracking progress. Patches attached to parent bug, due for next release

Reedy changed the visibility from "Custom Policy" to "Public (No Login Required)".Apr 6 2017, 8:57 PM
Reedy changed the edit policy from "Custom Policy" to "All Users".
Comment Actions

Change 346846 merged by jenkins-bot:
[mediawiki/core@master] SECURITY: Always normalize link url before adding to ParserOutput

https://gerrit.wikimedia.org/r/346846

Comment Actions

Change 346865 merged by jenkins-bot:
[mediawiki/core@REL1_28] SECURITY: Always normalize link url before adding to ParserOutput

https://gerrit.wikimedia.org/r/346865

Comment Actions

Change 346855 merged by jenkins-bot:
[mediawiki/core@REL1_27] SECURITY: Always normalize link url before adding to ParserOutput

https://gerrit.wikimedia.org/r/346855

Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL · Credits