VOOZH about

URL: https://en.wiktionary.org/wiki/Module:data_consistency_check

⇱ Module:data consistency check - Wiktionary, the free dictionary


Jump to content
From Wiktionary, the free dictionary

The following documentation is located at Module:data consistency check/documentation. [edit]
Useful links: subpage listlinkstransclusionstestcasessandbox

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Output

Discrepancies detected:

  • Proto-language with no family: Proto-Amuesha-Chamicuro (awd-amc-pro) should be the proto-language of "awd-amc", which doesn't exist.
  • Proto-language with no family: Proto-Kampa (awd-kmp-pro) should be the proto-language of "awd-kmp", which doesn't exist.
  • Proto-language with no family: Proto-Paresi-Waura (awd-prw-pro) should be the proto-language of "awd-prw", which doesn't exist.
  • Proto-language with no family: Proto-Rukai (dru-pro) should be the proto-language of "dru", but Rukai (dru) is not a family.
  • Proto-language with no family: Proto-Puroik (sit-khp-pro) should be the proto-language of "sit-khp", which doesn't exist.
  • Blissymbolic script (Blis) is not used by any language and has no characters listed for auto-detection.
  • Cypro-Minoan script (Cpmn) is not used by any language.
  • Hiragana script (Hira) is not used by any language.
  • Kana script (Hrkt) is not used by any language.
  • Image-rendered script (Image) is not used by any language and has no characters listed for auto-detection.
  • International Phonetic Alphabet (Ipach) is not used by any language and has no characters listed for auto-detection.
  • Moon script (Moon) is not used by any language and has no characters listed for auto-detection.
  • Morse code (Morse) is not used by any language and has no characters listed for auto-detection.
  • musical notation (Music) is not used by any language.
  • Proto-Cuneiform script (Pcun) is not used by any language and has no characters listed for auto-detection.
  • Proto-Elamite script (Pelm) is not used by any language and has no characters listed for auto-detection.
  • Proto-Sinaitic script (Psin) is not used by any language and has no characters listed for auto-detection.
  • Rongorongo script (Roro) is not used by any language and has no characters listed for auto-detection.
  • Rumi numerals (Rumin) is not used by any language.
  • flag semaphore (Semap) is not used by any language and has no characters listed for auto-detection.
  • Visible Speech script (Visp) is not used by any language and has no characters listed for auto-detection.
  • mathematical notation (Zmth) is not used by any language.
  • symbolic script (Zsym) is not used by any language.
  • undetermined script (Zyyy) is not used by any language and has no characters listed for auto-detection.
  • uncoded script (Zzzz) is not used by any language and has no characters listed for auto-detection.
  • The codes fa-Arab, ug-Arab, ks-Arab, ps-Arab, ur-Arab, ku-Arab, tt-Arab, ota-Arab, mzn-Arab and sd-Arab are currently alias codes. Only one code should be used in the data.
  • The codes ms-Arab and kk-Arab are currently alias codes. Only one code should be used in the data.

Checks performed

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.
  • Wikidata item IDs must be a positive integer or a string starting with Q and ending with decimal digits.

The following must be true of the data used by Module:languages:

  • Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
  • The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
  • If field 2 is not nil, it must a valid Wikidata item ID.
  • If field 3 or family is given and not nil, it must be a valid family code.
  • If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
  • If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standard_chars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
  • If link_tr is present, it must be true.
  • Have no data keys besides these: 1,2,3,"entry_name","sort_key","display","otherNames","aliases","varieties","type","scripts","ancestors","wikimedia_codes","wikipedia_article","standard_chars","translit","override_translit","link_tr".

Checks not performed:

  • If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
  • If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:

  • canonicalName must be given.
  • parent must be given must be a valid language, family or etymology-only language code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
  • Have no data keys besides these: "canonicalName","otherNames","parent","ancestors","wikipedia_article","wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName","otherNames","family","protoLanguage","wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName","otherNames","parent","systems","wikipedia_article","characters","direction".

-- TODO:
-- ietf_subtag field used with a 2/3-letter langauge/family code except qaa-qtz, or a 4-letter script code.
-- Check against files containing up-to-date ISO data, to cross-check validity.
localexport={}
localmw=mw
localrequire=require
localstring=string
localArray=require("Module:array")
localm_en_utilities=require("Module:en-utilities")
localm_etym_languages_canonical_names=require("Module:etymology languages/canonical names")
localm_etym_languages_codes=require("Module:etymology languages/code to canonical name")
localm_etym_languages_data=require("Module:etymology languages/data")
localm_families=require("Module:families")
localm_families_canonical_names=require("Module:families/canonical names")
localm_families_codes=require("Module:families/code to canonical name")
localm_families_data=require("Module:families/data")
localm_languages=require("Module:languages")
localm_languages_canonical_names=require("Module:languages/canonical names")
localm_languages_codes=require("Module:languages/code to canonical name")
localm_languages_data_all=require("Module:languages/data/all")
localm_load=require("Module:load")
localm_scripts=require("Module:scripts")
localm_scripts_canonical_names=require("Module:scripts/canonical names")
localm_scripts_codes=require("Module:scripts/code to canonical name")
localm_scripts_data=require("Module:scripts/data")
localm_str_utils=require("Module:string utilities")
localm_table=require("Module:table")
localadd_indefinite_article=m_en_utilities.add_indefinite_article
localcodepoint=m_str_utils.codepoint
localconcat=table.concat
localdump=mw.dumpObject
localformat=string.format
localgcodepoint=m_str_utils.gcodepoint
localget_data_module_name=m_languages.getDataModuleName
localget_family_by_code=m_families.getByCode
localget_family_by_canonical_name=m_families.getByCanonicalName
localget_indefinite_article=m_en_utilities.get_indefinite_article
localget_language_by_code=m_languages.getByCode
localget_language_by_canonical_name=m_languages.getByCanonicalName
localget_script_by_code=m_scripts.getByCode
localget_script_by_canonical_name=m_scripts.getByCanonicalName
localgmatch=string.gmatch
localgsub=string.gsub
localinsert=table.insert
localipairs=ipairs
localis_callable=require("Module:fun").is_callable
localis_positive_integer=require("Module:math").is_positive_integer
localis_known_language_tag=mw.language.isKnownLanguageTag
localisutf8=mw.ustring.isutf8
localjson_decode=mw.text.jsonDecode
locallanguage_link=require("Module:links").language_link
locallist_to_set=m_table.listToSet
locallist_to_text=mw.text.listToText
localload_data=m_load.load_data
locallog=mw.log
localmain_loader=package.loaders[2]
localmake_family=m_families.makeObject
localmake_lang=m_languages.makeObject
localmake_script=m_scripts.makeObject
localmatch=string.match
localnew_title=mw.title.new
localnext=next
localpairs=pairs
localpcall=pcall
localremove_comments=require("Module:string/removeComments")
localsafe_require=m_load.safe_require
localsorted_pairs=m_table.sortedPairs
localsplit=m_str_utils.split
localsub=string.sub
localtable_len=m_table.length
localtag_text=require("Module:script utilities").tag_text
localtype=type
localumatch=m_str_utils.match
localunpack=unpackortable.unpack-- Lua 5.2 compatibility
localaliases=require("Module:languages/data").aliases
localmessages
localfunctiondiscrepancy(modname,...)
localsuccess,result=pcall(function(...)
messages[modname]:insert(format(...))
end,...)
ifnotsuccessthen
log(result,...)
end
end
localmessages_mt={}
functionmessages_mt:__index(k)
localval=Array()
self[k]=val
returnval
end
localall_codes={}
locallanguage_names={}
localetym_language_names={}
localfamily_names={}
localscript_names={}
localnonempty_families={}
localallowed_empty_families={tbq=true}
localnonempty_scripts={}
localfunctionlink(obj,code_first)
returntype(obj)=="string"andobjor
code_firstandformat("<code>%s</code> (%s)",obj:getCode(),obj:makeCategoryLink())or
format("%s (<code>%s</code>)",obj:makeCategoryLink(),obj:getCode())
end
localfunctioncheck_data_keys(...)
localvalid_keys=Array(...):toSet()
returnfunction(modname,obj,data)
localinvalid_keys
forkinpairs(data)do
ifnotvalid_keys[k]then
ifnotinvalid_keysthen
invalid_keys=Array(k)
else
invalid_keys:insert(k)
end
end
end
ifinvalid_keys==nilthen
return
end
localplural=#invalid_keys~=1
discrepancy(modname,
"The data key%s %s for %s %s invalid.",
pluraland"s"or"",
invalid_keys:map(function(key)
return"<code>"..key.."</code>"
end):concat(", "),
link(obj),
pluraland"are"or"is"
)
end
end
-- Modification of isArray in [[Module:table]].
-- This assumes all keys are either integers or non-numbers.
-- If there are fractional numbers, the results might be incorrect.
-- For instance, find_gap{"a", "b", [0.5] = true} evaluates to 3, but there
-- isn't a gap at 3 in the sense of there being an integer key greater than 3.
localfunctionfind_gap(t,can_contain_non_number_keys)
locali=0
forkinpairs(t)do
ifnot(can_contain_non_number_keysandtype(k)~="number")then
i=i+1
ift[i]==nilthen
returni
end
end
end
end
localfunctioncheck_true_or_string_or_nil(modname,obj,data,key)
localfield=data[key]
ifnot(field==nilorfield==trueortype(field)=="string")then
discrepancy(modname,
"%s has %s <code>%s</code> value that is not <code>nil</code>, <code>true</code> or a string: <code>%s</code>",
link(obj),get_indefinite_article(key),key,dump(data[key])
)
end
end
localfunctioncheck_array(modname,obj,data,array_name,parent_array_name,can_contain_non_number_keys)
localparent_table=data
ifparent_array_namethen
parent_table=assert(data[parent_array_name],parent_array_name)
parent_array_name="the <code>"..parent_array_name.."</code> field in "
else
parent_array_name=""
end
localarray_type=type(parent_table[array_name])
ifarray_type=="table"then
localgap=find_gap(parent_table[array_name],can_contain_non_number_keys)
ifgapthen
discrepancy(modname,
"The <code>%s</code> array in %sthe data table for %s has a gap at index %d.",
array_name,
parent_array_name,
link(obj),
gap
)
else
returntrue
end
else
discrepancy(modname,
"The <code>%s</code> field in %sthe data table for %s should be an array (table) but is %s.",
array_name,
parent_array_name,
link(obj),
array_type=="nil"and"nil"or"a "..array_type
)
end
end
localfunctioncheck_no_alias_codes(modname,mod_data)
locallookup,discrepancies={},{}
fork,vinpairs(mod_data)do
localcheck=lookup[v]
ifcheckthen
discrepancies[check]=discrepancies[check]or{"<code>"..check.."</code>"}
insert(discrepancies[check],"<code>"..k.."</code>")
else
lookup[v]=k
end
end
for_,vinpairs(discrepancies)do
discrepancy(modname,
"The codes %s are currently alias codes. Only one code should be used in the data.",
list_to_text(v,", "," and ")
)
end
end
localfunctioncheck_wikidata_item(modname,obj,data,key)
localdata_item=data[key]
ifdata_item==niloris_positive_integer(data_item)then
return
end
discrepancy(modname,
"%s has a Wikidata item ID that is not a positive integer: <code>%s</code>",
link(obj),dump(data_item)
)
end
localfunctioncheck_name_field(modname,obj,data,canonical_name,data_key,allow_nested,allow_canonical_name_in_table)
localarray=data[data_key]
ifnotarraythen
return
end
check_array(modname,obj,data,data_key,nil,true)
localnames={}
localfunctioncheck_other_name(other_name)
ifnotallow_canonical_name_in_tableandother_name==canonical_namethen
discrepancy(modname,
"%s has its canonical name (<code>%s</code>) repeated in the table of <code>%s</code>.",
link(obj),dump(canonical_name),data_key
)
end
ifnames[other_name]then
discrepancy(modname,
"The name %s is found twice or more in the list of <code>%s</code> for %s.",
other_name,data_key,link(obj)
)
end
names[other_name]=true
end
for_,other_nameinipairs(array)do
iftype(other_name)=="table"then
ifnotallow_nestedthen
discrepancy(modname,
"A nested table is found in the list of <code>%s</code> for %s, but isn't allowed.",
data_key,link(obj)
)
else
for_,oninipairs(other_name)do
check_other_name(on)
end
end
else
check_other_name(other_name)
end
end
end
localfunctioncheck_other_names_aliases_varieties(modname,obj,data,canonical_name)
ifdata.other_namesthen
check_name_field(modname,obj,data,canonical_name,"other_names")
end
ifdata.aliasesthen
check_name_field(modname,obj,data,canonical_name,"aliases")
end
ifdata.varietiesthen
-- Sometimes a variety legitimately has the same name as the language as a whole, so allow that.
check_name_field(modname,obj,data,canonical_name,"varieties","allow_nested","allow_canonical_name_in_table")
end
end
localfunctionvalidate_pattern(pattern,modname,obj,standard_chars)
iftype(pattern)~="string"then
returndiscrepancy(modname,
"\"%s\", the %spattern for %s, is not a string.",
pattern,standard_charsand"standard character "or"",link(obj)
)
elseifnotisutf8(pattern)then
returndiscrepancy(modname,
"%s specifies a pattern for for %scharacter detection which is not valid UTF-8: <code>%s</code>",
link(obj),standard_charsand"standard "or"",dump(pattern)
)
end
localranges
forlower,higheringmatch(pattern,"(.[\128-\191]*)%-%%?(.[\128-\191]*)")do
ifcodepoint(lower)>=codepoint(higher)then
ranges=rangesorArray()
insert(ranges,{lower,higher})
end
end
ifrangesandranges[1]then
localplural=#ranges~=1and"s"or""
discrepancy(modname,
"%s specifies an invalid pattern "..
"for %scharacter detection: <code>%s</code>. The first codepoint%s "..
"in the range%s %s %s must be less than or equal to the second.",
link(obj),standard_charsand"standard "or"",dump(pattern),plural,plural,
ranges:map(function(range)
returnformat(range[1].."-"..range[2].." (U+%X, U+%X)",codepoint(range[1]),codepoint(range[2]))
end):concat(", "),
#ranges~=1and"are"or"is"
)
end
localsuccess,result=pcall(umatch,"","["..pattern.."]")
ifnotsuccessthen
discrepancy(modname,
"%s specifies an invalid pattern for %scharacter detection: <code>%s</code> (%s)",
link(obj),standard_charsand"standard "or"",dump(pattern),result
)
end
end
localremove_exceptions_addition=0xF0000
localmaximum_code_point=0x10FFFF
localremove_exceptions_maximum_code_point=maximum_code_point-remove_exceptions_addition
-- TODO: check modules exist.
-- TODO: validate script codes and check inner tables.
localfunctioncheck_replacement_data(modname,obj,data,key,func_name)
localreplacements=data[key]
ifreplacements==nilthen
return
end
localreplacements_type=type(replacements)
ifreplacements_type=="string"then
localmod=main_loader("Module:"..replacements)
ifnotmodthen
discrepancy(modname,
"The <code>%s</code> field in the data table for %s specifies the module [[Module:%s]], which does not exist.",
key,link(obj),replacements
)
else
mod=mod()
ifnot(type(mod)=="table"andis_callable(mod[func_name]))then
discrepancy(modname,
"The <code>%s</code> field in the data table for %s specifies the module [[Module:%s]], which exists, but does not contain the expected function <code>%s()</code>.",
key,link(obj),replacements,func_name
)
end
end
return
elseifreplacements_type~="table"then
discrepancy(modname,
"The <code>%s</code> field in the data table for %s must be a string or table, not a %s.",
key,link(obj),replacements_type
)
return
end
localfrom,to=replacements.from,replacements.to
if(from~=nil)~=(to~=nil)then
discrepancy(modname,
"The <code>from</code> and <code>to</code> arrays in the <code>%s</code> table for %s are not both defined or both undefined.",
key,link(obj)
)
elseiffromthen
for_,kinipairs{"from","to"}do
check_array(modname,obj,data,k,key)
end
end
localremove_diacritics=replacements.remove_diacritics
ifnot(remove_diacritics==nilortype(remove_diacritics)=="string")then
discrepancy(modname,
"The <code>remove_diacritics</code> field in the <code>%s</code> table for %s table must be a string.",
key,link(obj)
)
end
localremove_exceptions=replacements.remove_exceptions
ifremove_exceptionsthen
ifcheck_array(modname,obj,data,"remove_exceptions",key)then
forsequence_i,sequenceinipairs(remove_exceptions)do
localcode_point_i=0
forcode_pointingcodepoint(sequence)do
code_point_i=code_point_i+1
ifcode_point>remove_exceptions_maximum_code_pointthen
discrepancy(modname,
"Code point #%d (0x%04X) in field #%d of the <code>remove_exceptions</code> array for %s is over U+%04X.",
code_point_i,code_point,sequence_i,link(obj),remove_exceptions_maximum_code_point
)
end
end
end
end
end
iffromandtoandtable_len(to)>table_len(from)then
discrepancy(modname,
"The <code>from</code> array in the <code>%s</code> table for %s must be shorter or the same length as the <code>to</code> array.",
key,link(obj)
)
end
end
localfunctioncheck_replacements_data(modname,obj,data)
for_,replacement_specinipairs{
{"translit","tr"},
{"display_text","makeDisplayText"},
{"strip_diacritics","stripDiacritics"},
{"sort_key","makeSortKey"},
}do
check_replacement_data(modname,obj,data,unpack(replacement_spec))
end
end
localfunctionhas_ancestor(lang,code)
for_,ancinipairs(lang:getAncestors())do
ifcode==anc:getCode()orhas_ancestor(anc,code)then
returntrue
end
end
end
localfunctionget_default_ancestors(lang)
iflang:hasType("language","etymology-only")then
localparent=lang:getParent()
ifnothas_ancestor(parent,lang:getCode())then
returnparent:getAncestorCodes()
end
end
localfam_code,def_anc=lang:getFamilyCode()
whilefam_codeandfam_code~="qfa-not"do
localfam=m_families_data[fam_code]
def_anc=fam.protoLanguageor
m_languages_data_all[fam_code.."-pro"]andfam_code.."-pro"or
m_etym_languages_data[fam_code.."-pro"]andfam_code.."-pro"
ifdef_ancanddef_anc~=lang:getCode()then
return{def_anc}
end
fam_code=fam[3]
end
end
localfunctioniterate_ancestor(obj,modname,anc_code)
localanc=get_language_by_code(anc_code,nil,true)
ifnotancthen
discrepancy(modname,
"%s lists the invalid language code <code>%s</code> as its ancestor.",
link(obj),dump(anc_code)
)
return
end
localanc_fam=anc:getFamily()
ifnotanc_famthen
discrepancy(modname,
"%s has no family.",
link(anc)
)
return
end
localanc_fam_code=anc_fam:getCode()
localdef_ancs=get_default_ancestors(obj)
ifdef_ancsthen
for_,def_ancinipairs(def_ancs)do
def_anc=get_language_by_code(def_anc,nil,true)
ifdef_ancand(
anc_code==def_anc:getCode()or
has_ancestor(def_anc,anc_code)or
def_anc:hasParent(anc_code)andnothas_ancestor(anc,def_anc:getCode())
)then
discrepancy(modname,
"%s has the ancestor %s listed in its ancestor field, which is redundant, since it is determined to be ancestral automatically.",
link(obj),link(anc)
)
end
end
end
ifnotobj:inFamily(anc_fam_code)then
discrepancy(modname,
"%s has %s set as an ancestor, but is not in the %s.",
link(obj),link(anc),link(anc_fam)
)
end
localfam,proto=obj
repeat
fam=fam:getFamily()
proto=famandfam:getProtoLanguage()
untilprotoornotfamorfam:getCode()=="qfa-not"
ifprotoandnot(
proto:getCode()==anc:getCode()or
proto:hasAncestor(anc:getCode())or
anc:hasAncestor(proto:getCode())
)then
localfam=obj:getFamily()
discrepancy(modname,
"%s is in the %s and has %s set as an ancestor, but it is not possible to form an ancestral chain between them.",
link(obj),link(fam),link(anc)
)
end
end
localfunctioncheck_ancestors(modname,obj,data)
localancestors=data.ancestors
ifancestors==nilthen
return
end
localancestors_type=type(ancestors)
ifancestors_type=="string"then
ancestors=split(ancestors,",",true,true)
elseifancestors_type~="table"then
discrepancy(modname,
"The <code>ancestors</code> field in the data table for %s must be a string or table, not a %s.",
link(obj),ancestors_type
)
end
for_,ancinipairs(ancestors)do
iterate_ancestor(obj,modname,anc)
end
end
localfunctioncheck_wikimedia_codes(modname,obj,data)
localwikimedia_codes=data.wikimedia_codes
ifwikimedia_codes==nilthen
return
end
localwikimedia_codes_type=type(wikimedia_codes)
ifwikimedia_codes_type=="string"then
wikimedia_codes=split(wikimedia_codes,",",true,true)
elseifwikimedia_codes_type~="table"then
discrepancy(modname,
"The <code>wikimedia_codes</code> field in the data table for %s must be a string or table, not a %s.",
link(obj),wikimedia_codes_type
)
end
for_,codeinipairs(wikimedia_codes)do
ifnotis_known_language_tag(code)then
discrepancy(modname,
"%s lists the invalid Wikimedia code <code>%s</code> in the <code>wikimedia_codes</code> field.",
link(obj),dump(code)
)
end
end
end
localfunctioncheck_code_to_name_and_name_to_code_maps(
source_module_type,
source_module_description,
code_to_module_map,name_to_code_map,
code_to_name_modname,code_to_name_module,
name_to_code_modname,name_to_code_module
)
localfunctioncheck_code_and_name(modname,code,canonical_name)
-- Check the code is in code_to_module_map and that it didn't originate from the wrong data module.
localcheck_mod=code_to_module_map[code]orcode_to_module_map[aliases[code]]
ifnot(check_modandmatch(check_mod,"^"..source_module_type.."/data"))then
ifnotname_to_code_map[canonical_name]then
discrepancy(modname,
"The code <code>%s</code> and the canonical name %s should be removed; they are not found in %s.",
code,canonical_name,source_module_description
)
else
discrepancy(modname,
"<code>%s</code>, the code for the canonical name %s, is wrong; it should be <code>%s</code>.",
code,canonical_name,name_to_code_map[canonical_name]
)
end
elseifnotname_to_code_map[canonical_name]then
localdata_table=require("Module:"..code_to_module_map[code])[code]
discrepancy(modname,
"%s, the canonical name for the code <code>%s</code>, is wrong; it should be %s.",
canonical_name,code,data_table[1]
)
end
end
forcode,canonical_nameinpairs(code_to_name_module)do
check_code_and_name(code_to_name_modname,code,canonical_name)
end
forcanonical_name,codeinpairs(name_to_code_module)do
check_code_and_name(name_to_code_modname,code,canonical_name)
end
end
localfunctioncheck_extraneous_extra_data(
data_modname,data_module,extra_data_modname,extra_data_module)
forcode,_inpairs(extra_data_module)do
ifnotdata_module[code]then
discrepancy(extra_data_modname,
"The code <code>%s</code> is not found in [[Module:%s]], and should be removed from [[Module:%s]].",
code,data_modname,extra_data_modname
)
end
end
end
-- TODO: add collision check between the canonical names "X" and "X [Ll]anguage".
localfunctioncheck_languages(frame)
localcheck_language_data_keys=check_data_keys(
1,2,3,4,-- canonical name, Wikidata item, family, scripts
"display_text","generate_forms","strip_diacritics","sort_key",
"other_names","aliases","varieties","ietf_subtag",
"type","ancestors","pseudo_families",
"wikimedia_codes","wikipedia_article","standard_chars",
"translit","override_translit","link_tr",
"dotted_dotless_i"
)
localfunctioncheck_language(modname,code,data,extra_modname,extra_data)
localobj,code_modname,canonical_name=make_lang(code,data,true),get_data_module_name(code),data[1]
-- FIXME: this module should use the prefixed module name throughout.
code_modname=code_modname:gsub("^Module:","")
ifcode_modname~=modnamethen
ifcode_modname=="languages/data/2"then
discrepancy(modname,
"%s is a two-letter code, so should be moved to [[Module:%s]].",
link(obj),code_modname
)
elseifcode_modname=="languages/data/exceptional"then
discrepancy(modname,
"%s is an exceptional code, as it does not consist of two or three lowercase letters, so should be moved to [[Module:%s]].",
link(obj),code_modname
)
else
discrepancy(modname,
"%s is a three-letter code beginning with '%s', so should be moved to [[Module:%s]].",
link(obj),sub(code,1,1),code_modname
)
end
end
check_language_data_keys(modname,obj,data)
ifall_codes[code]then
discrepancy(modname,
"The code <code>%s</code> is not unique; it is also defined in [[Module:%s]].",
code,all_codes[code]
)
else
ifnotm_languages_codes[code]then
discrepancy("languages/code to canonical name",
"The code %s is missing.",
link(obj,true)
)
end
all_codes[code]=modname
end
-- TODO: these checks should be consolidated with the proto-language checks in the family data,
-- since bad settings there affect the warnings here (e.g. xxx-pro assigned to yyy when xxx also
-- doesn't not exist - a warning that xxx has "no family" would be misleading).
ifsub(code,-4)=="-pro"then
localfam_code=sub(code,1,-5)
localfam=get_language_by_code(fam_code,nil,true,true)
ifnotfamthen
discrepancy(modname,
"'''Proto-language with no family''': %s should be the proto-language of <code>%s</code>, which doesn't exist.",
link(obj),dump(fam_code)
)
elseifnotfam:hasType("family")then
discrepancy(modname,
"'''Proto-language with no family''': %s should be the proto-language of <code>%s</code>, but %s is not a family.",
link(obj),dump(fam_code),link(fam)
)
else
-- Reinstate this as low-priority once message priorities have been implemented.
--				local expected_name = "Proto-" .. fam:getCanonicalName()
--				if canonical_name ~= expected_name then
--					discrepancy(modname,
--						"%s does not have the expected name \"%s\", even though it is the proto-language of the %s.",
--						link(obj), expected_name, link(fam)
--					)
--				end
end
end
ifnotcanonical_namethen
discrepancy(modname,
"The code <code>%s</code> has no canonical name specified.",
code
)
elseiflanguage_names[canonical_name]then
localcanonical_lang=get_language_by_canonical_name(canonical_name)
ifnotcanonical_langthen
discrepancy(modname,
"%s has a canonical name that cannot be looked up.",
link(obj)
)
elseifdata.main_code~=canonical_lang:getCode()then
discrepancy(modname,
"%s has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(obj),language_names[canonical_name]
)
end
else
ifnotm_languages_canonical_names[canonical_name]then
discrepancy("languages/canonical names",
"The canonical name %s is missing.",
link(obj)
)
end
language_names[canonical_name]=code
end
check_wikidata_item(modname,obj,data,2)
ifextra_datathen
check_other_names_aliases_varieties(modname,obj,extra_data,canonical_name)
end
locallang_type=data.type
iflang_typeandnot(lang_type=="regular"orlang_type=="reconstructed"orlang_type=="appendix-constructed")then
discrepancy(modname,
"%s is of the invalid type <code>%s</code>.",
link(obj),lang_type
)
end
ifdata.aliasesthen
discrepancy(modname,
"%s has an <code>aliases</code> key in [[Module:%s]]. This must be moved to [[Module:%s]].",
link(obj),modname,extra_modname
)
end
ifdata.varietiesthen
discrepancy(modname,
"%s has the <code>varieties</code> key in [[Module:%s]]. This must be moved to [[Module:%s]].",
link(obj),modname,extra_modname
)
end
ifdata.other_namesthen
discrepancy(modname,
"%s has the <code>other_names</code> key in [[Module:%s]]. This must be moved to [[Module:%s]].",
link(obj),modname,extra_modname
)
end
ifnotextra_datathen
discrepancy(extra_modname,
"%s has data in [[Module:%s]], but does not have corresponding data in [[Module:%s]].",
link(obj),modname,extra_modname
)
--[[elseif extra_data.other_names then
			discrepancy(extra_modname,
				"%s has <code>other_names</code> key, but these should be changed to either <code>aliases</code> or <code>varieties</code>.",
				link(obj)
			)]]
end
localsc=data[4]
ifscthen
iftype(sc)=="string"then
sc=split(sc,"%s*,%s*",true)
end
iftype(sc)=="table"then
ifnotsc[1]then
discrepancy(modname,
"%s has no scripts listed.",
link(obj)
)
else
for_,sccodeinipairs(sc)do
localcur_sc=m_scripts_data[sccode]
ifnot(cur_scorsccode=="All"orsccode=="Hants")then
discrepancy(modname,
"%s lists the invalid script code <code>%s</code>.",
link(obj),dump(sccode)
)
--[[elseif not cur_sc.characters then
							discrepancy(modname,
								"%s lists the %s, which does not have any characters.",
								link(obj), link(get_script_by_code(sccode))
							)]]
end
nonempty_scripts[sccode]=true
end
end
else
discrepancy(modname,
"The %s field for %s must be a table or string.",
4,link(obj)
)
end
end
ifdata.ancestorsthen
check_ancestors(modname,obj,data)
end
ifdata.wikimedia_codesthen
check_wikimedia_codes(modname,obj,data)
end
ifdata[3]then
localfamily=data[3]
ifnotm_families_data[family]then
discrepancy(modname,
"%s has the invalid family code <code>%s</code>.",
link(obj),dump(family)
)
end
nonempty_families[family]=true
end
check_replacements_data(modname,obj,data)
ifdata.standard_charsthen
iftype(data.standard_chars)=="table"then
localsccodes={}
for_,sccodeinipairs(sc)do
sccodes[sccode]=true
end
forsccodeinpairs(data.standard_chars)do
ifnot(sccodes[sccode]orsccode==1)then
discrepancy(modname,
"The field %s in the <code>standard_chars</code> table for %s does not match any script for that language.",
sccode,link(obj)
)
end
end
elseifdata.standard_charsandtype(data.standard_chars)~="string"then
discrepancy(modname,
"The <code>standard_chars</code> field in the data table for %s must be a string or table.",
link(obj)
)
end
end
check_true_or_string_or_nil(modname,obj,data,"override_translit")
check_true_or_string_or_nil(modname,obj,data,"link_tr")
-- This doesn't apply any more since scripts can be script-wide translit methods.		
-- if data.override_translit and not data.translit then
-- 	discrepancy(modname,
-- 		"%s has the <code>override_translit</code> field set, but no transliteration module",
-- 		link(obj)
-- 	)
-- end
end
localfunctioncheck_module(modname)
localmod_data=load_data("Module:"..modname)
localextra_modname=modname.."/extra"
localextra_mod_data=load_data("Module:"..extra_modname)
forcode,datainpairs(mod_data)do
check_language(modname,code,data,extra_modname,extra_mod_data[code])
end
check_no_alias_codes(modname,mod_data)
check_no_alias_codes(extra_modname,extra_mod_data)
check_extraneous_extra_data(modname,mod_data,extra_modname,extra_mod_data)
end
-- Check two-letter codes
check_module(
"languages/data/2"
)
-- Check three-letter codes
fori=0x61,0x7Ado-- a to z
check_module(
format("languages/data/3/%c",i)
)
end
-- Check exceptional codes
check_module(
"languages/data/exceptional"
)
-- These checks must be done while all_codes only contains language codes:
-- that is, after language data modules have been processed, but before
-- etymology languages, families, and scripts have.
check_code_to_name_and_name_to_code_maps(
"languages",
"a submodule of [[Module:languages]]",
all_codes,language_names,
"languages/code to canonical name",m_languages_codes,
"languages/canonical names",m_languages_canonical_names
)
-- Check [[Template:langname-lite]]
localmodname="Template:langname-lite"
forcode,nameingmatch(remove_comments(new_title(modname):getContent()),"\n\t*|#*([^\n]+)=([^\n]*)")do
if#code>1andcode~="default"then
for_,codeinpairs(split(code,"|",true))do
locallang=get_language_by_code(code,nil,true,true)
ifmatch(name,"etymcode")then
localnonEtym_name=frame:preprocess(name)
localnonEtym_real_name=lang:getFullName()
ifnonEtym_name~=nonEtym_real_namethen
discrepancy(modname,
"Code: <code>%s</code>. Saw name: %s. Expected name: %s.",
code,nonEtym_name,nonEtym_real_name
)
end
name=frame:preprocess(gsub(name,"{{{allow etym|}}}","1"))
elseifmatch(name,"familycode")then
name=match(name,"familycode|(.-)|")
else
name=name
end
ifnotlangthen
discrepancy(modname,
"Code: <code>%s</code>. Saw name: %s. Language not present in data.",
code,name
)
else
localreal_name=lang:getCanonicalName()
ifname~=real_namethen
discrepancy(modname,
"Code: <code>%s</code>. Saw name: %s. Expected name: %s.",
code,name,real_name
)
end
end
end
end
end
end
localfunctioncheck_etym_languages()
localmodname="etymology languages/data"
localcheck_etymology_language_data_keys=check_data_keys(
1,2,3,4,-- canonical name, Wikidata item, family, scripts
"parent","display_text","generate_forms","strip_diacritics","sort_key",
"other_names","aliases","varieties","ietf_subtag",
"type","main_code","ancestors","pseudo_families",
"wikimedia_codes","wikipedia_article","standard_chars",
"translit","override_translit","link_tr",
"dotted_dotless_i"
)
localchecked={}
forcode,datainpairs(m_etym_languages_data)do
localobj,canonical_name,parent=make_lang(code,data,true),data[1],data.parent
check_etymology_language_data_keys(modname,obj,data)
ifall_codes[code]then
discrepancy(modname,
"The code <code>%s</code> is not unique; it is also defined in [[Module:%s]].",
code,all_codes[code]
)
else
ifnotm_etym_languages_codes[code]then
discrepancy("etymology languages/code to canonical name",
"The code %s is missing.",
link(obj,true)
)
end
all_codes[code]=modname
end
ifnotcanonical_namethen
discrepancy(modname,
"The code <code>%s</code> has no canonical name specified.",
code
)
elseiflanguage_names[canonical_name]then
localcanonical_lang=get_language_by_canonical_name(canonical_name,nil,true)
ifnotcanonical_langthen
discrepancy(modname,
"%s has a canonical name that cannot be looked up.",
link(obj)
)
elseifdata.main_code~=canonical_lang:getCode()then
discrepancy(modname,
"%s has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(obj),language_names[canonical_name]
)
end
else
ifnotm_etym_languages_canonical_names[canonical_name]then
discrepancy("etymology languages/canonical names",
"The canonical name %s is missing.",
link(obj)
)
end
etym_language_names[canonical_name]=code
end
check_other_names_aliases_varieties(modname,obj,data,canonical_name)
ifparentthen
iftype(parent)~="string"then
discrepancy(modname,
"%s has a parent code that is %s rather than a string.",
link(obj),parent==niland"nil"or"a "..type(parent)
)
elseifnot(m_languages_data_all[parent]orm_etym_languages_data[parent])then
discrepancy(modname,
"%s has the invalid parent code <code>%s</code>%s.",
link(obj),dump(parent),m_families_data[parent]and" (a family code)"or""
)
end
nonempty_families[parent]=true
else
discrepancy(modname,
"%s has no parent code.",
link(obj)
)
end
ifdata.ancestorsthen
check_ancestors(modname,obj,data)
end
ifdata.wikimedia_codesthen
check_wikimedia_codes(modname,obj,data)
end
ifdata[3]then
localfamily=data[3]
ifnotm_families_data[family]then
discrepancy(modname,
"%s has the invalid family code <code>%s</code>.",
link(obj),dump(family))
end
nonempty_families[family]=true
end
check_replacements_data(modname,obj,data)
check_wikidata_item(modname,obj,data,2)
localstack={}
whiledatado
ifchecked[code]then
break
elseifstack[code]then
localparent=data.parent
discrepancy(modname,
"%s has a cyclic parental relationship to %s",
link(make_lang(code,data,true)),
link(get_language_by_code(parent,nil,true))
)
break
end
stack[code]=true
code=data.parent
data=m_etym_languages_data[code]
end
forcodeinpairs(stack)do
checked[code]=true
end
end
check_no_alias_codes(modname,m_etym_languages_data)
check_code_to_name_and_name_to_code_maps(
"etymology languages",
"[[Module:etymology languages/data]]",
all_codes,etym_language_names,
"etymology languages/code to canonical name",m_etym_languages_codes,
"etymology languages/canonical names",m_etym_languages_canonical_names)
end
-- TODO: add collision check between the canonical names "X" and "X [Ll]anguages".
localfunctioncheck_families()
localmodname="families/data"
localcheck_family_data_keys=check_data_keys(
1,2,3,-- canonical name, Wikidata item, (parent) family
"type","ietf_subtag",
"protoLanguage","other_names","aliases","varieties","pseudo_families"
)
localchecked,double_check_if_empty={["qfa-not"]=true},{}
forcode,datainpairs(m_families_data)do
localobj,canonical_name,family,protolang=make_family(code,data),data[1],data[3],data.protoLanguage
check_family_data_keys(modname,obj,data)
ifall_codes[code]then
discrepancy(modname,
"The code <code>%s</code> is not unique; it is also defined in [[Module:%s]].",
code,all_codes[code]
)
else
ifnotm_families_codes[code]then
discrepancy("families/code to canonical name",
"The code %s is missing.",
link(obj,true)
)
end
all_codes[code]=modname
end
ifnotcanonical_namethen
discrepancy(modname,
"The code <code>%s</code> has no canonical name specified.",
code
)
elseiffamily_names[canonical_name]then
localcanonical_family=get_family_by_canonical_name(canonical_name)
ifnotcanonical_familythen
discrepancy(modname,
"%s has a canonical name that cannot be looked up.",
link(obj)
)
elseifdata.main_code~=canonical_family:getCode()then
discrepancy(modname,
"%s has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(obj),family_names[canonical_name]
)
end
else
ifnotm_families_canonical_names[canonical_name]then
discrepancy("families/canonical names",
"The canonical name %s is missing.",
link(obj)
)
end
family_names[canonical_name]=code
end
check_other_names_aliases_varieties(modname,obj,data,canonical_name)
iffamilythen
iffamily==codeandcode~="qfa-not"then
discrepancy(modname,
"%s has itself as its family.",
link(obj)
)
elseifnotm_families_data[family]then
discrepancy(modname,
"%s has the invalid parent family code <code>%s</code>.",
link(obj),dump(family)
)
end
nonempty_families[family]=true
end
ifprotolangthen
localprotolang_obj=get_language_by_code(protolang,nil,true)
ifnotprotolang_objthen
discrepancy(modname,
"%s has the invalid proto-language code <code>%s</code>.",
link(obj),dump(protolang)
)
elseifprotolang==code.."-pro"then
discrepancy(modname,
"%s has %s listed as its proto-language, which is redundant, since it is determined to be the proto-language automatically.",
link(obj),link(protolang_obj)
)
elseifsub(protolang,-4)=="-pro"then
discrepancy(modname,
"%s has %s listed as its proto-language, which is supposed to be the proto-language for the family <code>%s</code>.",link(obj),link(protolang_obj),sub(protolang,1,-5)
)
end
end
check_wikidata_item(modname,obj,data,2)
-- Could be a false-positive if a child family occurs on a later
-- iteration, so set aside any that fail for a second check. This avoids
-- having to iterate through the whole list of families once
-- nonempty_families has been fully populated.
ifnot(nonempty_families[code]orallowed_empty_families[code])then
double_check_if_empty[code]=obj
end
localstack={}
whiledatado
ifchecked[code]then
break
elseifstack[code]then
localparent=data[3]
discrepancy(modname,
"%s has a cyclic familial relationship to %s",
link(make_family(code,data)),
link(get_family_by_code(parent))
)
break
end
stack[code]=true
code=data[3]
data=m_families_data[code]
end
forcodeinpairs(stack)do
checked[code]=true
end
end
-- Any languages set aside as candidates for having no children are checked
-- again, now that nonempty_families is definitely complete.
forcode,objinnext,double_check_if_emptydo
ifnot(nonempty_families[code]orallowed_empty_families[code])then
discrepancy(modname,
"%s has no child families or languages.",
link(obj)
)
end
end
check_no_alias_codes(modname,m_families_data)
check_code_to_name_and_name_to_code_maps(
"families",
"[[Module:families/data]]",
all_codes,family_names,
"families/code to canonical name",m_families_codes,
"families/canonical names",m_families_canonical_names)
end
-- TODO: add collision check between the canonical names "X" and "X [Ss]cript".
localfunctioncheck_scripts()
localmodname="scripts/data"
localcheck_script_data_keys=check_data_keys(
1,2,3,-- canonical name, Wikidata item, writing systems
"other_names","aliases","varieties","parent","ietf_subtag","type",
"wikipedia_article","ranges","characters","spaces","capitalized","translit","direction",
"character_category","normalizationFixes","sort_by_scraping",
"display_text","sort_key","strip_diacritics"
)
-- Just to satisfy requirements of check_code_to_name_and_name_to_code_maps.
localscript_code_to_module_map={}
forcode,datainpairs(m_scripts_data)do
localobj,canonical_name=make_script(code,data),data[1]
ifnotm_scripts_codes[code]and#code==4then
discrepancy("scripts/code to canonical name",
"The code %s is missing",
link(obj,true)
)
end
check_script_data_keys(modname,obj,data)
ifnotcanonical_namethen
discrepancy(modname,
"The code <code>%s</code> has no canonical name specified.",
code
)
elseifscript_names[canonical_name]then
localcanonical_script=get_script_by_canonical_name(canonical_name)
ifnotcanonical_scriptthen
discrepancy(modname,
"%s has a canonical name that cannot be looked up.",
link(obj)
)
--[[elseif data.main_code ~= canonical_script:getCode() then
				discrepancy(modname,
					"%s has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
					link(obj), script_names[canonical_name]
				)]]
end
else
ifnotm_scripts_canonical_names[canonical_name]and#code==4then
discrepancy("scripts/canonical names",
"The canonical name %s is missing.",
link(obj)
)
end
script_names[canonical_name]=code
end
check_other_names_aliases_varieties(modname,obj,data,canonical_name)
ifnotnonempty_scripts[code]then
discrepancy(modname,
"%s is not used by any language%s.",
link(obj),data.charactersand""
or" and has no characters listed for auto-detection")
--[[elseif not data.characters then
			discrepancy(modname,
				"%s has no characters listed for auto-detection.",
				link(obj)
			)--]]
end
ifdata.charactersthen
validate_pattern(data.characters,modname,obj,false)
end
check_wikidata_item(modname,obj,data,2)
script_code_to_module_map[code]=modname
end
check_no_alias_codes(modname,m_scripts_data)
check_code_to_name_and_name_to_code_maps(
"scripts",
"a submodule of [[Module:scripts]]",
script_code_to_module_map,script_names,
"scripts/code to canonical name",m_scripts_codes,
"scripts/canonical names",m_scripts_canonical_names)
end
-- FIXME: this is quite messy.
localfunctioncheck_wikidata_languages()
localdata=json_decode(new_title("Module:languages/data/wikidata.json"):getContent())
localseen={{},{},{},[5]={}}
for_,iteminipairs(data)do
localid=item.id
fork,vinpairs(item)do
ifk~="id"then
local_seen=seen[k]
for_,codeinipairs(v)do
local_code=code[1]
local_type=type(_seen[_code])
if_type=="table"then
insert(_seen[_code],id)
elseif_type=="string"then
_seen[_code]={_seen[_code],id}
else
_seen[_code]=id
end
end
end
end
end
localmodname="languages/data/wikidata.json"
fork,vinpairs(seen)do
forcode,idsinpairs(v)do
iftype(ids)=="table"then
localt={}
fori,idinipairs(ids)do
t[i]=format("<code>[[d:%s|%s]]</code>",id,id)
end
discrepancy(modname,
"<code>%s</code> is set as an ISO 639-%d code on multiple items: %s.",
code,k,list_to_text(t)
)
end
end
end
end
localfunctioncheck_labels()
localcheck_label_data_keys=check_data_keys(
"display","Wikipedia","glossary",
"plain_categories","topical_categories","pos_categories","regional_categories","sense_categories",
"omit_preComma","omit_postComma","omit_preSpace",
"deprecated","track"
)
localfunctioncheck_label(modname,code,data)
local_type=type(data)
if_type=="table"then
check_label_data_keys(modname,code,data)
elseif_type~="string"then
discrepancy(modname,
"The data for the label <code>%s</code> is %s %s; only tables and strings are allowed.",
code,add_indefinite_article(_type)
)
end
end
for_,moduleinipairs{"","/regional","/topical"}do
localmodname="Module:labels/data"..module
module=require(modname)
forlabel,datainpairs(module)do
check_label(modname,label,data)
end
end
forcodeinpairs(m_languages_codes)do
localmodname="Module:labels/data/lang/"..code
localmodule=safe_require(modname)
ifmodulethen
forlabel,datainpairs(module)do
check_label(modname,label,data)
end
end
end
end
localfunctioncheck_zh_trad_simp()
localm_ts=require("Module:zh/data/ts")
localm_st=require("Module:zh/data/st")
localruby=require("Module:ja-ruby").ruby_auto
locallang=get_language_by_code("zh")
localHant=get_script_by_code("Hant")
localHans=get_script_by_code("Hans")
localdata={[0]=m_st,m_ts}
localmod={[0]="st","ts"}
localvar={[0]="Simp.","Trad."}
localsc={[0]=Hans,Hant}
localfunctionfind_stable_loop(chars,other,j)
localdisplay=ruby({["markup"]="["..other.."]("..var[(j+1)%2]..")"})
display=language_link{term=other,alt=display,lang=lang,sc=sc[(j+1)%2],tr="-"}
insert(chars,display)
ifdata[(j+1)%2][other]==otherthen
insert(chars,other)
returnchars,1
elseifnotdata[(j+1)%2][other]then
insert(chars,"not found")
returnchars,2
elseifdata[j%2][data[(j+1)%2][other]]~=otherthen
returnfind_stable_loop(chars,data[(j+1)%2][other],j+1)
else
localdisplay=ruby({["markup"]="["..data[(j+1)%2][other].."]("..var[j%2]..")"})
display=language_link{term=data[(j+1)%2][other],alt=display,lang=lang,sc=sc[j%2],tr="-"}
insert(chars,display.." (")
display=ruby({["markup"]="["..data[j%2][data[(j+1)%2][other]].."]("..var[(j+1)%2]..")"})
display=language_link{term=data[j%2][data[(j+1)%2][other]],alt=display,lang=lang,sc=sc[(j+1)%2],tr="-"}
insert(chars,display.." etc.)")
returnchars,3
end
returnchars
end
fori=0,1,1do
forch,other_chinpairs(data[i])do
ifdata[(i+1)%2][other_ch]~=chthen
localchars,issue={}
localdisplay=ruby({["markup"]="["..ch.."]("..var[i]..")"})
display=language_link{term=ch,alt=display,lang=lang,sc=sc[i],tr="-"}
insert(chars,display)
chars,issue=find_stable_loop(chars,other_ch,i)
ifissue==1orissue==2then
localsc_this,mod_this,j={}
ifmatch(chars[#chars-1],var[(i+1)%2])then
j=1
else
j=0
end
mod_this=mod[(i+j)%2]
sc_this={[0]=sc[(i+j)%2],sc[(i+j+1)%2]}
fork,chinipairs(chars)do
chars[k]=tag_text(ch,lang,sc_this[k%2],"term")
end
localmodname="zh/data/"..mod_this
ifissue==1then
discrepancy(modname,
"character references itself: %s",
concat(chars," → ")
)
elseifissue==2then
discrepancy(modname,
"missing character: %s",
concat(chars," → ")
)
end
elseifissue==3then
forj,chinipairs(chars)do
chars[j]=tag_text(ch,lang,sc[(i+j)%2],"term")
end
discrepancy("zh/data/"..mod[i],
"possible mismatched character: %s",
concat(chars," → ")
)
end
end
end
end
end
localfunctioncheck_serialization(modname)
localserializers={
["Hani-sortkey/data/serialized"]="Hani-sortkey/serializer",
}
ifnotserializers[modname]then
returnnil
end
localserializer=serializers[modname]
localcurrent_data=require("Module:"..serializer).main(true)
localstored_data=require("Module:"..modname)
ifcurrent_data~=stored_datathen
discrepancy(modname,
"<strong><u>Important!</u> Serialized data is out of sync. Use [[Module:%s]] to update it. If you have made any changes to the underlying data, the serialized data <u>must</u> be updated before these changes will take effect.</strong>",
serializer
)
end
end
localfind_code=require("Module:memoize")(function(message)
returnmatch(message,"<code>([^<]+)</code>")
end)
localfunctioncompare_messages(message1,message2)
localcode1,code2=find_code(message1),find_code(message2)
ifcode1andcode2then
returncode1<code2
else
returnmessage1<message2
end
end
-- Warning: cannot be called twice in the same module invocation because
-- some module-global variables are not reset between calls.
localfunctiondo_checks(frame,modules)
messages=setmetatable({},messages_mt)
ifmodules["zh/data/ts"]ormodules["zh/data/st"]then
check_zh_trad_simp()
end
check_languages(frame)
check_etym_languages()
-- families and scripts must be checked AFTER languages; languages checks fill out
-- the nonempty_families and nonempty_scripts tables, used for testing if a family/script
-- is ever used in the data
check_families()
check_scripts()
check_wikidata_languages()
ifmodules["labels/data"]then
check_labels()
end
formoduleinpairs(modules)do
check_serialization(module)
end
setmetatable(messages,nil)
for_,msglistinpairs(messages)do
msglist:sort(compare_messages)
end
localret=messages
messages=nil
returnret
end
localfunctionformat_message(modname,msglist)
localheader;ifmatch(modname,"^Module:")ormatch(modname,"^Template:")then
header="===[["..modname.."]]==="
else
header="===[[Module:"..modname.."]]==="
end
returnheader..msglist:map(function(msg)
return"\n* "..msg
end):concat()
end
functionexport.check_modules_t(frame)
localargs=frame.args
localmodules=list_to_set(args)
localret=Array()
localmessages=do_checks(frame,modules)
for_,moduleinipairs(args)do
localmsglist=messages[module]
ifmsglistthen
ret:insert(format_message(module,msglist))
end
end
returnret:concat("\n")
end
functionexport.perform(frame)
localmessages=do_checks(frame,{})
-- Format the messages
localret=Array()
formodname,msglistinsorted_pairs(messages)do
ret:insert(format_message(modname,msglist))
end
-- Are there any messages?
-- TODO: check how many messages there are.
iffalsethen--if i == 1 then
return"<b class=\"success\">Glory to Arstotzka.</b>"
else
ret:insert(1,"<b class=\"warning\">Discrepancies detected:</b>")
returnret:concat("\n")
end
end
returnexport