VOOZH about

URL: https://en.wiktionary.org/wiki/Module:labels

⇱ Module:labels - Wiktionary, the free dictionary


Jump to content
From Wiktionary, the free dictionary

The following documentation is located at Module:labels/documentation. [edit]
Useful links: subpage listlinkstransclusionstestcasessandbox (diff)

This module supports Module:labels/templates, which in turn is used by the template {{label}}, as well as {{term-label}} and {{accent}}.

See Module:labels/data and its submodules Module:labels/data/qualifiers, Module:labels/data/regional and Module:labels/data/topical, as well as lang-specific submodules such as Module:labels/data/lang/en (for English) and Module:labels/data/lang/grc (for Ancient Greek), for lists of defined labels, and for labels that are aliases (or "redirects") for other labels.

Testcases

A label specific to "grc" (Ancient Greek)
code result
{{label|grc|Attic}} (Attic)
{{label|en|Attic}} (Attic)

Exported functions

Labels go through several stages of processing to get from the original (raw) label specified in the Wikicode to the final (formatted) label displayed to the user. The following terminology will help keep things straight:

  • The "raw label" is the label specified in the Wikicode.
  • The "non-canonical label" is the label extracted from the raw label, used for looking up in the label modules in order to fetch the associated label data structure and determine the canonical form of the label. Normally this is the same as the raw label, but it will be different if the raw label is of the form !label (e.g. !Australian) label!display (e.g. Southern US!Southern). The former syntax indicates that the label should display as-is instead of in its canonical form (which in the example given is Australia), and the latter syntax indicates that the label should display in the form specified after the exclamation point.
  • The "canonical label" is the result of applying alias resolution to the non-canonical label. Normally, the canonical label rather than the non-canonical label is what is shown to the user.
  • The "display form of the label" is what is shown to the user, not considering links and HTML that may wrap the display form to get the formatted form of the label. The display form comes from the .display field of the module label data for the label; if no such field exists in the label data, it is normally the canonical label. However, if the display override exists (see below), it takes precedence over the .display field or canonical label when determining the display form of the label.
  • The "display override", if specified, overrides all other means of determining the display form of the label. It is specified in two circumstances, i.e. in the !label and label!display raw label formats (i.e. in the same cirumstances where the raw label and non-canonical label are different).
  • The "formatted form of the label" is the final form of the label shown directly to the user. It generally appears to the user as the display form of the label, but in the Wikicode, the formatted form may wrap the display form with a link to Wikipedia, the Wiktionary glossary or another Wiktionary entry, and that link in turn may be wrapped in an HTML span with a "deprecated" CSS class attached, causing the label to display differently (to indicate that it is deprecated).

export.get_langs_to_extract_wikipedia_articles_from_wikidata

functionexport.get_langs_to_extract_wikipedia_articles_from_wikidata(lang)

Given language lang (a full language, etymology-language or family), fetch a list of Wikimedia languages to check when converting a Wikidata item to a Wikipedia article. English is always first, followed by the Wikimedia language code(s) of lang if lang is a language (which may or may not be the same as lang's Wiktionary code), followed by the macrolanguage of lang for certain languages and families (currently, only languages and families in the Chinese and Arabic families). If lang is nil, only return English. Note that the same code may occur more than once in the list. This is exported because it's also used by Module:category tree/poscatboiler/data/language varieties.

export.fetch_categories

functionexport.fetch_categories(canon_label,labdata,lang,mode,for_doc,category_types)

Fetch the categories to add to a page, given that the label whose canonical form is canon_label with language lang has been seen. labdata is the label data structure for label, fetched from the appropriate submodule. mode specifies how the label was invoked (see get_label_info() for more information). The return value is a list of the actual categories, unless for_doc is specified, in which case the categories returned are marked up for display on a documentation page. If for_doc is given, lang may be nil to format the categories in a language-independent fashion; otherwise, it must be specified. If category_types is specified, it should be a set object (i.e. with category types as keys and true as values), and only categories of the specified types will be returned.

export.get_submodules

functionexport.get_submodules(lang)

Return the list of all labels data modules for a label whose language is lang. The return value is a list of module names, with overriding modules earlier in the list (that is, if a label occurs in two modules in the list, the earlier-listed module takes precedence). If lang is nil, only return non-language-specific submodules.

export.format_label

functionexport.format_label(label,labdata,lang,deprecated,override_display,mode)

Return the formatted form of a label label (which should be the canonical form of the label; see comment at top), given (a) the label data structure labdata from one of the data modules; (b) the language object lang of the language being processed, or nil for no language; (c) deprecated (true if the label is deprecated, otherwise the deprecation information is taken from labdata); (d) override_display (if specified, override the display form of the label with the specified string, instead of any value in labdata.display or labdata.special_display or the canonical label in label itself); (e) mode (same as data.mode passed to get_label_info()). Returns two values: the formatted label form and a boolean indicating whether the label is deprecated.

NOTE: Under normal circumstances, do not use this. Instead, use get_label_info(), which searches all the data modules for a given label and handles other complications.

export.get_label_info

functionexport.get_label_info(data)

Return information on a label. On input data is an object with the following fields:

  • label: The raw label to return information on.
  • lang: The language of the label. Must be specified unless for_doc is given.
  • mode: How the label was invoked. One of the following:
    • nil or "label": invoked through {{lb}} or another template whose labels in the same fashion, e.g. {{alt}}, {{quote}} or {{syn}};
    • "term-label": invoked through {{tlb}};
    • "accent": invoked through {{a}} or the |a= or |aa= parameters of other pronunciation templates, such as {{IPA}}, {{rhymes}} or {{homophones}};
    • "form-of": invoked through {{alt form}}, {{standard spelling of}} or other form-of template. This changes the display and/or categorization of a minority of labels. (The majority work the same for all modes.)
  • for_doc: Data is being fetched for documentation purposes. This causes the raw categories returned in categories to be formatted for documentation display.
  • nocat: If true, don't add the label to any categories.
  • force_cat: Force adding categories even in namespaces that normally exclude them (e.g. userspace and discussion pages).
  • notrack: Disable all tracking for this label.
  • sort: Sort key for categorization.
  • already_seen: An object used to track labels already seen, so they aren't displayed twice. Tracking is according to the display form of the label, so if two labels have the same display form, the second one won't be displayed (but its categories will still be added). If already_seen is nil, this tracking doesn't happen.

The return value is an object with the following fields:

  • raw_text: If specified, the object does not describe a label but simply raw text surrounding labels. This occurs when double angle bracket (<<...>>) notation is used. get_label_info() does not currently return objects with this field set, but process_raw_labels() does. The value is "begin" (this is the first raw text portion derived from a double angle bracket spec, provided there are at least two raw text portions); "end" (this is the last raw text portion derived from a double angle bracket spec, provided there are at least two portions); "middle" (this is neither the first nor the last raw text portion); or "only" (this is a raw text portion standing by itself). The particular value determines the handling of commas and spaces on one or both sides of the raw text. If this field is specified, only the label field (containing the actual raw text) and the category field (containing an empty list) are set; all other fields are nil.
  • raw_label: The raw label that was passed in.
  • non_canonical: The label prior to canonicalization (i.e. alias resolution). Usually this is the same as raw_label, but if the raw label was preceded by an exclamation point (meaning "display the raw label as-is"), this field will contain the label stripped of the exclamation point, and if the raw label is of the form label!display (meaning "display the label in the specified form"), this field will contain the label before the exclamation point.
  • canonical: If the label in non_canonical is an alias, this contains the canonical name of the label; otherwise it will be nil.
  • override_display: If specified, this contains a string that overrides the normal display form of the label. The display form of a label is the .display field of the label data if present, and otherwise is normally the canonical form of the label (i.e. after alias resolution). (This is not the same as the formatted form of the label, found in label, which is the final form shown to the user and includes links to Wikipedia, the glossary, etc. as well as an HTML wrapper if the label is deprecated.) If override_display is specified, however, this is used in place of the normal display form of the label. This currently happens in two circumstances: (1) the label was preceded by ! to indicate that the raw label should be displayed rather than the canonical form; (2) the label was given in the form label!display (meaning "display the label in the specified display form").
  • label: The formatted form of the label. This is what is actually shown to the user. If the label is recognized (found in some module), this will typically be in the form of a link.
  • categories: A list of the categories to add the label to; an empty list if nocat was specified.
  • formatted_categories: A string containing the formatted categories; nil if nocat or for_doc was specified, or if categories is empty. Currently will be an empty string if there are categories to format but the namespace is one that normally excludes categories (e.g. userspace and discussion pages), and force_cat isn't specified.
  • deprecated: True if the label is deprecated.
  • recognized: If true, the label was found in some module.
  • data: The data structure for the label, as fetched from the label modules. For unrecognized labels, this will be an empty object.

export.split_labels_on_comma

functionexport.split_labels_on_comma(term)

Split a string containing comma-separated raw labels into the individual labels. This will not split on a comma followed by whitespace, and it will not split inside of matched <...> or [...]. The code is written to be efficient, so that it does not load modules (e.g. Module:parse utilities) unnecessarily.

export.process_raw_labels

functionexport.process_raw_labels(data)

Return a list of objects corresponding to a set of raw labels. Each object returned is of the format returned by get_label_info(). This is similar to looping over the labels and calling get_label_info() on each one, but it also correctly handles embedded double angle bracket specs <<...>> found in the labels. (In such a case, there will be more objects returned than raw labels passed in.) On input, data is an object with the following fields:

  • labels: The list of labels to process.
  • lang: The language of the labels. Must be specified.
  • mode: How the label was invoked; see get_label_info() for more information.
  • nocat: If true, don't add the label to any categories.
  • force_cat: Force adding categories even in namespaces that normally exclude them (e.g. userspace and discussion pages).
  • notrack: Disable all tracking for this label.
  • sort: Sort key for categorization.
  • already_seen: An object used to track labels already seen, so they aren't displayed twice. Tracking is according to the display form of the label, so if two labels have the same display form, the second one won't be displayed (but its categories will still be added). If already_seen is nil, this tracking doesn't happen.
  • ok_to_destructively_modify: If set, the data structure will be destructively modified in the process of this function running.

export.split_and_process_raw_labels

functionexport.split_and_process_raw_labels(data)

Split a comma-separated string of raw labels and process each label to get a list of objects suitable for passing to format_processed_labels(). Each object returned is of the format returned by get_label_info(). This is equivalent to calling split_labels_on_comma() followed by process_raw_labels(). On input, data is an object with the following fields:

  • labels: The string containing the raw comma-separated labels.
  • lang: The language of the labels. Must be specified.
  • mode: How the label was invoked; see get_label_info() for more information.
  • nocat: If true, don't add the label to any categories.
  • force_cat: Force adding categories even in namespaces that normally exclude them (e.g. userspace and discussion pages).
  • notrack: Disable all tracking for this label.
  • sort: Sort key for categorization.
  • already_seen: An object used to track labels already seen, so they aren't displayed twice. Tracking is according to the display form of the label, so if two labels have the same display form, the second one won't be displayed (but its categories will still be added). If already_seen is nil, this tracking doesn't happen.
  • ok_to_destructively_modify: If set, the data structure will be destructively modified in the process of this function running.

export.format_processed_labels

functionexport.format_processed_labels(data)

Format one or more already-processed labels for display and categorization. "Already-processed" means that get_label_info() or process_raw_labels() has been called on the raw labels to convert them into objects containing information on how to display and categorize the labels. This is a lower-level alternative to show_labels() and is meant for modules such as Module:alternative forms, Module:quote and Module:etymology/templates/descendant that support displaying labels along with some other information.

On input data is an object with the following fields:

  • labels: List of the label objects to format, in the format returned by get_label_info().
  • lang: The language of the labels.
  • open: Open bracket or parenthesis to display before the concatenated labels. If specified, it is wrapped in the "ib-brac" and "label-brac" CSS classes. If nil or false, no open bracket is displayed.
  • close: Close bracket or parenthesis to display after the concatenated labels. If specified, it is wrapped in the "ib-brac" and "label-brac" CSS classes. If nil or false, no close bracket is displayed.
  • no_ib_content: By default, the concatenated formatted labels inside of the open/close brackets are wrapped in the "ib-content" and "label-content" CSS classes. Specify this to suppress this wrapping.
  • raw: Suppress all CSS wrapping of content, including open/close parentheses, content and comma delimiters (which are normally wrapped in "ib-comma" and "label-comma" CSS classes).
  • ok_to_destructively_modify: If set, the data structure, and the data.labels table inside of it, will be destructively modified in the process of this function running.
  • split_output: If not given, the return value is a concatenation of the formatted concatenated labels and formatted categories. Otherwise, two values are returned: the formatted pronunciation and the categories. If split_output is the value "raw", the categories are returned in list form, where the list elements are strings f the form suitable for passing to format_categories() in Module:utilities. If split_output is any other value besides nil, the categories are returned as a pre-formatted concatenated string.

The return value (or the first return value, if split_output is given) is a string containing the contenated labels, optionally surrounded by open/close brackets or parentheses. Normally, labels are separated by comma-space sequences, but this may be suppressed for certain labels. If nocat wasn't given to get_label_info() or process_raw_labels(), and split_output wasn't given, the label objects will contain formatted categories in them, which will be inserted into the returned text. (Use split_output if you need the categories returned separately.) The concatenated text inside of the open/close brackets is normally wrapped in the "ib-content" CSS class, but this can be suppressed, as mentioned above.

export.show_labels

functionexport.show_labels(data)

Format one or more labels for display and categorization. This provides the implementation of the {{label}}/{{lb}}, {{term label}}/{{tlb}} and {{accent}}/{{a}} templates, and can also be called from a module. The return value is a string to be inserted into the generated page, including the display and categories. On input data is an object with the following fields:

  • labels: List of the labels to format.
  • lang: The language of the labels.
  • mode: How the label was invoked; see get_label_info() for more information.
  • nocat: If true, don't add the labels to any categories.
  • force_cat: Force adding categories even in namespaces that normally exclude them (e.g. userspace and discussion pages).
  • notrack: Disable all tracking for these labels.
  • sort: Sort key for categorization.
  • no_track_already_seen: Don't track already-seen labels. If not specified, already-seen labels are not displayed again, but still categorize. See the documentation of get_label_info().
  • open: Open bracket or parenthesis to display before the concatenated labels. If nil, defaults to an open parenthesis. Set to false to disable.
  • close: Close bracket or parenthesis to display after the concatenated labels. If nil, defaults to a close parenthesis. Set to false to disable.
  • no_ib_content: As in format_processed_labels().
  • raw: As in format_processed_labels(). Also suppress wrapping the entire formatted result in a usage label CSS class (see below).
  • ok_to_destructively_modify: If set, the data structure will be destructively modified in the process of this function running.

Compared with format_processed_labels(), this function has the following differences:

  1. The labels specified in labels are raw labels (i.e. strings) rather than formatted objects.
  2. The open and close brackets default to parentheses ("round brackets") rather than not being displayed by default.
  3. Tracking of already-seen labels is enabled unless explicitly turned off using no_track_already_seen.
  4. The entire formatted result is wrapped in a "usage-label-<var>type</var>" CSS class (depending on the value of mode), unless raw is given.

export.alias

functionexport.alias(labels,key,aliases)

Helper function for the data modules.

export.split_display_form

functionexport.split_display_form(label)

Split the display form of a label. Returns two values: link and display. If the display form consists of a two-part link, link is the first part and display is the second part. If the display form consists of a single-part link, link and display are the same. Otherwise (the display form is not a link or contains an embedded link), link is the same as the passed-in label and display is nil.

export.combine_display_form_parts

functionexport.combine_display_form_parts(link,display)

Combine the link and display parts of the display form of a label as returned by split_display_form(). If display is nil, link is returned directly. Otherwise, a one-part or two-part link is constructed depending on whether link and display are the same. (As a special case, if both consist of a blank string, the return value is a blank string rather than a malformed link.)

export.finalize_data

functionexport.finalize_data(labels)

Used to finalize the data into the form that is actually returned.


localexport={}
export.lang_specific_data_list_module="Module:labels/data/lang"
export.lang_specific_data_modules_prefix="Module:labels/data/lang/"
localload_module="Module:load"
localparse_utilities_module="Module:parse utilities"
localstring_utilities_module="Module:string utilities"
localutilities_module="Module:utilities"
localinsert=table.insert
localrequire_when_needed=require("Module:require when needed")
localunpack=unpackortable.unpack-- Lua 5.2 compatibility
localdump=mw.dumpObject
localm_lang_specific_data=mw.loadData(export.lang_specific_data_list_module)
localm_table=require_when_needed("Module:table")
--[==[ intro:
Labels go through several stages of processing to get from the original (raw) label specified in the Wikicode to the
final (formatted) label displayed to the user. The following terminology will help keep things straight:
* The "raw label" is the label specified in the Wikicode.
* The "non-canonical label" is the label extracted from the raw label, used for looking up in the label modules in order
 to fetch the associated label data structure and determine the canonical form of the label. Normally this is the same
 as the raw label, but it will be different if the raw label is of the form `!<var>label</var>` (e.g. `!Australian`)
 `<var>label</var>!<var>display</var>` (e.g. `Southern US!Southern`). The former syntax indicates that the label
 should display as-is instead of in its canonical form (which in the example given is `Australia`), and the latter
 syntax indicates that the label should display in the form specified after the exclamation point.
* The "canonical label" is the result of applying alias resolution to the non-canonical label. Normally, the
 canonical label rather than the non-canonical label is what is shown to the user.
* The "display form of the label" is what is shown to the user, not considering links and HTML that may wrap the
 display form to get the formatted form of the label. The display form comes from the `.display` field of the module
 label data for the label; if no such field exists in the label data, it is normally the canonical label. However, if
 the display override exists (see below), it takes precedence over the `.display` field or canonical label when
 determining the display form of the label.
* The "display override", if specified, overrides all other means of determining the display form of the label. It is
 specified in two circumstances, i.e. in the `!<var>label</var>` and `<var>label</var>!<var>display</var>` raw label
 formats (i.e. in the same cirumstances where the raw label and non-canonical label are different).
* The "formatted form of the label" is the final form of the label shown directly to the user. It generally appears to
 the user as the display form of the label, but in the Wikicode, the formatted form may wrap the display form with a
 link to Wikipedia, the Wiktionary glossary or another Wiktionary entry, and that link in turn may be wrapped in an
 HTML span with a "deprecated" CSS class attached, causing the label to display differently (to indicate that it is
 deprecated).
]==]
-- for testing
localforce_cat=false
localm_headword_data=mw.loadData("Module:headword/data")
localSUBPAGENAME=m_headword_data.pagename
-- Disable tracking on heavy pages to save time.
localpages_where_tracking_is_disabled=m_headword_data.large_pages
-- Add tracking category for PAGE. The tracking category linked to is [[Wiktionary:Tracking/labels/PAGE]].
-- We also add to [[Wiktionary:Tracking/labels/PAGE/LANGCODE]] and [[Wiktionary:Tracking/labels/PAGE/MODE]] if
-- LANGCODE and/or MODE given.
localfunctiontrack(page,langcode,mode)
ifpages_where_tracking_is_disabled[SUBPAGENAME]then
returntrue
end
-- avoid including links in pages (may cause error)
page=page:gsub("%[","("):gsub("%]",")"):gsub("|","!")
require("Module:debug/track")("labels/"..page)
iflangcodethen
require("Module:debug/track")("labels/"..page.."/"..langcode)
end
ifmodethen
require("Module:debug/track")("labels/"..page.."/"..mode)
end
-- We don't currently add a tracking label for both langcode and mode to reduce the total number of labels, to
-- save some memory.
returntrue
end
localfunctionucfirst(txt)
returnmw.getContentLanguage():ucfirst(txt)
end
localmode_to_outer_class={
["label"]="usage-label-sense",
["term-label"]="usage-label-term",
["accent"]="usage-label-accent",
["form-of"]="usage-label-form-of",
}
localmode_to_property_prefix={
["label"]=false,
["term-label"]=false,-- handled specially
["accent"]="accent_",
["form-of"]="form_of_",
}
localfunctionvalidate_mode(mode)
mode=modeor"label"
ifnotmode_to_outer_class[mode]then
localallowed_values={}
forkey,_inpairs(mode_to_outer_class)do
insert(allowed_values,"'"..key.."'")
end
table.sort(allowed_values)
error(("Invalid value '%s' for `mode`; should be one of %s"):format(mode,table.concat(allowed_values,", ")))
end
returnmode
end
localfunctiongetprop(labdata,mode,prop)
localmode_prefix=mode_to_property_prefix[mode]
returnmode_prefixandlabdata[mode_prefix..prop]orlabdata[prop]
end
localfunctioncheck_type(label,lang,prop,value,expected_types)
ifvalue==nilorexpected_types==nilthen
returnvalue
end
iftype(expected_types)~="table"then
expected_types={expected_types}
end
localvaltype=type(value)
localmatches=false
for_,expected_typeinipairs(expected_types)do
iftype(expected_type)=="string"then
ifvaltype==expected_typethen
matches=true
break
end
elseifvalue==expected_typethen
matches=true
break
end
end
ifnotmatchesthen
localfunctionjoin_untagged_or(elements)
returnm_table.serialCommaJoin(elements,{conj="or",dontTag=true})
end
localquoted_types={}
localquoted_values={}
for_,expected_typeinipairs(expected_types)do
iftype(expected_type)=="string"then
insert(quoted_types,"'"..expected_type.."'")
else
insert(quoted_values,"'"..dump(expected_type).."'")
end
end
localpossible_matches={}
ifquoted_types[1]then
insert(possible_matches,("be of type%s %s"):format(
quoted_types[2]and"s"or"",join_untagged_or(quoted_types)))
end
ifquoted_values[1]then
insert(possible_matches,("have the value%s %s"):format(
quoted_values[2]and"s"or"",join_untagged_or(quoted_values)))
end
error(("Internal error: For label '%s', langcode '%s', property '%s' should %s but is of type '%s' with value %s"):format(
label,langandlang:getCode()or"UNKNOWN",prop,join_untagged_or(possible_matches),valtype,dump(value)))
end
end
-- HACK! For languages in any of the given families, check the specified-language Wikipedia for appropriate
-- Wikipedia articles for the language in question (esp. useful for obscure etymology-only languages that may not
-- have English articles for them, like many Chinese lects).
localfamilies_to_wikipedia_languages={
{"zhx","zh"},
{"sem-arb","ar"},
}
--[==[
Given language `lang` (a full language, etymology-language or family), fetch a list of Wikimedia languages to check
when converting a Wikidata item to a Wikipedia article. English is always first, followed by the Wikimedia language
code(s) of `lang` if `lang` is a language (which may or may not be the same as `lang`'s Wiktionary code), followed
by the macrolanguage of `lang` for certain languages and families (currently, only languages and families in the Chinese
and Arabic families). If `lang` is nil, only return English. Note that the same code may occur more than once in the
list. This is exported because it's also used by [[Module:category tree/poscatboiler/data/language varieties]].
]==]
functionexport.get_langs_to_extract_wikipedia_articles_from_wikidata(lang)
localwikipedia_langs={}
insert(wikipedia_langs,"en")
iflangthen
localarticle_lang=lang
whilearticle_langdo
ifarticle_lang:hasType("language")then
localwmcodes=article_lang:getWikimediaLanguageCodes()
for_,wmcodeinipairs(wmcodes)do
insert(wikipedia_langs,wmcode)
end
end
article_lang=article_lang:getParent()
end
for_,family_to_wp_langinipairs(families_to_wikipedia_languages)do
localfamily,wp_lang=unpack(family_to_wp_lang)
iflang:inFamily(family)then
insert(wikipedia_langs,wp_lang)
end
end
end
returnwikipedia_langs
end
--[==[
Fetch the categories to add to a page, given that the label whose canonical form is `canon_label` with language `lang`
has been seen. `labdata` is the label data structure for `label`, fetched from the appropriate submodule. `mode`
specifies how the label was invoked (see {get_label_info()} for more information). The return value is a list of the
actual categories, unless `for_doc` is specified, in which case the categories returned are marked up for display on a
documentation page. If `for_doc` is given, `lang` may be nil to format the categories in a language-independent fashion;
otherwise, it must be specified. If `category_types` is specified, it should be a set object (i.e. with category types
as keys and {true} as values), and only categories of the specified types will be returned.
]==]
functionexport.fetch_categories(canon_label,labdata,lang,mode,for_doc,category_types)
localcategories={}
mode=validate_mode(mode)
locallangcode,canonical_name
iflangthen
langcode=lang:getFullCode()
canonical_name=lang:getFullName()
elseiffor_docthen
langcode="<var>[langcode]</var>"
canonical_name="<var>[language name]</var>"
else
error("Internal error: Must specify `lang` unless `for_doc` is given")
end
localfunctionlabprop(prop,expected_types)
localretval=getprop(labdata,mode,prop)
check_type(canon_label,lang,prop,retval,expected_types)
returnretval
end
localempty_list={}
localfunctionget_cats(cat_type)
ifcategory_typesandnotcategory_types[cat_type]then
returnempty_list
end
localcats=labprop(cat_type)
ifnotcatsthen
returnempty_list
end
iftype(cats)~="table"then
return{cats}
end
returncats
end
localtopical_categories=get_cats("topical_categories")
localsense_categories=get_cats("sense_categories")
localpos_categories=get_cats("pos_categories")
localregional_categories=get_cats("regional_categories")
localplain_categories=get_cats("plain_categories")
localfunctioninsert_cat(cat,sense_cat)
iffor_docthen
cat="<code>"..cat.."</code>"
ifsense_catthen
ifmode=="term-label"then
cat=cat.." (using {{tl|tlb}})"
else
cat=cat.." (using {{tl|lb}} or form-of template)"
end
cat=mw.getCurrentFrame():preprocess(cat)
end
end
insert(categories,cat)
end
for_,catinipairs(topical_categories)do
insert_cat(langcode..":"..(cat==trueanducfirst(canon_label)orcat))
end
for_,catinipairs(sense_categories)do
ifcat==truethen
cat=canon_label
end
cat=mode=="term-label"andcat.." terms"or"terms with "..cat.." senses"
insert_cat(canonical_name.." "..cat,true)
end
for_,catinipairs(pos_categories)do
insert_cat(canonical_name.." "..(cat==trueandcanon_labelorcat))
end
for_,catinipairs(regional_categories)do
insert_cat((cat==trueanducfirst(canon_label)orcat).." "..canonical_name)
end
for_,catinipairs(plain_categories)do
insert_cat(cat==trueanducfirst(canon_label)orcat)
end
returncategories
end
--[==[
Return the list of all labels data modules for a label whose language is `lang`. The return value is a list of
module names, with overriding modules earlier in the list (that is, if a label occurs in two modules in the list,
the earlier-listed module takes precedence). If `lang` is nil, only return non-language-specific submodules.
]==]
functionexport.get_submodules(lang)
localsubmodules={
"Module:labels/data",
"Module:labels/data/qualifiers",
"Module:labels/data/regional",
"Module:labels/data/topical",
}
ifnotlangthen
returnsubmodules
end
-- get language-specific labels from data module
locallangcode=lang:getFullCode()
ifm_lang_specific_data.langs_with_lang_specific_modules[langcode]then
-- prefer per-language label in order to pick subvariety labels over regional ones
insert(submodules,1,export.lang_specific_data_modules_prefix..langcode)
end
returnsubmodules
end
--[==[
Return the formatted form of a label `label` (which should be the canonical form of the label; see comment at top),
given (a) the label data structure `labdata` from one of the data modules; (b) the language object `lang` of the
language being processed, or nil for no language; (c) `deprecated` (true if the label is deprecated, otherwise the
deprecation information is taken from `labdata`); (d) `override_display` (if specified, override the display form of the
label with the specified string, instead of any value in `labdata.display` or `labdata.special_display` or the canonical
label in `label` itself); (e) `mode` (same as `data.mode` passed to {get_label_info()}). Returns two values: the
formatted label form and a boolean indicating whether the label is deprecated.
'''NOTE: Under normal circumstances, do not use this.''' Instead, use {get_label_info()}, which searches all the data
modules for a given label and handles other complications.
]==]
functionexport.format_label(label,labdata,lang,deprecated,override_display,mode)
localformatted_label
mode=validate_mode(mode)
localfunctionlabprop(prop,expected_types)
localretval=getprop(labdata,mode,prop)
check_type(label,lang,prop,retval,expected_types)
returnretval
end
deprecated=deprecatedorlabprop("deprecated")
ifnotoverride_displayandlabprop("special_display")then
localfunctionadd_language_name(str)
ifstr=="canonical_name"then
iflangthen
returnlang:getFullName()
else
return"<code><var>[language name]</var></code>"
end
else
return""
end
end
formatted_label=labprop("special_display","string"):gsub("<(.-)>",add_language_name)
else
--[=[
			We proceed as follows:
			1. The display form comes from either (a) the `override_display` variable if set (this happens when
			 the user uses a label like '!British'); (b) the `display` property, if set; or (c) the label iself.
			2. If the display form contains a link, use it directly and ignore the other display-related settings.
			 (NOTE: Settings `Wikipedia` and `Wikidata` may still be used on the category page itself, by the
			 category tree code.)
			3. Otherwise, use one of the other display-related settings, in the following order:
			 `glossary` > `Wiktionary` > `Wikipedia` > `Wikidata`. Specifically:
			 a. If any of the values is equal to `true`, that is equivalent to specifying a string consisting of
				 the canonical label.
			 b. If `glossary` is set, it specifies the anchor in [[Appendix:Glossary]].
			 c. If `Wiktionary` is set, it specifies an arbitrary Wiktionary page or page + anchor (e.g. a
				 separate Appendix entry).
			 d. If `Wikipedia` is set, it specifies an arbitrary Wikipedia article, or a list of such items (in
				 this case, we select the first one, but the category tree uses all of them).
			 e. If `Wikidata` is set, it specifies an arbitrary Wikidata item to retrieve a Wikipedia article from,
				 or a list of such items (in this case, we select the first one, but the category tree uses all of
				 them). If the item is of the form `wmcode:id`, the Wikipedia article corresponding to `id` in the
				 `wmcode`-language Wikipedia is fetched if available. Otherwise, the English-language Wikipedia
				 article corresponding to `id` is retrieved if available, falling back to the Wikimedia language(s)
				 corresponding to `lang` and then (in certain cases) to the macrolanguage that `lang` is part of.
			Note that if `mode` is specified, prefixed properties (e.g. `accent_display` for `mode` == "accent",
			`form_display` for `mode` == "form") are checked before the bare equivalent (e.g. `display`).
		]=]
localdisplay=override_displayorlabprop("display","string")orlabel
-- There are several 'Foo spelling' labels specially designed for use in the |from= param in
-- {{alternative form of}}, {{standard spelling of}} and the like. Often the display includes the word
-- "spelling" at the end (e.g. if it's defaulted), which is useful when the label is used with {{tl|lb}} or
-- {{tl|tlb}}; but it causes redundancy when used with the form-of templates, which add the word "form",
-- "spelling", "standard spelling", etc. after the label.
ifmode=="form-of"then
display=display:gsub(" spelling$","")
end
ifdisplay:find("%[%[")then
formatted_label=display
else
localglossary=labprop("glossary",{"string",true})
localWiktionary=labprop("Wiktionary",{"string",true})
localWikipedia=labprop("Wikipedia",{"string",true,"table"})
localWikidata=labprop("Wikidata",{"string",true,"table"})
ifglossarythen
localglossary_entry=glossary==trueandlabelorglossary
formatted_label="[[Appendix:Glossary#"..glossary_entry.."|"..display.."]]"
elseifWiktionarythen
localWiktionary_entry=Wiktionary==trueandlabelorWiktionary
ifWiktionary==displaythen
formatted_label="[["..display.."]]"
else
formatted_label="[["..Wiktionary_entry.."|"..display.."]]"
end
elseifWikipediathen
iftype(Wikipedia)=="table"then
Wikipedia=Wikipedia[1]
end
localWikipedia_entry=Wikipedia==trueandlabelorWikipedia
formatted_label="[[w:"..Wikipedia_entry.."|"..display.."]]"
elseifWikidatathen
ifnotmw.wikibasethen
error(("Unable to retrieve data from Wikidata ID for label '%s'; `mw.wikibase` not defined"
):format(label))
end
localfunctionmake_formatted_label(wmcode,id)
localarticle=mw.wikibase.sitelink(id,wmcode.."wiki")
ifarticlethen
locallink=wmcode=="en"and"w:"..articleor"w:"..wmcode..":"..article
return("[[%s|%s]]"):format(link,display)
else
returnnil
end
end
iftype(Wikidata)=="table"then
Wikidata=Wikidata[1]
end
localwmcode,id=Wikidata:match("^(.*):(.*)$")
ifwmcodethen
formatted_label=make_formatted_label(wmcode,id)
else
locallangs_to_check=export.get_langs_to_extract_wikipedia_articles_from_wikidata(lang)
for_,wmcodeinipairs(langs_to_check)do
formatted_label=make_formatted_label(wmcode,Wikidata)
ifformatted_labelthen
break
end
end
end
formatted_label=formatted_labelordisplay
else
formatted_label=display
end
end
end
ifdeprecatedthen
formatted_label='<span class="deprecated-label">'..formatted_label..'</span>'
end
returnformatted_label,deprecated
end
--[==[
Return information on a label. On input `data` is an object with the following fields:
* `label`: The raw label to return information on.
* `lang`: The language of the label. Must be specified unless `for_doc` is given.
* `mode`: How the label was invoked. One of the following:
 ** {nil} or {"label"}: invoked through {{tl|lb}} or another template whose labels in the same fashion, e.g.
 {{tl|alt}}, {{tl|quote}} or {{tl|syn}};
 ** {"term-label"}: invoked through {{tl|tlb}};
 ** {"accent"}: invoked through {{tl|a}} or the {{para|a}} or {{para|aa}} parameters of other pronunciation templates,
 such as {{tl|IPA}}, {{tl|rhymes}} or {{tl|homophones}};
 ** {"form-of"}: invoked through {{tl|alt form}}, {{tl|standard spelling of}} or other form-of template.
 This changes the display and/or categorization of a minority of labels. (The majority work the same for all modes.)
* `for_doc`: Data is being fetched for documentation purposes. This causes the raw categories returned in
 `categories` to be formatted for documentation display.
* `nocat`: If true, don't add the label to any categories.
* `force_cat`: Force adding categories even in namespaces that normally exclude them (e.g. userspace and discussion
 pages).
* `notrack`: Disable all tracking for this label.
* `sort`: Sort key for categorization.
* `already_seen`: An object used to track labels already seen, so they aren't displayed twice. Tracking is according
 to the display form of the label, so if two labels have the same display form, the second one won't be displayed
 (but its categories will still be added). If `already_seen` is {nil}, this tracking doesn't happen.
The return value is an object with the following fields:
* `raw_text`: If specified, the object does not describe a label but simply raw text surrounding labels. This occurs
 when double angle bracket (<<...>>) notation is used. {get_label_info()} does not currently return objects with this
 field set, but {process_raw_labels()} does. The value is {"begin"} (this is the first raw text portion derived from
 a double angle bracket spec, provided there are at least two raw text portions); {"end"} (this is the last raw text
 portion derived from a double angle bracket spec, provided there are at least two portions); {"middle"} (this is
 neither the first nor the last raw text portion); or {"only"} (this is a raw text portion standing by itself). The
 particular value determines the handling of commas and spaces on one or both sides of the raw text. If this field is
 specified, only the `label` field (containing the actual raw text) and the `category` field (containing an empty list)
 are set; all other fields are {nil}.
* `raw_label`: The raw label that was passed in.
* `non_canonical`: The label prior to canonicalization (i.e. alias resolution). Usually this is the same as `raw_label`,
 but if the raw label was preceded by an exclamation point (meaning "display the raw label as-is"), this field will
 contain the label stripped of the exclamation point, and if the raw label is of the form
 `<var>label</var>!<var>display</var>` (meaning "display the label in the specified form"), this field will contain the
 label before the exclamation point.
* `canonical`: If the label in `non_canonical` is an alias, this contains the canonical name of the label; otherwise it
 will be {nil}.
* `override_display`: If specified, this contains a string that overrides the normal display form of the label. The
 display form of a label is the `.display` field of the label data if present, and otherwise is normally the canonical
 form of the label (i.e. after alias resolution). (This is not the same as the formatted form of the label, found in
 `label`, which is the final form shown to the user and includes links to Wikipedia, the glossary, etc. as well as an
 HTML wrapper if the label is deprecated.) If `override_display` is specified, however, this is used in place of the
 normal display form of the label. This currently happens in two circumstances: (1) the label was preceded by ! to
 indicate that the raw label should be displayed rather than the canonical form; (2) the label was given in the form
 `<var>label</var>!<var>display</var>` (meaning "display the label in the specified `<var>display</var>` form").
* `label`: The formatted form of the label. This is what is actually shown to the user. If the label is recognized
 (found in some module), this will typically be in the form of a link.
* `categories`: A list of the categories to add the label to; an empty list if `nocat` was specified.
* `formatted_categories`: A string containing the formatted categories; {nil} if `nocat` or `for_doc` was specified,
 or if `categories` is empty. Currently will be an empty string if there are categories to format but the namespace is
 one that normally excludes categories (e.g. userspace and discussion pages), and `force_cat` isn't specified.
* `deprecated`: True if the label is deprecated.
* `recognized`: If true, the label was found in some module.
* `data`: The data structure for the label, as fetched from the label modules. For unrecognized labels, this will
 be an empty object.
]==]
functionexport.get_label_info(data)
ifnotdata.labelthen
error("`data` must now be an object containing the params")
end
localmode=validate_mode(data.mode)
localret={categories={}}
locallabel=data.label
localraw_label=label
ret.raw_label=raw_label
localoverride_display
iflabel:find("^!")then
label=label:gsub("^!","")
override_display=label
elseiflabel:find("![^%s]")then
label,override_display=label:match("^(.-)!([^%s].*)$")
ifnotlabelthen
error(("Internal error: This Lua pattern should never fail to match for label '%s'"):format(raw_label))
end
end
localnon_canonical=label
ret.non_canonical=non_canonical
localdeprecated=false
locallabdata
localsubmodule
localdata_langcode=data.langanddata.lang:getCode()ornil
localsubmodules_to_check=export.get_submodules(data.lang)
for_,submodule_to_checkinipairs(submodules_to_check)do
submodule=mw.loadData(submodule_to_check)
localthis_labdata=submodule[label]
localresolved_label
iftype(this_labdata)=="string"then
resolved_label=this_labdata
this_labdata=submodule[this_labdata]
ifnotthis_labdatathen
error(("Internal error: Label alias '%s' points to '%s', which is undefined in module [[%s]]"):format(
label,resolved_label,submodule_to_check))
end
iftype(this_labdata)=="string"then
error(("Internal error: Label alias '%s' points to '%s', which is also an alias (of '%s') in module [[%s]]"):format(
label,resolved_label,this_labdata,submodule_to_check))
end
end
ifthis_labdatathen
-- Make sure either there's no lang restriction, or we're processing lang-independent, or our language
-- is among the listed languages. Otherwise, continue processing (which could conceivably pick up a
-- lang-appropriate version of the label in another label data module).
locallablangs=getprop(this_labdata,mode,"langs")
ifnotlablangsornotdata_langcodethen
labdata=this_labdata
label=resolved_labelorlabel
break
end
locallang_in_list=false
for_,langcodeinipairs(lablangs)do
iflangcode==data_langcodethen
lang_in_list=true
break
end
end
iflang_in_listthen
labdata=this_labdata
label=resolved_labelorlabel
break
elseifnotdata.notrackthen
-- Track use of a label that fails the lang restriction.
-- [[Special:WhatLinksHere/Wiktionary:Tracking/labels/wrong-lang-label]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/labels/wrong-lang-label/LANGCODE]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/labels/wrong-lang-label/LABEL]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/labels/wrong-lang-label/LABEL/LANGCODE]]
track("wrong-lang-label",data_langcode)
track("wrong-lang-label/"..label,data_langcode)
ifresolved_labelthen
track("wrong-lang-label/"..resolved_label,data_langcode)
end
end
end
end
iflabdatathen
ret.recognized=true
else
labdata={}
ret.recognized=false
end
localfunctionlabprop(prop)
returngetprop(labdata,mode,prop)
end
iflabprop("deprecated")then
deprecated=true
end
iflabel~=non_canonicalthen
-- Note that this is an alias and store the canonical version.
ret.canonical=label
end
ifnotdata.notrackthen-- labprop("track") then -- track all labels now
-- Track label (after converting aliases to canonical form; but also track raw label (alias) if different
-- from canonical label).
-- [[Special:WhatLinksHere/Wiktionary:Tracking/labels/label/LABEL]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/labels/label/LABEL/LANGCODE]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/labels/label/LABEL/MODE]]
track("label/"..label,data_langcode,mode)
iflabel~=non_canonicalthen
track("label/"..non_canonical,data_langcode,mode)
end
end
localformatted_label
formatted_label,deprecated=export.format_label(label,labdata,data.lang,deprecated,override_display,mode)
ret.deprecated=deprecated
ifdeprecatedthen
ifnotdata.nocatthen
localdepcat="Entries with deprecated labels"
ifdata.for_docthen
depcat="<code>"..depcat.."</code>"
end
insert(ret.categories,depcat)
end
end
locallabel_for_already_seen=
(labprop("topical_categories")orlabprop("regional_categories")
orlabprop("plain_categories")orlabprop("pos_categories")
orlabprop("sense_categories"))andformatted_label
ornil
-- Track label text. If label text was previously used, don't show it, but include the categories.
-- For an example, see [[hypocretin]].
ifdata.already_seenanddata.already_seen[label_for_already_seen]then
ret.label=""
else
ifformatted_label:find("{")then
formatted_label=mw.getCurrentFrame():preprocess(formatted_label)
end
ret.label=formatted_label
end
ifdata.nocatthen
-- do nothing
else
localcats=export.fetch_categories(label,labdata,data.lang,mode,data.for_doc)
for_,catinipairs(cats)do
insert(ret.categories,cat)
end
ifnotret.categories[1]ordata.for_docthen
-- Don't try to format categories if we're doing this for documentation ({{label/doc}}), because there
-- will be HTML in the categories.
-- do nothing
else
ret.formatted_categories=require(utilities_module).format_categories(ret.categories,data.lang,
data.sort,nil,force_catordata.force_cat)
end
end
ret.data=labdata
iflabel_for_already_seenanddata.already_seenthen
data.already_seen[label_for_already_seen]=true
end
returnret
end
--[==[
Split a string containing comma-separated raw labels into the individual labels. This will not split on a comma
followed by whitespace, and it will not split inside of matched <...> or [...]. The code is written to be efficient, so
that it does not load modules (e.g. [[Module:parse utilities]]) unnecessarily.
]==]
functionexport.split_labels_on_comma(term)
ifterm:find("[%[<]")then
-- Do it the "hard way". We don't want to split anything inside of <...> or <<...>> even if there are commas
-- inside of the angle brackets. For good measure we do the same for [...] and [[...]]. We first parse balanced
-- segment runs involving either [...] or <...>. Then we split alternating runs on comma (but not on
-- comma+whitespace). Then we rejoin the split runs. For example, given the following:
-- "regional,older <<non-rhotic,and,non-hoarse-horse>> speakers", the first call to
-- parse_multi_delimiter_balanced_segment_run() produces
--
-- {"regional,older ", "<<non-rhotic,and,non-hoarse-horse>>", " speakers"}
--
-- After calling split_alternating_runs_on_comma(), we get the following:
--
-- {{"regional"}, {"older ", "<<non-rhotic,and,non-hoarse-horse>>", " speakers"}}
--
-- After rejoining each group, we get:
--
-- {"regional", "older <<non-rhotic,and,non-hoarse-horse>> speakers"}
--
-- which is the desired output. When processing the second "label" string, the code in process_raw_labels()
-- will do a similar process to this to pull out the labels inside of the <<...>> notation.
localput=require(parse_utilities_module)
localsegments=put.parse_multi_delimiter_balanced_segment_run(term,{{"<",">"},{"[","]"}})
-- This won't split on comma+whitespace.
localcomma_separated_groups=put.split_alternating_runs_on_comma(segments)
fori,groupinipairs(comma_separated_groups)do
comma_separated_groups[i]=table.concat(group)
end
returncomma_separated_groups
elseifterm:find(",%s")then
-- This won't split on comma+whitespace.
returnrequire(parse_utilities_module).split_on_comma(term)
elseifterm:find(",")then
returnrequire(string_utilities_module).split(term,",")
else
return{term}
end
end
--[==[
Return a list of objects corresponding to a set of raw labels. Each object returned is of the format returned by
{get_label_info()}. This is similar to looping over the labels and calling {get_label_info()} on each one, but it also
correctly handles embedded double angle bracket specs <<...>> found in the labels. (In such a case, there will be more
objects returned than raw labels passed in.) On input, `data` is an object with the following fields:
* `labels`: The list of labels to process.
* `lang`: The language of the labels. Must be specified.
* `mode`: How the label was invoked; see {get_label_info()} for more information.
* `nocat`: If true, don't add the label to any categories.
* `force_cat`: Force adding categories even in namespaces that normally exclude them (e.g. userspace and discussion
 pages).
* `notrack`: Disable all tracking for this label.
* `sort`: Sort key for categorization.
* `already_seen`: An object used to track labels already seen, so they aren't displayed twice. Tracking is according
 to the display form of the label, so if two labels have the same display form, the second one won't be displayed
 (but its categories will still be added). If `already_seen` is {nil}, this tracking doesn't happen.
* `ok_to_destructively_modify`: If set, the `data` structure will be destructively modified in the process of this
 function running.
]==]
functionexport.process_raw_labels(data)
locallabel_infos={}
ifnotdata.ok_to_destructively_modifythen
data=m_table.shallowCopy(data)
data.ok_to_destructively_modify=true
end
localfunctionget_info_and_insert(label)
-- Reuse this structure to save memory.
data.label=label
insert(label_infos,export.get_label_info(data))
end
for_,labelinipairs(data.labels)do
iflabel:find("<<")then
localsegments=require(string_utilities_module).split(label,"<<(.-)>>")
fori,segmentinipairs(segments)do
ifi%2==1then
localraw_text_type=i==1and"begin"ori==#segmentsand"end"or"middle"
insert(label_infos,{raw_text=raw_text_type,label=segment,categories={}})
else
localsegment_labels=export.split_labels_on_comma(segment)
for_,segment_labelinipairs(segment_labels)do
get_info_and_insert(segment_label)
end
end
end
else
get_info_and_insert(label)
end
end
returnlabel_infos
end
--[==[
Split a comma-separated string of raw labels and process each label to get a list of objects suitable for passing to
{format_processed_labels()}. Each object returned is of the format returned by {get_label_info()}. This is equivalent to
calling {split_labels_on_comma()} followed by {process_raw_labels()}. On input, `data` is an object with the following
fields:
* `labels`: The string containing the raw comma-separated labels.
* `lang`: The language of the labels. Must be specified.
* `mode`: How the label was invoked; see {get_label_info()} for more information.
* `nocat`: If true, don't add the label to any categories.
* `force_cat`: Force adding categories even in namespaces that normally exclude them (e.g. userspace and discussion
 pages).
* `notrack`: Disable all tracking for this label.
* `sort`: Sort key for categorization.
* `already_seen`: An object used to track labels already seen, so they aren't displayed twice. Tracking is according
 to the display form of the label, so if two labels have the same display form, the second one won't be displayed
 (but its categories will still be added). If `already_seen` is {nil}, this tracking doesn't happen.
* `ok_to_destructively_modify`: If set, the `data` structure will be destructively modified in the process of this
 function running.
]==]
functionexport.split_and_process_raw_labels(data)
ifnotdata.ok_to_destructively_modifythen
data=m_table.shallowCopy(data)
data.ok_to_destructively_modify=true
end
data.labels=export.split_labels_on_comma(data.labels)
returnexport.process_raw_labels(data)
end
--[==[
Format one or more already-processed labels for display and categorization. "Already-processed" means that
{get_label_info()} or {process_raw_labels()} has been called on the raw labels to convert them into objects containing
information on how to display and categorize the labels. This is a lower-level alternative to {show_labels()} and is
meant for modules such as [[Module:alternative forms]], [[Module:quote]] and [[Module:etymology/templates/descendant]]
that support displaying labels along with some other information.
On input `data` is an object with the following fields:
* `labels`: List of the label objects to format, in the format returned by {get_label_info()}.
* `lang`: The language of the labels.
* `open`: Open bracket or parenthesis to display before the concatenated labels. If specified, it is wrapped in the
 {"ib-brac"} and {"label-brac"} CSS classes. If {nil} or {false}, no open bracket is displayed.
* `close`: Close bracket or parenthesis to display after the concatenated labels. If specified, it is wrapped in the
 {"ib-brac"} and {"label-brac"} CSS classes. If {nil} or {false}, no close bracket is displayed.
* `no_ib_content`: By default, the concatenated formatted labels inside of the open/close brackets are wrapped in the
 {"ib-content"} and {"label-content"} CSS classes. Specify this to suppress this wrapping.
* `raw`: Suppress all CSS wrapping of content, including open/close parentheses, content and comma delimiters (which
 are normally wrapped in {"ib-comma"} and {"label-comma"} CSS classes).
* `ok_to_destructively_modify`: If set, the `data` structure, and the `data.labels` table inside of it, will be
 destructively modified in the process of this function running.
* `split_output`: If not given, the return value is a concatenation of the formatted concatenated labels and formatted
 categories. Otherwise, two values are returned: the formatted pronunciation and the categories. If `split_output` is
 the value {"raw"}, the categories are returned in list form, where the list elements are strings f the form suitable
 for passing to {format_categories()} in [[Module:utilities]]. If `split_output` is any other value besides {nil}, the
 categories are returned as a pre-formatted concatenated string.
The return value (or the first return value, if `split_output` is given) is a string containing the contenated labels,
optionally surrounded by open/close brackets or parentheses. Normally, labels are separated by comma-space sequences,
but this may be suppressed for certain labels. If `nocat` wasn't given to {get_label_info()} or {process_raw_labels()},
and `split_output` wasn't given, the label objects will contain formatted categories in them, which will be inserted
into the returned text. (Use `split_output` if you need the categories returned separately.) The concatenated text
inside of the open/close brackets is normally wrapped in the {"ib-content"} CSS class, but this can be suppressed, as
mentioned above.
]==]
functionexport.format_processed_labels(data)
ifnotdata.labelsthen
error("`data` must now be an object containing the params")
end
ifnotdata.ok_to_destructively_modifythen
data=m_table.shallowCopy(data)
data.labels=m_table.deepCopy(data.labels)
data.ok_to_destructively_modify=true
end
locallabels=data.labels
ifnotlabels[1]then
error("You must specify at least one label.")
end
-- Show the labels
localomit_preComma=false
localomit_postComma=true
localomit_preSpace=false
localomit_postSpace=true
for_,labelinipairs(labels)do
omit_preComma=omit_postComma
omit_preSpace=omit_postSpace
localraw_text_omit_before=label.raw_text=="middle"orlabel.raw_text=="end"
localraw_text_omit_after=label.raw_text=="middle"orlabel.raw_text=="begin"
label.omit_comma=omit_preCommaor(label.dataandlabel.data.omit_preComma)orraw_text_omit_before
omit_postComma=(label.dataandlabel.data.omit_postComma)orraw_text_omit_after
label.omit_space=omit_preSpaceor(label.dataandlabel.data.omit_preSpace)orraw_text_omit_before
omit_postSpace=(label.dataandlabel.data.omit_postSpace)orraw_text_omit_after
end
ifdata.langthen
locallang_functions_module=export.lang_specific_data_modules_prefix..data.lang:getCode().."/functions"
localm_lang_functions=require(load_module).safe_require(lang_functions_module)
ifm_lang_functionsandm_lang_functions.postprocess_handlersthen
for_,handlerinipairs(m_lang_functions.postprocess_handlers)do
handler(data)
end
end
end
localfunctionwrap_css(txt,suffix)
ifdata.rawthen
returntxt
end
return("<span class=\"ib-%s label-%s\">%s</span>"):format(suffix,suffix,txt)
end
localcategories=nil
localformatted_categories=split_outputandsplit_output~="raw"and{}ornil
fori,labelinfoinipairs(labels)do
locallabel
-- Need to check for 'not raw_text' here because blank labels may legitimately occur as raw text if a double
-- angle bracket spec occurs at the beginning of a label. In this case we've already taken into account the
-- context and don't want to leave out a preceding comma and space e.g. in a case like
-- {{lb|en|rare|<<dialect>> or <<eye dialect>>}}. FIXME: We should reconsider whether we need this special case
-- at all.
iflabelinfo.label==""andnotlabelinfo.raw_textthen
label=""
else
label=(labelinfo.omit_commaand""orwrap_css(",","comma"))..
(labelinfo.omit_spaceand""or"&#32;")..
labelinfo.label
end
ifsplit_outputthen
labels[i]=label
ifsplit_output=="raw"then
iflabelinfo.categoriesandlabelinfo.categories[1]then
ifcategoriesthen
m_table.extend(categories,labelinfo.categories)
else
categories=labelinfo.categories
end
end
elseiflabelinfo.formatted_categoriesthen
insert(formatted_categories,labelinfo.formatted_categories)
end
else
labels[i]=label..(labelinfo.formatted_categoriesor"")
end
end
localfunctionwrap_open_close(val)
ifvalthen
returnwrap_css(val,"brac")
else
return""
end
end
localconcatenated_labels=table.concat(labels,"")
ifnotdata.no_ib_contentthen
concatenated_labels=wrap_css(concatenated_labels,"content")
end
localret_labels=wrap_open_close(data.open)..concatenated_labels..wrap_open_close(data.close)
ifsplit_output=="raw"then
returnret_labels,categories
elseifsplit_outputthen
returnret_labels,concat(formatted_categories)
else
returnret_labels
end
end
--[==[
Format one or more labels for display and categorization. This provides the implementation of the
{{tl|label}}/{{tl|lb}}, {{tl|term label}}/{{tl|tlb}} and {{tl|accent}}/{{tl|a}} templates, and can also be called from a
module. The return value is a string to be inserted into the generated page, including the display and categories. On
input `data` is an object with the following fields:
* `labels`: List of the labels to format.
* `lang`: The language of the labels.
* `mode`: How the label was invoked; see {get_label_info()} for more information.
* `nocat`: If true, don't add the labels to any categories.
* `force_cat`: Force adding categories even in namespaces that normally exclude them (e.g. userspace and discussion
 pages).
* `notrack`: Disable all tracking for these labels.
* `sort`: Sort key for categorization.
* `no_track_already_seen`: Don't track already-seen labels. If not specified, already-seen labels are not displayed
 again, but still categorize. See the documentation of {get_label_info()}.
* `open`: Open bracket or parenthesis to display before the concatenated labels. If {nil}, defaults to an open
 parenthesis. Set to {false} to disable.
* `close`: Close bracket or parenthesis to display after the concatenated labels. If {nil}, defaults to a close
 parenthesis. Set to {false} to disable.
* `no_ib_content`: As in `format_processed_labels()`.
* `raw`: As in `format_processed_labels()`. Also suppress wrapping the entire formatted result in a usage label CSS
 class (see below).
* `ok_to_destructively_modify`: If set, the `data` structure will be destructively modified in the process of this
 function running.
Compared with {format_processed_labels()}, this function has the following differences:
# The labels specified in `labels` are raw labels (i.e. strings) rather than formatted objects.
# The open and close brackets default to parentheses ("round brackets") rather than not being displayed by default.
# Tracking of already-seen labels is enabled unless explicitly turned off using `no_track_already_seen`.
# The entire formatted result is wrapped in a {"usage-label-<var>type</var>"} CSS class (depending on the value of
 `mode`), unless `raw` is given.
]==]
functionexport.show_labels(data)
ifnotdata.labelsthen
error("`data` must now be an object containing the params")
end
ifnotdata.ok_to_destructively_modifythen
data=m_table.shallowCopy(data)
data.ok_to_destructively_modify=true
end
locallabels=data.labels
ifnotlabels[1]then
error("You must specify at least one label.")
end
localmode=validate_mode(data.mode)
ifnotdata.no_track_already_seenthen
data.already_seen={}
end
data.labels=export.process_raw_labels(data)
ifdata.open==nilthen
data.open="("
end
ifdata.close==nilthen
data.close=")"
end
localformatted=export.format_processed_labels(data)
ifdata.rawthen
returnformatted
else
return"<span class=\""..mode_to_outer_class[mode].."\">"..formatted.."</span>"
end
end
--[==[Helper function for the data modules.]==]
functionexport.alias(labels,key,aliases)
m_table.alias(labels,key,aliases)
end
--[==[
Split the display form of a label. Returns two values: `link` and `display`. If the display form consists of a
two-part link, `link` is the first part and `display` is the second part. If the display form consists of a
single-part link, `link` and `display` are the same. Otherwise (the display form is not a link or contains an
embedded link), `link` is the same as the passed-in `label` and `display` is nil.
]==]
functionexport.split_display_form(label)
ifnotlabel:find("%[%[")then
returnlabel,nil
end
locallink,display=label:match("^%[%[([^%[%]|]+)|([^%[%]|]+)%]%]$")
iflinkthen
returnlink,display
end
link=label:match("^%[%[([^%[%]|])+%]%]$")
iflinkthen
returnlink,link
end
returnlabel,nil
end
--[==[
Combine the `link` and `display` parts of the display form of a label as returned by {split_display_form()}.
If `display` is nil, `link` is returned directly. Otherwise, a one-part or two-part link is constructed
depending on whether `link` and `display` are the same. (As a special case, if both consist of a blank string,
the return value is a blank string rather than a malformed link.)
]==]
functionexport.combine_display_form_parts(link,display)
ifnotdisplaythen
returnlink
end
iflink==displaythen
iflink==""then
return""
else
return("[[%s]]"):format(link)
end
end
return("[[%s|%s]]"):format(link,display)
end
--[==[Used to finalize the data into the form that is actually returned.]==]
functionexport.finalize_data(labels)
localshallow_copy=m_table.shallowCopy
localaliases={}
forlabel,datainpairs(labels)do
iftype(data)=="table"then
ifdata.aliasesthen
for_,aliasinipairs(data.aliases)do
aliases[alias]=label
end
data.aliases=nil
end
ifdata.deprecated_aliasesthen
localdata2=shallow_copy(data)
data2.deprecated=true
data2.canonical=label
for_,aliasinipairs(data2.deprecated_aliases)do
aliases[alias]=data2
end
data.deprecated_aliases=nil
data2.deprecated_aliases=nil
end
end
end
forlabel,datainpairs(aliases)do
labels[label]=data
end
returnlabels
end
returnexport