VOOZH about

URL: https://deepwiki.com/netgen/query-translator/5.2-solr-extendeddismax-generator

⇱ Solr ExtendedDisMax Generator | netgen/query-translator | DeepWiki


Loading...
Menu

Solr ExtendedDisMax Generator

Purpose and Scope

The Solr ExtendedDisMax Generator converts a Galach SyntaxTree into a query string compatible with Solr's Extended DisMax query parser. This generator produces output that can be used with Solr's edismax query parser, enabling user-friendly search interfaces backed by Solr.

For information about the Elasticsearch equivalent, see Elasticsearch QueryString Generator. For details about shared Lucene visitor components used by both generators, see Lucene Generators Common Components. For the visitor pattern architecture, see Visitor Pattern Implementation.

Sources: lib/Languages/Galach/Generators/ExtendedDisMax.php1-37

Generator Architecture

The ExtendedDisMax class serves as the main entry point for generating Solr queries. It follows the same architectural pattern as other generators in the system: a simple facade that delegates to a visitor implementation.


Component Architecture: ExtendedDisMax Generator delegates to visitor collection

The generator constructor accepts a single Visitor parameter, typically an Aggregate visitor that dispatches to specialized visitor implementations. The generate() method accepts a SyntaxTree and optional parameters, returning the generated query string.

Sources: lib/Languages/Galach/Generators/ExtendedDisMax.php1-37 tests/Galach/Generators/ExtendedDisMaxTest.php227-261

Visitor Configuration

The ExtendedDisMax generator requires a collection of visitors to handle different node types in the syntax tree. The typical configuration includes:

Visitor ClassNode Type HandledPurpose
Lucene\Common\QueryQueryRoot node visitor
Lucene\Common\GroupGroupHandles grouped expressions with domain mapping
Lucene\Common\LogicalAndLogicalAndTranslates AND operators
Lucene\Common\LogicalOrLogicalOrTranslates OR operators
Lucene\Common\LogicalNotLogicalNotTranslates NOT operators
Lucene\Common\MandatoryMandatoryTranslates + (mandatory) operator
Lucene\Common\ProhibitedProhibitedTranslates - (prohibited) operator
Lucene\Common\PhrasePhraseHandles quoted phrases with domain mapping
Lucene\Common\TagTagTranslates #tag syntax to field query
Lucene\Common\UserUserTranslates @user syntax to field query
Lucene\ExtendedDisMax\WordWordHandles word tokens with ExtendedDisMax escaping

The majority of visitors are shared between ExtendedDisMax and QueryString generators (see Lucene Generators Common Components). Only the Word visitor differs, implementing ExtendedDisMax-specific character escaping rules.

Sources: tests/Galach/Generators/ExtendedDisMaxTest.php227-261

Field Mapping Configuration

Field mapping allows translation of Galach domain prefixes and special syntax into Solr field names. Three types of field mapping are configured:

Domain Prefix Mapping

The Group and Phrase visitors accept a field map array and default field name:


When a user query contains a domain prefix like domain:(one two), the generator maps it to special_text_t:(one two). Unrecognized domains default to the specified field (e.g., unexpected:(one two) becomes default_text_t:(one two)).

Tag Field Mapping

The Tag visitor maps Galach #tag syntax to a specific Solr field:


Input #tag translates to tags_ms:tag.

User Field Mapping

The User visitor maps Galach @user syntax to a specific Solr field:


Input @user translates to user_s:user.


Field Mapping Flow: Domain prefixes and special syntax translate to Solr field names

Sources: tests/Galach/Generators/ExtendedDisMaxTest.php16-20 tests/Galach/Generators/ExtendedDisMaxTest.php232-256 tests/Galach/Generators/ExtendedDisMaxTest.php46-116

Character Escaping Rules

The ExtendedDisMax\Word visitor implements character escaping rules specific to Solr's Extended DisMax parser. The escapeWord() method applies a regex pattern to escape special characters.

Escaped Characters

The following characters are escaped with a backslash prefix when they appear in word tokens:

Character(s)Escaped FormReason
+\+Mandatory operator
-\-Prohibited operator
&&\&&AND operator alternative
||||OR operator alternative
!\!NOT operator alternative
( )\( \)Grouping delimiters
{ }\{ \}Range query delimiters
[ ]\[ \]Range query delimiters
^\^Boost operator
"\"Phrase delimiter
~\~Fuzzy/proximity operator
*\*Wildcard operator
?\?Single-char wildcard
:\:Field separator
/\/Regex delimiter
\\\Escape character itself
(space)\ Word boundary

The implementation uses a single preg_replace() call with a pattern that matches all special characters:

/(\\+|-|&&|\\|\\||!|\\(|\\)|\\{|}|\\[|]|\\^|"|~|\\*|\\?|:|\\/|\\\\| )/

Each matched character is prefixed with a backslash: \\$1.

Escaping Behavior Examples

Input QueryOutput QueryNotes
oneoneNo special characters
\\\Single backslash escaped
\\\\Already escaped backslash preserved
\+\+Escaped operator preserved
\\&&\\\&&Mixed escaping handled
\*\\\*Wildcard escape handled

Sources: lib/Languages/Galach/Generators/Lucene/ExtendedDisMax/Word.php19-26 tests/Galach/Generators/ExtendedDisMaxTest.php117-201

Operator Normalization

The generator normalizes operator syntax to ExtendedDisMax canonical forms:


Operator Normalization: Alternative syntax forms convert to canonical operators

All logical operators are normalized to their word forms (AND, OR, NOT) rather than symbol forms (&&, ||, !). Unary operators (+, -) are preserved as-is.

Sources: tests/Galach/Generators/ExtendedDisMaxTest.php54-84

Translation Examples

The following table demonstrates complete translations from Galach syntax to ExtendedDisMax format:

Input QueryOutput QueryDescription
oneoneSimple word
'one''one'Single-quoted word
"one""one"Phrase
one twoone twoMultiple words
(one two)(one two)Grouped expression
one AND twoone AND twoLogical AND
one && twoone AND twoAlternative AND normalized
one OR twoone OR twoLogical OR
one || twoone OR twoAlternative OR normalized
NOT oneNOT oneLogical NOT
!oneNOT oneAlternative NOT normalized
+one+oneMandatory term
-one-oneProhibited term
@useruser_s:userUser tag translated
#tagtags_ms:tagTag translated
domain:onespecial_text_t:oneDomain prefix mapped
unexpected:onedefault_text_t:oneUnmapped domain defaults

Sources: tests/Galach/Generators/ExtendedDisMaxTest.php22-201

Usage Pattern


Execution Flow: SyntaxTree traversal produces Solr query string

Typical usage involves:

  1. Configure visitors with field mappings and default fields
  2. Create Aggregate visitor with configured visitor collection
  3. Instantiate ExtendedDisMax generator with Aggregate visitor
  4. Call generate() with a SyntaxTree to produce output

The generator is stateless and thread-safe once configured. The same generator instance can process multiple syntax trees.

Sources: lib/Languages/Galach/Generators/ExtendedDisMax.php25-36 tests/Galach/Generators/ExtendedDisMaxTest.php210-222

Comparison with QueryString Generator

The ExtendedDisMax and QueryString generators share the same architecture and most visitor implementations. The primary difference is the character escaping rules in their respective Word visitors:

GeneratorWord VisitorAdditional Escaped Characters
ExtendedDisMaxLucene\ExtendedDisMax\WordNone (baseline)
QueryStringLucene\QueryString\Word=, >, <

The QueryString generator escapes three additional characters (=, >, <) required by Elasticsearch's Query String Query parser. All other functionality is identical, achieved through visitor reuse.

Sources: lib/Languages/Galach/Generators/Lucene/ExtendedDisMax/Word.php19-26 lib/Languages/Galach/Generators/Lucene/QueryString/Word.php19-26 tests/Galach/Generators/QueryStringTest.php13-32