VOOZH about

URL: https://deepwiki.com/netgen/query-translator/3.2-parser

⇱ Parser | netgen/query-translator | DeepWiki


Loading...
Menu

Parser

The Parser class implements a shift-reduce parsing algorithm that transforms a TokenSequence into a SyntaxTree. Located at lib/Languages/Galach/Parser.php24 it uses an internal stack-based state machine to build a hierarchical node structure representing the query according to Galach grammar rules.

The parser ensures all input produces valid output by applying corrections when encountering malformed syntax. Error handling is covered in detail in page 3.3.

Parser Position in Pipeline

The Parser consumes TokenSequence from the Tokenizer and produces SyntaxTree for Generators:


Sources: lib/Languages/Galach/Parser.php159-174 lib/Values/TokenSequence.php11-36 lib/Values/SyntaxTree.php

Class Structure

The Parser implements the Parsing interface and maintains three private properties during parsing:

PropertyTypePurpose
$tokensToken[]Input token array, consumed via array_shift()
$stackSplStackWorking stack holding tokens and nodes
$correctionsCorrection[]Accumulated corrections during parsing

Sources: lib/Languages/Galach/Parser.php24-174 lib/Parsing.php

Shift-Reduce Algorithm

The parse() method implements a shift-reduce parser using a main loop that alternates between shifting tokens and reducing nodes:


The parser maintains no explicit state variable - state is encoded in the stack contents and remaining token array.

Sources: lib/Languages/Galach/Parser.php159-174 lib/Languages/Galach/Parser.php176-182 lib/Languages/Galach/Parser.php184-207 lib/Languages/Galach/Parser.php472-483

Node Type Hierarchy

The parser constructs seven node types, all implementing the Node interface. Each node type corresponds to a specific syntactic construct:

Node ClassPropertiesRepresentsCreated In
Querynodes[]Root node containing all top-level nodesreduceQuery() line 482
TermtokenLeaf node for words, phrases, tags, usersshiftTerm() line 287
Groupnodes[], tokenLeft, tokenRightParenthesized subqueryshiftGroupEnd() line 299 reduceGroup() line 393
LogicalAndleftOperand, rightOperand, tokenBinary AND operationreduceLogicalAnd() line 358
LogicalOrleftOperand, rightOperand, tokenBinary OR operationreduceLogicalOr() line 390
LogicalNotoperand, tokenUnary NOT operationreduceLogicalNot() line 334
Mandatoryoperand, tokenUnary + (required) operatorreducePreference() line 316
Prohibitedoperand, tokenUnary - (excluded) operatorreducePreference() line 319

Sources: lib/Languages/Galach/Values/Node/Query.php lib/Languages/Galach/Values/Node/Term.php lib/Languages/Galach/Values/Node/Group.php lib/Languages/Galach/Values/Node/LogicalAnd.php lib/Languages/Galach/Values/Node/LogicalOr.php lib/Languages/Galach/Values/Node/LogicalNot.php lib/Languages/Galach/Values/Node/Mandatory.php lib/Languages/Galach/Values/Node/Prohibited.php

Shift Operations and Token-to-Method Mapping

Each token type maps to a specific shift method via the $shifts lookup array line 89-101:


Sources: lib/Languages/Galach/Parser.php89-101 lib/Languages/Galach/Parser.php209-305

Reduction Rules and Precedence

The reduce() method line 184-207 applies reductions based on node type using the $nodeToReductionGroup and $reductionGroups mappings line 103-136:


This implements operator precedence: preference operators (+/-) bind tightest, followed by NOT, then AND, then OR.

Sources: lib/Languages/Galach/Parser.php103-136 lib/Languages/Galach/Parser.php184-207 lib/Languages/Galach/Parser.php461-470

Operator Precedence Implementation

Precedence is enforced by the order of reduction attempts. Higher precedence operators are reduced before lower precedence ones:

PrecedenceOperatorsReduction MethodLines
1 (Highest)Grouping ()reduceGroup()393-415
2Preference + -reducePreference()307-320
3Logical NOT NOT !reduceLogicalNot()322-335
4Logical AND ANDreduceLogicalAnd()349-359
5 (Lowest)Logical OR ORreduceLogicalOr()369-391

The reduceLogicalOr() method includes special logic line 376-384 to respect AND precedence: when inside a query (not a group) and the next token is AND, it pushes the current node back onto the stack and returns null, allowing the AND to be reduced first.

Example: one OR two AND three produces Query(LogicalOr(Term(one), LogicalAnd(Term(two), Term(three)))) because AND is reduced before OR.

Sources: lib/Languages/Galach/Parser.php307-391 tests/Galach/IntegrationTest.php104-148

Stack State Machine

The parser's behavior is determined by the state of its SplStack. The stack contains a mix of Token and Node objects that represent the parse state:


Stack inspection methods guide parser decisions:

  • isTopStackToken($typeMask) line 508-511 checks if stack top is a token matching the mask
  • isToken($token, $typeMask) line 495-506 validates token type

Sources: lib/Languages/Galach/Parser.php148-151 lib/Languages/Galach/Parser.php495-511

Initialization: Group Delimiter Matching

The init() method line 453-459 calls cleanupGroupDelimiters() line 592-613 to remove unmatched parentheses before parsing begins:


The algorithm uses two tracking arrays line 615-638:

  • $trackLeft accumulates left delimiter indexes
  • $trackRight accumulates unmatched right delimiter indexes
  • When a right delimiter is encountered with a pending left, they match and the left is popped
  • Remaining indexes in both arrays represent unmatched delimiters

Sources: lib/Languages/Galach/Parser.php453-459 lib/Languages/Galach/Parser.php592-638

Reduction Method Details

Each reduction method checks specific stack conditions and returns either a new node (reduction succeeded) or the original node (no reduction):

reducePreference() line 307-320:

  • Checks if stack top is TOKEN_MANDATORY or TOKEN_PROHIBITED
  • Pops token and wraps node in Mandatory or Prohibited node
  • Returns wrapped node or original node

reduceLogicalNot() line 322-335:

  • Checks if stack top is TOKEN_LOGICAL_NOT or TOKEN_LOGICAL_NOT_2
  • Special case: if node is already Mandatory or Prohibited, ignores NOT operators via ignoreLogicalNotOperatorsPrecedingPreferenceOperator() line 337-347
  • Otherwise pops token and wraps in LogicalNot node

reduceLogicalAnd() line 349-359:

  • Requires stack count > 1 and stack top is TOKEN_LOGICAL_AND
  • Pops token and left operand from stack
  • Creates LogicalAnd(leftOperand, node, token)

reduceLogicalOr() line 369-391:

  • Requires stack count > 1 and stack top is TOKEN_LOGICAL_OR
  • If not in group: checks if next token is AND; if so, pushes node back and returns null (lets AND be reduced first)
  • Pops token and left operand
  • Creates LogicalOr(leftOperand, node, token)

reduceGroup() line 393-415:

  • Pops TOKEN_GROUP_END from stack
  • Pops all tokens except TOKEN_GROUP_BEGIN via popTokens(~TOKEN_GROUP_BEGIN) line 398
  • If next is TOKEN_GROUP_BEGIN, empty group detected - ignore via ignoreEmptyGroup() line 402-403
  • Otherwise: reduces remaining OR, collects all nodes from stack top, pops left delimiter
  • Sets group properties and returns

Sources: lib/Languages/Galach/Parser.php307-415

Token Type Shortcuts

The parser uses bitmask shortcuts defined in $tokenShortcuts line 78-87 for efficient token type checking:

ShortcutToken TypesUsage
operatorNotTOKEN_LOGICAL_NOT | TOKEN_LOGICAL_NOT_2Checking for any NOT variant
operatorPreferenceTOKEN_MANDATORY | TOKEN_PROHIBITEDChecking for +/- operators
operatorPrefixMANDATORY | PROHIBITED | LOGICAL_NOT_2Operators that can precede terms
operatorUnaryAll unary operators (NOT, +, -, !)Checking for any unary operator
operatorBinaryTOKEN_LOGICAL_AND | TOKEN_LOGICAL_ORChecking for binary operators
operatorAll operator tokensChecking for any operator
groupDelimiterTOKEN_GROUP_BEGIN | TOKEN_GROUP_ENDChecking for parentheses
binaryOperatorAndWhitespaceAND | OR | WHITESPACEUsed in ignoreFollowingOperators() line 558-570

These are used extensively in validation methods throughout the parser for efficient bitwise token type checking.

Sources: lib/Languages/Galach/Parser.php78-87

Final Query Assembly

The reduceQuery() method line 472-483 performs final cleanup and creates the root Query node:

  1. popTokens() line 474 - Removes any dangling tokens from stack (generates corrections for missing operands)
  2. reduceRemainingLogicalOr() line 475 - Performs final OR reduction if stack top is a node
  3. Collect nodes line 478-480 - Pops all remaining items from stack (should all be nodes at this point)
  4. Create Query line 482 - Wraps collected nodes in new Query($nodes)
  5. Push back - Pushes Query node as sole stack item

The resulting stack contains exactly one Query node which becomes the SyntaxTree root.

Sources: lib/Languages/Galach/Parser.php472-483

Parse Example: one AND NOT two

Detailed trace of stack and token operations:

StepActiontokens[]stackResult
1init()[Word(one), AND, NOT, Word(two)][]Initialize
2shift() Word(one)[AND, NOT, Word(two)][]Returns Term(one)
3reduce(Term(one))[AND, NOT, Word(two)][]Try reductions, none apply
4(reduction complete)[AND, NOT, Word(two)][Term(one)]Push to stack
5shift() AND[NOT, Word(two)][Term(one)]Validate & push
6(returns null)[NOT, Word(two)][Term(one), AND]No node to reduce
7shift() NOT[Word(two)][Term(one), AND]Push token
8(returns null)[Word(two)][Term(one), AND, NOT]No node to reduce
9shift() Word(two)[][Term(one), AND, NOT]Returns Term(two)
10reduce(Term(two))[][Term(one), AND, NOT]Try reductions
11reduceLogicalNot()[][Term(one), AND]Pops NOT, creates LogicalNot(Term(two))
12reduceLogicalAnd()[][Term(one)]Pops AND & Term(one), creates LogicalAnd(Term(one), LogicalNot(Term(two)))
13(reduction complete)[][LogicalAnd(...)]Push to stack
14reduceQuery()[][]Collects [LogicalAnd(...)], creates Query([LogicalAnd(...)])
15Return[][Query(...)]Returns SyntaxTree

Sources: tests/Galach/IntegrationTest.php182-201

Integration Pattern

Typical usage sequence showing interaction with Tokenizer and Generators:


The SyntaxTree object contains:

  • rootNode: The Query node (or other root)
  • tokenSequence: Reference to original TokenSequence
  • corrections: Array of Correction objects applied during parsing

Sources: tests/Galach/IntegrationTest.php35-36 lib/Values/SyntaxTree.php

Error Correction Types

The Parser defines several types of corrections that it applies when encountering problematic input:

Correction TypeConstantDescription
Adjacent Unary OperatorCORRECTION_ADJACENT_UNARY_OPERATOR_PRECEDING_OPERATOR_IGNOREDParser ignored adjacent unary operator preceding another operator
Missing OperandCORRECTION_UNARY_OPERATOR_MISSING_OPERAND_IGNOREDParser ignored unary operator missing an operand
Missing Left OperandCORRECTION_BINARY_OPERATOR_MISSING_LEFT_OPERAND_IGNOREDParser ignored binary operator missing left side operand
Missing Right OperandCORRECTION_BINARY_OPERATOR_MISSING_RIGHT_OPERAND_IGNOREDParser ignored binary operator missing right side operand
Operator Following OperatorCORRECTION_BINARY_OPERATOR_FOLLOWING_OPERATOR_IGNOREDParser ignored binary operator following another operator
NOT Before PreferenceCORRECTION_LOGICAL_NOT_OPERATORS_PRECEDING_PREFERENCE_IGNOREDParser ignored logical not operators preceding mandatory or prohibited operator
Empty GroupCORRECTION_EMPTY_GROUP_IGNOREDParser ignored empty group and connecting operators
Unmatched Left DelimiterCORRECTION_UNMATCHED_GROUP_LEFT_DELIMITER_IGNOREDParser ignored unmatched left side group delimiter
Unmatched Right DelimiterCORRECTION_UNMATCHED_GROUP_RIGHT_DELIMITER_IGNOREDParser ignored unmatched right side group delimiter
Bailout TokenCORRECTION_BAILOUT_TOKEN_IGNOREDParser ignored bailout type token

Sources: lib/Languages/Galach/Parser.php28-76

Conclusion

The Parser component is a critical part of the QueryTranslator system. It employs a shift-reduce parsing algorithm to transform token sequences into structured syntax trees that represent queries in the Galach language. The Parser handles various operator types, grouping structures, and implements robust error correction mechanisms to handle invalid or problematic input.

For more information about token extraction and tokenization, see Tokenization. For details on how the syntax tree is used to generate backend queries, see Query Generation.