VOOZH about

URL: https://deepwiki.com/netgen/query-translator/3-the-galach-language

⇱ The Galach Language | netgen/query-translator | DeepWiki


Loading...
Menu

The Galach Language

Galach is the reference query language implementation in the QueryTranslator library. It implements an "unofficial standard" syntax that resembles the Lucene Query Parser syntax used by Solr and Elasticsearch, making it familiar to users of modern search interfaces.

The language is named after the common language from Frank Herbert's Dune universe, reflecting its role as the shared lingua franca for search queries across different backends.

Design Philosophy

Galach is designed as a user-facing query language that:

  1. Follows established conventions: Based on syntax patterns found in popular search engines
  2. Resembles Lucene Query Parser: Similar to the syntax used by Solr and Elasticsearch
  3. Provides error tolerance: No input is considered invalid; corrections are applied gracefully
  4. Enables customization: Extensible through the TokenExtractor abstraction
  5. Supports multiple backends: Single AST representation translates to different search engines

The language serves as both a production-ready query parser and a reference implementation demonstrating the QueryTranslator framework's capabilities.

Language Implementation Structure

Galach is implemented in the lib/Languages/Galach directory with the following key components:


Sources: lib/Languages/Galach/README.md1-63 README.md17-23

Position Within QueryTranslator Framework


Galach demonstrates the framework's design by providing concrete implementations of all extension points. The Tokenizer and Parser classes are marked final, showing that customization happens through the TokenExtractor abstraction rather than inheritance.

Sources: lib/Languages/Galach/README.md241-436 lib/Languages/Galach/README.md422-427

Syntax Overview

Galach syntax is based on conventions familiar from popular search engines and the Lucene Query Parser. The quick reference:

word "phrase" (group) +mandatory -prohibited AND && OR || NOT ! #tag @user domain:term

Example Query:

cheese AND (bacon OR eggs) +type:breakfast -@spamuser

This query searches for documents containing "cheese" AND either "bacon" OR "eggs", with mandatory field "type" matching "breakfast", excluding content from user "spamuser".

Token Type Mapping

Token Types to Code Entities


The token types are defined as constants in Tokenizer.php lib/Languages/Galach/Tokenizer.php18-32 and instantiated as specific token value objects in lib/Languages/Galach/Values/Token/.

Operator Precedence

Operators are applied in the following order:

  1. Unary operators (highest precedence): NOT, !, +, -
  2. Logical AND: AND, &&
  3. Logical OR (lowest precedence): OR, ||

Grouping with parentheses () overrides the default precedence.

Detailed syntax documentation is provided in Syntax and Features.

Sources: lib/Languages/Galach/README.md18-24 lib/Languages/Galach/SYNTAX.md1-229 README.md19-22

Processing Pipeline

Galach processing follows a three-phase compiler architecture: lexical analysis (tokenization), syntax analysis (parsing), and code generation.

Complete Processing Flow


Phase 1: Lexical Analysis

The Tokenizer class lib/Languages/Galach/Tokenizer.php breaks the input string into tokens using a TokenExtractor implementation. Two extractors are provided:

ExtractorPathSupported Features
Fulllib/Languages/Galach/TokenExtractor/Full.phpComplete Galach syntax including domains, tags, users
Textlib/Languages/Galach/TokenExtractor/Text.phpText subset only: words, phrases, operators, grouping

The tokenizer produces a TokenSequence lib/Values/TokenSequence.php containing all recognized tokens.

See Tokenization for detailed documentation.

Phase 2: Syntax Analysis

The Parser class lib/Languages/Galach/Parser.php implements a shift-reduce algorithm that constructs a SyntaxTree lib/Values/SyntaxTree.php from the token sequence. The parser creates a hierarchy of Node objects representing the abstract syntax tree.

The parser applies corrections for any invalid input, ensuring every query produces a valid AST. Correction information is preserved in the SyntaxTree.

See Parser for implementation details and Error Handling and Corrections for correction types.

Phase 3: Code Generation

Generators traverse the SyntaxTree using the Visitor pattern and produce backend-specific output:

GeneratorPathTarget
Nativelib/Languages/Galach/Generators/Native.phpGalach format (for cleanup/normalization)
ExtendedDisMaxlib/Languages/Galach/Generators/ExtendedDisMax.phpSolr Extended DisMax Query Parser
QueryStringlib/Languages/Galach/Generators/QueryString.phpElasticsearch Query String Query

Each generator is composed of multiple visitor implementations, one for each node type in the syntax tree.

See Query Generation for detailed documentation.

Sources: lib/Languages/Galach/README.md32-50 lib/Languages/Galach/README.md437-473

Error Handling Philosophy

Galach implements a key design principle: no input is considered invalid. The parser is completely resistant to errors and will produce a valid SyntaxTree from any input string.

When invalid syntax is encountered, the parser applies corrections to produce valid output while preserving the user's intended meaning as much as possible. All applied corrections are recorded in the SyntaxTree object and available for inspection.

Error Correction Flow


This approach makes Galach suitable for user-facing search interfaces where graceful degradation is essential. The correction information can be used to:

  • Clean up user input by regenerating with the Native generator
  • Provide syntax highlighting in rich input interfaces
  • Display error feedback to users
  • Log common syntax errors for analysis

Correction Types

Galach defines 10 correction types as constants in Parser.php lib/Languages/Galach/Parser.php12-21:

ConstantExample InputCorrected Result
CORRECTION_ADJACENT_UNARY_OPERATOR_PRECEDING_OPERATOR_IGNORED++one+one
CORRECTION_UNARY_OPERATOR_MISSING_OPERAND_IGNOREDone NOTone
CORRECTION_BINARY_OPERATOR_MISSING_LEFT_OPERAND_IGNOREDAND twotwo
CORRECTION_BINARY_OPERATOR_MISSING_RIGHT_OPERAND_IGNOREDone ANDone
CORRECTION_BINARY_OPERATOR_FOLLOWING_OPERATOR_IGNOREDone AND OR twoone two
CORRECTION_LOGICAL_NOT_OPERATORS_PRECEDING_PREFERENCE_IGNOREDNOT +one+one
CORRECTION_EMPTY_GROUP_IGNOREDone AND ()one
CORRECTION_UNMATCHED_GROUP_LEFT_DELIMITER_IGNOREDone (one
CORRECTION_UNMATCHED_GROUP_RIGHT_DELIMITER_IGNORED) oneone
CORRECTION_BAILOUT_TOKEN_IGNOREDone "one

Each correction is represented by a Correction value object lib/Values/Correction.php containing the correction type and the affected tokens.

Detailed correction behavior is documented in Error Handling and Corrections.

Sources: lib/Languages/Galach/README.md114-240 README.md24-29

Customization

Galach provides customization through the TokenExtractor abstraction. The Tokenizer and Parser classes are marked final, meaning customization happens at the lexical analysis phase rather than through inheritance.

Customization Capabilities

  1. Change special characters: Modify operators (AND, OR, +, -, etc.) and delimiters ((, ), ")
  2. Enable/disable features: Use only a subset of the full syntax (e.g., disable domains, tags, users)
  3. Implement custom term tokens: Create specialized token types with custom extraction logic
  4. Modify domain prefix syntax: Change how domain prefixes are recognized

TokenExtractor Extension Point

TokenExtractor Class Hierarchy


The abstract TokenExtractor class lib/Languages/Galach/TokenExtractor.php requires implementing:

  • getExpressionTypeMap(): Returns a map of regex patterns to Tokenizer::TOKEN_* constants
  • createTermToken(): Creates token instances from matched regex data
  • createGroupBeginToken(): (Optional) Customizes TOKEN_GROUP_BEGIN token creation

The Full and Text extractors serve as reference implementations demonstrating different feature sets.

See Customization and Extension for detailed implementation guidance.

Sources: lib/Languages/Galach/README.md241-427 lib/Languages/Galach/TokenExtractor.php

Example Usage

Here's how to use the Galach language processor:


Sources: lib/Languages/Galach/README.md51-112