VOOZH about

URL: https://deepwiki.com/netgen/query-translator

⇱ netgen/query-translator | DeepWiki


Loading...
Menu

Overview

Purpose and Scope

This document provides an introduction to the Query Translator library, a search query translation system that converts user input query strings into backend-specific query formats through an Abstract Syntax Tree (AST) representation. This page explains the library's purpose, architecture overview, key features, and position as a foundational library for building user-level query languages.

For detailed information about the compiler-style architecture, see Architecture. For information about the reference query language implementation, see The Galach Language. For customization and extension mechanisms, see Customization and Extension.

Sources: README.md1-74 composer.json1-53


What is Query Translator?

Query Translator is a PHP library that functions as a search query translator with Abstract Syntax Tree representation. It takes a search string as user input and converts it into a format that search backends can understand.

The library provides:

  • Lexical analysis: Breaking query strings into tokens
  • Syntax analysis: Parsing tokens into an Abstract Syntax Tree (SyntaxTree)
  • Code generation: Converting the AST into backend-specific query formats

The library is designed as a foundational component for building user-level query languages, not as a complete solution for specific use cases. It provides the infrastructure and extension points that developers can use to implement their own query translation requirements.

Sources: README.md10-50 composer.json3-14


Core Concepts

Abstract Syntax Tree (AST) Representation

The library produces a SyntaxTree as an intermediate representation. This AST serves as the central pivot point that separates parsing from generation:

Input Query String → Tokens → SyntaxTree → Backend-Specific Output

The AST representation enables:

  • Multi-backend support: One parsed query can generate output for multiple backends without re-parsing
  • Query analysis: The tree structure can be analyzed and manipulated before generation
  • Extensibility: Custom backends can be added by implementing new generators

Robust Error Handling

The parser is completely resistant to errors. No user input is considered invalid. Instead:

  • The parser applies corrections to produce a valid SyntaxTree
  • Correction information is preserved in the tree
  • This enables graceful degradation and rich UI features (syntax highlighting, error feedback, suggestions)

Multi-Backend Support

The library supports multiple output formats out of the box:

  • Native Galach: Regenerates the query in Galach format
  • Solr ExtendedDisMax: Generates queries for Solr's Extended DisMax parser
  • Elasticsearch QueryString: Generates queries for Elasticsearch's Query String Query

Sources: README.md10-30 README.md37-50


High-Level Architecture

The Query Translator follows a classic compiler architecture with three distinct phases:


Diagram: Query Translation Pipeline with Component Names

This architecture provides clear separation of concerns:

  • Lexical Analysis handles character-level processing and tokenization
  • Syntax Analysis constructs the AST with error correction
  • Code Generation produces backend-specific output using the Visitor pattern

Sources: README.md10-16


Repository Structure

The codebase is organized to separate interfaces, implementations, and testing:


Diagram: Repository Structure with Key Directories

DirectoryPurpose
lib/Production code implementing the query translator
lib/Values/Value objects representing Tokens and Nodes
lib/Languages/Galach/Reference implementation of the Galach query language
tests/PHPUnit test suite mirroring the lib/ structure
vendor/Composer-managed dependencies

Sources: composer.json33-41 .php_cs.dist1-32


Key Components

TokenExtractor

An abstract class providing the primary extension point for customizing lexical rules. It defines regex patterns that the Tokenizer uses to recognize tokens.

Two implementations are provided:

  • Full TokenExtractor: Complete syntax support including tags, users, and domain prefixes
  • Text TokenExtractor: Simplified subset for basic text queries

Sources: README.md33-35

Tokenizer

Processes the input string using patterns from TokenExtractor to produce a TokenSequence. Handles multi-byte character offsets and unrecognized input via BAILOUT tokens.

Parser

Implements a shift-reduce parsing algorithm that constructs a SyntaxTree from a TokenSequence. Includes the correction system that ensures all input produces valid output.

Sources: README.md26-29

SyntaxTree

The Abstract Syntax Tree representation consisting of Node objects organized hierarchically. Contains:

  • Node types: Query, Term, Group, LogicalAnd, LogicalOr, LogicalNot, Mandatory, Prohibited
  • Correction information: Details about syntax corrections applied during parsing

Generators

Use the Visitor pattern to traverse the SyntaxTree and produce backend-specific output:

  • Native Generator: Produces Galach format with proper escaping
  • ExtendedDisMax Generator: Produces Solr ExtendedDisMax queries with field mapping
  • QueryString Generator: Produces Elasticsearch QueryString queries

Sources: README.md14-16


The Galach Language

Galach is the reference query language implementation provided by this library. The name "Galach" refers to the query syntax, which is based on what appears to be the unofficial standard for search query user input, similar to the Lucene Query Parser syntax.

Syntax Overview

FeatureSyntaxExample
Word termwordsearch
Phrase term"phrase""hello world"
Grouping(expression)(one OR two)
Mandatory+term+required
Prohibited-term-unwanted
Logical ANDAND or &&one AND two
Logical OROR or ||one OR two
Logical NOTNOT or !NOT excluded
Tag#tag#important
User@user@john
Domain prefixdomain:termtitle:search

The parser handles operator precedence, implicit AND operations, and gracefully corrects syntax errors.

For complete syntax documentation, see Syntax and Features. For parser implementation details, see Parser. For error correction mechanisms, see Error Handling and Corrections.

Sources: README.md17-22


Use Cases

The library is designed as a foundation for implementing various query translation scenarios:

User-Level Query Language

Build a user-friendly query language on top of your search backend, providing features like tags, user mentions, and natural syntax while translating to the backend's native format.

Common Query Language

Implement a single query syntax that works across multiple search backends (Solr, Elasticsearch, databases), abstracting away backend-specific differences.

Query Language Control

Take control over query language options that are provided by search backends, customizing which features are available and how they work.

Enhanced Error Handling

Provide better error handling and user feedback than what search backends offer natively, using the correction system to guide users.

Query Analysis and Manipulation

Analyze and transform queries before sending them to the backend, implementing features like query suggestions, auto-correction, or security filtering.

Rich Input Interface

Implement rich input features such as syntax highlighting, error feedback, inline suggestions, and query building UI components using the AST and correction information.

Sources: README.md37-45


Component Relationships


Diagram: Component Relationships with File Paths

This diagram shows how the major components interact and where they are located in the codebase. The flow moves from top to bottom through the three main phases:

  1. Tokenization: TokenExtractor and Tokenizer produce Token objects in a TokenSequence
  2. Parsing: Parser consumes TokenSequence to produce SyntaxTree containing Node objects
  3. Generation: Visitor implementations traverse the SyntaxTree to generate output

Sources: composer.json34-36


Getting Started

Installation

Add the library to your project using Composer:


Requirements

  • PHP 7.0 or higher (also compatible with PHP 8.x)

Basic Usage Flow

  1. Choose a TokenExtractor implementation (Full or Text) based on your syntax needs
  2. Create a Tokenizer with the chosen TokenExtractor
  3. Tokenize the input string to produce a TokenSequence
  4. Parse the TokenSequence to produce a SyntaxTree
  5. Generate output using one of the provided generators or implement your own

For detailed usage examples and customization options, refer to the Galach documentation at lib/Languages/Galach/.

Customization

The library provides four primary extension points:

  • TokenExtractor: Customize lexical rules and token types
  • Visitor: Implement custom code generation for new backends
  • Field Mapping: Configure backend-specific field name translations
  • Term Clauses: Define custom term clause implementations

For detailed customization guidance, see Customization and Extension.

Sources: README.md52-62 composer.json25-26


Demo Application

A separate demo application is available at netgen/query-translator-demo that showcases the library's features with an interactive web interface.

Sources: README.md64-73


Development Infrastructure

The project maintains high code quality through:

ComponentPurposeConfiguration File
PHPUnitUnit testing frameworkphpunit.xml
PHP-CS-FixerCode style enforcement (Symfony standards).php_cs.dist
CodecovCode coverage reportingGitHub Actions integration
ComposerDependency management and PSR-4 autoloadingcomposer.json

The GitHub Actions CI/CD pipeline tests across PHP versions 7.0 through 8.5, ensuring broad compatibility.

For detailed information about testing and development workflows, see Testing and Development.

Sources: composer.json28-31 .php_cs.dist1-32