Last indexed: 13 February 2026 (50f4d4)

Overview

Purpose and Scope

This document provides an introduction to the Query Translator library, a search query translation system that converts user input query strings into backend-specific query formats through an Abstract Syntax Tree (AST) representation. This page explains the library's purpose, architecture overview, key features, and position as a foundational library for building user-level query languages.

For detailed information about the compiler-style architecture, see Architecture. For information about the reference query language implementation, see The Galach Language. For customization and extension mechanisms, see Customization and Extension.

Sources: README.md1-74 composer.json1-53

What is Query Translator?

Query Translator is a PHP library that functions as a search query translator with Abstract Syntax Tree representation. It takes a search string as user input and converts it into a format that search backends can understand.

The library provides:

Lexical analysis: Breaking query strings into tokens
Syntax analysis: Parsing tokens into an Abstract Syntax Tree (SyntaxTree)
Code generation: Converting the AST into backend-specific query formats

The library is designed as a foundational component for building user-level query languages, not as a complete solution for specific use cases. It provides the infrastructure and extension points that developers can use to implement their own query translation requirements.

Sources: README.md10-50 composer.json3-14

Core Concepts

Abstract Syntax Tree (AST) Representation

The library produces a SyntaxTree as an intermediate representation. This AST serves as the central pivot point that separates parsing from generation:

Input Query String → Tokens → SyntaxTree → Backend-Specific Output

The AST representation enables:

Multi-backend support: One parsed query can generate output for multiple backends without re-parsing
Query analysis: The tree structure can be analyzed and manipulated before generation
Extensibility: Custom backends can be added by implementing new generators

Robust Error Handling

The parser is completely resistant to errors. No user input is considered invalid. Instead:

The parser applies corrections to produce a valid SyntaxTree
Correction information is preserved in the tree
This enables graceful degradation and rich UI features (syntax highlighting, error feedback, suggestions)

Multi-Backend Support

The library supports multiple output formats out of the box:

Native Galach: Regenerates the query in Galach format
Solr ExtendedDisMax: Generates queries for Solr's Extended DisMax parser
Elasticsearch QueryString: Generates queries for Elasticsearch's Query String Query

Sources: README.md10-30 README.md37-50

High-Level Architecture

The Query Translator follows a classic compiler architecture with three distinct phases:

Diagram: Query Translation Pipeline with Component Names

This architecture provides clear separation of concerns:

Lexical Analysis handles character-level processing and tokenization
Syntax Analysis constructs the AST with error correction
Code Generation produces backend-specific output using the Visitor pattern

Sources: README.md10-16

Repository Structure

The codebase is organized to separate interfaces, implementations, and testing:

Diagram: Repository Structure with Key Directories

Directory	Purpose
`lib/`	Production code implementing the query translator
`lib/Values/`	Value objects representing Tokens and Nodes
`lib/Languages/Galach/`	Reference implementation of the Galach query language
`tests/`	PHPUnit test suite mirroring the lib/ structure
`vendor/`	Composer-managed dependencies

Sources: composer.json33-41 .php_cs.dist1-32

Key Components

TokenExtractor

An abstract class providing the primary extension point for customizing lexical rules. It defines regex patterns that the Tokenizer uses to recognize tokens.

Two implementations are provided:

Full TokenExtractor: Complete syntax support including tags, users, and domain prefixes
Text TokenExtractor: Simplified subset for basic text queries

Sources: README.md33-35

Tokenizer

Processes the input string using patterns from TokenExtractor to produce a TokenSequence. Handles multi-byte character offsets and unrecognized input via BAILOUT tokens.

Parser

Implements a shift-reduce parsing algorithm that constructs a SyntaxTree from a TokenSequence. Includes the correction system that ensures all input produces valid output.

Sources: README.md26-29

SyntaxTree

The Abstract Syntax Tree representation consisting of Node objects organized hierarchically. Contains:

Node types: Query, Term, Group, LogicalAnd, LogicalOr, LogicalNot, Mandatory, Prohibited
Correction information: Details about syntax corrections applied during parsing

Generators

Use the Visitor pattern to traverse the SyntaxTree and produce backend-specific output:

Native Generator: Produces Galach format with proper escaping
ExtendedDisMax Generator: Produces Solr ExtendedDisMax queries with field mapping
QueryString Generator: Produces Elasticsearch QueryString queries

Sources: README.md14-16

The Galach Language

Galach is the reference query language implementation provided by this library. The name "Galach" refers to the query syntax, which is based on what appears to be the unofficial standard for search query user input, similar to the Lucene Query Parser syntax.

Syntax Overview

Feature	Syntax	Example
Word term	`word`	`search`
Phrase term	`"phrase"`	`"hello world"`
Grouping	`(expression)`	`(one OR two)`
Mandatory	`+term`	`+required`
Prohibited	`-term`	`-unwanted`
Logical AND	`AND` or `&&`	`one AND two`
Logical OR	`OR` or `\|\|`	`one OR two`
Logical NOT	`NOT` or `!`	`NOT excluded`
Tag	`#tag`	`#important`
User	`@user`	`@john`
Domain prefix	`domain:term`	`title:search`

The parser handles operator precedence, implicit AND operations, and gracefully corrects syntax errors.

For complete syntax documentation, see Syntax and Features. For parser implementation details, see Parser. For error correction mechanisms, see Error Handling and Corrections.

Sources: README.md17-22

Use Cases

The library is designed as a foundation for implementing various query translation scenarios:

User-Level Query Language

Build a user-friendly query language on top of your search backend, providing features like tags, user mentions, and natural syntax while translating to the backend's native format.

Common Query Language

Implement a single query syntax that works across multiple search backends (Solr, Elasticsearch, databases), abstracting away backend-specific differences.

Query Language Control

Take control over query language options that are provided by search backends, customizing which features are available and how they work.

Enhanced Error Handling

Provide better error handling and user feedback than what search backends offer natively, using the correction system to guide users.

Query Analysis and Manipulation

Analyze and transform queries before sending them to the backend, implementing features like query suggestions, auto-correction, or security filtering.

Rich Input Interface

Implement rich input features such as syntax highlighting, error feedback, inline suggestions, and query building UI components using the AST and correction information.

Sources: README.md37-45

Component Relationships

Diagram: Component Relationships with File Paths

This diagram shows how the major components interact and where they are located in the codebase. The flow moves from top to bottom through the three main phases:

Tokenization: TokenExtractor and Tokenizer produce Token objects in a TokenSequence
Parsing: Parser consumes TokenSequence to produce SyntaxTree containing Node objects
Generation: Visitor implementations traverse the SyntaxTree to generate output

Sources: composer.json34-36

Getting Started

Installation

Add the library to your project using Composer:

Requirements

PHP 7.0 or higher (also compatible with PHP 8.x)

Basic Usage Flow

Choose a TokenExtractor implementation (Full or Text) based on your syntax needs
Create a Tokenizer with the chosen TokenExtractor
Tokenize the input string to produce a TokenSequence
Parse the TokenSequence to produce a SyntaxTree
Generate output using one of the provided generators or implement your own

For detailed usage examples and customization options, refer to the Galach documentation at lib/Languages/Galach/.

Customization

The library provides four primary extension points:

TokenExtractor: Customize lexical rules and token types
Visitor: Implement custom code generation for new backends
Field Mapping: Configure backend-specific field name translations
Term Clauses: Define custom term clause implementations

For detailed customization guidance, see Customization and Extension.

Sources: README.md52-62 composer.json25-26

Demo Application

A separate demo application is available at netgen/query-translator-demo that showcases the library's features with an interactive web interface.

Sources: README.md64-73

Development Infrastructure

The project maintains high code quality through:

Component	Purpose	Configuration File
PHPUnit	Unit testing framework	`phpunit.xml`
PHP-CS-Fixer	Code style enforcement (Symfony standards)	`.php_cs.dist`
Codecov	Code coverage reporting	GitHub Actions integration
Composer	Dependency management and PSR-4 autoloading	`composer.json`

The GitHub Actions CI/CD pipeline tests across PHP versions 7.0 through 8.5, ensuring broad compatibility.

For detailed information about testing and development workflows, see Testing and Development.

Sources: composer.json28-31 .php_cs.dist1-32

Refresh this wiki

URL: https://deepwiki.com/netgen/query-translator

⇱ netgen/query-translator | DeepWiki