Voozh

The pdb.agg function accepts an Elasticsearch-compatible JSON aggregate query string. It executes the aggregate using the columnar portion of the ParadeDB index, which can significantly accelerate performance compared to vanilla Postgres. For example, the following query counts the total number of results for a search query.

SELECT pdb.agg('{"value_count": {"field": "id"}}')
FROM mock_items
WHERE category === 'electronics';

Expected Response

 agg
----------------
 {"value": 5.0}
(1 row)

This query counts the number of results for every distinct group:

SELECT rating, pdb.agg('{"value_count": {"field": "id"}}')
FROM mock_items
WHERE category === 'electronics'
GROUP BY rating
ORDER BY rating
LIMIT 5;

Expected Response

 rating | agg
--------+----------------
 3 | {"value": 1.0}
 4 | {"value": 3.0}
 5 | {"value": 1.0}
(3 rows)

Multiple Aggregations

To compute multiple aggregations at once, simply include multiple pdb.agg functions in the target list:

SELECT
 pdb.agg('{"avg": {"field": "rating"}}') AS avg_rating,
 pdb.agg('{"value_count": {"field": "id"}}') AS count
FROM mock_items
WHERE category === 'electronics';

Expected Response

 avg_rating | count
----------------+----------------
 {"value": 4.0} | {"value": 5.0}
(1 row)

Performance Optimization

On every query, ParadeDB runs checks to ensure that deleted or updated-away rows are not factored into the result set. If your table is not frequently updated or you can tolerate an approximate result, the performance of aggregate queries can be improved by disabling these visibility checks. To do so, set the second argument of pdb.agg to false.

SELECT pdb.agg('{"value_count": {"field": "id"}}', false)
FROM mock_items
WHERE description ||| 'running shoes';

Disabling this check can improve query times by 2-4x in some cases (at the expense of correctness).

If a single query contains multiple pdb.agg calls, all of them must use the same visibility setting (either all true or all false).

JSON Fields

If metadata is a JSON field with key color, use metadata.color as the field name:

SELECT pdb.agg('{"terms": {"field": "metadata.color"}}')
FROM mock_items
WHERE id @@@ pdb.all();

If a text or JSON field is used inside pdb.agg, it must use the literal or literal normalized tokenizer.

Was this page helpful?

Filtering

Date Histogram

URL: https://docs.paradedb.com/documentation/aggregates/overview

⇱ Aggregate Syntax - ParadeDB

Documentation Index

Multiple Aggregations

Performance Optimization

JSON Fields

URL: https://docs.paradedb.com/documentation/aggregates/overview

⇱ Aggregate Syntax - ParadeDB

Documentation Index

​Multiple Aggregations

​Performance Optimization

​JSON Fields

Multiple Aggregations

Performance Optimization

JSON Fields