![]() |
VOOZH | about |
Elasticsearch is a powerful tool not just for search but also for performing complex data analytics. Metric aggregations are a crucial aspect of this capability, allowing users to compute metrics like averages, sums, and more on numeric fields within their data.
This guide will delve into metric aggregations in Elasticsearch, explaining what they are, how they work, and providing detailed examples to illustrate their use.
Metric aggregations in Elasticsearch calculate metrics based on the values of numeric fields in your documents. Unlike bucket aggregations, which group documents into buckets, metric aggregations work directly on the numeric values and return statistical metrics. They are essential for summarizing large datasets and deriving insights such as averages, minimums, maximums, sums, and more.
Elasticsearch offers several types of metric aggregations, each serving a different purpose:
To make the explanations concrete, let's assume we have an Elasticsearch index called products with documents that look like this:
{
"product_id": 1,
"name": "Laptop",
"category": "electronics",
"price": 1000,
"quantity_sold": 5,
"rating": 4.5
}
The average aggregation computes the average value of a numeric field. Let's calculate the average price of products in our index.
Query:
GET /products/_search
{
"size": 0,
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
Output:
{
"aggregations": {
"avg_price": {
"value": 550.0
}
}
}
In this example, the average price of products is $550.0.
The sum aggregation calculates the total sum of a numeric field. Let's calculate the total quantity sold for all products.
Query:
GET /products/_search
{
"size": 0,
"aggs": {
"total_quantity_sold": {
"sum": {
"field": "quantity_sold"
}
}
}
}
Output:
{
"aggregations": {
"total_quantity_sold": {
"value": 25
}
}
}
In this example, the total quantity sold for all products is 25.
The min aggregation finds the minimum value of a numeric field. Let's find the minimum price of products.
Query:
GET /products/_search
{
"size": 0,
"aggs": {
"min_price": {
"min": {
"field": "price"
}
}
}
}
Output
{
"aggregations": {
"min_price": {
"value": 100.0
}
}
}
In this example, the minimum price of products is $100.0.
The max aggregation finds the maximum value of a numeric field. Let's find the maximum price of products.
Query:
GET /products/_search
{
"size": 0,
"aggs": {
"max_price": {
"max": {
"field": "price"
}
}
}
}
Output
{
"aggregations": {
"max_price": {
"value": 1000.0
}
}
}
In this example, the maximum price of products is $1000.0.
The stats aggregation provides a summary of statistics, including count, sum, min, max, and average. Let's get the stats for the price field.
Query:
GET /products/_search
{
"size": 0,
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
Output
{
"aggregations": {
"price_stats": {
"count": 10,
"min": 100.0,
"max": 1000.0,
"avg": 550.0,
"sum": 5500.0
}
}
}
In this example, we get a summary of statistics for the price field.
The extended stats aggregation provides additional statistics such as variance, standard deviation, and sum of squares. Let's get the extended stats for the price field.
Query
GET /products/_search
{
"size": 0,
"aggs": {
"extended_price_stats": {
"extended_stats": {
"field": "price"
}
}
}
}
Output
{
"aggregations": {
"extended_price_stats": {
"count": 10,
"min": 100.0,
"max": 1000.0,
"avg": 550.0,
"sum": 5500.0,
"sum_of_squares": 3850000.0,
"variance": 202500.0,
"std_deviation": 450.0
}
}
}
In this example, we get extended statistics for the price field, including variance and standard deviation.
The value count aggregation counts the number of values in a field. Let's count the number of products.
Query
GET /products/_search
{
"size": 0,
"aggs": {
"product_count": {
"value_count": {
"field": "product_id"
}
}
}
}
Output
{
"aggregations": {
"product_count": {
"value": 10
}
}
}
In this example, the number of products is 10.
The percentiles aggregation calculates the percentiles over numeric values. Let's calculate the 25th, 50th, and 75th percentiles for the price field.
Query
GET /products/_search
{
"size": 0,
"aggs": {
"price_percentiles": {
"percentiles": {
"field": "price",
"percents": [25, 50, 75]
}
}
}
}
Output
{
"aggregations": {
"price_percentiles": {
"values": {
"25.0": 275.0,
"50.0": 550.0,
"75.0": 825.0
}
}
}
}
In this example, we get the 25th, 50th, and 75th percentiles for the price field.
The percentile rank aggregation computes the percentile rank of specific values. Let's calculate the percentile ranks for prices 300 and 600.
Query
GET /products/_search
{
"size": 0,
"aggs": {
"price_percentile_ranks": {
"percentile_ranks": {
"field": "price",
"values": [300, 600]
}
}
}
}
Output
{
"aggregations": {
"price_percentile_ranks": {
"values": {
"300.0": 30.0,
"600.0": 60.0
}
}
}
}
In this example, prices 300 and 600 fall into the 30th and 60th percentiles, respectively.
The cardinality aggregation estimates the count of distinct values. Let's estimate the number of distinct categories.
Query
GET /products/_search
{
"size": 0,
"aggs": {
"distinct_categories": {
"cardinality": {
"field": "category.keyword"
}
}
}
}
Output
{
"aggregations": {
"distinct_categories": {
"value": 3
}
}
}
In this example, there are 3 distinct categories.
The geo-bounds aggregation computes the bounding box containing all geo-points in the field. Let's calculate the geo-bounds for a field containing geo points.
Query
GET /locations/_search
{
"size": 0,
"aggs": {
"geo_bounds": {
"geo_bounds": {
"field": "location"
}
}
}
}
Output
{
"aggregations": {
"geo_bounds": {
"bounds": {
"top_left": {
"lat": 40.73,
"lon": -74.1
},
"bottom_right": {
"lat": 40.01,
"lon": -71.12
}
}
}
}
}
In this example, the geo-bounds aggregation calculates the bounding box for the geo-points.
Metric aggregations in Elasticsearch are a powerful way to perform statistical analysis on your data. They allow you to calculate averages, sums, minimums, maximums, and more, providing valuable insights into your data. By understanding and utilizing these aggregations, you can unlock the full potential of Elasticsearch for your data analytics needs. Whether you're summarizing sales data, analyzing user behavior, or exploring any other type of numeric data, metric aggregations are an essential tool in your Elasticsearch toolkit.