Available solutions

This template is for Zabbix version: 7.4

Also available for: 7.2 7.0 6.4 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/7.4

Etcd by HTTP

Overview

This template is designed to monitor etcd by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Etcd by HTTP — collects metrics by help of the HTTP agent from /metrics endpoint.

Refer to the vendor documentation.

For the users of etcd version <= 3.4 !

In etcd v3.5 some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade your etcd instance, or use older Etcd by HTTP template version.

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

Etcd 3.5.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Make sure that etcd allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics.
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics.
Add the template to the etcd node. Set the hostname or IP address of the etcd host in the {$ETCD.HOST} macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding --listen-metrics-urls flag.

For more details, see the etcd documentation.

Additional points to consider:

If you have specified a non-standard port for etcd, don't forget to change macros: {$ETCD.SCHEME} and {$ETCD.PORT}.
You can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template to use on a host level if necessary.
To test availability, run: zabbix_get -s etcd-host -k etcd.health.
See the macros section, as it will set the trigger values.

Macros used

Name	Description	Default
{$ETCD.HOST}	The hostname or IP address of the `etcd` API endpoint.	`<SET ETCD HOST>`
{$ETCD.PORT}	The port of the `etcd` API endpoint.	`2379`
{$ETCD.SCHEME}	The request scheme which may be `http` or `https`.	`http`
{$ETCD.USER}
{$ETCD.PASSWORD}
{$ETCD.LEADER.CHANGES.MAX.WARN}	The maximum number of leader changes.	`5`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	The maximum number of proposal failures.	`2`
{$ETCD.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	The maximum number of proposals in queue.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors.	`90`
{$ETCD.GRPC_CODE.MATCHES}	The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPC_CODE.NOT_MATCHES}	The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC.ERRORS.MAX.WARN}	The maximum number of gRPC request failures.	`1`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	The filter of discoverable gRPC codes, which will create triggers.	`Aborted\|Unavailable`

Items

Name	Description	Type	Key and additional info
Service's TCP port state	Simple check	net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Get node metrics	HTTP agent	etcd.get_metrics
Node health	HTTP agent	etcd.health Preprocessing JSON Path: `$.health` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Server is a leader	It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise.	Dependent item	etcd.is.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_is_leader)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Server has a leader	It defines - whether or not a leader exists: 1 - it exists; 0 - it does not.	Dependent item	etcd.has.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_has_leader)` Discard unchanged with heartbeat: `10m`
Leader changes	The number of leader changes the member has seen since its start.	Dependent item	etcd.leader.changes Preprocessing Prometheus pattern: `VALUE(etcd_server_leader_changes_seen_total)`
Proposals committed per second	The number of consensus proposals committed.	Dependent item	etcd.proposals.committed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_committed_total)` Change per second
Proposals applied per second	The number of consensus proposals applied.	Dependent item	etcd.proposals.applied.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_applied_total)` Change per second
Proposals failed per second	The number of failed proposals seen.	Dependent item	etcd.proposals.failed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_failed_total)` Change per second
Proposals pending	The current number of pending proposals to commit.	Dependent item	etcd.proposals.pending Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_pending)`
Reads per second	The number of read actions by `get/getRecursive`, local to this member.	Dependent item	etcd.reads.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_reads_total` JavaScript: `The text is too long. Please see the template.` Change per second
Writes per second	The number of writes (e.g., `set/compareAndDelete`) seen by this member.	Dependent item	etcd.writes.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_writes_total` JavaScript: `The text is too long. Please see the template.` Change per second
Client gRPC received bytes per second	The number of bytes received from gRPC clients per second.	Dependent item	etcd.network.grpc.received.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_received_bytes_total)` Change per second
Client gRPC sent bytes per second	The number of bytes sent from gRPC clients per second.	Dependent item	etcd.network.grpc.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_sent_bytes_total)` Change per second
HTTP requests received	The number of requests received into the system (successfully parsed and `authd`).	Dependent item	etcd.http.requests.rate Preprocessing Prometheus to JSON: `etcd_http_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
HTTP 5XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `5XX`.	Dependent item	etcd.http.requests.5xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"5.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
HTTP 4XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `4XX`.	Dependent item	etcd.http.requests.4xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"4.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs received per second	The number of RPC stream messages received on the server.	Dependent item	etcd.grpc.received.rate Preprocessing Prometheus to JSON: `grpc_server_msg_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs sent per second	The number of gRPC stream messages sent by the server.	Dependent item	etcd.grpc.sent.rate Preprocessing Prometheus to JSON: `grpc_server_msg_sent_total` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs started per second	The number of RPCs started on the server.	Dependent item	etcd.grpc.started.rate Preprocessing Prometheus to JSON: `grpc_server_started_total` JavaScript: `The text is too long. Please see the template.` Change per second
Get version	HTTP agent	etcd.get_version
Server version	The version of the `etcd server`.	Dependent item	etcd.server.version Preprocessing JSON Path: `$.etcdserver` Discard unchanged with heartbeat: `1d`
Cluster version	The version of the `etcd cluster`.	Dependent item	etcd.cluster.version Preprocessing JSON Path: `$.etcdcluster` Discard unchanged with heartbeat: `1d`
DB size	The total size of the underlying database.	Dependent item	etcd.db.size Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_db_total_size_in_bytes)`
Keys compacted per second	The number of DB keys compacted per second.	Dependent item	etcd.keys.compacted.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_db_compaction_keys_total)` ⛔️Custom on fail: Set value to: `0` Change per second
Keys expired per second	The number of expired keys per second.	Dependent item	etcd.keys.expired.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_store_expires_total)` Change per second
Keys total	The total number of keys.	Dependent item	etcd.keys.total Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_keys_total)`
Uptime	`Etcd` server uptime.	Dependent item	etcd.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` JavaScript: `The text is too long. Please see the template.`
Virtual memory	The size of virtual memory expressed in bytes.	Dependent item	etcd.virtual.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Resident memory	The size of resident memory expressed in bytes.	Dependent item	etcd.res.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
CPU	The total user and system CPU time spent in seconds.	Dependent item	etcd.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second
Open file descriptors	The number of open file descriptors.	Dependent item	etcd.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Maximum open file descriptors	The Maximum number of open file descriptors.	Dependent item	etcd.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Deletes per second	The number of deletes seen by this member per second.	Dependent item	etcd.delete.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_delete_total)` Change per second
PUT per second	The number of puts seen by this member per second.	Dependent item	etcd.put.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_put_total)` Change per second
Range per second	The number of ranges seen by this member per second.	Dependent item	etcd.range.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Transaction per second	The number of transactions seen by this member per second.	Dependent item	etcd.txn.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Pending events	The total number of pending events to be sent.	Dependent item	etcd.events.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_pending_events_total)`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Service is unavailable	`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0`	Average	Manual close: Yes
Etcd: Node healthcheck failed	See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check.	`last(/Etcd by HTTP/etcd.health)=0`	Average	Depends on: Etcd: Service is unavailable
Etcd: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`	Warning	Manual close: Yes Depends on: Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`	Average
Etcd: Instance has seen too many leader changes	Rapid leadership changes impact the performance of `etcd` significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the `etcd cluster`.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`	Warning
Etcd: Too many proposal failures	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`	Warning
Etcd: Too many proposals are queued to commit	Rising pending proposals suggests there is a high client load, or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`	Warning
Etcd: Too many HTTP requests failures	Too many requests failed on `etcd` instance with the `5xx HTTP code`.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`	Warning
Etcd: Server version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`	Info	Manual close: Yes
Etcd: Cluster version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`	Info	Manual close: Yes
Etcd: Host has been restarted	Uptime is less than 10 minutes.	`last(/Etcd by HTTP/etcd.uptime)<10m`	Info	Manual close: Yes
Etcd: Current number of open files is too high	Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, `etcd` may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`	Warning

LLD rule gRPC codes discovery

Name Description Type Key and additional info

gRPC codes discovery

Dependent item

Name	Description	Type	Key and additional info
gRPC codes discovery	Dependent item	etcd.grpc_code.discovery Preprocessing Prometheus to JSON: `grpc_server_handled_total` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

etcd.grpc_code.discovery

Preprocessing

Prometheus to JSON: grpc_server_handled_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for gRPC codes discovery

Name Description Type Key and additional info

RPCs completed with code {#GRPC.CODE}

Name	Description	Type	Key and additional info
RPCs completed with code {#GRPC.CODE}	The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.	Dependent item	etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing Prometheus to JSON: `grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}` JavaScript: `The text is too long. Please see the template.` Change per second

The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.

Dependent item

etcd.grpc.handled.rate[{#GRPC.CODE}]

Preprocessing

Prometheus to JSON: grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}
JavaScript: The text is too long. Please see the template.
Change per second

Trigger prototypes for gRPC codes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE}	`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`	Warning

LLD rule Peers discovery

Name Description Type Key and additional info

Peers discovery

Dependent item

Name	Description	Type	Key and additional info
Peers discovery	Dependent item	etcd.peer.discovery Preprocessing Prometheus to JSON: `etcd_network_peer_sent_bytes_total`

etcd.peer.discovery

Preprocessing

Prometheus to JSON: etcd_network_peer_sent_bytes_total

Item prototypes for Peers discovery

Name	Description	Type	Key and additional info
Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `VALUE(etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"})` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Send failures	The number of sent failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Receive failures	The number of received failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.2

Also available for: 7.4 7.0 6.4 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/7.2

Etcd by HTTP

Overview

This template is designed to monitor etcd by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Etcd by HTTP — collects metrics by help of the HTTP agent from /metrics endpoint.

Refer to the vendor documentation.

For the users of etcd version <= 3.4 !

In etcd v3.5 some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade your etcd instance, or use older Etcd by HTTP template version.

Requirements

Zabbix version: 7.2 and higher.

Tested versions

This template has been tested on:

Etcd 3.5.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Make sure that etcd allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics.
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics.
Add the template to the etcd node. Set the hostname or IP address of the etcd host in the {$ETCD.HOST} macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding --listen-metrics-urls flag.

For more details, see the etcd documentation.

Additional points to consider:

If you have specified a non-standard port for etcd, don't forget to change macros: {$ETCD.SCHEME} and {$ETCD.PORT}.
You can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template to use on a host level if necessary.
To test availability, run: zabbix_get -s etcd-host -k etcd.health.
See the macros section, as it will set the trigger values.

Macros used

Name	Description	Default
{$ETCD.HOST}	The hostname or IP address of the `etcd` API endpoint.	`<SET ETCD HOST>`
{$ETCD.PORT}	The port of the `etcd` API endpoint.	`2379`
{$ETCD.SCHEME}	The request scheme which may be `http` or `https`.	`http`
{$ETCD.USER}
{$ETCD.PASSWORD}
{$ETCD.LEADER.CHANGES.MAX.WARN}	The maximum number of leader changes.	`5`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	The maximum number of proposal failures.	`2`
{$ETCD.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	The maximum number of proposals in queue.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors.	`90`
{$ETCD.GRPC_CODE.MATCHES}	The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPC_CODE.NOT_MATCHES}	The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC.ERRORS.MAX.WARN}	The maximum number of gRPC request failures.	`1`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	The filter of discoverable gRPC codes, which will create triggers.	`Aborted\|Unavailable`

Items

Name	Description	Type	Key and additional info
Service's TCP port state	Simple check	net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Get node metrics	HTTP agent	etcd.get_metrics
Node health	HTTP agent	etcd.health Preprocessing JSON Path: `$.health` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Server is a leader	It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise.	Dependent item	etcd.is.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_is_leader)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Server has a leader	It defines - whether or not a leader exists: 1 - it exists; 0 - it does not.	Dependent item	etcd.has.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_has_leader)` Discard unchanged with heartbeat: `10m`
Leader changes	The number of leader changes the member has seen since its start.	Dependent item	etcd.leader.changes Preprocessing Prometheus pattern: `VALUE(etcd_server_leader_changes_seen_total)`
Proposals committed per second	The number of consensus proposals committed.	Dependent item	etcd.proposals.committed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_committed_total)` Change per second
Proposals applied per second	The number of consensus proposals applied.	Dependent item	etcd.proposals.applied.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_applied_total)` Change per second
Proposals failed per second	The number of failed proposals seen.	Dependent item	etcd.proposals.failed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_failed_total)` Change per second
Proposals pending	The current number of pending proposals to commit.	Dependent item	etcd.proposals.pending Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_pending)`
Reads per second	The number of read actions by `get/getRecursive`, local to this member.	Dependent item	etcd.reads.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_reads_total` JavaScript: `The text is too long. Please see the template.` Change per second
Writes per second	The number of writes (e.g., `set/compareAndDelete`) seen by this member.	Dependent item	etcd.writes.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_writes_total` JavaScript: `The text is too long. Please see the template.` Change per second
Client gRPC received bytes per second	The number of bytes received from gRPC clients per second.	Dependent item	etcd.network.grpc.received.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_received_bytes_total)` Change per second
Client gRPC sent bytes per second	The number of bytes sent from gRPC clients per second.	Dependent item	etcd.network.grpc.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_sent_bytes_total)` Change per second
HTTP requests received	The number of requests received into the system (successfully parsed and `authd`).	Dependent item	etcd.http.requests.rate Preprocessing Prometheus to JSON: `etcd_http_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
HTTP 5XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `5XX`.	Dependent item	etcd.http.requests.5xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"5.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
HTTP 4XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `4XX`.	Dependent item	etcd.http.requests.4xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"4.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs received per second	The number of RPC stream messages received on the server.	Dependent item	etcd.grpc.received.rate Preprocessing Prometheus to JSON: `grpc_server_msg_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs sent per second	The number of gRPC stream messages sent by the server.	Dependent item	etcd.grpc.sent.rate Preprocessing Prometheus to JSON: `grpc_server_msg_sent_total` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs started per second	The number of RPCs started on the server.	Dependent item	etcd.grpc.started.rate Preprocessing Prometheus to JSON: `grpc_server_started_total` JavaScript: `The text is too long. Please see the template.` Change per second
Get version	HTTP agent	etcd.get_version
Server version	The version of the `etcd server`.	Dependent item	etcd.server.version Preprocessing JSON Path: `$.etcdserver` Discard unchanged with heartbeat: `1d`
Cluster version	The version of the `etcd cluster`.	Dependent item	etcd.cluster.version Preprocessing JSON Path: `$.etcdcluster` Discard unchanged with heartbeat: `1d`
DB size	The total size of the underlying database.	Dependent item	etcd.db.size Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_db_total_size_in_bytes)`
Keys compacted per second	The number of DB keys compacted per second.	Dependent item	etcd.keys.compacted.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_db_compaction_keys_total)` ⛔️Custom on fail: Set value to: `0` Change per second
Keys expired per second	The number of expired keys per second.	Dependent item	etcd.keys.expired.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_store_expires_total)` Change per second
Keys total	The total number of keys.	Dependent item	etcd.keys.total Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_keys_total)`
Uptime	`Etcd` server uptime.	Dependent item	etcd.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` JavaScript: `The text is too long. Please see the template.`
Virtual memory	The size of virtual memory expressed in bytes.	Dependent item	etcd.virtual.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Resident memory	The size of resident memory expressed in bytes.	Dependent item	etcd.res.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
CPU	The total user and system CPU time spent in seconds.	Dependent item	etcd.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second
Open file descriptors	The number of open file descriptors.	Dependent item	etcd.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Maximum open file descriptors	The Maximum number of open file descriptors.	Dependent item	etcd.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Deletes per second	The number of deletes seen by this member per second.	Dependent item	etcd.delete.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_delete_total)` Change per second
PUT per second	The number of puts seen by this member per second.	Dependent item	etcd.put.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_put_total)` Change per second
Range per second	The number of ranges seen by this member per second.	Dependent item	etcd.range.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Transaction per second	The number of transactions seen by this member per second.	Dependent item	etcd.txn.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Pending events	The total number of pending events to be sent.	Dependent item	etcd.events.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_pending_events_total)`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Service is unavailable	`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0`	Average	Manual close: Yes
Etcd: Node healthcheck failed	See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check.	`last(/Etcd by HTTP/etcd.health)=0`	Average	Depends on: Etcd: Service is unavailable
Etcd: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`	Warning	Manual close: Yes Depends on: Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`	Average
Etcd: Instance has seen too many leader changes	Rapid leadership changes impact the performance of `etcd` significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the `etcd cluster`.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`	Warning
Etcd: Too many proposal failures	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`	Warning
Etcd: Too many proposals are queued to commit	Rising pending proposals suggests there is a high client load, or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`	Warning
Etcd: Too many HTTP requests failures	Too many requests failed on `etcd` instance with the `5xx HTTP code`.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`	Warning
Etcd: Server version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`	Info	Manual close: Yes
Etcd: Cluster version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`	Info	Manual close: Yes
Etcd: Host has been restarted	Uptime is less than 10 minutes.	`last(/Etcd by HTTP/etcd.uptime)<10m`	Info	Manual close: Yes
Etcd: Current number of open files is too high	Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, `etcd` may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`	Warning

LLD rule gRPC codes discovery

Name Description Type Key and additional info

gRPC codes discovery

Dependent item

Name	Description	Type	Key and additional info
gRPC codes discovery	Dependent item	etcd.grpc_code.discovery Preprocessing Prometheus to JSON: `grpc_server_handled_total` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

etcd.grpc_code.discovery

Preprocessing

Prometheus to JSON: grpc_server_handled_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for gRPC codes discovery

Name Description Type Key and additional info

RPCs completed with code {#GRPC.CODE}

Name	Description	Type	Key and additional info
RPCs completed with code {#GRPC.CODE}	The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.	Dependent item	etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing Prometheus to JSON: `grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}` JavaScript: `The text is too long. Please see the template.` Change per second

The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.

Dependent item

etcd.grpc.handled.rate[{#GRPC.CODE}]

Preprocessing

Prometheus to JSON: grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}
JavaScript: The text is too long. Please see the template.
Change per second

Trigger prototypes for gRPC codes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE}	`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`	Warning

LLD rule Peers discovery

Name Description Type Key and additional info

Peers discovery

Dependent item

Name	Description	Type	Key and additional info
Peers discovery	Dependent item	etcd.peer.discovery Preprocessing Prometheus to JSON: `etcd_network_peer_sent_bytes_total`

etcd.peer.discovery

Preprocessing

Prometheus to JSON: etcd_network_peer_sent_bytes_total

Item prototypes for Peers discovery

Name	Description	Type	Key and additional info
Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `VALUE(etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"})` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Send failures	The number of sent failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Receive failures	The number of received failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.0

Also available for: 7.4 7.2 6.4 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/7.0

Etcd by HTTP

Overview

This template is designed to monitor etcd by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Etcd by HTTP — collects metrics by help of the HTTP agent from /metrics endpoint.

Refer to the vendor documentation.

For the users of etcd version <= 3.4 !

In etcd v3.5 some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade your etcd instance, or use older Etcd by HTTP template version.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Etcd 3.5.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Make sure that etcd allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics.
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics.
Add the template to the etcd node. Set the hostname or IP address of the etcd host in the {$ETCD.HOST} macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding --listen-metrics-urls flag.

For more details, see the etcd documentation.

Additional points to consider:

If you have specified a non-standard port for etcd, don't forget to change macros: {$ETCD.SCHEME} and {$ETCD.PORT}.
You can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template to use on a host level if necessary.
To test availability, run: zabbix_get -s etcd-host -k etcd.health.
See the macros section, as it will set the trigger values.

Macros used

Name	Description	Default
{$ETCD.HOST}	The hostname or IP address of the `etcd` API endpoint.	`<SET ETCD HOST>`
{$ETCD.PORT}	The port of the `etcd` API endpoint.	`2379`
{$ETCD.SCHEME}	The request scheme which may be `http` or `https`.	`http`
{$ETCD.USER}
{$ETCD.PASSWORD}
{$ETCD.LEADER.CHANGES.MAX.WARN}	The maximum number of leader changes.	`5`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	The maximum number of proposal failures.	`2`
{$ETCD.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	The maximum number of proposals in queue.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors.	`90`
{$ETCD.GRPC_CODE.MATCHES}	The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPC_CODE.NOT_MATCHES}	The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC.ERRORS.MAX.WARN}	The maximum number of gRPC request failures.	`1`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	The filter of discoverable gRPC codes, which will create triggers.	`Aborted\|Unavailable`

Items

Name	Description	Type	Key and additional info
Service's TCP port state	Simple check	net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Get node metrics	HTTP agent	etcd.get_metrics
Node health	HTTP agent	etcd.health Preprocessing JSON Path: `$.health` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Server is a leader	It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise.	Dependent item	etcd.is.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_is_leader)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Server has a leader	It defines - whether or not a leader exists: 1 - it exists; 0 - it does not.	Dependent item	etcd.has.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_has_leader)` Discard unchanged with heartbeat: `10m`
Leader changes	The number of leader changes the member has seen since its start.	Dependent item	etcd.leader.changes Preprocessing Prometheus pattern: `VALUE(etcd_server_leader_changes_seen_total)`
Proposals committed per second	The number of consensus proposals committed.	Dependent item	etcd.proposals.committed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_committed_total)` Change per second
Proposals applied per second	The number of consensus proposals applied.	Dependent item	etcd.proposals.applied.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_applied_total)` Change per second
Proposals failed per second	The number of failed proposals seen.	Dependent item	etcd.proposals.failed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_failed_total)` Change per second
Proposals pending	The current number of pending proposals to commit.	Dependent item	etcd.proposals.pending Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_pending)`
Reads per second	The number of read actions by `get/getRecursive`, local to this member.	Dependent item	etcd.reads.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_reads_total` JavaScript: `The text is too long. Please see the template.` Change per second
Writes per second	The number of writes (e.g., `set/compareAndDelete`) seen by this member.	Dependent item	etcd.writes.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_writes_total` JavaScript: `The text is too long. Please see the template.` Change per second
Client gRPC received bytes per second	The number of bytes received from gRPC clients per second.	Dependent item	etcd.network.grpc.received.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_received_bytes_total)` Change per second
Client gRPC sent bytes per second	The number of bytes sent from gRPC clients per second.	Dependent item	etcd.network.grpc.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_sent_bytes_total)` Change per second
HTTP requests received	The number of requests received into the system (successfully parsed and `authd`).	Dependent item	etcd.http.requests.rate Preprocessing Prometheus to JSON: `etcd_http_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
HTTP 5XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `5XX`.	Dependent item	etcd.http.requests.5xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"5.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
HTTP 4XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `4XX`.	Dependent item	etcd.http.requests.4xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"4.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs received per second	The number of RPC stream messages received on the server.	Dependent item	etcd.grpc.received.rate Preprocessing Prometheus to JSON: `grpc_server_msg_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs sent per second	The number of gRPC stream messages sent by the server.	Dependent item	etcd.grpc.sent.rate Preprocessing Prometheus to JSON: `grpc_server_msg_sent_total` JavaScript: `The text is too long. Please see the template.` Change per second
RPCs started per second	The number of RPCs started on the server.	Dependent item	etcd.grpc.started.rate Preprocessing Prometheus to JSON: `grpc_server_started_total` JavaScript: `The text is too long. Please see the template.` Change per second
Get version	HTTP agent	etcd.get_version
Server version	The version of the `etcd server`.	Dependent item	etcd.server.version Preprocessing JSON Path: `$.etcdserver` Discard unchanged with heartbeat: `1d`
Cluster version	The version of the `etcd cluster`.	Dependent item	etcd.cluster.version Preprocessing JSON Path: `$.etcdcluster` Discard unchanged with heartbeat: `1d`
DB size	The total size of the underlying database.	Dependent item	etcd.db.size Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_db_total_size_in_bytes)`
Keys compacted per second	The number of DB keys compacted per second.	Dependent item	etcd.keys.compacted.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_db_compaction_keys_total)` ⛔️Custom on fail: Set value to: `0` Change per second
Keys expired per second	The number of expired keys per second.	Dependent item	etcd.keys.expired.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_store_expires_total)` Change per second
Keys total	The total number of keys.	Dependent item	etcd.keys.total Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_keys_total)`
Uptime	`Etcd` server uptime.	Dependent item	etcd.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` JavaScript: `The text is too long. Please see the template.`
Virtual memory	The size of virtual memory expressed in bytes.	Dependent item	etcd.virtual.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Resident memory	The size of resident memory expressed in bytes.	Dependent item	etcd.res.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
CPU	The total user and system CPU time spent in seconds.	Dependent item	etcd.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second
Open file descriptors	The number of open file descriptors.	Dependent item	etcd.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Maximum open file descriptors	The Maximum number of open file descriptors.	Dependent item	etcd.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Deletes per second	The number of deletes seen by this member per second.	Dependent item	etcd.delete.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_delete_total)` Change per second
PUT per second	The number of puts seen by this member per second.	Dependent item	etcd.put.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_put_total)` Change per second
Range per second	The number of ranges seen by this member per second.	Dependent item	etcd.range.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Transaction per second	The number of transactions seen by this member per second.	Dependent item	etcd.txn.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Pending events	The total number of pending events to be sent.	Dependent item	etcd.events.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_pending_events_total)`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Service is unavailable	`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0`	Average	Manual close: Yes
Etcd: Node healthcheck failed	See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check.	`last(/Etcd by HTTP/etcd.health)=0`	Average	Depends on: Etcd: Service is unavailable
Etcd: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`	Warning	Manual close: Yes Depends on: Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`	Average
Etcd: Instance has seen too many leader changes	Rapid leadership changes impact the performance of `etcd` significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the `etcd cluster`.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`	Warning
Etcd: Too many proposal failures	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`	Warning
Etcd: Too many proposals are queued to commit	Rising pending proposals suggests there is a high client load, or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`	Warning
Etcd: Too many HTTP requests failures	Too many requests failed on `etcd` instance with the `5xx HTTP code`.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`	Warning
Etcd: Server version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`	Info	Manual close: Yes
Etcd: Cluster version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`	Info	Manual close: Yes
Etcd: Host has been restarted	Uptime is less than 10 minutes.	`last(/Etcd by HTTP/etcd.uptime)<10m`	Info	Manual close: Yes
Etcd: Current number of open files is too high	Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, `etcd` may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`	Warning

LLD rule gRPC codes discovery

Name Description Type Key and additional info

gRPC codes discovery

Dependent item

Name	Description	Type	Key and additional info
gRPC codes discovery	Dependent item	etcd.grpc_code.discovery Preprocessing Prometheus to JSON: `grpc_server_handled_total` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

etcd.grpc_code.discovery

Preprocessing

Prometheus to JSON: grpc_server_handled_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for gRPC codes discovery

Name Description Type Key and additional info

RPCs completed with code {#GRPC.CODE}

Name	Description	Type	Key and additional info
RPCs completed with code {#GRPC.CODE}	The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.	Dependent item	etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing Prometheus to JSON: `grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}` JavaScript: `The text is too long. Please see the template.` Change per second

The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.

Dependent item

etcd.grpc.handled.rate[{#GRPC.CODE}]

Preprocessing

Prometheus to JSON: grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}
JavaScript: The text is too long. Please see the template.
Change per second

Trigger prototypes for gRPC codes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE}	`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`	Warning

LLD rule Peers discovery

Name Description Type Key and additional info

Peers discovery

Dependent item

Name	Description	Type	Key and additional info
Peers discovery	Dependent item	etcd.peer.discovery Preprocessing Prometheus to JSON: `etcd_network_peer_sent_bytes_total`

etcd.peer.discovery

Preprocessing

Prometheus to JSON: etcd_network_peer_sent_bytes_total

Item prototypes for Peers discovery

Name	Description	Type	Key and additional info
Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `VALUE(etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"})` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Send failures	The number of sent failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd peer {#ETCD.PEER}: Receive failures	The number of received failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.4

Also available for: 7.4 7.2 7.0 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/6.4

Etcd by HTTP

Overview

This template is designed to monitor etcd by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Etcd by HTTP — collects metrics by help of the HTTP agent from /metrics endpoint.

Refer to the vendor documentation.

For the users of etcd version <= 3.4 !

In etcd v3.5 some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade your etcd instance, or use older Etcd by HTTP template version.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Etcd 3.5.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Make sure that etcd allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics.
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics.
Add the template to the etcd node. Set the hostname or IP address of the etcd host in the {$ETCD.HOST} macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding --listen-metrics-urls flag.

For more details, see the etcd documentation.

Additional points to consider:

If you have specified a non-standard port for etcd, don't forget to change macros: {$ETCD.SCHEME} and {$ETCD.PORT}.
You can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template to use on a host level if necessary.
To test availability, run: zabbix_get -s etcd-host -k etcd.health.
See the macros section, as it will set the trigger values.

Macros used

Name	Description	Default
{$ETCD.HOST}	The hostname or IP address of the `etcd` API endpoint.	`<SET ETCD HOST>`
{$ETCD.PORT}	The port of the `etcd` API endpoint.	`2379`
{$ETCD.SCHEME}	The request scheme which may be `http` or `https`.	`http`
{$ETCD.USER}
{$ETCD.PASSWORD}
{$ETCD.LEADER.CHANGES.MAX.WARN}	The maximum number of leader changes.	`5`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	The maximum number of proposal failures.	`2`
{$ETCD.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	The maximum number of proposals in queue.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors.	`90`
{$ETCD.GRPC_CODE.MATCHES}	The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPC_CODE.NOT_MATCHES}	The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC.ERRORS.MAX.WARN}	The maximum number of gRPC request failures.	`1`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	The filter of discoverable gRPC codes, which will create triggers.	`Aborted\|Unavailable`

Items

Name	Description	Type	Key and additional info
Etcd: Service's TCP port state	Simple check	net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Etcd: Get node metrics	HTTP agent	etcd.get_metrics
Etcd: Node health	HTTP agent	etcd.health Preprocessing JSON Path: `$.health` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Etcd: Server is a leader	It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise.	Dependent item	etcd.is.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_is_leader)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Etcd: Server has a leader	It defines - whether or not a leader exists: 1 - it exists; 0 - it does not.	Dependent item	etcd.has.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_has_leader)` Discard unchanged with heartbeat: `10m`
Etcd: Leader changes	The number of leader changes the member has seen since its start.	Dependent item	etcd.leader.changes Preprocessing Prometheus pattern: `VALUE(etcd_server_leader_changes_seen_total)`
Etcd: Proposals committed per second	The number of consensus proposals committed.	Dependent item	etcd.proposals.committed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_committed_total)` Change per second
Etcd: Proposals applied per second	The number of consensus proposals applied.	Dependent item	etcd.proposals.applied.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_applied_total)` Change per second
Etcd: Proposals failed per second	The number of failed proposals seen.	Dependent item	etcd.proposals.failed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_failed_total)` Change per second
Etcd: Proposals pending	The current number of pending proposals to commit.	Dependent item	etcd.proposals.pending Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_pending)`
Etcd: Reads per second	The number of read actions by `get/getRecursive`, local to this member.	Dependent item	etcd.reads.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_reads_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Writes per second	The number of writes (e.g., `set/compareAndDelete`) seen by this member.	Dependent item	etcd.writes.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_writes_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Client gRPC received bytes per second	The number of bytes received from gRPC clients per second.	Dependent item	etcd.network.grpc.received.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_received_bytes_total)` Change per second
Etcd: Client gRPC sent bytes per second	The number of bytes sent from gRPC clients per second.	Dependent item	etcd.network.grpc.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_sent_bytes_total)` Change per second
Etcd: HTTP requests received	The number of requests received into the system (successfully parsed and `authd`).	Dependent item	etcd.http.requests.rate Preprocessing Prometheus to JSON: `etcd_http_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: HTTP 5XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `5XX`.	Dependent item	etcd.http.requests.5xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"5.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: HTTP 4XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `4XX`.	Dependent item	etcd.http.requests.4xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"4.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs received per second	The number of RPC stream messages received on the server.	Dependent item	etcd.grpc.received.rate Preprocessing Prometheus to JSON: `grpc_server_msg_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs sent per second	The number of gRPC stream messages sent by the server.	Dependent item	etcd.grpc.sent.rate Preprocessing Prometheus to JSON: `grpc_server_msg_sent_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs started per second	The number of RPCs started on the server.	Dependent item	etcd.grpc.started.rate Preprocessing Prometheus to JSON: `grpc_server_started_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Get version	HTTP agent	etcd.get_version
Etcd: Server version	The version of the `etcd server`.	Dependent item	etcd.server.version Preprocessing JSON Path: `$.etcdserver` Discard unchanged with heartbeat: `1d`
Etcd: Cluster version	The version of the `etcd cluster`.	Dependent item	etcd.cluster.version Preprocessing JSON Path: `$.etcdcluster` Discard unchanged with heartbeat: `1d`
Etcd: DB size	The total size of the underlying database.	Dependent item	etcd.db.size Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_db_total_size_in_bytes)`
Etcd: Keys compacted per second	The number of DB keys compacted per second.	Dependent item	etcd.keys.compacted.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_db_compaction_keys_total)` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Keys expired per second	The number of expired keys per second.	Dependent item	etcd.keys.expired.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_store_expires_total)` Change per second
Etcd: Keys total	The total number of keys.	Dependent item	etcd.keys.total Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_keys_total)`
Etcd: Uptime	`Etcd` server uptime.	Dependent item	etcd.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` JavaScript: `The text is too long. Please see the template.`
Etcd: Virtual memory	The size of virtual memory expressed in bytes.	Dependent item	etcd.virtual.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Etcd: Resident memory	The size of resident memory expressed in bytes.	Dependent item	etcd.res.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Etcd: CPU	The total user and system CPU time spent in seconds.	Dependent item	etcd.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second
Etcd: Open file descriptors	The number of open file descriptors.	Dependent item	etcd.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Etcd: Maximum open file descriptors	The Maximum number of open file descriptors.	Dependent item	etcd.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Etcd: Deletes per second	The number of deletes seen by this member per second.	Dependent item	etcd.delete.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_delete_total)` Change per second
Etcd: PUT per second	The number of puts seen by this member per second.	Dependent item	etcd.put.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_put_total)` Change per second
Etcd: Range per second	The number of ranges seen by this member per second.	Dependent item	etcd.range.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Etcd: Transaction per second	The number of transactions seen by this member per second.	Dependent item	etcd.txn.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Etcd: Pending events	The total number of pending events to be sent.	Dependent item	etcd.events.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_pending_events_total)`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Service is unavailable	`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0`	Average	Manual close: Yes
Etcd: Node healthcheck failed	See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check.	`last(/Etcd by HTTP/etcd.health)=0`	Average	Depends on: Etcd: Service is unavailable
Etcd: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`	Warning	Manual close: Yes Depends on: Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`	Average
Etcd: Instance has seen too many leader changes	Rapid leadership changes impact the performance of `etcd` significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the `etcd cluster`.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`	Warning
Etcd: Too many proposal failures	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`	Warning
Etcd: Too many proposals are queued to commit	Rising pending proposals suggests there is a high client load, or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`	Warning
Etcd: Too many HTTP requests failures	Too many requests failed on `etcd` instance with the `5xx HTTP code`.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`	Warning
Etcd: Server version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`	Info	Manual close: Yes
Etcd: Cluster version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`	Info	Manual close: Yes
Etcd: Host has been restarted	Uptime is less than 10 minutes.	`last(/Etcd by HTTP/etcd.uptime)<10m`	Info	Manual close: Yes
Etcd: Current number of open files is too high	Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, `etcd` may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`	Warning

LLD rule gRPC codes discovery

Name Description Type Key and additional info

gRPC codes discovery

Dependent item

Name	Description	Type	Key and additional info
gRPC codes discovery	Dependent item	etcd.grpc_code.discovery Preprocessing Prometheus to JSON: `grpc_server_handled_total` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

etcd.grpc_code.discovery

Preprocessing

Prometheus to JSON: grpc_server_handled_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for gRPC codes discovery

Name Description Type Key and additional info

Etcd: RPCs completed with code {#GRPC.CODE}

Name	Description	Type	Key and additional info
Etcd: RPCs completed with code {#GRPC.CODE}	The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.	Dependent item	etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing Prometheus to JSON: `grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}` JavaScript: `The text is too long. Please see the template.` Change per second

The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.

Dependent item

etcd.grpc.handled.rate[{#GRPC.CODE}]

Preprocessing

Prometheus to JSON: grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}
JavaScript: The text is too long. Please see the template.
Change per second

Trigger prototypes for gRPC codes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE}	`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`	Warning

LLD rule Peers discovery

Name Description Type Key and additional info

Peers discovery

Dependent item

Name	Description	Type	Key and additional info
Peers discovery	Dependent item	etcd.peer.discovery Preprocessing Prometheus to JSON: `etcd_network_peer_sent_bytes_total`

etcd.peer.discovery

Preprocessing

Prometheus to JSON: etcd_network_peer_sent_bytes_total

Item prototypes for Peers discovery

Name	Description	Type	Key and additional info
Etcd: Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `VALUE(etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"})` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Send failures	The number of sent failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Receive failures	The number of received failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.2

Also available for: 7.4 7.2 7.0 6.4 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/6.2

Etcd by HTTP

Overview

For Zabbix version: 6.2 and higher. This template is designed to monitor etcd by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Etcd by HTTP — collects metrics by help of the HTTP agent from /metrics endpoint.

Refer to the vendor documentation.

For the users of etcd version <= 3.4 !

In etcd v3.5 some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade your etcd instance, or use older Etcd by HTTP template version.

This template has been tested on:

Etcd, version 3.5.6

Setup

See Zabbix template operation for basic instructions.

Follow these instructions:

Import the template into Zabbix.
After importing the template, make sure that etcd allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics.
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics.
Add the template to each etcd node. By default, the template uses a client's port. You can configure metrics endpoint location by adding --listen-metrics-urls flag. (For more details, see etcd documentation).

Additional points to consider:

If you have specified a non-standard port for etcd, don't forget to change macros: {$ETCD.SCHEME} and {$ETCD.PORT}.
You can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template to use on a host level if necessary.
To test availability, run : zabbix_get -s etcd-host -k etcd.health.
See the macros section, as it will set the trigger values.

Configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$ETCD.GRPC.ERRORS.MAX.WARN}	The maximum number of gRPC request failures.	`1`
{$ETCD.GRPC_CODE.MATCHES}	The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPC_CODE.NOT_MATCHES}	The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	The filter of discoverable gRPC codes, which will create triggers.	`Aborted
{$ETCD.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures.	`2`
{$ETCD.LEADER.CHANGES.MAX.WARN}	The maximum number of leader changes.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors.	`90`
{$ETCD.PASSWORD}	-	``
{$ETCD.PORT}	The port of `etcd` API endpoint.	`2379`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	The maximum number of proposal failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	The maximum number of proposals in queue.	`5`
{$ETCD.SCHEME}	The request scheme which may be `http` or `https`.	`http`
{$ETCD.USER}	-	``

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info

gRPC codes discovery

Name	Description	Type	Key and additional info
gRPC codes discovery	-	DEPENDENT	etcd.grpc_code.discovery Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_handled_total` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `1h` Filter: AND - {#GRPC.CODE} NOT_MATCHES_REGEX `{$ETCD.GRPC_CODE.NOT_MATCHES}` - {#GRPC.CODE} MATCHES_REGEX `{$ETCD.GRPC_CODE.MATCHES}` Overrides: trigger - {#GRPC.CODE} MATCHES_REGEX `{$ETCD.GRPC_CODE.TRIGGER.MATCHES}` - TRIGGER_PROTOTYPE LIKE `Too many failed gRPC requests` - DISCOVER
Peers discovery	-	DEPENDENT	etcd.peer.discovery Preprocessing: - PROMETHEUS_TO_JSON: `etcd_network_peer_sent_bytes_total`

DEPENDENT

etcd.grpc_code.discovery

Preprocessing:

- PROMETHEUS_TO_JSON: grpc_server_handled_total

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Filter:

AND

- {#GRPC.CODE} NOT_MATCHES_REGEX {$ETCD.GRPC_CODE.NOT_MATCHES}

- {#GRPC.CODE} MATCHES_REGEX {$ETCD.GRPC_CODE.MATCHES}

Overrides:

trigger
- {#GRPC.CODE} MATCHES_REGEX {$ETCD.GRPC_CODE.TRIGGER.MATCHES}
- TRIGGER_PROTOTYPE LIKE Too many failed gRPC requests
- DISCOVER

Peers discovery

DEPENDENT

etcd.peer.discovery

Preprocessing:

- PROMETHEUS_TO_JSON: etcd_network_peer_sent_bytes_total

Items collected

Group	Name	Description	Type	Key and additional info
Etcd	Etcd: Service's TCP port state	-	SIMPLE	net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Node health	-	HTTP_AGENT	etcd.health Preprocessing: - JSONPATH: `$.health` - BOOL_TO_DECIMAL ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Server is a leader	It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise.	DEPENDENT	etcd.is.leader Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_is_leader` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Server has a leader	It defines - whether or not a leader exists: 1 - it exists; 0 - it does not.	DEPENDENT	etcd.has.leader Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_has_leader` - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Leader changes	The number of leader changes the member has seen since its start.	DEPENDENT	etcd.leader.changes Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_leader_changes_seen_total`
Etcd	Etcd: Proposals committed per second	The number of consensus proposals committed.	DEPENDENT	etcd.proposals.committed.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_committed_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals applied per second	The number of consensus proposals applied.	DEPENDENT	etcd.proposals.applied.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_applied_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals failed per second	The number of failed proposals seen.	DEPENDENT	etcd.proposals.failed.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_failed_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals pending	The current number of pending proposals to commit.	DEPENDENT	etcd.proposals.pending Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_pending`
Etcd	Etcd: Reads per second	The number of read actions by `get/getRecursive`, local to this member.	DEPENDENT	etcd.reads.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_debugging_store_reads_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Writes per second	The number of writes (e.g., `set/compareAndDelete`) seen by this member.	DEPENDENT	etcd.writes.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_debugging_store_writes_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Client gRPC received bytes per second	The number of bytes received from gRPC clients per second.	DEPENDENT	etcd.network.grpc.received.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_client_grpc_received_bytes_total` - CHANGE_PER_SECOND
Etcd	Etcd: Client gRPC sent bytes per second	The number of bytes sent from gRPC clients per second.	DEPENDENT	etcd.network.grpc.sent.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_client_grpc_sent_bytes_total` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP requests received	The number of requests received into the system (successfully parsed and `authd`).	DEPENDENT	etcd.http.requests.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_received_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP 5XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `5XX`.	DEPENDENT	etcd.http.requests.5xx.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_failed_total{code=~"5.+"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP 4XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `4XX`.	DEPENDENT	etcd.http.requests.4xx.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_failed_total{code=~"4.+"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs received per second	The number of RPC stream messages received on the server.	DEPENDENT	etcd.grpc.received.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_msg_received_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs sent per second	The number of gRPC stream messages sent by the server.	DEPENDENT	etcd.grpc.sent.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_msg_sent_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs started per second	The number of RPCs started on the server.	DEPENDENT	etcd.grpc.started.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_started_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Server version	The version of the `etcd server`.	DEPENDENT	etcd.server.version Preprocessing: - JSONPATH: `$.etcdserver` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
Etcd	Etcd: Cluster version	The version of the `etcd cluster`.	DEPENDENT	etcd.cluster.version Preprocessing: - JSONPATH: `$.etcdcluster` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
Etcd	Etcd: DB size	The total size of the underlying database.	DEPENDENT	etcd.db.size Preprocessing: - PROMETHEUS_PATTERN: `etcd_mvcc_db_total_size_in_bytes`
Etcd	Etcd: Keys compacted per second	The number of DB keys compacted per second.	DEPENDENT	etcd.keys.compacted.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_db_compaction_keys_total` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Keys expired per second	The number of expired keys per second.	DEPENDENT	etcd.keys.expired.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_store_expires_total` - CHANGE_PER_SECOND
Etcd	Etcd: Keys total	The total number of keys.	DEPENDENT	etcd.keys.total Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_keys_total`
Etcd	Etcd: Uptime	`Etcd` server uptime.	DEPENDENT	etcd.uptime Preprocessing: - PROMETHEUS_PATTERN: `process_start_time_seconds` - JAVASCRIPT: `//use boottime to calculate uptime return (Math.floor(Date.now()/1000)-Number(value));`
Etcd	Etcd: Virtual memory	The size of virtual memory expressed in bytes.	DEPENDENT	etcd.virtual.bytes Preprocessing: - PROMETHEUS_PATTERN: `process_virtual_memory_bytes`
Etcd	Etcd: Resident memory	The size of resident memory expressed in bytes.	DEPENDENT	etcd.res.bytes Preprocessing: - PROMETHEUS_PATTERN: `process_resident_memory_bytes`
Etcd	Etcd: CPU	The total user and system CPU time spent in seconds.	DEPENDENT	etcd.cpu.util Preprocessing: - PROMETHEUS_PATTERN: `process_cpu_seconds_total` - CHANGE_PER_SECOND
Etcd	Etcd: Open file descriptors	The number of open file descriptors.	DEPENDENT	etcd.open.fds Preprocessing: - PROMETHEUS_PATTERN: `process_open_fds`
Etcd	Etcd: Maximum open file descriptors	The Maximum number of open file descriptors.	DEPENDENT	etcd.max.fds Preprocessing: - PROMETHEUS_PATTERN: `process_max_fds`
Etcd	Etcd: Deletes per second	The number of deletes seen by this member per second.	DEPENDENT	etcd.delete.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_mvcc_delete_total` - CHANGE_PER_SECOND
Etcd	Etcd: PUT per second	The number of puts seen by this member per second.	DEPENDENT	etcd.put.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_mvcc_put_total` - CHANGE_PER_SECOND
Etcd	Etcd: Range per second	The number of ranges seen by this member per second.	DEPENDENT	etcd.range.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_range_total` - CHANGE_PER_SECOND
Etcd	Etcd: Transaction per second	The number of transactions seen by this member per second.	DEPENDENT	etcd.txn.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_range_total` - CHANGE_PER_SECOND
Etcd	Etcd: Pending events	The total number of pending events to be sent.	DEPENDENT	etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_pending_events_total`
Etcd	Etcd: RPCs completed with code {#GRPC.CODE}	The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.	DEPENDENT	etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to a peer with the ID `{#ETCD.PEER}`.	DEPENDENT	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from a peer with the ID `{#ETCD.PEER}`.	DEPENDENT	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_received_bytes_total{From="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Send failures	The number of sent failures from a peer with the ID `{#ETCD.PEER}`.	DEPENDENT	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_sent_failures_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Receive failures	The number of received failures from a peer with the ID `{#ETCD.PEER}`.	DEPENDENT	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_received_failures_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Zabbix raw items	Etcd: Get node metrics	-	HTTP_AGENT	etcd.get_metrics
Zabbix raw items	Etcd: Get version	-	HTTP_AGENT	etcd.get_version

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Service is unavailable	-	`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"])=0`	AVERAGE	Manual close: YES
Etcd: Node healthcheck failed	See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check.	`last(/Etcd by HTTP/etcd.health)=0`	AVERAGE	Depends on: - Etcd: Service is unavailable
Etcd: Failed to fetch info data	Zabbix has not received data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`	WARNING	Manual close: YES Depends on: - Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`	AVERAGE
Etcd: Instance has seen too many leader changes	Rapid leadership changes impact the performance of `etcd` significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the `etcd cluster`.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`	WARNING
Etcd: Too many proposal failures	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`	WARNING
Etcd: Too many proposals are queued to commit	Rising pending proposals suggests there is a high client load, or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`	WARNING
Etcd: Too many HTTP requests failures	Too many requests failed on `etcd` instance with the `5xx HTTP code`.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`	WARNING
Etcd: Server version has changed	The Etcd version has changed. Acknowledge to close manually.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`	INFO	Manual close: YES
Etcd: Cluster version has changed	The Etcd version has changed. Acknowledge to close manually.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`	INFO	Manual close: YES
Etcd: Host has been restarted	The host uptime is less than 10 minutes.	`last(/Etcd by HTTP/etcd.uptime)<10m`	INFO	Manual close: YES
Etcd: Current number of open files is too high	Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, `etcd` may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`	WARNING
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE}	-	`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`	WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com.

This template is for Zabbix version: 6.0

Also available for: 7.4 7.2 7.0 6.4 6.2 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/6.0

Etcd by HTTP

Overview

This template is designed to monitor etcd by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template Etcd by HTTP — collects metrics by help of the HTTP agent from /metrics endpoint.

Refer to the vendor documentation.

For the users of etcd version <= 3.4 !

In etcd v3.5 some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade your etcd instance, or use older Etcd by HTTP template version.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Etcd 3.5.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Follow these instructions:

Import the template into Zabbix.
After importing the template, make sure that etcd allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics.
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics.
Add the template to each etcd node. By default, the template uses a client's port. You can configure metrics endpoint location by adding --listen-metrics-urls flag. (For more details, see etcd documentation).

Additional points to consider:

If you have specified a non-standard port for etcd, don't forget to change macros: {$ETCD.SCHEME} and {$ETCD.PORT}.
You can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template to use on a host level if necessary.
To test availability, run : zabbix_get -s etcd-host -k etcd.health.
See the macros section, as it will set the trigger values.

Macros used

Name	Description	Default
{$ETCD.PORT}	The port of `etcd` API endpoint.	`2379`
{$ETCD.SCHEME}	The request scheme which may be `http` or `https`.	`http`
{$ETCD.USER}
{$ETCD.PASSWORD}
{$ETCD.LEADER.CHANGES.MAX.WARN}	The maximum number of leader changes.	`5`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	The maximum number of proposal failures.	`2`
{$ETCD.HTTP.FAIL.MAX.WARN}	The maximum number of HTTP request failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	The maximum number of proposals in queue.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	The maximum percentage of used file descriptors.	`90`
{$ETCD.GRPC_CODE.MATCHES}	The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPC_CODE.NOT_MATCHES}	The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC.ERRORS.MAX.WARN}	The maximum number of gRPC request failures.	`1`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	The filter of discoverable gRPC codes, which will create triggers.	`Aborted\|Unavailable`

Items

Name	Description	Type	Key and additional info
Etcd: Service's TCP port state	Simple check	net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Etcd: Get node metrics	HTTP agent	etcd.get_metrics
Etcd: Node health	HTTP agent	etcd.health Preprocessing JSON Path: `$.health` Boolean to decimal ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Etcd: Server is a leader	It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise.	Dependent item	etcd.is.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_is_leader)` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
Etcd: Server has a leader	It defines - whether or not a leader exists: 1 - it exists; 0 - it does not.	Dependent item	etcd.has.leader Preprocessing Prometheus pattern: `VALUE(etcd_server_has_leader)` Discard unchanged with heartbeat: `10m`
Etcd: Leader changes	The number of leader changes the member has seen since its start.	Dependent item	etcd.leader.changes Preprocessing Prometheus pattern: `VALUE(etcd_server_leader_changes_seen_total)`
Etcd: Proposals committed per second	The number of consensus proposals committed.	Dependent item	etcd.proposals.committed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_committed_total)` Change per second
Etcd: Proposals applied per second	The number of consensus proposals applied.	Dependent item	etcd.proposals.applied.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_applied_total)` Change per second
Etcd: Proposals failed per second	The number of failed proposals seen.	Dependent item	etcd.proposals.failed.rate Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_failed_total)` Change per second
Etcd: Proposals pending	The current number of pending proposals to commit.	Dependent item	etcd.proposals.pending Preprocessing Prometheus pattern: `VALUE(etcd_server_proposals_pending)`
Etcd: Reads per second	The number of read actions by `get/getRecursive`, local to this member.	Dependent item	etcd.reads.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_reads_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Writes per second	The number of writes (e.g., `set/compareAndDelete`) seen by this member.	Dependent item	etcd.writes.rate Preprocessing Prometheus to JSON: `etcd_debugging_store_writes_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Client gRPC received bytes per second	The number of bytes received from gRPC clients per second.	Dependent item	etcd.network.grpc.received.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_received_bytes_total)` Change per second
Etcd: Client gRPC sent bytes per second	The number of bytes sent from gRPC clients per second.	Dependent item	etcd.network.grpc.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_network_client_grpc_sent_bytes_total)` Change per second
Etcd: HTTP requests received	The number of requests received into the system (successfully parsed and `authd`).	Dependent item	etcd.http.requests.rate Preprocessing Prometheus to JSON: `etcd_http_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: HTTP 5XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `5XX`.	Dependent item	etcd.http.requests.5xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"5.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: HTTP 4XX	The number of handled failures of requests (non-watches), by the method (`GET/PUT` etc.), and the code `4XX`.	Dependent item	etcd.http.requests.4xx.rate Preprocessing Prometheus to JSON: `etcd_http_failed_total{code=~"4.+"}` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs received per second	The number of RPC stream messages received on the server.	Dependent item	etcd.grpc.received.rate Preprocessing Prometheus to JSON: `grpc_server_msg_received_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs sent per second	The number of gRPC stream messages sent by the server.	Dependent item	etcd.grpc.sent.rate Preprocessing Prometheus to JSON: `grpc_server_msg_sent_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: RPCs started per second	The number of RPCs started on the server.	Dependent item	etcd.grpc.started.rate Preprocessing Prometheus to JSON: `grpc_server_started_total` JavaScript: `The text is too long. Please see the template.` Change per second
Etcd: Get version	HTTP agent	etcd.get_version
Etcd: Server version	The version of the `etcd server`.	Dependent item	etcd.server.version Preprocessing JSON Path: `$.etcdserver` Discard unchanged with heartbeat: `1d`
Etcd: Cluster version	The version of the `etcd cluster`.	Dependent item	etcd.cluster.version Preprocessing JSON Path: `$.etcdcluster` Discard unchanged with heartbeat: `1d`
Etcd: DB size	The total size of the underlying database.	Dependent item	etcd.db.size Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_db_total_size_in_bytes)`
Etcd: Keys compacted per second	The number of DB keys compacted per second.	Dependent item	etcd.keys.compacted.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_db_compaction_keys_total)` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Keys expired per second	The number of expired keys per second.	Dependent item	etcd.keys.expired.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_store_expires_total)` Change per second
Etcd: Keys total	The total number of keys.	Dependent item	etcd.keys.total Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_keys_total)`
Etcd: Uptime	`Etcd` server uptime.	Dependent item	etcd.uptime Preprocessing Prometheus pattern: `VALUE(process_start_time_seconds)` JavaScript: `The text is too long. Please see the template.`
Etcd: Virtual memory	The size of virtual memory expressed in bytes.	Dependent item	etcd.virtual.bytes Preprocessing Prometheus pattern: `VALUE(process_virtual_memory_bytes)`
Etcd: Resident memory	The size of resident memory expressed in bytes.	Dependent item	etcd.res.bytes Preprocessing Prometheus pattern: `VALUE(process_resident_memory_bytes)`
Etcd: CPU	The total user and system CPU time spent in seconds.	Dependent item	etcd.cpu.util Preprocessing Prometheus pattern: `VALUE(process_cpu_seconds_total)` Change per second
Etcd: Open file descriptors	The number of open file descriptors.	Dependent item	etcd.open.fds Preprocessing Prometheus pattern: `VALUE(process_open_fds)`
Etcd: Maximum open file descriptors	The Maximum number of open file descriptors.	Dependent item	etcd.max.fds Preprocessing Prometheus pattern: `VALUE(process_max_fds)`
Etcd: Deletes per second	The number of deletes seen by this member per second.	Dependent item	etcd.delete.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_delete_total)` Change per second
Etcd: PUT per second	The number of puts seen by this member per second.	Dependent item	etcd.put.rate Preprocessing Prometheus pattern: `VALUE(etcd_mvcc_put_total)` Change per second
Etcd: Range per second	The number of ranges seen by this member per second.	Dependent item	etcd.range.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Etcd: Transaction per second	The number of transactions seen by this member per second.	Dependent item	etcd.txn.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_range_total)` Change per second
Etcd: Pending events	The total number of pending events to be sent.	Dependent item	etcd.events.sent.rate Preprocessing Prometheus pattern: `VALUE(etcd_debugging_mvcc_pending_events_total)`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Service is unavailable	`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"])=0`	Average	Manual close: Yes
Etcd: Node healthcheck failed	See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check.	`last(/Etcd by HTTP/etcd.health)=0`	Average	Depends on: Etcd: Service is unavailable
Etcd: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`	Warning	Manual close: Yes Depends on: Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`	Average
Etcd: Instance has seen too many leader changes	Rapid leadership changes impact the performance of `etcd` significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the `etcd cluster`.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`	Warning
Etcd: Too many proposal failures	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`	Warning
Etcd: Too many proposals are queued to commit	Rising pending proposals suggests there is a high client load, or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`	Warning
Etcd: Too many HTTP requests failures	Too many requests failed on `etcd` instance with the `5xx HTTP code`.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`	Warning
Etcd: Server version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`	Info	Manual close: Yes
Etcd: Cluster version has changed	Etcd version has changed. Acknowledge to close the problem manually.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`	Info	Manual close: Yes
Etcd: Host has been restarted	Uptime is less than 10 minutes.	`last(/Etcd by HTTP/etcd.uptime)<10m`	Info	Manual close: Yes
Etcd: Current number of open files is too high	Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, `etcd` may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`	Warning

LLD rule gRPC codes discovery

Name Description Type Key and additional info

gRPC codes discovery

Dependent item

Name	Description	Type	Key and additional info
gRPC codes discovery	Dependent item	etcd.grpc_code.discovery Preprocessing Prometheus to JSON: `grpc_server_handled_total` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

etcd.grpc_code.discovery

Preprocessing

Prometheus to JSON: grpc_server_handled_total
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for gRPC codes discovery

Name Description Type Key and additional info

Etcd: RPCs completed with code {#GRPC.CODE}

Name	Description	Type	Key and additional info
Etcd: RPCs completed with code {#GRPC.CODE}	The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.	Dependent item	etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing Prometheus to JSON: `grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}` JavaScript: `The text is too long. Please see the template.` Change per second

The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.

Dependent item

etcd.grpc.handled.rate[{#GRPC.CODE}]

Preprocessing

Prometheus to JSON: grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}
JavaScript: The text is too long. Please see the template.
Change per second

Trigger prototypes for gRPC codes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE}	`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`	Warning

LLD rule Peers discovery

Name Description Type Key and additional info

Peers discovery

Dependent item

Name	Description	Type	Key and additional info
Peers discovery	Dependent item	etcd.peer.discovery Preprocessing Prometheus to JSON: `etcd_network_peer_sent_bytes_total`

etcd.peer.discovery

Preprocessing

Prometheus to JSON: etcd_network_peer_sent_bytes_total

Item prototypes for Peers discovery

Name	Description	Type	Key and additional info
Etcd: Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `VALUE(etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"})` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Send failures	The number of sent failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
Etcd: Etcd peer {#ETCD.PEER}: Receive failures	The number of received failures from a peer with the ID `{#ETCD.PEER}`.	Dependent item	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing Prometheus pattern: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 5.4

Also available for: 7.4 7.2 7.0 6.4 6.2 6.0 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/5.4

Etcd by HTTP

Overview

For Zabbix version: 5.4 and higher
The template to monitor Etcd by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Etcd — collects metrics by HTTP agent from /metrics endpoint. See https://etcd.io/docs/v3.4.0/op-guide/monitoring/#metrics-endpoint.

This template was tested on:

Etcd, version 3.0+

Setup

See Zabbix template operation for basic instructions.

Import template into Zabbix
After importing template make sure that etcd allows for metric collection. Test by running: curl -L http://localhost:2379/metrics
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify run curl -L http://<etcd_node_address>:2379/metrics
Add the template to each node with etcd. By default template use client port. You can configure metrics endpoint location by --listen-metrics-urls flag (See etcd docs).

If you have specified a non-standard port for etcd, don't forget change macros {$ETCD.SCHEME}, {$ETCD.PORT}.

If you need it, you can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template for using on the host level.

Test availability: zabbix_get -s etcd-host -k etcd.health

Besides, see the macros section as it will set the trigger values.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$ETCD.GRPC.ERRORS.MAX.WARN}	Maximum number of gRPC requests failures.	`1`
{$ETCD.GRPC_CODE.MATCHES}	Filter of discoverable gRPC codes https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`.*`
{$ETCD.GRPC_CODE.NOT_MATCHES}	Filter to exclude discovered gRPC codes https://github.com/grpc/grpc/blob/master/doc/statuscodes.md.	`CHANGE_IF_NEEDED`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	Filter of discoverable gRPC codes which will be create triggers.	`Aborted
{$ETCD.HTTP.FAIL.MAX.WARN}	Maximum number of HTTP requests failures.	`2`
{$ETCD.LEADER.CHANGES.MAX.WARN}	Maximum number of leader changes.	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`90`
{$ETCD.PASSWORD}	-	``
{$ETCD.PORT}	The port of Etcd API endpoint.	`2379`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	Maximum number of proposal failures.	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	Maximum number of proposals in queue.	`5`
{$ETCD.SCHEME}	Request scheme which may be http or https.	`http`
{$ETCD.USER}	-	``

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info

gRPC codes discovery

Name	Description	Type	Key and additional info
gRPC codes discovery	-	DEPENDENT	etcd.grpc_code.discovery Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_handled_total` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `1h` Filter: AND - {#GRPC.CODE} NOT_MATCHES_REGEX `{$ETCD.GRPC_CODE.NOT_MATCHES}` - {#GRPC.CODE} MATCHES_REGEX `{$ETCD.GRPC_CODE.MATCHES}` Overrides: trigger - {#GRPC.CODE} MATCHES_REGEX `{$ETCD.GRPC_CODE.TRIGGER.MATCHES}` - TRIGGER_PROTOTYPE LIKE `Too many failed gRPC requests` - DISCOVER
Peers discovery	-	DEPENDENT	etcd.peer.discovery Preprocessing: - PROMETHEUS_TO_JSON: `etcd_network_peer_sent_bytes_total`

DEPENDENT

etcd.grpc_code.discovery

Preprocessing:

- PROMETHEUS_TO_JSON: grpc_server_handled_total

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Filter:

AND

- {#GRPC.CODE} NOT_MATCHES_REGEX {$ETCD.GRPC_CODE.NOT_MATCHES}

- {#GRPC.CODE} MATCHES_REGEX {$ETCD.GRPC_CODE.MATCHES}

Overrides:

trigger
- {#GRPC.CODE} MATCHES_REGEX {$ETCD.GRPC_CODE.TRIGGER.MATCHES}
- TRIGGER_PROTOTYPE LIKE Too many failed gRPC requests - DISCOVER

Peers discovery

DEPENDENT

etcd.peer.discovery

Preprocessing:

- PROMETHEUS_TO_JSON: etcd_network_peer_sent_bytes_total

Items collected

Group	Name	Description	Type	Key and additional info
Etcd	Etcd: Service's TCP port state	-	SIMPLE	net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Node health	-	HTTP_AGENT	etcd.health Preprocessing: - JSONPATH: `$.health` - BOOL_TO_DECIMAL ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Server is a leader	Whether or not this member is a leader. 1 if is, 0 otherwise.	DEPENDENT	etcd.is.leader Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_is_leader` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Server has a leader	Whether or not a leader exists. 1 is existence, 0 is not.	DEPENDENT	etcd.has.leader Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_has_leader` - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Leader changes	The the number of leader changes the member has seen since its start.	DEPENDENT	etcd.leader.changes Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_leader_changes_seen_total`
Etcd	Etcd: Proposals committed per second	The number of consensus proposals committed.	DEPENDENT	etcd.proposals.committed.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_committed_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals applied per second	The number of consensus proposals applied.	DEPENDENT	etcd.proposals.applied.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_applied_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals failed per second	The number of failed proposals seen.	DEPENDENT	etcd.proposals.failed.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_failed_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals pending	The current number of pending proposals to commit.	DEPENDENT	etcd.proposals.pending Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_pending`
Etcd	Etcd: Reads per second	Number of reads action by (get/getRecursive), local to this member.	DEPENDENT	etcd.reads.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_debugging_store_reads_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Writes per second	Number of writes (e.g. set/compareAndDelete) seen by this member.	DEPENDENT	etcd.writes.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_debugging_store_writes_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Client gRPC received bytes per second	The number of bytes received from grpc clients per second.	DEPENDENT	etcd.network.grpc.received.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_client_grpc_received_bytes_total` - CHANGE_PER_SECOND
Etcd	Etcd: Client gRPC sent bytes per second	The number of bytes sent from grpc clients per second.	DEPENDENT	etcd.network.grpc.sent.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_client_grpc_sent_bytes_total` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP requests received	Number of requests received into the system (successfully parsed and authd).	DEPENDENT	etcd.http.requests.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_received_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP 5XX	Number of handle failures of requests (non-watches), by method (GET/PUT etc.), and code 5XX.	DEPENDENT	etcd.http.requests.5xx.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_failed_total{code=~"5.+"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP 4XX	Number of handle failures of requests (non-watches), by method (GET/PUT etc.), and code 4XX.	DEPENDENT	etcd.http.requests.4xx.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_failed_total{code=~"4.+"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs received per second	The number of RPC stream messages received on the server.	DEPENDENT	etcd.grpc.received.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_msg_received_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs sent per second	The number of gRPC stream messages sent by the server.	DEPENDENT	etcd.grpc.sent.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_msg_sent_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs started per second	The number of RPCs started on the server.	DEPENDENT	etcd.grpc.started.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_started_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Server version	Version of the Etcd server.	DEPENDENT	etcd.server.version Preprocessing: - JSONPATH: `$.etcdserver` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
Etcd	Etcd: Cluster version	Version of the Etcd cluster.	DEPENDENT	etcd.cluster.version Preprocessing: - JSONPATH: `$.etcdcluster` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
Etcd	Etcd: DB size	Total size of the underlying database.	DEPENDENT	etcd.db.size Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_db_total_size_in_bytes`
Etcd	Etcd: Keys compacted per second	The number of DB keys compacted per second.	DEPENDENT	etcd.keys.compacted.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_db_compaction_keys_total` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Keys expired per second	The number of expired keys per second.	DEPENDENT	etcd.keys.expired.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_store_expires_total` - CHANGE_PER_SECOND
Etcd	Etcd: Keys total	Total number of keys.	DEPENDENT	etcd.keys.total Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_keys_total`
Etcd	Etcd: Uptime	Etcd server uptime.	DEPENDENT	etcd.uptime Preprocessing: - PROMETHEUS_PATTERN: `process_start_time_seconds` - JAVASCRIPT: `//use boottime to calculate uptime return (Math.floor(Date.now()/1000)-Number(value));`
Etcd	Etcd: Virtual memory	Virtual memory size in bytes.	DEPENDENT	etcd.virtual.bytes Preprocessing: - PROMETHEUS_PATTERN: `process_virtual_memory_bytes`
Etcd	Etcd: Resident memory	Resident memory size in bytes.	DEPENDENT	etcd.res.bytes Preprocessing: - PROMETHEUS_PATTERN: `process_resident_memory_bytes`
Etcd	Etcd: CPU	Total user and system CPU time spent in seconds.	DEPENDENT	etcd.cpu.util Preprocessing: - PROMETHEUS_PATTERN: `process_cpu_seconds_total` - CHANGE_PER_SECOND
Etcd	Etcd: Open file descriptors	Number of open file descriptors.	DEPENDENT	etcd.open.fds Preprocessing: - PROMETHEUS_PATTERN: `process_open_fds`
Etcd	Etcd: Maximum open file descriptors	The Maximum number of open file descriptors.	DEPENDENT	etcd.max.fds Preprocessing: - PROMETHEUS_PATTERN: `process_max_fds`
Etcd	Etcd: Deletes per second	The number of deletes seen by this member per second.	DEPENDENT	etcd.delete.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_delete_total` - CHANGE_PER_SECOND
Etcd	Etcd: PUT per second	The number of puts seen by this member per second.	DEPENDENT	etcd.put.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_put_total` - CHANGE_PER_SECOND
Etcd	Etcd: Range per second	The number of ranges seen by this member per second.	DEPENDENT	etcd.range.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_range_total` - CHANGE_PER_SECOND
Etcd	Etcd: Transaction per second	The number of transactions seen by this member per second.	DEPENDENT	etcd.txn.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_range_total` - CHANGE_PER_SECOND
Etcd	Etcd: Pending events	Total number of pending events to be sent.	DEPENDENT	etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_pending_events_total`
Etcd	Etcd: RPCs completed with code {#GRPC.CODE}	The number of RPCs completed on the server with grpc_code {#GRPC.CODE}.	DEPENDENT	etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to peer with ID {#ETCD.PEER}.	DEPENDENT	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from peer with ID {#ETCD.PEER}.	DEPENDENT	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_received_bytes_total{From="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Send failures	The number of send failures from peer with ID {#ETCD.PEER}.	DEPENDENT	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_sent_failures_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Receive failures failures	The number of receive failures from the peer with ID {#ETCD.PEER}.	DEPENDENT	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_received_failures_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Zabbix_raw_items	Etcd: Get node metrics	-	HTTP_AGENT	etcd.get_metrics
Zabbix_raw_items	Etcd: Get version	-	HTTP_AGENT	etcd.get_version

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Service is unavailable	-	`last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"])=0`	AVERAGE	Manual close: YES
Etcd: Node healthcheck failed	https://etcd.io/docs/v3.4.0/op-guide/monitoring/#health-check	`last(/Etcd by HTTP/etcd.health)=0`	AVERAGE	Depends on: - Etcd: Service is unavailable
Etcd: Failed to fetch info data (or no data for 30m)	Zabbix has not received data for items for the last 30 minutes.	`nodata(/Etcd by HTTP/etcd.is.leader,30m)=1`	WARNING	Manual close: YES Depends on: - Etcd: Service is unavailable
Etcd: Member has no leader	If a member does not have a leader, it is totally unavailable.	`last(/Etcd by HTTP/etcd.has.leader)=0`	AVERAGE
Etcd: Instance has seen too many leader changes (over {$ETCD.LEADER.CHANGES.MAX.WARN} for 15m)'	Rapid leadership changes impact the performance of etcd significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the etcd cluster.	`(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN}`	WARNING
Etcd: Too many proposal failures (over {$ETCD.PROPOSAL.FAIL.MAX.WARN} for 5m)'	Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.	`min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`	WARNING
Etcd: Too many proposals are queued to commit (over {$ETCD.PROPOSAL.PENDING.MAX.WARN} for 5m)'	Rising pending proposals suggests there is a high client load or the member cannot commit proposals.	`min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`	WARNING
Etcd: Too many HTTP requests failures (over {$ETCD.HTTP.FAIL.MAX.WARN} for 5m)'	Too many reqvests failed on etcd instance with 5xx HTTP code.	`min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN}`	WARNING
Etcd: Server version has changed (new version: {ITEM.VALUE})	Etcd version has changed. Ack to close.	`last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0`	INFO	Manual close: YES
Etcd: Cluster version has changed (new version: {ITEM.VALUE})	Etcd version has changed. Ack to close.	`last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0`	INFO	Manual close: YES
Etcd: has been restarted (uptime < 10m)	Uptime is less than 10 minutes	`last(/Etcd by HTTP/etcd.uptime)<10m`	INFO	Manual close: YES
Etcd: Current number of open files is too high (over {$ETCD.OPEN.FDS.MAX.WARN}% for 5m)	Heavy file descriptor usage (i.e., near the process's file descriptor limit) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, etcd may panic because it cannot create new WAL files.	`min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN}`	WARNING
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} (over {$ETCD.GRPC.ERRORS.MAX.WARN} in 5m)	-	`min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN}`	WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

This template is for Zabbix version: 5.0

Also available for: 7.4 7.2 7.0 6.4 6.2 6.0 5.4

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/etcd_http?at=release/5.0

Template App Etcd by HTTP

Overview

For Zabbix version: 5.0 and higher
The template to monitor Etcd by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Etcd — collects metrics by HTTP agent from /metrics endpoint. See https://etcd.io/docs/v3.4.0/op-guide/monitoring/#metrics-endpoint.

This template was tested on:

Etcd, version 3.0+

Setup

See Zabbix template operation for basic instructions.

Import template into Zabbix
After importing template make sure that etcd allows for metric collection. Test by running: curl -L http://localhost:2379/metrics
Check if etcd is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify run curl -L http://<etcd_node_address>:2379/metrics
Add the template to each node with etcd. By default template use client port. You can configure metrics endpoint location by --listen-metrics-urls flag (See etcd docs).

If you have specified a non-standard port for etcd, don't forget change macros {$ETCD.SCHEME}, {$ETCD.PORT}.

If you need it, you can set {$ETCD.USERNAME} and {$ETCD.PASSWORD} macros in the template for using on the host level.

Test availability: zabbix_get -s etcd-host -k etcd.health

Besides, see the macros section as it will set the trigger values.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$ETCD.GRPC.ERRORS.MAX.WARN}	Maximum number of gRPC requests failures	`1`
{$ETCD.GRPC_CODE.MATCHES}	Filter of discoverable gRPC codes https://github.com/grpc/grpc/blob/master/doc/statuscodes.md	`.*`
{$ETCD.GRPC_CODE.NOT_MATCHES}	Filter to exclude discovered gRPC codes https://github.com/grpc/grpc/blob/master/doc/statuscodes.md	`CHANGE_IF_NEEDED`
{$ETCD.GRPC_CODE.TRIGGER.MATCHES}	Filter of discoverable gRPC codes which will create triggers	`Aborted
{$ETCD.HTTP.FAIL.MAX.WARN}	Maximum number of HTTP requests failures	`2`
{$ETCD.LEADER.CHANGES.MAX.WARN}	Maximum number of leader changes	`5`
{$ETCD.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors	`90`
{$ETCD.PASSWORD}	-	``
{$ETCD.PORT}	The port of Etcd API endpoint	`2379`
{$ETCD.PROPOSAL.FAIL.MAX.WARN}	Maximum number of proposal failures	`2`
{$ETCD.PROPOSAL.PENDING.MAX.WARN}	Maximum number of proposals in queue	`5`
{$ETCD.SCHEME}	Request scheme which may be http or https	`http`
{$ETCD.USER}	-	``

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info

gRPC codes discovery

DEPENDENT

etcd.grpc_code.discovery

Preprocessing:

- PROMETHEUS_TO_JSON: grpc_server_handled_total

- JAVASCRIPT: The text is too long. Please see the template.

- DISCARD_UNCHANGED_HEARTBEAT: 1h

Filter:

AND

- A: {#GRPC.CODE} NOT_MATCHES_REGEX {$ETCD.GRPC_CODE.NOT_MATCHES}

- B: {#GRPC.CODE} MATCHES_REGEX {$ETCD.GRPC_CODE.MATCHES}

Overrides:

trigger
- {#GRPC.CODE} MATCHES_REGEX {$ETCD.GRPC_CODE.TRIGGER.MATCHES}
- TRIGGER_PROTOTYPE LIKE Too many failed gRPC requests - DISCOVER

Peers discovery

DEPENDENT

etcd.peer.discovery

Preprocessing:

- PROMETHEUS_TO_JSON: etcd_network_peer_sent_bytes_total

Items collected

Group	Name	Description	Type	Key and additional info
Etcd	Etcd: Service's TCP port state	-	SIMPLE	net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Node health	-	HTTP_AGENT	etcd.health Preprocessing: - JSONPATH: `$.health` - BOOL_TO_DECIMAL - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Server is a leader	Whether or not this member is a leader. 1 if is, 0 otherwise.	DEPENDENT	etcd.is.leader Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_is_leader` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Server has a leader	Whether or not a leader exists. 1 is existence, 0 is not.	DEPENDENT	etcd.has.leader Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_has_leader` - DISCARD_UNCHANGED_HEARTBEAT: `10m`
Etcd	Etcd: Leader changes	The number of leader changes the member has seen since its start.	DEPENDENT	etcd.leader.changes Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_leader_changes_seen_total`
Etcd	Etcd: Proposals committed per second	The number of consensus proposals committed.	DEPENDENT	etcd.proposals.committed.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_committed_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals applied per second	The number of consensus proposals applied.	DEPENDENT	etcd.proposals.applied.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_applied_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals failed per second	The number of failed proposals seen.	DEPENDENT	etcd.proposals.failed.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_failed_total` - CHANGE_PER_SECOND
Etcd	Etcd: Proposals pending	The current number of pending proposals to commit.	DEPENDENT	etcd.proposals.pending Preprocessing: - PROMETHEUS_PATTERN: `etcd_server_proposals_pending`
Etcd	Etcd: Reads per second	Number of reads action by (get/getRecursive), local to this member.	DEPENDENT	etcd.reads.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_debugging_store_reads_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Writes per second	Number of writes (e.g. set/compareAndDelete) seen by this member.	DEPENDENT	etcd.writes.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_debugging_store_writes_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Client gRPC received bytes per second	The number of bytes received from grpc clients per second	DEPENDENT	etcd.network.grpc.received.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_client_grpc_received_bytes_total` - CHANGE_PER_SECOND
Etcd	Etcd: Client gRPC sent bytes per second	The number of bytes sent from grpc clients per second	DEPENDENT	etcd.network.grpc.sent.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_client_grpc_sent_bytes_total` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP requests received	Number of requests received into the system (successfully parsed and authd).	DEPENDENT	etcd.http.requests.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_received_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP 5XX	Number of handle failures of requests (non-watches), by method (GET/PUT etc.), and code 5XX.	DEPENDENT	etcd.http.requests.5xx.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_failed_total{code=~"5.+"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: HTTP 4XX	Number of handle failures of requests (non-watches), by method (GET/PUT etc.), and code 4XX.	DEPENDENT	etcd.http.requests.4xx.rate Preprocessing: - PROMETHEUS_TO_JSON: `etcd_http_failed_total{code=~"4.+"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs received per second	The number of RPC stream messages received on the server.	DEPENDENT	etcd.grpc.received.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_msg_received_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs sent per second	The number of gRPC stream messages sent by the server.	DEPENDENT	etcd.grpc.sent.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_msg_sent_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: RPCs started per second	The number of RPCs started on the server.	DEPENDENT	etcd.grpc.started.rate Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_started_total` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Server version	Version of the Etcd server.	DEPENDENT	etcd.server.version Preprocessing: - JSONPATH: `$.etcdserver` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
Etcd	Etcd: Cluster version	Version of the Etcd cluster.	DEPENDENT	etcd.cluster.version Preprocessing: - JSONPATH: `$.etcdcluster` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
Etcd	Etcd: DB size	Total size of the underlying database.	DEPENDENT	etcd.db.size Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_db_total_size_in_bytes`
Etcd	Etcd: Keys compacted per second	The number of DB keys compacted per second.	DEPENDENT	etcd.keys.compacted.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_db_compaction_keys_total` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Keys expired per second	The number of expired keys per second.	DEPENDENT	etcd.keys.expired.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_store_expires_total` - CHANGE_PER_SECOND
Etcd	Etcd: Keys total	Total number of keys.	DEPENDENT	etcd.keys.total Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_keys_total`
Etcd	Etcd: Uptime	Etcd server uptime.	DEPENDENT	etcd.uptime Preprocessing: - PROMETHEUS_PATTERN: `process_start_time_seconds` - JAVASCRIPT: `//use boottime to calculate uptime return (Math.floor(Date.now()/1000)-Number(value));`
Etcd	Etcd: Virtual memory	Virtual memory size in bytes.	DEPENDENT	etcd.virtual.bytes Preprocessing: - PROMETHEUS_PATTERN: `process_virtual_memory_bytes`
Etcd	Etcd: Resident memory	Resident memory size in bytes.	DEPENDENT	etcd.res.bytes Preprocessing: - PROMETHEUS_PATTERN: `process_resident_memory_bytes`
Etcd	Etcd: CPU	Total user and system CPU time spent in seconds.	DEPENDENT	etcd.cpu.util Preprocessing: - PROMETHEUS_PATTERN: `process_cpu_seconds_total` - CHANGE_PER_SECOND
Etcd	Etcd: Open file descriptors	Number of open file descriptors.	DEPENDENT	etcd.open.fds Preprocessing: - PROMETHEUS_PATTERN: `process_open_fds`
Etcd	Etcd: Maximum open file descriptors	The Maximum number of open file descriptors.	DEPENDENT	etcd.max.fds Preprocessing: - PROMETHEUS_PATTERN: `process_max_fds`
Etcd	Etcd: Deletes per second	The number of deletes seen by this member per second.	DEPENDENT	etcd.delete.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_delete_total` - CHANGE_PER_SECOND
Etcd	Etcd: PUT per second	The number of puts seen by this member per second.	DEPENDENT	etcd.put.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_put_total` - CHANGE_PER_SECOND
Etcd	Etcd: Range per second	The number of ranges seen by this member per second.	DEPENDENT	etcd.range.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_range_total` - CHANGE_PER_SECOND
Etcd	Etcd: Transaction per second	The number of transactions seen by this member per second.	DEPENDENT	etcd.txn.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_range_total` - CHANGE_PER_SECOND
Etcd	Etcd: Events sent per second	The number of events sent by this member per second	DEPENDENT	etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_events_total` - CHANGE_PER_SECOND
Etcd	Etcd: Pending events	Total number of pending events to be sent.	DEPENDENT	etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: `etcd_debugging_mvcc_pending_events_total`
Etcd	Etcd: RPCs completed with code {#GRPC.CODE}	The number of RPCs completed on the server with grpc_code {#GRPC.CODE}	DEPENDENT	etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing: - PROMETHEUS_TO_JSON: `grpc_server_handled_total{grpc_method="{#GRPC.CODE}"}` - JAVASCRIPT: `The text is too long. Please see the template.` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Bytes sent	The number of bytes sent to peer with ID {#ETCD.PEER}	DEPENDENT	etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_sent_bytes_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Bytes received	The number of bytes received from peer with ID {#ETCD.PEER}	DEPENDENT	etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_received_bytes_total{From="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Send failures	The number of send failures from peer with ID {#ETCD.PEER}	DEPENDENT	etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_sent_failures_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Etcd	Etcd: Etcd peer {#ETCD.PEER}: Receive failures	The number of receive failures from the peer with ID {#ETCD.PEER}	DEPENDENT	etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUS_PATTERN: `etcd_network_peer_received_failures_total{To="{#ETCD.PEER}"}` ⛔️ON_FAIL: `CUSTOM_VALUE -> 0` - CHANGE_PER_SECOND
Zabbix_raw_items	Etcd: Get node metrics	-	HTTP_AGENT	etcd.get_metrics
Zabbix_raw_items	Etcd: Get version	-	HTTP_AGENT	etcd.get_version

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Etcd: Service is unavailable	-	`{TEMPLATE_NAME:net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"].last()}=0`	AVERAGE	Manual close: YES
Etcd: Node healthcheck failed	https://etcd.io/docs/v3.4.0/op-guide/monitoring/#health-check	`{TEMPLATE_NAME:etcd.health.last()}=0`	AVERAGE	Depends on: - Etcd: Service is unavailable
Etcd: Failed to fetch info data (or no data for 30m)	Zabbix has not received data for items for the last 30 minutes	`{TEMPLATE_NAME:etcd.is.leader.nodata(30m)}=1`	WARNING	Manual close: YES Depends on: - Etcd: Service is unavailable
Etcd: Member has no leader	"If a member does not have a leader, it is totally unavailable."	`{TEMPLATE_NAME:etcd.has.leader.last()}=0`	AVERAGE
Etcd: Instance has seen too many leader changes (over {$ETCD.LEADER.CHANGES.MAX.WARN} for 15m)'	Rapid leadership changes impact the performance of etcd significantly. It also signals that the leader is unstable, perhaps due to network connectivity issues or excessive load hitting the etcd cluster.	`{TEMPLATE_NAME:etcd.leader.changes.delta(15m)}>{$ETCD.LEADER.CHANGES.MAX.WARN}`	WARNING
Etcd: Too many proposal failures (over {$ETCD.PROPOSAL.FAIL.MAX.WARN} for 5m)'	"Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster."	`{TEMPLATE_NAME:etcd.proposals.failed.rate.min(5m)}>{$ETCD.PROPOSAL.FAIL.MAX.WARN}`	WARNING
Etcd: Too many proposals are queued to commit (over {$ETCD.PROPOSAL.PENDING.MAX.WARN} for 5m)'	"Rising pending proposals suggests there is a high client load or the member cannot commit proposals."	`{TEMPLATE_NAME:etcd.proposals.pending.min(5m)}>{$ETCD.PROPOSAL.PENDING.MAX.WARN}`	WARNING
Etcd: Too many HTTP requests failures (over {$ETCD.HTTP.FAIL.MAX.WARN} for 5m)'	"Too many requests failed on etcd instance with 5xx HTTP code"	`{TEMPLATE_NAME:etcd.http.requests.5xx.rate.min(5m)}>{$ETCD.HTTP.FAIL.MAX.WARN}`	WARNING
Etcd: Server version has changed (new version: {ITEM.VALUE})	Etcd version has changed. Ack to close.	`{TEMPLATE_NAME:etcd.server.version.diff()}=1 and {TEMPLATE_NAME:etcd.server.version.strlen()}>0`	INFO	Manual close: YES
Etcd: Cluster version has changed (new version: {ITEM.VALUE})	Etcd version has changed. Ack to close.	`{TEMPLATE_NAME:etcd.cluster.version.diff()}=1 and {TEMPLATE_NAME:etcd.cluster.version.strlen()}>0`	INFO	Manual close: YES
Etcd: has been restarted (uptime < 10m)	Uptime is less than 10 minutes.	`{TEMPLATE_NAME:etcd.uptime.last()}<10m`	INFO	Manual close: YES
Etcd: Current number of open files is too high (over {$ETCD.OPEN.FDS.MAX.WARN}% for 5m)	"Heavy file descriptor usage (i.e., near the process's file descriptor limit) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, etcd may panic because it cannot create new WAL files."	`{TEMPLATE_NAME:etcd.open.fds.min(5m)}/{TEMPLATE_NAME:etcd.max.fds.last()}*100>{$ETCD.OPEN.FDS.MAX.WARN}`	WARNING
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} (over {$ETCD.GRPC.ERRORS.MAX.WARN} in 5m)	-	`{TEMPLATE_NAME:etcd.grpc.handled.rate[{#GRPC.CODE}].min(5m)}>{$ETCD.GRPC.ERRORS.MAX.WARN}`	WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

URL: https://www.zabbix.com/integrations/etcd

⇱ etcd monitoring and integration with Zabbix

Zabbix + etcd

etcd

Available solutions

Etcd by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule gRPC codes discovery

Item prototypes for gRPC codes discovery

Trigger prototypes for gRPC codes discovery

LLD rule Peers discovery

Item prototypes for Peers discovery

Feedback

Etcd by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule gRPC codes discovery

Item prototypes for gRPC codes discovery

Trigger prototypes for gRPC codes discovery

LLD rule Peers discovery

Item prototypes for Peers discovery

Feedback

Etcd by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule gRPC codes discovery

Item prototypes for gRPC codes discovery

Trigger prototypes for gRPC codes discovery

LLD rule Peers discovery

Item prototypes for Peers discovery

Feedback

Etcd by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule gRPC codes discovery

Item prototypes for gRPC codes discovery

Trigger prototypes for gRPC codes discovery

LLD rule Peers discovery

Item prototypes for Peers discovery

Feedback

Etcd by HTTP

Overview

Setup

Configuration

Macros used

Template links

Discovery rules

Items collected

Triggers

Feedback

Etcd by HTTP

Overview

Requirements

Tested versions

Configuration